What is in a PDB file? Shuchismita Dutta May 7, 2009 1 Overview Exploring the PDB format file File formats and dictionaries Validation Finding PDB format files and other files Exploring PDB file Meta data Coordinates Title section OBSLTE 18-JUL-84 1HHB 2HHB 3HHB 4HHB SPLIT 1JGP 1JGQ 1JGO CAVEAT 1B86 THERE ARE CHIRALITY ERRORS IN C-ALPHA CENTERS REVDAT REVDAT REVDAT REVDAT 4 3 2 1 24-FEB-09 01-APR-03 15-OCT-89 17-JUL-84 4HHB 4HHB 4HHB 4HHB 1 VERSN 1 JRNL 3 MTRIX 0 SPRSDE 17-JUL-84 4HHB 1HHB Remarks: the numbers mean something REMARK REMARK REMARK REMARK REMARK REMARK 0 0 0 0 0 0 THIS ENTRY (2Q41) REFLECTS AN ALTERNATIVE MODELING OF THE ORIGINAL STRUCTURAL DATA (R1XJ5SF) DETERMINED BY AUTHORS OF THE PDB ENTRY 1XJ5: G.E.WESENBERG,D.W.SMITH, G.N.PHILLIPS JR.,E.BITTO,C.A.BINGMAN,S.T.M.ALLARD, CENTER FOR EUKARYOTIC STRUCTURAL GENOMICS (CESG). Data collection details: X-ray source, detector, data collection details (200) Fiber diffraction (205) NMR (210, 215, 217) Neutron diffractions (230) Electron crystallography (240) Electron Microscopy (245) Crystallographic details: Vm, Matthew’s coefficient Crystallographic symmetry Remark 3 Data from each refinement software has its own template and details Remarks: the numbers mean something Biological assembly information Example of a virus (1AYN) Remarks Compound details Missing residues, atoms Geometry: close contacts, bond length, angle and torsion deviations, sterochemistry Ligand details Related entries Sequence details Chemistry sections : Primary Structure & Ligand DBREF 1BH0 A 1 29 UNP P01275 GLUC_HUMAN 53 81 SEQADV 1BH0 LYS A 17 UNP P01275 ARG 69 ENGINEERED SEQADV 1BH0 LYS A 18 UNP P01275 ARG 70 ENGINEERED SEQADV 1BH0 GLU A 21 UNP P01275 ASP 73 ENGINEERED SEQRES 1 A 29 HIS SER GLN GLY THR PHE THR SER ASP TYR SER LYS TYR SEQRES 2 A 29 LEU ASP SER LYS LYS ALA GLN GLU PHE VAL GLN TRP LEU SEQRES 3 A 29 MET ASN THR MODRES 2F4K NLE A 65 LEU NORLEUCINE MODRES 2F4K NLE A 70 LEU NORLEUCINE HET PO4 D 147 1 HET PO4 B 147 1 HET HEM A 142 43 HET HEM B 148 43 HET HEM C 142 43 HET HEM D 148 43 HETNAM PO4 PHOSPHATE ION HETNAM HEM PROTOPORPHYRIN IX CONTAINING FE HETSYN HEM HEME FORMUL 5 PO4 2(O4 P 3-) FORMUL 7 HEM 4(C34 H32 FE N4 O4) FORMUL 11 HOH *221(H2 O) Secondary Structure & Connectivity HELIX 1 AA SER A 3 GLY A 18 1 16 HELIX 2 AB HIS A 20 SER A 35 1 16 HELIX 3 AC PHE A 36 TYR A 42 1 7 SHEET 1 A 4 ILE A 18 LEU A 23 0 SHEET 2 A 4 LEU A 111 VAL A 118 -1 O GLY A 115 N TRP A 19 SSBOND SSBOND SSBOND SSBOND LINK LINK LINK LINK 1 2 3 4 CYS CYS CYS CYS A A A A 6 30 64 76 NE2 HIS A 87 NE2 HIS B 92 FE HEM B 147 FE HEM A 143 CYS CYS CYS CYS FE FE O1 O1 A A A A HEM HEM OXY OXY 127 1555 1555 2.02 115 1555 1555 2.02 80 1555 1555 2.03 94 1555 1555 2.01 A B B A 143 147 150 150 1555 1555 1555 1555 1555 1555 1555 1555 CISPEP 1 PRO A 98 PRO A 99 0 0.53 CISPEP 2 GLY A 109 PRO A 110 0 -0.01 1.94 2.07 1.87 1.66 Miscellaneous SITE SITE SITE SITE 1 1 2 3 ACT AC1 AC1 AC1 3 HIS H 57 ASP H 102 SER H 12 HIS H 57 ASN H 98 LEU H 12 ASP H 189 ALA H 190 SER 12 GLY H 216 GLY H 219 HOH 195 99 ILE H 174 H 195 TRP H 215 H 264 HOH H 270 REMARK REMARK REMARK REMARK REMARK REMARK REMARK 800 800 800 800 800 800 800 SITE SITE_IDENTIFIER: ACT EVIDENCE_CODE: AUTHOR SITE_DESCRIPTION: CATALYTIC SITE SITE_IDENTIFIER: AC1 EVIDENCE_CODE: SOFTWARE SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MID H 1 Crystallographic info, Coordinate Transformations & coordinates CRYST1 88.814 95.207 89.164 90.00 104.96 90.00 P 1 21 1 8 ORIGX1 1.000000 0.000000 0.000000 0.00000 ORIGX2 0.000000 1.000000 0.000000 0.00000 ORIGX3 0.000000 0.000000 1.000000 0.00000 SCALE1 0.011259 0.000000 0.003009 0.00000 SCALE2 0.000000 0.010503 0.000000 0.00000 SCALE3 0.000000 0.000000 0.011609 0.00000 MODEL 1 ATOM 1 N ATOM 2 CA ATOM 3 C ATOM 4 O ATOM 5 CB ATOM 6 OG ATOM 7 N ATOM 8 CA ... ENDMDL SER SER SER SER SER SER THR THR A A A A A A A A 41 41 41 41 41 41 42 42 -9.122 -8.282 -7.051 -6.646 -7.845 -7.250 -6.473 -5.290 -10.304 -11.187 -11.693 -11.108 -10.416 -11.264 -12.792 -13.380 89.511 88.650 89.414 90.421 87.393 86.423 88.935 89.552 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 51.94 52.75 52.51 53.15 51.93 52.59 51.75 50.38 N C C O C O N C GLY GLY GLY GLY GLY GLY GLY GLY ASER ASER ASER ASER ASER ASER BGLY BGLY BGLY BGLY A A A A A A A A A A A A A A A A A A 8 8 8 8 9 9 9 9 10 10 10 10 10 10 10 10 10 10 2.326 3.121 3.533 4.302 3.080 3.330 4.552 4.720 5.404 6.598 6.236 5.150 7.516 8.894 5.404 6.598 6.236 5.150 4.110 3.079 3.408 2.642 4.526 4.880 5.685 6.098 6.014 6.814 8.234 8.733 6.864 6.884 6.014 6.814 8.234 8.733 1.416 2.065 3.476 4.092 4.038 5.396 5.709 6.885 4.753 5.042 5.479 5.233 3.822 4.237 4.753 5.042 5.479 5.233 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.33 0.33 0.33 0.33 0.33 0.33 0.67 0.67 0.67 0.67 Microheterogeneity (1ENM) 42.03 42.27 42.32 44.09 40.18 40.11 39.75 40.96 39.21 38.11 36.87 32.77 39.46 40.79 39.21 38.11 36.87 32.77 Atom type B-factor Occupancy z coordinate y coordinate x coordinate Residue # N CA C O N CA C O N CA C O CB OG N CA C O Chain ID 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 Residue name Atom name ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM Alternate conformer ID ATOM ATOM ATOM ATOM S.# Coordinate section: A Closer look N C C O N C C O N C C O C O N C C O 1 N 2 CA 3 C 4 O 5 CB 6 N 7 CA 8 C 9 O 10 CB 11 N 12 CA 13 C 14 O 15 CB 16 CG 17 OD1 18 OD2 19 N 20 CA 21 C 22 O GLU GLU GLU GLU GLU ALA ALA ALA ALA ALA ASP ASP ASP ASP ASP ASP ASP ASP CYS CYS CYS CYS L L L L L L L L L L L L L L L L L L L L L L 1C 1C 1C 1C 1C 1B 1B 1B 1B 1B 1A 1A 1A 1A 1A 1A 1A 1A 1 1 1 1 63.677 64.338 63.351 63.320 65.320 62.537 61.571 60.631 60.238 60.810 60.262 59.378 57.965 57.476 59.593 58.724 57.452 59.188 57.321 56.005 55.351 56.002 26.331 26.818 27.360 28.565 25.825 26.499 26.988 28.018 27.865 25.845 29.089 30.016 29.526 28.873 31.557 32.268 32.455 32.658 29.802 29.353 30.160 30.636 17.947 16.736 15.717 15.489 16.101 15.096 14.116 14.729 15.872 13.511 14.012 14.691 14.760 13.851 14.587 13.564 13.924 12.472 15.860 16.036 17.077 17.968 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 Residue numbering (1DWD) 31.77 35.78 41.73 49.37 38.64 36.03 33.01 32.42 31.68 33.36 33.13 35.05 31.74 36.72 41.32 46.17 47.60 48.99 22.52 15.35 15.83 18.73 Atom type B-factor Occupancy z coordinate y coordinate x coordinate Chain ID Residue # Residue name Atom name S.# ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM Insertion codes ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM N C C O C N C C O C N C C O C C O O N C C O Connectivity & Book keeping SEQRES coordinates Sheet Helix 80 73 81 80 82 84 81 83 88 Nonstd residues 73 80 81 82 remarks CONECT CONECT CONECT CONECT MASTER 2487 0 28 47 52 0 0 673322 31 280 104 END The PDB format guide Located at – http://www.wwpdb.org/documentation/format32/v3.2.html Defines all the records that appear in the PDB file Includes templates for all records and remarks www.wwpdb.org Keeping track of all the information PDB format file is a report from a database The database is built on the PDB exchange and chemical component dictionaries The PDB exchange dictionary captures every piece of data from the PDB format file to build the mmCIF format file Validation uses dictionaries to – Check inter-relationships between different data components – Match information to chemical component dictionary mmCIF format file -snip- PDB format file PDB Format vs mmCIF Format 80 characters wide Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms. Includes name, source and sequence of all polymers Can include a maximum of 62 chains and 99999 atoms. Free format Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms. Includes name, source and sequence of all polymers No restriction to number of chains or atoms in file. Keeping track of all the information PDB format file is a report from a database The database is built on the PDB exchange and chemical component dictionaries The PDB exchange dictionary captures every piece of data from the PDB format file to build the mmCIF format file Validation uses dictionaries to – Check inter-relationships between different data components – Match information to chemical component dictionary Dictionaries PDB Exchange (pdbx) dictionary – (http://mmcif.pdb.org/) – Includes the syntax, definitions, relations, boundaries – Includes examples for the contents of the mmCIF format file. Chemical Component Dictionary – Describes all residues in the PDB files (standard, non-standard amino acids, nucleotides and other ligands - ions, drugs, cofactors, inhibitors) – 1-3 alphanumeric character identifier – Includes model & idealized coordinates for components, connectivities, name, formula, smiles strings – Maintained by the wwPDB. – Used for data processing and validation of structures Ligand cif file Ligand Expo - Search Options Also use for component building Ligand Expo – Substructure Search Ligand Expo - Browse Options PDB Exchange Dictionary includes syntax & definitions for mmCIF format files PDB format file mmCIF format file -snip- Instance of valine matched to VAL in Chemical Component Dictionary Validation Quality assessment Is the structure well determined overall? Is the structure suitable for your analysis and/or modeling requirements? Are local regions that you are interested in well determined? When to Validate? Step 0: Validation Refinement Step 2: Validation Report Step 1: PDB ID Archival Data Depositor Data Deposition Primary Annotation Validation PDB Entry Core Database Distribution Site Step 3: Corrections Step 4: Depositor Approval Download Data Step 5: Functional Annotation Validation Use of PDB data What is validated? Chemistry – Of polymer (match to DB and internal consistency) – Of ligands, ions, inhibitors (match to dictionary) Geometry – Close contacts – Bond length, angle, torsion etc. deviations – Ramachandran plot Experimental data – SF check – R factors How to validate? Molprobity EDS server Procheck Whatcheck/Whatif Validation server at RCSB PDB Electron Density Server report Real-space R-value Electron Density Server report Real-space R-value Validation at RCSB PDB Close contacts Bond length deviations Torsion angles Planarity Missing residues Link records SFCheck report Downloading files Coordinate – – – – PDB 80 character wide Created for X-ray structures Updated for NMR, EM and other methods mmCIF More flexible format Based on mmCIF (PDBX) dictionary PDBML XML translation of mmCIF format files Biological Unit Experimental data – – SF files Distributed in mmCIF format Constraints file Validated by BMRB Archive download The ftp archive RCSB PDB website The Structure Summary page Asymmetric and Biological Unit Structure Analysis (RCSB tables) Summary Exploring the PDB format file – Documentation available from the wwPDB website File formats and dictionaries – Documentation and links available from the wwPDB website Validation – Links available from RCSB PDB website Finding PDB format files and other files – Links available from wwPDB and RCSB PDB websites Funding NSF, NIGMS, DOE, NLM, NCI, NINDS, NIDDK Wellcome Trust, EU, CCP4, BBSRC, MRC, EMBL BIRD-JST, MEXT NLM 62