What is in a PDB file? Shuchismita Dutta Spring 2010 1

advertisement

What is in a PDB file?

Shuchismita Dutta

Spring 2010

1

Overview

 Exploring the PDB format file

 File formats and dictionaries

 Finding PDB format files and other files

 Validation

Exploring PDB file

Meta data

Coordinates

Title section

OBSLTE 18-JUL-84 1HHB 2HHB 3HHB 4HHB

SPLIT 1JGP 1JGQ 1JGO

CAVEAT 1B86 THERE ARE CHIRALITY ERRORS IN C-ALPHA CENTERS

REVDAT 4 24-FEB-09 4HHB 1 VERSN

REVDAT 3 01-APR-03 4HHB 1 JRNL

REVDAT 2 15-OCT-89 4HHB 3 MTRIX

REVDAT 1 17-JUL-84 4HHB 0

SPRSDE 17-JUL-84 4HHB 1HHB

Remarks: the numbers mean something

REMARK 0

REMARK 0 THIS ENTRY (2Q41) REFLECTS AN ALTERNATIVE MODELING OF THE

REMARK 0 ORIGINAL STRUCTURAL DATA (R1XJ5SF) DETERMINED BY AUTHORS

REMARK 0 OF THE PDB ENTRY 1XJ5: G.E.WESENBERG,D.W.SMITH,

REMARK 0 G.N.PHILLIPS JR.,E.BITTO,C.A.BINGMAN,S.T.M.ALLARD,

REMARK 0 CENTER FOR EUKARYOTIC STRUCTURAL GENOMICS (CESG).

Data collection details:

X-ray source, detector, data collection details (200)

Fiber diffraction (205)

NMR (210, 215, 217)

Neutron diffractions (230)

Electron crystallography (240)

Electron Microscopy (245)

Crystallographic details:

Vm, Matthew’s coefficient

Crystallographic symmetry

Remark 3

Data from each refinement software has its own template and details

Remarks: the numbers mean something

Biological assembly information

Example of a virus (1AYN)

Remarks

Compound details

Missing residues, atoms

Geometry: close contacts, bond length, angle and torsion deviations, sterochemistry

Ligand details

Related entries

Sequence details

Chemistry sections :

Primary Structure & Ligand

DBREF 1BH0 A 1 29 UNP P01275 GLUC_HUMAN 53 81

SEQADV 1BH0 LYS A 17 UNP P01275 ARG 69 ENGINEERED

SEQADV 1BH0 LYS A 18 UNP P01275 ARG 70 ENGINEERED

SEQADV 1BH0 GLU A 21 UNP P01275 ASP 73 ENGINEERED

SEQRES 1 A 29 HIS SER GLN GLY THR PHE THR SER ASP TYR SER LYS TYR

SEQRES 2 A 29 LEU ASP SER LYS LYS ALA GLN GLU PHE VAL GLN TRP LEU

SEQRES 3 A 29 MET ASN THR

MODRES 2F4K NLE A 65 LEU NORLEUCINE

MODRES 2F4K NLE A 70 LEU NORLEUCINE

HET PO4 D 147 1

HET PO4 B 147 1

HET HEM A 142 43

HET HEM B 148 43

HET HEM C 142 43

HET HEM D 148 43

HETNAM PO4 PHOSPHATE ION

HETNAM HEM PROTOPORPHYRIN IX CONTAINING FE

HETSYN HEM HEME

FORMUL 5 PO4 2(O4 P 3-)

FORMUL 7 HEM 4(C34 H32 FE N4 O4)

FORMUL 11 HOH *221(H2 O)

Secondary Structure & Connectivity

HELIX 1 AA SER A 3 GLY A 18 1 16

HELIX 2 AB HIS A 20 SER A 35 1 16

HELIX 3 AC PHE A 36 TYR A 42 1 7

SHEET 1 A 4 ILE A 18 LEU A 23 0

SHEET 2 A 4 LEU A 111 VAL A 118 -1 O GLY A 115 N TRP A 19

SSBOND 1 CYS A 6 CYS A 127 1555 1555 2.02

SSBOND 2 CYS A 30 CYS A 115 1555 1555 2.02

SSBOND 3 CYS A 64 CYS A 80 1555 1555 2.03

SSBOND 4 CYS A 76 CYS A 94 1555 1555 2.01

LINK NE2 HIS A 87 FE HEM A 143 1555 1555 1.94

LINK NE2 HIS B 92 FE HEM B 147 1555 1555 2.07

LINK FE HEM B 147 O1 OXY B 150 1555 1555 1.87

LINK FE HEM A 143 O1 OXY A 150 1555 1555 1.66

CISPEP 1 PRO A 98 PRO A 99 0 0.53

CISPEP 2 GLY A 109 PRO A 110 0 -0.01

Miscellaneous

SITE 1 ACT 3 HIS H 57 ASP H 102 SER H 195

SITE 1 AC1 12 HIS H 57 ASN H 98 LEU H 99 ILE H 174

SITE 2 AC1 12 ASP H 189 ALA H 190 SER H 195 TRP H 215

SITE 3 AC1 12 GLY H 216 GLY H 219 HOH H 264 HOH H 270

REMARK 800 SITE

REMARK 800 SITE_IDENTIFIER: ACT

REMARK 800 EVIDENCE_CODE: AUTHOR

REMARK 800 SITE_DESCRIPTION: CATALYTIC SITE

REMARK 800 SITE_IDENTIFIER: AC1

REMARK 800 EVIDENCE_CODE: SOFTWARE

REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE MID H 1

Crystallographic info, Coordinate

Transformations & coordinates

CRYST1 88.814 95.207 89.164 90.00 104.96 90.00 P 1 21 1 8

ORIGX1 1.000000 0.000000 0.000000 0.00000

ORIGX2 0.000000 1.000000 0.000000 0.00000

ORIGX3 0.000000 0.000000 1.000000 0.00000

SCALE1 0.011259 0.000000 0.003009 0.00000

SCALE2 0.000000 0.010503 0.000000 0.00000

SCALE3 0.000000 0.000000 0.011609 0.00000

MODEL 1

ATOM 1 N SER A 41 -9.122 -10.304 89.511 0.12 51.94 N

ATOM 2 CA SER A 41 -8.282 -11.187 88.650 0.12 52.75 C

ATOM 3 C SER A 41 -7.051 -11.693 89.414 0.12 52.51 C

ATOM 4 O SER A 41 -6.646 -11.108 90.421 0.12 53.15 O

ATOM 5 CB SER A 41 -7.845 -10.416 87.393 0.12 51.93 C

ATOM 6 OG SER A 41 -7.250 -11.264 86.423 0.12 52.59 O

ATOM 7 N THR A 42 -6.473 -12.792 88.935 0.12 51.75 N

ATOM 8 CA THR A 42 -5.290 -13.380 89.552 0.12 50.38 C

...

ENDMDL

Coordinate section: A Closer look

Alternate conformer ID

ATOM 49 N GLY A 8 2.326 4.110 1.416 1.00 42.03 N

ATOM 50 CA GLY A 8 3.121 3.079 2.065 1.00 42.27 C

ATOM 51 C GLY A 8 3.533 3.408 3.476 1.00 42.32 C

ATOM 52 O GLY A 8 4.302 2.642 4.092 1.00 44.09 O

ATOM 53 N GLY A 9 3.080 4.526 4.038 1.00 40.18 N

ATOM 54 CA GLY A 9 3.330 4.880 5.396 1.00 40.11 C

ATOM 55 C GLY A 9 4.552 5.685 5.709 1.00 39.75 C

ATOM 56 O GLY A 9 4.720 6.098 6.885 1.00 40.96 O

ATOM 57 N ASER A 10 5.404 6.014 4.753 0.33 39.21 N

ATOM 58 CA ASER A 10 6.598 6.814 5.042 0.33 38.11 C

ATOM 59 C ASER A 10 6.236 8.234 5.479 0.33 36.87 C

ATOM 60 O ASER A 10 5.150 8.733 5.233 0.33 32.77 O

ATOM 61 CB ASER A 10 7.516 6.864 3.822 0.33 39.46 C

ATOM 62 OG ASER A 10 8.894 6.884 4.237 0.33 40.79 O

ATOM 63 N BGLY A 10 5.404 6.014 4.753 0.67 39.21 N

ATOM 64 CA BGLY A 10 6.598 6.814 5.042 0.67 38.11 C

ATOM 65 C BGLY A 10 6.236 8.234 5.479 0.67 36.87 C

ATOM 66 O BGLY A 10 5.150 8.733 5.233 0.67 32.77 O

Microheterogeneity (1ENM)

Insertion codes

ATOM 1 N GLU L 1C 63.677 26.331 17.947 1.00 31.77 N

ATOM 2 CA GLU L 1C 64.338 26.818 16.736 1.00 35.78 C

ATOM 3 C GLU L 1C 63.351 27.360 15.717 1.00 41.73 C

ATOM 4 O GLU L 1C 63.320 28.565 15.489 1.00 49.37 O

ATOM 5 CB GLU L 1C 65.320 25.825 16.101 1.00 38.64 C

ATOM 6 N ALA L 1B 62.537 26.499 15.096 1.00 36.03 N

ATOM 7 CA ALA L 1B 61.571 26.988 14.116 1.00 33.01 C

ATOM 8 C ALA L 1B 60.631 28.018 14.729 1.00 32.42 C

ATOM 9 O ALA L 1B 60.238 27.865 15.872 1.00 31.68 O

ATOM 10 CB ALA L 1B 60.810 25.845 13.511 1.00 33.36 C

ATOM 11 N ASP L 1A 60.262 29.089 14.012 1.00 33.13 N

ATOM 12 CA ASP L 1A 59.378 30.016 14.691 1.00 35.05 C

ATOM 13 C ASP L 1A 57.965 29.526 14.760 1.00 31.74 C

ATOM 14 O ASP L 1A 57.476 28.873 13.851 1.00 36.72 O

ATOM 15 CB ASP L 1A 59.593 31.557 14.587 1.00 41.32 C

ATOM 16 CG ASP L 1A 58.724 32.268 13.564 1.00 46.17 C

ATOM 17 OD1 ASP L 1A 57.452 32.455 13.924 1.00 47.60 O

ATOM 18 OD2 ASP L 1A 59.188 32.658 12.472 1.00 48.99 O

ATOM 19 N CYS L 1 57.321 29.802 15.860 1.00 22.52 N

ATOM 20 CA CYS L 1 56.005 29.353 16.036 1.00 15.35 C

ATOM 21 C CYS L 1 55.351 30.160 17.077 1.00 15.83 C

ATOM 22 O CYS L 1 56.002 30.636 17.968 1.00 18.73 O

Residue numbering (1DWD)

Connectivity & Book keeping

CONECT 73 80

CONECT 80 73 81

CONECT 81 80 82 84

CONECT 82 81 83 88

MASTER 2487 0 28 47 52 0 0 673322 31 280 104

END

The PDB format guide

 Located at

– http://www.wwpdb.org/documentation/format32/v3.2.html

 Defines all the records that appear in the PDB file

 Includes templates for all records and remarks

www.wwpdb.org

Keeping track of all the information

 PDB format file is a report from a database

 The database is built on the PDB exchange and chemical component dictionaries

 The PDB exchange dictionary captures every piece of data from the PDB format file to build the mmCIF format file

 Validation uses dictionaries to

– Check inter-relationships between different data components

– Match information to chemical component dictionary

-snip-

PDB format file mmCIF format file

PDB Format vs mmCIF Format

 80 characters wide

 Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms.

 Includes name, source and sequence of all polymers

 Can include a maximum of 62 chains and 99999 atoms.

 Free format

 Includes header and coordinates (x, y, z, occupancy and B-factors) for all atoms.

 Includes name, source and sequence of all polymers

 No restriction to number of chains or atoms in file.

Keeping track of all the information

 PDB format file is a report from a database

 The database is built on the PDB exchange and chemical component dictionaries

 The PDB exchange dictionary captures every piece of data from the PDB format file to build the mmCIF format file

 Validation uses dictionaries to

– Check inter-relationships between different data components

– Match information to chemical component dictionary

Dictionaries

 PDB Exchange (pdbx) dictionary

– (http://mmcif.pdb.org/)

– Includes the syntax, definitions, relations, boundaries

– Includes examples for the contents of the mmCIF format file.

 Chemical Component Dictionary

– Describes all residues in the PDB files (standard, non-standard amino acids, nucleotides and other ligands - ions, drugs, cofactors, inhibitors)

– 1-3 alphanumeric character identifier

– Includes model & idealized coordinates for components, connectivities, name, formula, smiles strings

– Maintained by the wwPDB.

– Used for data processing and validation of structures

Ligand cif file

Ligand Expo - Search Options

Ligand Expo – Substructure Search

Ligand Expo - Browse Options

-snip-

PDB format file

PDB Exchange

Dictionary includes syntax & definitions for mmCIF format files mmCIF format file

Instance of valine matched to VAL in

Chemical Component

Dictionary

Downloading files

 Coordinate

– PDB

 80 character wide

 Created for X-ray structures

 Updated for NMR, EM and other methods

– mmCIF

 More flexible format

 Based on mmCIF (PDBX) dictionary

– PDBML

 XML translation of mmCIF format files

– Biological Unit

 Experimental data

– SF files

 Distributed in mmCIF format

– Constraints file

 Validated by BMRB

Archive download

The ftp archive

RCSB PDB website

The Structure Summary page

Asymmetric and Biological Unit

Structure Analysis (RCSB tables)

Validation

 Quality assessment

 Is the structure well determined overall?

 Is the structure suitable for your analysis and/or modeling requirements?

 Are local regions that you are interested in well determined?

When to Validate?

Refinement

Step 2: Validation Report

Step 1: PDB ID

Depositor Data Deposition

Primary

Annotation

Validation PDB Entry

Step 3: Corrections

Step 4: Depositor Approval

Step 5: Functional Annotation

Archival Data

Core

Database

Distribution

Site

Download Data

Use of PDB data

Validation

What is validated?

 Chemistry

– Of polymer (match to DB and internal consistency)

– Of ligands, ions, inhibitors (match to dictionary)

 Geometry

– Close contacts

– Bond length, angle, torsion etc. deviations

– Ramachandran plot

 Experimental data

– SF check

– R factors

How to validate?

 Molprobity

 EDS server

 Procheck

 Whatcheck/Whatif

 Validation server at RCSB PDB

Electron Density Server report

Real-space R-value

Electron Density Server report

Real-space R-value

Validation at RCSB PDB

Summary

 Exploring the PDB format file

– Documentation available from the wwPDB website

 File formats and dictionaries

– Documentation and links available from the wwPDB website

 Finding PDB format files and other files

– Links available from wwPDB and RCSB PDB websites

 Validation

– Links available from RCSB PDB website

Funding

NSF, NIGMS, DOE, NLM, NCI,

NINDS, NIDDK

Wellcome Trust, EU,

CCP4, BBSRC, MRC, EMBL

BIRD-JST, MEXT NLM

53

Download