READING PDB FILES Claire Shoemake

advertisement
READING PDB FILES
Claire Shoemake
Definitions
• Protein is used
interchangeably with
receptor
• The implication is that
the drug target (receptor)
being considered is
protein in nature
• Ligand: The small
molecule bound to the
protein. This could be an
endogenous molecule, or
a drug.
• Protein:ligand Complex:
This is the small molecule
bound to its receptor.
Normally the small
molecule modulates
receptor function
(agonist/antagonist)
What is a Protein Data Bank (PDB)
File?
• It is a textual file format describing the three dimensional structures
of molecules held in the Protein Data Bank.
http://bip.weizmann.ac.il/oca-bin/ocamain
• Most of the information in that database pertains to proteins, and
the pdb format accordingly provides for rich description and
annotation of protein properties. However, proteins are often
crystallized in association with other molecules or ions such as
water, ions, nucleic acids, drug molecules and so on, which
therefore can be described in the pdb format as well.
• The pdb file used as an example in this lecture is 1UZF
http://bip.weizmann.ac.il/oca-bin/send-pdb?id=1uzf
which
descrbes the Angiotensin Converting Enzyme (ACE) bound to the
ACE inhibiting drug Captopril
Protein Classification
PDB ID
1.
Gives information regarding the
content of the file.
2.
Indicates that the protein is
human. In this case human
testicular ACE.
3.
Indicates the nature of the tissue
culture that is used to express, or
grow, the protein described in
this file.
4.
Indicates the analytical
technique- X-Ray or NMR, that
was used by the authors to
resolve the protein crystal. In this
case the crystal being considered
is testicular ACE complexed to
captopril.
1.
2.
3.
4.
X-ray Crystallography
http://en.wikipedia.org/wiki/X-ray_crystallography
•
•
•
•
•
•
•
X-ray crystallography is a method of determining the
arrangement of atoms within a crystal, in which a beam of Xrays strikes a crystal and diffracts into many specific
directions.
From the angles and intensities of these diffracted beams, a
crystallographer can produce a three-dimensional picture of
the density of electrons within the crystal.
From this electron density, the mean positions of the atoms
in the crystal can be determined, as well as their chemical
bonds, and various other information.
Since many materials can form crystals — such as salts,
metals, minerals, semiconductors, as well as various
inorganic, organic and biological molecules — X-ray
crystallography has been fundamental in the development of
many scientific fields.
In its first decades of use, this method determined the size of
atoms, the lengths and types of chemical bonds, and the
atomic-scale differences among various materials, especially
minerals and alloys. The method also revealed the structure
and functioning of many biological molecules, including
vitamins, drugs, proteins and nucleic acids such as DNA.
X-ray crystallography is the chief method designing
pharmaceuticals against diseases
In an X-ray diffraction measurement, a crystal is mounted on
a goniometer and gradually rotated while being bombarded
with X-rays, producing a diffraction pattern of regularly
spaced spots known as reflections. The two-dimensional
images taken at different rotations are converted into a
three-dimensional model of the density of electrons within
the crystal using the mathematical method of Fourier
transforms, combined with chemical data known for the
sample. Poor resolution (fuzziness) or even errors may result
if the crystals are too small, or not uniform enough in their
internal makeup.
5.
5.
6.
6.
Crystallographic team- also
authors of the paper that must be
published in a peer-reviewed
journal prior to deposition
acceptance by the Protein Data
Bank
Details of the journal publication
submitted by the crystallographic
team. It is of vital importance to
obtain a copy of this publication
when attempting drug design
projects. These contain further
information that may not be
included in the pdb file
It is necessary to choose the best possible
crystallographic structure prior to embarking
on a drug design project. This is because this
structure serves as a starting point and template
on which all successive steps are dependent.
One critical factor in crystallographic data
selection is its resolution. Resolution implies
the smallest distance within which atoms may
be reliably distinguished.
The higher the resolution or the smaller the
distance within which atoms may be reliably
distinguished, the better is the crystallographic
structure.
Resolutions ranging from 2-3.5Å are considered
acceptable starting points for drug design
projects
This particular crystal structure was resolved at
2.0Å.
About 85% of the models (entries) in the
Protein Data Bank were determined by X-ray
crystallography. (Most of the remaining 15%
were determined by solution nuclear magnetic
resonance.) Analysis of x-ray diffraction
patterns from protein crystals produces an
electron density map, into which an atomic
model of the protein is fitted. Major errors
sometimes occur when fitting models in to
low-resolution electron density maps
The value of Free R is the best clue as to
whether major errors may be present in a
published model.
Obtaining diffraction-quality crystals of
proteins remains very difficult, despite many
recent advances. For every new protein
sequence targeted for X-ray crystallography,
about one in twenty is solved
Free R is a statistical quantity introduced in
1992 by Axel T. Brünger to assess the quality of
a model from X-ray crystallographic data.
It is calculated in the same manner as the R value, but from a subset of the data set aside for the calculation of free R,
and not used in the refinement of the model. It is a more reliable tool for assessing the model than the R value because
it is not self-referential -- that is, as an estimation of errors, free R is free of any bias that may have been introduced
during refinement.
As a rule of thumb, free R should not exceed the R value by more than 0.05; that is, if the R value is 0.20, free R should
not significantly exceed 0.25. Free R values exceeding 0.40 raise serious doubts about the model.
The R Value
• The R value is used to assess progress in the refinement of a model from
X-ray crystallographic data, and can be used as one factor in evaluating
the quality of a model. R is a measure of error between the observed
intensities from the diffraction pattern and the predicted intensities that
are calculated from the model. R values of 0.20 or less are taken as
evidence that the model is reliable.
• As a rule of thumb, models with R values substantially exceeding
(resolution/10) should be treated with caution. Thus, if the resolution of
a model is 2.5 Å, that model's R value should not exceed 0.25.
Completely erroneous models (e.g. random models) give R values of 0.40
to 0.60.
• However, R values themselves must be treated with caution. Unlike the
Free R, acceptable R values can be achieved despite serious errors in the
model
Kleywegt, GJ, AT Brünger. 1996. Checking your imagination: applications of the free R value. Structure
4:897-904.
It is incumbent on the authors to submit
experimental details to the Protein Data Bank.
This allows their experimental conditions to be
re-created, and their results to be reproduced.
The related entries section of the pdb file is valuable since it provides the researcher with
additional information regarding further structural information that may be available
about the protein, or receptor of interest.
In this case, three further depositions, with pdb IDs 1O86 (ACE + lisinopril), 1O8A (the
unbound form of ACE), and 1UZE (ACE + enalaprilat) are available.
It is of interest from a drug design point of view to visualise and compare these depositions
in order to identify whether or not the tertiary structure of the ACE is in any way ligand
dependant
The primary amino acid sequence i.e. the
linear sequence of the unfolded protein in this
case of testicular ACE enzyme is listed in this
section of the pdb file.
At this point of the file it is also possible to
deduce that the protein is a monomer. This
may be seen from the fact that the third
column of the file always contains the letter A.
This means that there is only one chain labelled
A, implying the monomeric status of the
protein
The term heteroatom is used in pdb files to designate all atoms that do not form part of the
protein i.e. all atoms that do not form part of the primary structure of the protein. This part of
the pdb file indicates all the heteroatoms (excluding water molecules) that form part of the
protein (ACE):ligand (captopril) complex.
The areas highlighted in blue are searchable, and lead to windows in which the structures of
the heteroatoms may be found.
In this case the presence of the Zn atom indicates the fact that ACE is a metalloprotease; MCO is
the code given by the authors for captopril. HOH indicates water.
Helices and sheets constitute
the secondary structure of a protein,
or more clearly the nature of the folding
that occurs along segments of the
protein.
This section of the pdb file yields
information regarding the secondary
structure of the protein being described.
The areas highlighted in blue are
searchable......
Parts of which are shown above. In this case, the entry shows which amino acids form helix 1 on the ACE.
The coordinate section of the pdb file describe
the coordinates of the atoms that are part of the
protein.
For example, the first ATOM line on the left
describes the alpha-N atom of the first residue of
peptide chain A, which is an aspartate residue.
The first three floating point numbers are its
x, y and z coordinates and are in units of
Ångströms.
The next three columns are the occupancy,
temperature factor, and the element name,
respectively.
The red rectangles delineate individual amino
acids. The atoms making up any one amino acid have
the same number in column 5 of the coordinate
file.
Thus, in this case, there are the coordinates of the
first 6 amino acids in the primary amino acid
sequence specifically aspartate, glutamine,
alanine, glutamine, alanine and serine
The temperature factor or B-factor can be thought of as a measure of how much an atom oscillates or vibrates around the position specified in
the model. Atoms at side-chain termini are expected to exhibit more freedom of movement than main-chain atoms, and this movement
amounts to spreading each atom over a small region of space. Occupancy is one of several parameters included in refinement. The occupancy
nj of atom j is a measure of the fraction of molecules in the crystal in which atom j actually occupies the position specified in the model.
If all molecules in the crystal are precisely identical, then occupancies for all atoms are 1.00.
This part of the pdb file shows the last amino acid in the primary amino acid sequence of the protein. Its end is indicated
by the TER entry encircled above.
The pdb file then continues to describe the first in the series of heteroatoms included in this entry- that is of those atoms
which are not part of the protein molecule. The first is NAG or N-acetylglucosamine. As indicated previously, two NAG
molecules were crystallised in this protein:ligand complex.
The coordinates for the metal ion (Zn) and the bound ligand molecule (Captopril) designated, as previously indicated
through a code identifier MCO are indicated above.
For each atom in the chemical component, lists to how many and to which other
atoms that atom is bonded. The list of CONECT records is concluded with an END record.
Ligand Protein Contacts (LPC)
http://bip.weizmann.ac.il/oca-bin/lpc?PDB_ID=1uzf
Most pdb files contain ligand:protein contact
information. This is of vital importance from a drug
design point of view:
A clear idea of the amino acids which bind the
ligand binding pocket is obtained
Critical binding interactions between the ligand and
the receptor may be identified
Unstable contacts may also be identified and
improved upon in the context of the design project
In this case, the table above lists the amino acids on the ACE
which make contact with captopril. The bond length, the contact
Surface area, and the nature of the bond are also indicated. The
Table above left is a glossary which explains the terms used in
the table above.
Hydrogen bonds play an important role in binding ligands to the ligand binding pocket of a receptor. They are different
from hydrophobic or Van der Waals interactions. These latter are more numerous and are considered to be largely
responsible for ligand stabilisation within a binding pocket. Hydrogen bonds, on the other hand, are associated with
selectivity. This means that a ligand and its cognate receptor recognise each other on the basis of the hydrogen bonds
they are capable of forging between them.
This is very important from a drug design point of view where selectivity is of paramount importance. Pdb files
conveniently list the hydrogen bonds forged between protein and ligand in a separate table in the LPC section of the file.
In the above table, the first section on the extreme left describes the ligand atoms which are involved in hydrogen bond
contacts with the protein amino acid side chains. In the first entry for example, Oxygen atom no1 (in the pdb entry) is
forging a hydrogen bond with the hydroxyl group of tyrosine520 of the ACE. The protein atom section consequently
describes the receptor atoms which forge hydrogen bond contacts with the ligand atoms. This hydrogen bond is 2.7Å long
and occupies a total surface area of 19.4Å2
The classification section (Class in the table above) is discussed later on.
This table lists each atomic contact
between the protein and the ligand. It is
similar to that for the hydrogen bond
interactions on the previous slide. It
differs in that it does not segregate for
hydrogen bond interactions, but
includes the bond types.
It also indicates the unstable
interactions in red. Drug designers will
often try to optimise these instable
contacts in order to create drug
molecules that reside within a ligand
binding pocket with improved stability.
These are the reference tables included in the LPC section
of a pdb file. They indicate the nature of the interactions
forged between protein and ligand (listed under the Class
Section), and in the case of the table on the right, there is
also information regarding which types of interactions will
give rise to stable or unstable contacts between the
protein and the ligand.
This data may be viewed graphically
using specialised software such as
VMD.......
Download