Practical validation of X-ray protein structures for modelers

advertisement
Practical validation of X-ray protein
structures for modelers
By Joost van Kempen and Hans Raaijmakers.
Part one
Please read through the text. You will come across some questions. The answers can be
found on the last pages. Try to answer the questions together and don’t be afraid to have
some discussion about it. After discussing your answer, compare it to our answer. If your
answers (page 15 – 17) is not exactly the same, it is not necessarily wrong. It is an
opportunity for discussion. If you have doubts, ask one of the practical assistants.
You might not be able to finish the entire practical. So if you can’t find the answer, look at
our answers and try to understand them. Try to stop 15 minutes before the end and go to
part 3 (if you are not there yet) to see if you understand the “take home” message.
Good Luck.
Introduction
In X-ray crystallography, X-rays are diffracted by the electrons. Unlike NMR data, these
electrons are quite anonymous: At medium resolution it is usually impossible to tell whether
electrons belong to carbon, nitrogen or oxygen atoms. The electron density is an average of
all the protein molecules in the crystal. In flexible parts of the protein, this leads to a
superposition of multiple poorly defined conformations, while the data/parameter ratio only
allows refinement of a single conformation. In these parts the protein structure depends
largely on the interpretation of the crystallographer.
Nowadays protein structures are mostly deposited together with the structure factor (The
raw data, i.e. a list of 3 index numbers, the intensity and the reliability of each diffraction
spot). This allows anyone to recalculate and reinterpret the electron density. To make this
even easier, the Electron Density Server (EDS) at Uppsala University provides electron
density maps for many protein structures.
Correction of erroneous structures might require the eyes of an expert, but, before spending
weeks on modeling, it’s easy enough to check whether the protein region of interest received
the attention it deserves.
1
Gathering the data
Pdb entry 1NDE contains the structure of the
receptor in complex with 4-(2-{[4-{[3-(4Chlorophenyl)propyl]sulfanyl}-6-(1piperazinyl)-1,3,5-triazin-2yl]amino}ethyl)phenol. We’ll use this
to spot some severe errors.
H
N
estrogen
N
HO
N
N
H
N
N
S
Cl
structure
Before we start our editor, we will first collect the necessary data. We will download the PDB
model and the electron density data. This can easily be done on the EDS.
Log in to the CMBI server like described in How to run Programs from your course
account director.
cd to bioinf4/coot.
>cd bioinf4/coot
Start your web browser e.g. firefox:
>firefox
Surf to the website http://eds.bmc.uu.se.
Enter the PDB code “1NDE” and press submit.
A screen appears showing some information about the used structure like Resolution, R
value and completeness of data. Take a second to look at the data.
In the bar on the left some plots can be viewed and data can be downloaded. We will first
download the coordinates of the predicted protein model.
To do so, click on the Coordinates link.
In a new window, the data that is in the file is displayed. As some of you might already
recognize, this is a normal PDB file. The default name of the file you are saving will be
pdb1nde.ent. You can leave it as it is. This is exactly the same file as you would get from the
PDB website.
Click File, Save Page As and save the file to the same directory you will work in
(choose browse for other folders , click on jour pc number e.g. ws200062, bioinf4
and coot and than save) .
Before we close the file we will have a look inside it to see what information can be found in
the header of the file. You can see this information also by opening the file in a normal text
editor, for example nedit. To start nedit, type in a shell in the coot directory:
>nedit pdb1nde.ent
Look thought the file until the part starts with only atom lines. You should come across
information about:

The organism the protein comes from.
2

The authors of the structure and their publication. It’s always worthwhile to read the
paper before you spend weeks of modeling based on this structure!!!

The resolution of the structure.

Links to SwissProt.

Much information about experimental conditions, used software and refinement
parameters.

Property’s that are strange or should be interpreted in a special way.

And much more.
A. Look at a line starting with ATOM. What do you think each column
means?
Close this window to return to the EDS page.
Now we will download the electron density maps.
Click on maps. In the window that appears set the Map format to CCP4. Leave the
type as it is (2mFo-DFc) before clicking Generate map. Within a few seconds a
link appears called 1nde.ccp4.gz.
Download this file by right clicking on the link and choosing Save Link As followed
by Save to save it in the same directory.
Save it to the same directory as the coordinate file and leave the filename as it is. The file
you just downloaded contains the “best” electron density calculated from the experimental
data and the model.
We will now view the data in coot.
To do so, close your web browser and go back to your shell and to the directory
you saved the files in.
First we need to extract the downloaded maps.
To do so run the command:
>gzip –d 1nde.ccp4.gz
Now we have to change the extension of all files to .map instead of .ccp4.
This can be done using the command:
>mv 1nde.ccp4 1nde.map
3
Working with Coot
We’ll use the program Coot to look at the protein model and the electron density. Coot is an
excellent freeware tool used for crystallographic model building, model completion and
validation. Structure representation hasn’t been a development focus, but it will do.
Before start Coot, be sure you are using the T-shell with –l option like described
in How to run Programs from your course account director.
Start Coot using the command:
>Coot
Click Close on the Tip screen to close it and than enlarge the Coot window
somewhat by dragging the edges.
Click File, Open Coordinates… and choose “pdb1nde.ent”.
Now, let’s open the electron density file.
You do this by choosing Open map… from the File menu. Choose the file
containing the electron density map (1nde.map).
As you will see, only a part of the electron density map is shown. This is done to increase the
handling speed.
Move the protein around for a few seconds to get familiar with the program. Find out what
clicking (on an atom) and dragging with all 3 mouse buttons does, and what the scroll wheel
does. The CTRL key changes the behavior of the mouse. Beware: Middle clicking on a atom
centers it but it is not selected. Selecting is done using the Go to Atom … window as
described later on.
While rotating the protein you will see that there is a clipping plane in front of the protein.
This is the reason that amino acids disappear if they come to close to the screen. This can
be disabled if you like, by dragging the mouse: CTRL-right dragging (horizontally) moves the
protein perpendicular to the screen, while dragging vertically it adjusts the thickness of the
visible plane.
See what the scroll wheel does. The Sigma value is shown while moving the scroll button.
Sigma values between 1 and 1.5 are usually best to look at the electron density map.
We will now inspect some specific parts of the protein. We will start with Glutamine 267. To
find it you can proceed in 2 ways.
1. By pressing the space bar, you can walk through all residues one by one.
Pressing shift + space bar will go in the opposite direction. Holding the
space bar (with or without shift) will move you rapidly through all residues.
2. Go to the Draw menu and choose Go To Atom…. Now type the residue
number (in this case 267) in the Residue Number edit box and Fill Ca (the C
alpha atom) in the Atom name box. Or pick the residue from the sequence
tree box as pointed out below (click on the + sign first). Now press apply.
4
In both cases, the selected residue is selected and centered on the screen.
5
Exploring the Crystal
Go to Glutamine 267. Use the scroll button to decrease the electron density a bit (to about 1
sigma). You will see that the side chain has got little electron density. The same is true for
Arg 329, Lys 353 and Gln 450.
B. Look at these side chains. Can you think of a reason why only so little
electron density is there?
Go to Tyrosine 411. Near this residue you will see a lot of electron density where no atoms
are in the model.
C. Can you imagine what that is?
To get an idea of the packing of the protein molecules in the crystal, and the space between
them, we will now expand the view beyond one crystal cell.
Click Reset View twice.
Open the Display manager by clicking on Display Manager. Click on the Display
button behind 1nde.map to hide the electron.
Click on Draw and then on Cell & Symmetry…. Then Click Symmetry by Molecule
and select Display as CAs followed by OK.
Set the Master Switch: Show Symmetry Atoms? To Yes and set the Radius to 40
A. Now Click OK.
You can now see a part of the crystal. To keep an overview, we’ll only look at the Cα traces.
At the computers used for this practical, 40 Å is the maximal size to show at reasonable
speed. On our 2 laptops, you can view 70 Å of the crystal, (or even 100 A if you’re patient),
to get an idea of what a crystal looks like at molecular scale.
Zoom out until you see the whole visible part of the crystal.
Now rotate the structure. There are 3 orientations that show a symmetry axis spot on. (6 if
you count looking in the opposite direction). You’ll see solvent channels that run throughout
the whole crystal. These channels provide access for many molecules, such as:

Heavy metals (to solve new protein structures)

Compounds (exchange them to study their binding modes)

Substrates (to show that enzymes can still be active in the crystal)

Reductors/oxidators (to change the redox state of some electron carriers)

…
It may take seconds, hours or days to soak such molecules into a crystal.
D. In one of the 3 directions the symmetry is higher than the 2-fold in the
other two directions. How High?
6
Set the Master Switch: Show Symmetry Atoms? To No again on the Draw > Cell &
Symmetry… window and press Apply.
Unhide the electron density from the Display manager (click on the Display
button once again behind the 1nde.ccp4.map file.)
Exploring the open space 1
In the open space between the proteins in the crystal that we just saw, a lot of solvent is
present. At some places a water molecule can make good hydrogen bonds with the protein.
Such an ordered water molecule may be visible as a small spherical density contour near Hbond donors or acceptors on the protein. But, especially in low resolution structures like this
one, it is difficult to see if it is water, ammonia, Na+, Cl- or another additive used for
crystallization. Or plain noise in the electron density map.
In this model, many waters are included. Let’s check how certain we can be about their
position.
Click on Measures than environment distances and than show residue
environment and press Ok.
The dashed lines show distances between atoms that cause “clashes”, atoms having a
distance smaller than the sum of their van der Waals distances. The pink lines show probable
errors, the yellow ones could be hydrogen bonds.
Select water 1 (HOH 1) and move through the first 10 waters by pressing the
space bar (shift-space bar to go backwards).
E. Which waters make sense chemically (hydrogen bonds/clashes) and/or
according to electron density.
7
Exploring the model
Go to residue 411.
F. What has happened to residues 412-415? Or even 412-418? Or to
residues 483-485?
Go to Glutamine 493
G. Can you find an alternative location for glu493?
Click Calculate, Model/Fit/Refine… and than Real Space Refine Zone. Click the
glu493 twice, and you will see that the conformation is minimized (don’t accept it
yet). Now drag glu493 residue to its new home. And if you release it you will see
it fits in the space very well.
Try to find the ligand. If you can’t find it use the Go To Atom… like before and use Residue
number 101 and Atom name C18. You must clear the Chain box or just use the tree in the
left box below.
H. While looking at the chemical structure, what must be wrong? Do you
have an idea how this has happened? (See the 2d-drawing of the
compound above).
Now we will load a parameter file so Coot will understand the chemical properties of the
ligand.
Click file, Import CIF dictionary… and select 1NDE_restraints.cif (from the coot
directory) followed by OK.
Click Calculate, Model/Fit/Refine… and than Real Space Refine Zone. Now click
twice on a ligand atom.
Before accepting the refined conformation you can try another position/conformer. Drag an
atom of your compound. (Left-mouse for entire molecule, CTRL+left for atom only). When
you release the atom you can see if the ligand returns in the same conformation or perhaps
one in another local minimum conformation is found. Choose the best conformation,
according to the chemistry and electron density.
Look at the changes before accepting the new conformation.
I.
After accepting, do you think the 2 waters in the ligand density should
still be there (scroll to a sigma of 4 or 5 to have an even better view)?
Go to Serine 311.
A click on calculate and than model/fit/refine to bring up the menu again. Then
click rotamers … and click on the Serine in the structure.
8
A window appears containing all (3) plausible conformations. Click on a conformation to see
it previewed in the structure. Look at the electron density and the surrounding atoms.
J. Which conformations do you prefer most and why?
Select the rotamer with the highest percentage (of occurrence in other high
resolution protein structures) and press accept.
As you can see the chide chain conformation is now changed to the selected one. We will
now show all distances to other atoms surrounding the side chain.
Click on Measures than environment distances and than make sure show residue
environment is selected.
K. Can you think of a reason why the crystallographer chose the original
conformation?
9
Part two
Exploring a high resolution Crystal
We will now look at a well refined structure with a much higher resolution: 1cz9. It contains
a domain of 'avian sarcoma virus integrase'.
Download the coordinates and the map file from the EDS like you did before.
Extract and rename the map file.
Now first open the pdb file (.ent) in a text editor or from the website again. And
find out what the resolution of the structure is.
Don’t close the file yet, we will look into it again in a moment.
Open the coordinate file in coot. First restart Coot to remove all old models.
Brows trough all amino acids using the space bar.
L. You will come across some strange looking amino acids (position 70,
74, 81,114,129,136, 165, 177). What is happening there? In which part
of the protein do you find these amino acids?
Look back in the PDB file and see how the atoms that are in one of the 2 conformations are
annotated (look especially in the occupancy column, but disregard the ANISOU lines).
Now open the map file. Also open the display manager and hide the coordinate
file (pdb1cz9.ent).
Browse through the visible part of the electron density. Which amino acids do you recognize
from the electron density alone? You might want to scroll down the electron density to 4
sigma, to get an even better view.
Unhide the coordinate file and see if you were right.
Go to his 93 and turn on the Environment Distances (from the draw menu).
M. Do you agree with this conformation? Also regarding the hydrogen
bonds?
10
Exploring the open space 2
Go to citrate 399.
N. Does the citrate fit well in its electron density? Does it also fit
chemically (what is the charge of the citrate and the surrounding part
of the molecule)?
Go to sulphate 400.
O. How well does it fit in its electron density?
If you look into the PDB file you will see in REMARK 280 that the protein is crystallizes in a
buffer containing 10% isopropanol (see both images below).
P. Could it be isopropanol that sits in the electron density instead of the
Sulfate? Why (also look to the chemistry)?
11
Part 3
Please read this text and for all 12 points mentioned, ask your self if you understand this. If
not, ask someone to explain it to you.
QUESTIONS YOU SHOULD ASK
ABOUT CRYSTALLOGRAPHIC MODELS
Copyright 2007 Gale Rhodes. Adapted by permission of the author.
INTRODUCTION
Molecular modeling programs and fine graphics computers are becoming common, making it possible
for many researchers and students to explore the wealth of structural information that comes from xray crystallography. Many students, teachers, and researchers in biochemistry and molecular biology
use crystallographic models to help them understand structure-function relationships. Despite the best
educational efforts of crystallographers, many users still treat molecular structures as objects that have
been seen directly, rather than as models resulting from a demanding interpretative process. Such users
are often unaware of the strengths and weakness of crystallographic models.
REVEALING THE ESOTERICA OF CRYSTALLOGRAPHY
According to the American Heritage Dictionary, esoterica are mysteries of a special type: "What is
esoteric is mysterious because it is known and understood by only a small, select group, as by a circle
of initiates or the members of a profession." Following is an attempt to disseminate widely some of the
esoterica of x-ray crystallography, and thus to enlarge the circle of those who might discern more
clearly the elusive truths that lie behind each crystallographic model.
I find in conversations with non crystallographers that a significant number of them are surprised to
learn the following facts about crystallographic models (some are annotated with the gist of their
response to dawning awareness):
1. that the structure obtained is not of one molecule, but of the average of many
molecules ("Oh yes, that's a basic point of all molecular science, I guess, but
sometimes I don't stop to think about it.");
2. that the model is obtained from molecules in the solid state, rather than in solution
("Oh -- I guess that's what crystallography means...but it never sank in that those
pictures are not of the molecule in solution"); but
3. that many macromolecules are demonstrably functional in the crystalline state ("Do
you mean that enzyme molecules can sit in a crystal and still be active???"); and in
fact,
4. that crystallographers go to great lengths to demonstrate that the crystalline
substance is still functional, and that it is consistent with what is known about the
molecule in solution ("Well, that's a comfort...");
5. that macromolecular crystals contain a large amount of water, some ordered and thus
detectable, and some disordered ("Oh, so in a sense, the molecules are still in
solution...? That would help to explain how they might still be active.").
12
6. that in many published models, the crystallographer has been unable to locate all of
the amino acid residues ("What? They can't find some parts of the molecule at
all???");
7. that in some published models, there is unexplained electron density, to which no
known parts of the protein or associated cofactors can be assigned ("You mean like
when I reassembled my carburetor and had parts left over?");
8. that some macromolecules in the crystalline state contain distortions due to crystal
packing ("Well, I'm not surprised -- but why isn't it more common, and how can you
detect it??");
9. that, despite being in the solid state, macromolecules are still in motion ("Now wait a
minute, I thought you said they were sitting pretty in the crystal..."), and that
crystallographic study provides some suggestions about the relative mobility of
various parts of the molecule ("Hey, that might be useful! Can I view that information
in the form of different colors on a graphics model??");
10. that the resolution of the model is not constant throughout, because i) different
portions of macromolecules in crystals possess different ranges and types of motion,
and ii) some portions adopt different ordered conformations in different unit cells
("You mean that all unit cells are not identical???"); and for this and other reasons,
11. that there is some tolerance or uncertainty in the atom positions, usually expressed in
a statistical way for the molecule as a whole, and that this tolerance, in part, reflects
the quality of the model ("Whattaya mean, quality -- are some models better than
others?"); and finally,
12. that you do not have to be a crystallographer to assess, at least roughly, the quality
of a model from data in the original publication of a crystallographic structure ("Oh, I
can never make heads or tails of the experimental section in a new structure paper -but I love the stereo pictures!").
TOWARDS BETTER-INFORMED USE OF MODELS
When we study a striking computer display of an enzyme's active site or a protein/DNA complex, we
are able to make discerning use of what we see only if we are fully aware of the strengths and
limitations of crystallographic models. The facts listed above suggest a series of questions that protein
scientists should ask of all models before using them in attempts to explain their own observations.
Crystallographers, in turn, should not assume that other researchers are aware of these points of
common crystallographic knowledge, and should make a special effort to enlarge the proportion of
users who can extract the most from the fruits of structure determination.
And by the way, what questions should we ask about macromolecular structures derived from NMR
data?
Useful resources:
Crystallography Made Crystal Clear: A Guide for Users of Macromolecular Models, 2nd
Edition, Gale Rhodes, San Diego: Academic Press, 2000, (ISBN 0-12-587072-8).
A Glossary of Terms from Crystallography, NMR, and Homology Modeling, by Gale Rhodes
http://www.usm.maine.edu/~rhodes/ModQual/index.html
13
Model validation, a very useful guide by an expert in the field, Gerard Kleywegt
http://xray.bmc.uu.se/gerard/embo2001/modval/index.html
The Uppsala Electron Density Server, to verify that the important bits of the crystal structure
are measured, not modeled. http://eds.bmc.uu.se/eds/
This is the end of the practical.
14
Answers
A
The ATOM line contains information for each atom in the structure. In the line below
you can read what information is in which column:
ATOM
1582 O
LYS A 471
110.193
4.757-116.230 1.00 64.84 O
Atom number/
|
| | |
|
|
|
|
|
|
Atom name/
| | |
|
|
|
|
|
|
Residue name/
| |
|
|
|
|
|
|
Chain identifier/
|
|
|
|
|
|
|
Residue number/
|
|
|
|
|
|
X coordinate/
|
|
|
|
|
Y coordinate/
|
|
|
|
Z coordinate/
|
|
|
Occupancy/
|
|
Temperature factor/
|
Atom Type/
Occupancy is the fraction of atoms in the crystal having this conformation. Especially
with high resolution structures you will see that some side chains have e.g. 2
conformations. Sometimes the second conformation is not present in the model.
Sometimes loops or side chains have occupancy 0.00. This means that their position
was modeled, not measured. The (total) occupancy lies always between 0 and 1.
The temperature factor or B-factor is a measure for the spread of the atom around its
position, more or less a Mean square deviation. Factors that influence this value are
Temperature (usually 100K), crystal imperfections, flexibility, experimental setup etc. In
poorly refined structures it may be an extra degree of freedom (for each atom!) that
masks mistakes made elsewhere. Beware of low resolution structures with much
variation in the B-factor.
B
The side chains are long floppy chains of atoms. There are no hydrogen bonds to the
side chain to stabilize it and no van der Waals contacts with ordered molecules, only
with disordered solvent. The side chain can adopt many different conformations. The
average electron density may be similar to that of the solvent, so it won’t show in the
contour.
C
Remember. We are looking at only one protein molecule from a protein crystal. Do you
know the answer now? Of course, the answer is simpler than it seems. This is the same
protein molecule, but in the next unit cell.
D
The symmetry of the crystal is P6122. That means there is a 6-fold rotational symmetry
(in fact a screw-axis) and two 2-fold axes perpendicular to this one. Below you see 6
possible unit cells that represent the whole crystal. The dotted hexagon highlights the
6-fold symmetry: The triangles mark identical parts of the crystal.
15
E
Water Electron density
Chemically ok
1
Yes
No, too many hydrophobic clashes.
2
Yes
There is an error in a ligand we will look at later on.
3
Yes
Yes
4
Yes
There is a Hydrogen bond but the water should be
slightly moved away from the hydrophobic clash. Or
choose a different conformation for the methionine.
5
poor
There is only one H-bond in a hydrophobic
environment. It’s probably not there: The carbonyl of
Thr 290 should make a proper H-bond to nitrogen of
Met 294 to create an ideal alpha-helix.
6
Not enough
No H-bond, vdW clashes. This water models residue
415 that has been omitted from the model, because
the density was so poor.
7
Yes
Again the problem with the ligand
8
Yes
Not a nice H-bond. Let’s assume the Histidine is
slightly mobile.
9
Yes
Yes
10
no
Yes
F
These residues are disordered. They have a different position in every unit cell.
Averaged, there is no density left. The crystallographer has not included these atoms in
the model.
G
Next to the density where the glutamine is in, there is a big density with water (38) in
it. Inhere the glutamine would fit as well. Placing the glutamine in the position it is in
16
now is just the interpretation of the crystallographer. It would be more correct to model
two conformations, each at half occupancy, but the data/parameter ratio would suffer,
increasing the risk of over-refinement. At medium resolution or worse (>2.2Å) explicit
double conformations are rare.
H
In the center of the ligand a triangle is formed by 2 carbons and a sulfide. Coot draws
bonds based on distance criteria, and these atoms are far too close. The bond between
the sulfide and the carbon having 3 visible bonds does not really exist. It is inexplicable
how this error could have been made, especially since the binding mode of the
compound is the sole focus of the paper.
I
No, the space is nicely filled by the ligand in this conformation. No density is left to
explain the waters.
J
This serine can fit in the electron density in 2 ways (conformer 1 and 2). Conformer 3 is
very unlikely because there is no density on the oxygen atom at that position. This
means that there is a big chance the serine does adopt one of the 2 conformations but
it may adopt a different one in different molecules of the crystal. What you see is the
average of the whole crystal, not one structure. At this resolution we can’t even be
absolutely sure that two seperate conformations exist. In such case, the conformation
that makes most sense, chemically speaking, is chosen.
K
In the conformation we introduced, the Serine fits the electron density slightly worse.
The hydrogen bond it may form with the oxygen from His 308 is shared with 2 other
hydrogen bond donors: its own backbone NH and the one of the next residue. Two
donors is already plenty for a carbonyl.
L
These amino acids have double conformations. As you can imagine, this happens more
often on the outside of the protein where they point into the disordered solvent
surrounding the protein than in the restrained packing inside the protein.
M
Yes, this is a good positioned histidine. If the threonine 91 would not have a hydrogen
bond with the imidazole ring, it could have been in mirror conformation. But the
hydrogen bonds block that conformation. In this case, a well ordered histidine
measured at high resolution, the nitrogens even show a higher electron density than
the carbons. That’s a rare sight in protein structures.
N
Yes, it fits very well in the electron density. And chemically the negative citrate fits well
between the positively charged arginines (did you see the arginine from the
neighbouring molecule in the crystal?) and at the positive end of an alpha helix.
O
It does not fit so nicely. Only the sulfate and 2 oxygens sit in the electron density even
though the molecule is rigid. Sulphates are often partly disordered, but usually there
are positively charged residues nearby.
P
Yes isopropanol seems to fit much better. The 3 carbons and the oxygen would fit quite
well in the electron density and the oxygen could also make the hydrogen bond to Val
90. Because of the hydrophobic environment the hydrophobic isopropanol would be
much more comfortable than the charged sulfate on this position. In such a case it is
wise to read the accompanying paper: The authors refined 6 crystal structures
simultaneously. The best one was crystallised in 2 M ammonium sulphate. So maybe
the sulphate was just inherited from that crystal, as the authors probably used that
model to solve this crystal structure. Maybe it really is a sulphate after all. If it’s
important to your model, it’s worth recalculating the structure.
17
Download