Practical validation of X-ray protein structures for modelers By Joost van Kempen and Hans Raaijmakers. Part one Please read through the text. You will come across some questions. The answers can be found on the last pages. Try to answer the questions together and don’t be afraid to have some discussion about it. After discussing your answer, compare it to our answer. If your answers (page 15 – 17) is not exactly the same, it is not necessarily wrong. It is an opportunity for discussion. If you have doubts, ask one of the practical assistants. You might not be able to finish the entire practical. So if you can’t find the answer, look at our answers and try to understand them. Try to stop 15 minutes before the end and go to part 3 (if you are not there yet) to see if you understand the “take home” message. Good Luck. Introduction In X-ray crystallography, X-rays are diffracted by the electrons. Unlike NMR data, these electrons are quite anonymous: At medium resolution it is usually impossible to tell whether electrons belong to carbon, nitrogen or oxygen atoms. The electron density is an average of all the protein molecules in the crystal. In flexible parts of the protein, this leads to a superposition of multiple poorly defined conformations, while the data/parameter ratio only allows refinement of a single conformation. In these parts the protein structure depends largely on the interpretation of the crystallographer. Nowadays protein structures are mostly deposited together with the structure factor (The raw data, i.e. a list of 3 index numbers, the intensity and the reliability of each diffraction spot). This allows anyone to recalculate and reinterpret the electron density. To make this even easier, the Electron Density Server (EDS) at Uppsala University provides electron density maps for many protein structures. Correction of erroneous structures might require the eyes of an expert, but, before spending weeks on modeling, it’s easy enough to check whether the protein region of interest received the attention it deserves. 1 Gathering the data Pdb entry 1NDE contains the structure of the receptor in complex with 4-(2-{[4-{[3-(4Chlorophenyl)propyl]sulfanyl}-6-(1piperazinyl)-1,3,5-triazin-2yl]amino}ethyl)phenol. We’ll use this to spot some severe errors. H N estrogen N HO N N H N N S Cl structure Before we start our editor, we will first collect the necessary data. We will download the PDB model and the electron density data. This can easily be done on the EDS. Log in to the CMBI server like described in How to run Programs from your course account director. cd to bioinf4/coot. >cd bioinf4/coot Start your web browser e.g. firefox: >firefox Surf to the website http://eds.bmc.uu.se. Enter the PDB code “1NDE” and press submit. A screen appears showing some information about the used structure like Resolution, R value and completeness of data. Take a second to look at the data. In the bar on the left some plots can be viewed and data can be downloaded. We will first download the coordinates of the predicted protein model. To do so, click on the Coordinates link. In a new window, the data that is in the file is displayed. As some of you might already recognize, this is a normal PDB file. The default name of the file you are saving will be pdb1nde.ent. You can leave it as it is. This is exactly the same file as you would get from the PDB website. Click File, Save Page As and save the file to the same directory you will work in (choose browse for other folders , click on jour pc number e.g. ws200062, bioinf4 and coot and than save) . Before we close the file we will have a look inside it to see what information can be found in the header of the file. You can see this information also by opening the file in a normal text editor, for example nedit. To start nedit, type in a shell in the coot directory: >nedit pdb1nde.ent Look thought the file until the part starts with only atom lines. You should come across information about: The organism the protein comes from. 2 The authors of the structure and their publication. It’s always worthwhile to read the paper before you spend weeks of modeling based on this structure!!! The resolution of the structure. Links to SwissProt. Much information about experimental conditions, used software and refinement parameters. Property’s that are strange or should be interpreted in a special way. And much more. A. Look at a line starting with ATOM. What do you think each column means? Close this window to return to the EDS page. Now we will download the electron density maps. Click on maps. In the window that appears set the Map format to CCP4. Leave the type as it is (2mFo-DFc) before clicking Generate map. Within a few seconds a link appears called 1nde.ccp4.gz. Download this file by right clicking on the link and choosing Save Link As followed by Save to save it in the same directory. Save it to the same directory as the coordinate file and leave the filename as it is. The file you just downloaded contains the “best” electron density calculated from the experimental data and the model. We will now view the data in coot. To do so, close your web browser and go back to your shell and to the directory you saved the files in. First we need to extract the downloaded maps. To do so run the command: >gzip –d 1nde.ccp4.gz Now we have to change the extension of all files to .map instead of .ccp4. This can be done using the command: >mv 1nde.ccp4 1nde.map 3 Working with Coot We’ll use the program Coot to look at the protein model and the electron density. Coot is an excellent freeware tool used for crystallographic model building, model completion and validation. Structure representation hasn’t been a development focus, but it will do. Before start Coot, be sure you are using the T-shell with –l option like described in How to run Programs from your course account director. Start Coot using the command: >Coot Click Close on the Tip screen to close it and than enlarge the Coot window somewhat by dragging the edges. Click File, Open Coordinates… and choose “pdb1nde.ent”. Now, let’s open the electron density file. You do this by choosing Open map… from the File menu. Choose the file containing the electron density map (1nde.map). As you will see, only a part of the electron density map is shown. This is done to increase the handling speed. Move the protein around for a few seconds to get familiar with the program. Find out what clicking (on an atom) and dragging with all 3 mouse buttons does, and what the scroll wheel does. The CTRL key changes the behavior of the mouse. Beware: Middle clicking on a atom centers it but it is not selected. Selecting is done using the Go to Atom … window as described later on. While rotating the protein you will see that there is a clipping plane in front of the protein. This is the reason that amino acids disappear if they come to close to the screen. This can be disabled if you like, by dragging the mouse: CTRL-right dragging (horizontally) moves the protein perpendicular to the screen, while dragging vertically it adjusts the thickness of the visible plane. See what the scroll wheel does. The Sigma value is shown while moving the scroll button. Sigma values between 1 and 1.5 are usually best to look at the electron density map. We will now inspect some specific parts of the protein. We will start with Glutamine 267. To find it you can proceed in 2 ways. 1. By pressing the space bar, you can walk through all residues one by one. Pressing shift + space bar will go in the opposite direction. Holding the space bar (with or without shift) will move you rapidly through all residues. 2. Go to the Draw menu and choose Go To Atom…. Now type the residue number (in this case 267) in the Residue Number edit box and Fill Ca (the C alpha atom) in the Atom name box. Or pick the residue from the sequence tree box as pointed out below (click on the + sign first). Now press apply. 4 In both cases, the selected residue is selected and centered on the screen. 5 Exploring the Crystal Go to Glutamine 267. Use the scroll button to decrease the electron density a bit (to about 1 sigma). You will see that the side chain has got little electron density. The same is true for Arg 329, Lys 353 and Gln 450. B. Look at these side chains. Can you think of a reason why only so little electron density is there? Go to Tyrosine 411. Near this residue you will see a lot of electron density where no atoms are in the model. C. Can you imagine what that is? To get an idea of the packing of the protein molecules in the crystal, and the space between them, we will now expand the view beyond one crystal cell. Click Reset View twice. Open the Display manager by clicking on Display Manager. Click on the Display button behind 1nde.map to hide the electron. Click on Draw and then on Cell & Symmetry…. Then Click Symmetry by Molecule and select Display as CAs followed by OK. Set the Master Switch: Show Symmetry Atoms? To Yes and set the Radius to 40 A. Now Click OK. You can now see a part of the crystal. To keep an overview, we’ll only look at the Cα traces. At the computers used for this practical, 40 Å is the maximal size to show at reasonable speed. On our 2 laptops, you can view 70 Å of the crystal, (or even 100 A if you’re patient), to get an idea of what a crystal looks like at molecular scale. Zoom out until you see the whole visible part of the crystal. Now rotate the structure. There are 3 orientations that show a symmetry axis spot on. (6 if you count looking in the opposite direction). You’ll see solvent channels that run throughout the whole crystal. These channels provide access for many molecules, such as: Heavy metals (to solve new protein structures) Compounds (exchange them to study their binding modes) Substrates (to show that enzymes can still be active in the crystal) Reductors/oxidators (to change the redox state of some electron carriers) … It may take seconds, hours or days to soak such molecules into a crystal. D. In one of the 3 directions the symmetry is higher than the 2-fold in the other two directions. How High? 6 Set the Master Switch: Show Symmetry Atoms? To No again on the Draw > Cell & Symmetry… window and press Apply. Unhide the electron density from the Display manager (click on the Display button once again behind the 1nde.ccp4.map file.) Exploring the open space 1 In the open space between the proteins in the crystal that we just saw, a lot of solvent is present. At some places a water molecule can make good hydrogen bonds with the protein. Such an ordered water molecule may be visible as a small spherical density contour near Hbond donors or acceptors on the protein. But, especially in low resolution structures like this one, it is difficult to see if it is water, ammonia, Na+, Cl- or another additive used for crystallization. Or plain noise in the electron density map. In this model, many waters are included. Let’s check how certain we can be about their position. Click on Measures than environment distances and than show residue environment and press Ok. The dashed lines show distances between atoms that cause “clashes”, atoms having a distance smaller than the sum of their van der Waals distances. The pink lines show probable errors, the yellow ones could be hydrogen bonds. Select water 1 (HOH 1) and move through the first 10 waters by pressing the space bar (shift-space bar to go backwards). E. Which waters make sense chemically (hydrogen bonds/clashes) and/or according to electron density. 7 Exploring the model Go to residue 411. F. What has happened to residues 412-415? Or even 412-418? Or to residues 483-485? Go to Glutamine 493 G. Can you find an alternative location for glu493? Click Calculate, Model/Fit/Refine… and than Real Space Refine Zone. Click the glu493 twice, and you will see that the conformation is minimized (don’t accept it yet). Now drag glu493 residue to its new home. And if you release it you will see it fits in the space very well. Try to find the ligand. If you can’t find it use the Go To Atom… like before and use Residue number 101 and Atom name C18. You must clear the Chain box or just use the tree in the left box below. H. While looking at the chemical structure, what must be wrong? Do you have an idea how this has happened? (See the 2d-drawing of the compound above). Now we will load a parameter file so Coot will understand the chemical properties of the ligand. Click file, Import CIF dictionary… and select 1NDE_restraints.cif (from the coot directory) followed by OK. Click Calculate, Model/Fit/Refine… and than Real Space Refine Zone. Now click twice on a ligand atom. Before accepting the refined conformation you can try another position/conformer. Drag an atom of your compound. (Left-mouse for entire molecule, CTRL+left for atom only). When you release the atom you can see if the ligand returns in the same conformation or perhaps one in another local minimum conformation is found. Choose the best conformation, according to the chemistry and electron density. Look at the changes before accepting the new conformation. I. After accepting, do you think the 2 waters in the ligand density should still be there (scroll to a sigma of 4 or 5 to have an even better view)? Go to Serine 311. A click on calculate and than model/fit/refine to bring up the menu again. Then click rotamers … and click on the Serine in the structure. 8 A window appears containing all (3) plausible conformations. Click on a conformation to see it previewed in the structure. Look at the electron density and the surrounding atoms. J. Which conformations do you prefer most and why? Select the rotamer with the highest percentage (of occurrence in other high resolution protein structures) and press accept. As you can see the chide chain conformation is now changed to the selected one. We will now show all distances to other atoms surrounding the side chain. Click on Measures than environment distances and than make sure show residue environment is selected. K. Can you think of a reason why the crystallographer chose the original conformation? 9 Part two Exploring a high resolution Crystal We will now look at a well refined structure with a much higher resolution: 1cz9. It contains a domain of 'avian sarcoma virus integrase'. Download the coordinates and the map file from the EDS like you did before. Extract and rename the map file. Now first open the pdb file (.ent) in a text editor or from the website again. And find out what the resolution of the structure is. Don’t close the file yet, we will look into it again in a moment. Open the coordinate file in coot. First restart Coot to remove all old models. Brows trough all amino acids using the space bar. L. You will come across some strange looking amino acids (position 70, 74, 81,114,129,136, 165, 177). What is happening there? In which part of the protein do you find these amino acids? Look back in the PDB file and see how the atoms that are in one of the 2 conformations are annotated (look especially in the occupancy column, but disregard the ANISOU lines). Now open the map file. Also open the display manager and hide the coordinate file (pdb1cz9.ent). Browse through the visible part of the electron density. Which amino acids do you recognize from the electron density alone? You might want to scroll down the electron density to 4 sigma, to get an even better view. Unhide the coordinate file and see if you were right. Go to his 93 and turn on the Environment Distances (from the draw menu). M. Do you agree with this conformation? Also regarding the hydrogen bonds? 10 Exploring the open space 2 Go to citrate 399. N. Does the citrate fit well in its electron density? Does it also fit chemically (what is the charge of the citrate and the surrounding part of the molecule)? Go to sulphate 400. O. How well does it fit in its electron density? If you look into the PDB file you will see in REMARK 280 that the protein is crystallizes in a buffer containing 10% isopropanol (see both images below). P. Could it be isopropanol that sits in the electron density instead of the Sulfate? Why (also look to the chemistry)? 11 Part 3 Please read this text and for all 12 points mentioned, ask your self if you understand this. If not, ask someone to explain it to you. QUESTIONS YOU SHOULD ASK ABOUT CRYSTALLOGRAPHIC MODELS Copyright 2007 Gale Rhodes. Adapted by permission of the author. INTRODUCTION Molecular modeling programs and fine graphics computers are becoming common, making it possible for many researchers and students to explore the wealth of structural information that comes from xray crystallography. Many students, teachers, and researchers in biochemistry and molecular biology use crystallographic models to help them understand structure-function relationships. Despite the best educational efforts of crystallographers, many users still treat molecular structures as objects that have been seen directly, rather than as models resulting from a demanding interpretative process. Such users are often unaware of the strengths and weakness of crystallographic models. REVEALING THE ESOTERICA OF CRYSTALLOGRAPHY According to the American Heritage Dictionary, esoterica are mysteries of a special type: "What is esoteric is mysterious because it is known and understood by only a small, select group, as by a circle of initiates or the members of a profession." Following is an attempt to disseminate widely some of the esoterica of x-ray crystallography, and thus to enlarge the circle of those who might discern more clearly the elusive truths that lie behind each crystallographic model. I find in conversations with non crystallographers that a significant number of them are surprised to learn the following facts about crystallographic models (some are annotated with the gist of their response to dawning awareness): 1. that the structure obtained is not of one molecule, but of the average of many molecules ("Oh yes, that's a basic point of all molecular science, I guess, but sometimes I don't stop to think about it."); 2. that the model is obtained from molecules in the solid state, rather than in solution ("Oh -- I guess that's what crystallography means...but it never sank in that those pictures are not of the molecule in solution"); but 3. that many macromolecules are demonstrably functional in the crystalline state ("Do you mean that enzyme molecules can sit in a crystal and still be active???"); and in fact, 4. that crystallographers go to great lengths to demonstrate that the crystalline substance is still functional, and that it is consistent with what is known about the molecule in solution ("Well, that's a comfort..."); 5. that macromolecular crystals contain a large amount of water, some ordered and thus detectable, and some disordered ("Oh, so in a sense, the molecules are still in solution...? That would help to explain how they might still be active."). 12 6. that in many published models, the crystallographer has been unable to locate all of the amino acid residues ("What? They can't find some parts of the molecule at all???"); 7. that in some published models, there is unexplained electron density, to which no known parts of the protein or associated cofactors can be assigned ("You mean like when I reassembled my carburetor and had parts left over?"); 8. that some macromolecules in the crystalline state contain distortions due to crystal packing ("Well, I'm not surprised -- but why isn't it more common, and how can you detect it??"); 9. that, despite being in the solid state, macromolecules are still in motion ("Now wait a minute, I thought you said they were sitting pretty in the crystal..."), and that crystallographic study provides some suggestions about the relative mobility of various parts of the molecule ("Hey, that might be useful! Can I view that information in the form of different colors on a graphics model??"); 10. that the resolution of the model is not constant throughout, because i) different portions of macromolecules in crystals possess different ranges and types of motion, and ii) some portions adopt different ordered conformations in different unit cells ("You mean that all unit cells are not identical???"); and for this and other reasons, 11. that there is some tolerance or uncertainty in the atom positions, usually expressed in a statistical way for the molecule as a whole, and that this tolerance, in part, reflects the quality of the model ("Whattaya mean, quality -- are some models better than others?"); and finally, 12. that you do not have to be a crystallographer to assess, at least roughly, the quality of a model from data in the original publication of a crystallographic structure ("Oh, I can never make heads or tails of the experimental section in a new structure paper -but I love the stereo pictures!"). TOWARDS BETTER-INFORMED USE OF MODELS When we study a striking computer display of an enzyme's active site or a protein/DNA complex, we are able to make discerning use of what we see only if we are fully aware of the strengths and limitations of crystallographic models. The facts listed above suggest a series of questions that protein scientists should ask of all models before using them in attempts to explain their own observations. Crystallographers, in turn, should not assume that other researchers are aware of these points of common crystallographic knowledge, and should make a special effort to enlarge the proportion of users who can extract the most from the fruits of structure determination. And by the way, what questions should we ask about macromolecular structures derived from NMR data? Useful resources: Crystallography Made Crystal Clear: A Guide for Users of Macromolecular Models, 2nd Edition, Gale Rhodes, San Diego: Academic Press, 2000, (ISBN 0-12-587072-8). A Glossary of Terms from Crystallography, NMR, and Homology Modeling, by Gale Rhodes http://www.usm.maine.edu/~rhodes/ModQual/index.html 13 Model validation, a very useful guide by an expert in the field, Gerard Kleywegt http://xray.bmc.uu.se/gerard/embo2001/modval/index.html The Uppsala Electron Density Server, to verify that the important bits of the crystal structure are measured, not modeled. http://eds.bmc.uu.se/eds/ This is the end of the practical. 14 Answers A The ATOM line contains information for each atom in the structure. In the line below you can read what information is in which column: ATOM 1582 O LYS A 471 110.193 4.757-116.230 1.00 64.84 O Atom number/ | | | | | | | | | | Atom name/ | | | | | | | | | Residue name/ | | | | | | | | Chain identifier/ | | | | | | | Residue number/ | | | | | | X coordinate/ | | | | | Y coordinate/ | | | | Z coordinate/ | | | Occupancy/ | | Temperature factor/ | Atom Type/ Occupancy is the fraction of atoms in the crystal having this conformation. Especially with high resolution structures you will see that some side chains have e.g. 2 conformations. Sometimes the second conformation is not present in the model. Sometimes loops or side chains have occupancy 0.00. This means that their position was modeled, not measured. The (total) occupancy lies always between 0 and 1. The temperature factor or B-factor is a measure for the spread of the atom around its position, more or less a Mean square deviation. Factors that influence this value are Temperature (usually 100K), crystal imperfections, flexibility, experimental setup etc. In poorly refined structures it may be an extra degree of freedom (for each atom!) that masks mistakes made elsewhere. Beware of low resolution structures with much variation in the B-factor. B The side chains are long floppy chains of atoms. There are no hydrogen bonds to the side chain to stabilize it and no van der Waals contacts with ordered molecules, only with disordered solvent. The side chain can adopt many different conformations. The average electron density may be similar to that of the solvent, so it won’t show in the contour. C Remember. We are looking at only one protein molecule from a protein crystal. Do you know the answer now? Of course, the answer is simpler than it seems. This is the same protein molecule, but in the next unit cell. D The symmetry of the crystal is P6122. That means there is a 6-fold rotational symmetry (in fact a screw-axis) and two 2-fold axes perpendicular to this one. Below you see 6 possible unit cells that represent the whole crystal. The dotted hexagon highlights the 6-fold symmetry: The triangles mark identical parts of the crystal. 15 E Water Electron density Chemically ok 1 Yes No, too many hydrophobic clashes. 2 Yes There is an error in a ligand we will look at later on. 3 Yes Yes 4 Yes There is a Hydrogen bond but the water should be slightly moved away from the hydrophobic clash. Or choose a different conformation for the methionine. 5 poor There is only one H-bond in a hydrophobic environment. It’s probably not there: The carbonyl of Thr 290 should make a proper H-bond to nitrogen of Met 294 to create an ideal alpha-helix. 6 Not enough No H-bond, vdW clashes. This water models residue 415 that has been omitted from the model, because the density was so poor. 7 Yes Again the problem with the ligand 8 Yes Not a nice H-bond. Let’s assume the Histidine is slightly mobile. 9 Yes Yes 10 no Yes F These residues are disordered. They have a different position in every unit cell. Averaged, there is no density left. The crystallographer has not included these atoms in the model. G Next to the density where the glutamine is in, there is a big density with water (38) in it. Inhere the glutamine would fit as well. Placing the glutamine in the position it is in 16 now is just the interpretation of the crystallographer. It would be more correct to model two conformations, each at half occupancy, but the data/parameter ratio would suffer, increasing the risk of over-refinement. At medium resolution or worse (>2.2Å) explicit double conformations are rare. H In the center of the ligand a triangle is formed by 2 carbons and a sulfide. Coot draws bonds based on distance criteria, and these atoms are far too close. The bond between the sulfide and the carbon having 3 visible bonds does not really exist. It is inexplicable how this error could have been made, especially since the binding mode of the compound is the sole focus of the paper. I No, the space is nicely filled by the ligand in this conformation. No density is left to explain the waters. J This serine can fit in the electron density in 2 ways (conformer 1 and 2). Conformer 3 is very unlikely because there is no density on the oxygen atom at that position. This means that there is a big chance the serine does adopt one of the 2 conformations but it may adopt a different one in different molecules of the crystal. What you see is the average of the whole crystal, not one structure. At this resolution we can’t even be absolutely sure that two seperate conformations exist. In such case, the conformation that makes most sense, chemically speaking, is chosen. K In the conformation we introduced, the Serine fits the electron density slightly worse. The hydrogen bond it may form with the oxygen from His 308 is shared with 2 other hydrogen bond donors: its own backbone NH and the one of the next residue. Two donors is already plenty for a carbonyl. L These amino acids have double conformations. As you can imagine, this happens more often on the outside of the protein where they point into the disordered solvent surrounding the protein than in the restrained packing inside the protein. M Yes, this is a good positioned histidine. If the threonine 91 would not have a hydrogen bond with the imidazole ring, it could have been in mirror conformation. But the hydrogen bonds block that conformation. In this case, a well ordered histidine measured at high resolution, the nitrogens even show a higher electron density than the carbons. That’s a rare sight in protein structures. N Yes, it fits very well in the electron density. And chemically the negative citrate fits well between the positively charged arginines (did you see the arginine from the neighbouring molecule in the crystal?) and at the positive end of an alpha helix. O It does not fit so nicely. Only the sulfate and 2 oxygens sit in the electron density even though the molecule is rigid. Sulphates are often partly disordered, but usually there are positively charged residues nearby. P Yes isopropanol seems to fit much better. The 3 carbons and the oxygen would fit quite well in the electron density and the oxygen could also make the hydrogen bond to Val 90. Because of the hydrophobic environment the hydrophobic isopropanol would be much more comfortable than the charged sulfate on this position. In such a case it is wise to read the accompanying paper: The authors refined 6 crystal structures simultaneously. The best one was crystallised in 2 M ammonium sulphate. So maybe the sulphate was just inherited from that crystal, as the authors probably used that model to solve this crystal structure. Maybe it really is a sulphate after all. If it’s important to your model, it’s worth recalculating the structure. 17