Structural Biology Practical MolBio / GGNB A57 December 2012 Tim Grüne Tutors: Caroline Behrens Inessa De Kevin Pröpper Tales Rocha de Moura Course Dates: Monday – Friday, 1pm – 5pm 1 1 DATA INTEGRATION WITH HKL2000 Data Integration with HKL2000 1.1 Getting Started The major part of this practical is computer based. The computing environment is completely Linux-based. This section contains a few basics to get familiar with it. 1.2 Logging in and setting up our files There are 4 accounts set up for this practical with usernames: mb01,mb02, mb03, mb04, mb05 passwords: mb01 ,mb02, mb03, mb04 mb05 Each group should pick on unique username and password and use it on both days. This prevents mangling of the data. The computers have there names printed on them. The ones suited for this course are • ganymede • klio • medusa • stheno • urania All computers have the same setup and your files are accessible from all these computers. Therefore it is not important that you stick to the same computer all the time. The computer network in the Sheldrick group is separated from the internet. In order to access the internet, different computers must be used (the tutors will tell you which ones they are). The usernames on those computers are usernames: pg1,pg2, pg3, pg4,Tales Rocha de Moura pg5 and the password for all these usernames is ohMou9Oo Passwords are case sensitive After logging in your desktop looks a little desolate. You are going to need a terminal from which you can type commands. In order to get your first terminal, type Alt-F2 which opens a small windows that allows you to type in commands. Type konsole to get a terminal. 1 1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000 From now on, the terminal will appear in your control menu (the blue button at the bottom of the screen with the “K”). (The screenshots of this tutorial stem from the GGNB practical, so please don’t get confused when some directories in the pictures read ggnb instead of mb.) In order to keep track of the process you should keep the different parts of this course in separate directories. For the first part, create a directory integration using the linux command mkdir by typing in the terminal window #> mkdir integration #> cd integration The first command creates the directory, the second one changes into that directory. 1.3 Data Integration with HKL2000 Now you can start the integration program HKL2000 by typing this command in the same terminal. The first window that appears asks for the detector type. Our data were collected on a Mar 345 image plate. Selecting the correct detector. 2 1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000 Once you selected the Mar345 detector and clicked OK, the main window of HKL2000 appears: Make sure that the New Output Data Dir lists the directory /net/home/mbXX/integration where ’XX’ corresponds to the number in your username. Correct output directory? Next you have to tell HKL2000 were the data frames are (New Raw Data Dir), because they are not in the directory you just created. In the subwindow Directory Tree you have to double-click on the net-folder, so that you can see ganymede and home. Double-click you way through to net->ganymede->ggnb-I->frames Thereafter click on the >> below New Raw Data Dir (not the one below New Output Data Dir!) When the New Raw Data Dir points to the correct directory /net/ganymede/ggnb-I/frames, you can click on the Load Data Sets-button and you can see the frames that HKL2000 found in that directory. 3 1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000 Loading data sets. After OK, the HKL2000 window looks like the next picture. Note that the bottom fields are not filled in as much as HKL2000 could extract from the file header. Data set information. 4 1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000 When you click on the Display button, HKL2000 shows the first frame (if you do not see anything, move the main window aside. Sometimes HKL2000 opens the new window behind the main window.) Frame number 1. 1.3.1 Indexing We must next ask HKL2000 to index our data, i.e. 1. find an (approximate) unit cell consistent with the pattern in the frame 2. assign the Miller indices to the reflections First select the Index-tab at the top of the main window. HKL2000 must find the reflections on the image. This is what the Peak Search button in the main window is for. 5 1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000 Click the Index. HKL2000 opens a new window which offers a choice of Bravais lattices. The green ones fit well with the diffraction pattern, the red one fit only poorly. Furthermore, the Bravais lattices are ordered according to the degree of symmetry. Pick the top green one, primitive tetragonal. The colour of the peaks in the frame window changes colour. Once an initial cell has been found, the unit cell parameters and the experimental settings (detector distance, . . . ) can be refined, i.e., their values improved based on the collected data. Hit the Refine button in the main window 3-4 times and check whether the numbers in that window do not change much any more. Now click Fit All in order to include all parameters, also select the button Mosaicity and click on the Refine button again several times until the numbers stabilise and do not change much anymore. The colour on the frame display has changed again. The colour indicates whether a reflection is a full reflection, i.e. whether the complete reflection has been recorded on that image, or whether part of it is on one of the adjacent frames. Question: Why are the spots on the circles not enclosed? Is HKL2000 making a mistake? 1.3.2 Box- and Spot-Size for Integration You can give HKL2000 an idea about how big the spots are and how much area the program should use in order to determine the background around each spot. Both are important settings for a proper data integration. 6 1.3 Data Integration with HKL2000 1 DATA INTEGRATION WITH HKL2000 Click the Zoom window-button in the Frame Display. This opens a third window. Middle-click in the Frame Display in an area with spots and adjust the brightness (in the Frame Display) so that you can clearly see the spots. The Int. box-button in the Zoom window shows the current settings for the background area (square) and the spot size (circle). There are actually two circles, the area in between the two is the “transition” between spot and background. You can see that the boxes overlap with some of the boxes, and the circles seem to small to encompass the spots, at least the larger ones. Click Zoom in twice for a good view. First click on Box size in the main window. A setting 20-25 seems to be a reasonable setting for this data set. You must click Refine in the main window to see the effect of the change. Similarly, increase the Spot size so that the spots fit into the circles. It does not matter, if the circles are too big, but it does if they are too small. A Spot size of about 0.75 seems a good choice. Now click the Refine-button in the main window two to three more times to adjust to the new settings. Before advancing to the integration step, you can tell the program where the shadow of the beam stop is so that it excludes this area during integration. With weak data this can be important to improve the data quality, but it is also good practice for high resolution data. Please ask the tutors to show how to carry out this step. Then click the Integrate-button to start the integration, and lean back for a few minutes. 1.3.3 Integration Integration starts automatically. The Integration-tab of the main window shows the progress of integration. The program now further improves the experimental parameters. The bottom right window shows the variation of some of the parameters. They usually fluctuate a little. 7 3 ELECTRON DENSITY, MODEL, AND SECONDARY STRUCTURE With most decent data sets, there is little to adjust during or after integration with HKL2000. 2 Data Preparation for Rest of Practical Before you can continue, you have to get a copy of the files required for the rest of this practical. Find the terminal window so that you can type commands and type #> cd The command cd without and argument takes you directly into your home-directory /net/home/mbXX. Create a directory for the practical, change into it and copy the required files: #> mkdir practical #> cd practical #> cp -rv /net/ganymede/molbio/ex* . (The period at the end of the last line is part of the command. It means here in the UNIX-world.) 3 Electron Density, Model, and Secondary Structure This section introduces the program coot.This program is a graphical model building interface. It can display PDB–files and electron density maps. The data are in the directory practical/ex2, so find a terminal, and change into this directory: #> cd ~/practical/ex2 (The “~” is a short-cut for your home-directory) 1. Open a terminal and have a look a the file exercise2.pdb with the text editor kwrite by typing kwrite exercise2.pdb As you can see, PDB–files are plain text files. The information in the “header” of the file (i.e., the lines not starting with ATOM), tells you a few things about what program created the file, about the refinement statistics. Because this file contains an data which were not deposited, there is not author information or other information about the molecule itself. 2. Start the program coot from the terminal. 3. Create an electron density map by loading the file exercise2.mtz: 8 3 ELECTRON DENSITY, MODEL, AND SECONDARY STRUCTURE File -> Auto Open MTZ At first nothing seems to happen. This is because we are looking at the origin of the coordinate system. In this particular case, there is no density at the origin (solvent region). Move around by holding the Ctrl-key and the left mouse button at the same time and move the mouse while you keep on holding the Ctrl-key! The default diameter of the map is 10Å. In order to see more, select the Edit->Map Parameters... entry and increase it to 15Å. WARNING: If you set this value to too high a value you may overload the graphics card of the computer hand make your computer reeeeeally slow!. This shows electron density in three different colours. The blue colour displays the main map which we want to explain with a model. The red and green part show differences between our model and our data. Red indicates “too much model” and green indicates ”not enough model”. We will come back to what this means shortly. 4. To rotate the map, hold down the left mouse button and move the mouse. To move (translate) the map, hold the Ctrl–key plus the left mouse button while you move the mouse. The yellow cube indicates the origin (0,0,0), while the small pink box indicates the centre of rotation. To zoom in or out, hold down the right mouse button while you move the mouse left or right or up or down. To change the level of detail of the map (the sigma–level), use the scroll wheel. With a lower sigma level one sees more, but at about 1σ or less, the noise overcomes the meaningful data. Can you already make out features, e.g. secondary structure elements, or side chains? What is the rule of the “Christmas tree”? 5. Now load the PDB-file exercise2.pdb. File -> Open Coordinates... The electron density becomes much more easily to understand. 6. This is a low resolution structure. The data (and hence the map) have a resolution of 3.4A, which is rather poor for a protein crystal structure. At this resolution, the atomic positions have to be considered with care. Instead look at the secondary structure of the protein. To better see it, select the 9 4 MODEL BUILDING Display Manager and select C-alphas for the Molecule exercise2.pdb. You can also switch off the map by clicking the Display–button next to the two map entries in the Display Manager. How many α–helices, and how many β–strands does this protein consist of? 7. Now use the centre mouse button to centre on the C–terminus of the model and redisplay the electron density map and look around a little. How do you judge the quality of this model? 4 Model Building In this exercise you are going to look at the model and density of Thermolysin, a heat-stable metalloproteinase produced by Bacillus srearothermophilus that hydrolyses peptide bonds on the amino side of bulky hydrophobic residues such as Leu, Ile, Val, and Phe. You are going to use coot to build some missing residues and correct the placement of a few side chains. 4.1 Major Corrections 1. Start coot again from a terminal. 2. Auto Load the electron density map of exercise3.mtz. These data have a resolution of 1.7Å. You should be able to see a difference to exercise 1. Find some aromatic side chains. They have holes now. 3. Open Coordinates from exercise3.pdb. 4. Centre on reside Thr 278: Draw -> Go To Atom ... and scroll down until you find Thr 278. Double-click on it to centre. 10 4.2 Minor Corrections — Rotamers 5 THE EFFECT OF LOW AND HIGH RESOLUTION Downstream of Thr 278 starts a part of blue density which overlaps with some green (difference) density: the model is missing atom to explain the density and you have to add them. 5. First tell the program which map you want to build against: Calculate -> Model/Fit/Refine... -> Select Map... -> OK 6. Now add a new peptide to residue, Thr 278: Add Terminal Residue... then click on any atom of residue Thr 278. If the peptide fits more or less in density, accept. 7. Residue 279 is now an Alanine. But according to the sequence it should be a different residue. Try to guess which one. 8. Coot let’s you to Mutate & Auto Fit ... 9. Fill the whole gap by alternatingly adding a terminal residue and mutating it to the correct type. 4.2 Minor Corrections — Rotamers 1. Centre on Trp55. Remember that “red” indicates “too many atoms” and green means “not enough atoms”: The Tryptophan is looking the wrong way. To correct it, click in Rotamers ... and then on some atom of Trp55. Select and accept the rotamer that best fits the density. 2. To improve the fit, Edit Chi Angles and move the Tryptophan into the density as good as possible. 3. To finish off, do a Real Space Refine Fit on Trp55: click the button and then two atoms of Trp55. 4. What you did may have violated stereo-chemical restrictions (ideal bond lengths, angles, etc.). To correct for this, click the Regularise Zone–button. Click once on a residue about two residues before Trp55 and two residues behind Trp55. 5. Look at Met120. There is quite a bit of red and green density around that residue. Do you have an idea to explain the density? 5 The Effect of Low and High Resolution Now you are going to examine two maps with different resolution ranges to learn the importance both of high and low resolution data. You are going to look at data from Tendamistat, a bacterial inhibitor of mammalian α–amylases (a digestive enzyme that breaks down starch). 1. Load the low–resolution map of Tendamistat, exercise4a.mtz and look around. It is difficult to recognise side chains, like with the first map you have seen. 2. Load the file exercise4.pdb. Now it is easier to see that the model fits the density so that the density makes sense. 3. Go the residue Asn272. Some residues are missing there. Can you make out their types? 4. Load the map from exercise4b.mtz. This map was calculate with data between 1Å and 2Å only, i.e., all reflections with less than 2Å resolution were omitted. It looks very noisy, but it shows peaks around the atoms. 5. Does the second map help to guess the right residues? 6. Load a map from all data, which covers data between 1.0Å and 20Å, exercise4c.mtz. 11 6 6 6.1 MODEL ANALYSIS AND VALIDATION Model Analysis and Validation Secondary Structure (a) Change to directory ex5. (b) Load exercise5.pdb. (c) Confirm the topology drawing (d) To aid you checking, activate the “environment distances” for the centred atom: Measures -> Environment Distances click on “Show Environment Distances” and limit the distances to 2.5–3.3Å. (e) For a hydrogen bond between an Oxygen and a Nitrogen atom, their distance ought to be roughly 2.7–3.2Å. and confirm the beginnings and the ends of the α–helices by checking the hydrogen bonds between the N of residue n and the O of residue n + 4. (f) Look at the same file with the pymol viewer program: From a terminal/ console, type pymol exercise5.pdb In the top right part of the graphical window there are two menu entries, one saying all and the other saying exercise5. The five boxes to their right stand for • • • • • Action Show Hide Label Colour Click the S->cartoon and H->lines to remove the chicken wire model and show the molecule as cartoon. For a nicer view, click C->by ss->Helix Sheet Loop 12 6.2 Structure Validation — Ramachandran Plots et al. 6 MODEL ANALYSIS AND VALIDATION (g) You see, the program detects fewer β–strands. For a publication one would have to check at least beginning and end of each secondary structure element and compare with at least one other program. (h) Next, select A->preset->b factor putty. This colours the atoms according to their B–value. The diameter of the tube also corresponds to the B-factor. (i) Can you explain which regions are blue (low B-factor) and which are green (medium B-factor) or even red (high B-factor)? 6.2 Structure Validation — Ramachandran Plots et al. Back to coot you are going to meet some of its possibilities to judge the quality of a structure. (a) Close pymol by selecting File -> Quit and go back to the coot window. (b) Load the file exercise5b.pdb. It is the structure of from ketosteroid isomerase. (c) click Validate -> Ramachandran Plot -> 0 exercise5b.pdb This should open an interactive window showing the Ramachandran plot of this structure. General Proline Glycine Outlier Depending on where you place the mouse pointer, it changes according to the Ramachandran plot of the corresponding class: Prolines are much more restricted while Glycines are less restricted then general peptides. (d) Click on the outlier marked red. Coot focuses the main window on that residue. (e) Load the corresponding map from exercise5b.mtz (f) From the Model/Fit/Refine–window, select Edit Backbone Torsion and click an Asn93. 13 7 FINDING THE ACTIVE SITE (g) Play around with the Φ and Ψ angles and try to improve the fit to the Ramachandran plot. Since there is nearly no density there to judge, we should at least make sure the loop fits the Ramachandran plot. (h) Now let coot try to refine the part: From the Calculate -> Model/Fit/Refine... Menu, Select Real Space Refine Zone and select the residues 90–96. Is the suggestion acceptable? When you accept, look at the Ramachandran plot! (i) Now have a look at the B–factor plot of the Cα main chain atoms of this structure. B−factors of Cα 90 B−factors Temperature Factor Å2 80 70 60 50 40 30 20 10 0 0 20 40 60 80 100 120 140 Residue number The largest peak is around residue 95, just around where you fixed the Ramachandran plot before. Look at the Cα trace of the protein. Can you imagine, why the B–factors are high in this region? 7 Finding the active site This time you are going to look at the structure of the protease thermolysine. Structures of thermolysine usually retains a Val–Lys– peptide in the active sites. You are going to locate the active site and fit the peptide. (a) Load the file exercise6.pdb and the map from exercise6.mtz. You may already notice a major “green” area, indicating that something is missing there. (b) Load the file val-lys.pdb. It contains a Valine–Lysine di-peptide. (c) Instead of trying to find the ligand yourself, you can ask coot to do it for you: Calculate -> Other Modelling Tools -> Find Ligands (d) Leave the default values except for three parts: i. select the Di-peptide as ligand to search for ii. select the ’flexible’ search. This lets the program take into account that the peptide can make rotations about certain bonds, e.g. the C − N bond between the two residues. iii. set the σ–level for the search to a value slightly less to what you use for looking at the map, e.g. around 1.3 (e) The peptide should end up within the extra density, but the fit is far from good. Manual adjustment is obviously necessary. Since the Lysine looks more obvious, we will start with the C–terminal residue. (f) move the Cα atom of the Lysine where it belongs. 14 8 PLAYING WITH PYMOL Calculate -> Model/Fit/Refine -> Translate/ Rotate Zone and click once on some atom of the Valine and once on some atom of the Lysine. (g) What do you notice about the density near the Oxygen of the Lysine? (h) Add the second terminal Oxygen: Calculate -> Other Modelling Tools ... -> Add OXT to Residue ... -> Fitted Ligand #0 (i) Flip the peptide between the two residues: On the Model/Fit/Refine Menu, select Flip Peptide and click on an atom of the Valine. (j) Now try to move and rotate the Valine alone into density, just coarsely. (k) When both residues are roughly in density, the stereochemical aspects are most likely awful. Use the Real Space Refine Zone option for coot to correct the fit and finally do an Regularise Zone to “polish” it. (l) If the side chains of the Lysine are still not where they should be, you can Edit Chi Angles to make them fit better. (m) In order to put the protein and its ligand into a refinement program, they must be within one PDB–file, i.e., the two structures must now be merged: Calculate -> Merge Molecules and select the fitted ligand and the molecule exercise6.pdb. Now we can File -> Save Coordinates our modified molecule in order to hand it over to a refinement program which will further improvements and calculate new phases for us. 8 Playing with Pymol If there is time left, you can open pymol again with any of the above PDB-files and play around with its options to make nice pictures. E.g. you can calculate the electrostatic surface potential with A -> generate -> vacuum electrostatics -> protein contact potential You can see strongly charged patches and less strongly charged parts on the surface. Next generate the symmetry related molecules in the crystal: A->generate ->symmetry mates -> within 12A. With the file exercise5.pdb you can observe that the contacts are actually at regions where the electrostatic potential is comparatively weak. This supports the notion that protein-protein contacts are usually controlled by van-der-Waals (hydrophobic) interaction and not electrostatic forces. 15