Boot Camp January 2014 Exploring the Structure of Fluorescent Proteins Based on queries performed at www.rcsb.org in Dec. 2013, PDB entries that closely match each of the Fluorescent Proteins (FPs) are: 1. mCherry – PDB ID 2h5q 2. mOrange – PDB ID 2h5o 3. mCitrine – PDB ID 3dq7 4. mCerulean – PDB ID 2wso 5. msfGFP – PDB ID 2b3p In the following section all instructions are provided using the PDB entry 1gfl. You can substitute this with the PDB entry that is the closest match to the FP assigned to your group. 1. Download the coordinate and structure factor files from the PDB. Open the web page www.rcsb.org, type in the PDB ID of interest (e.g. 1gfl) in the top search box and click on search. This should open the structure summary page for the PDB entry 1gfl. From the top right corner of the page download the coordinates and structure factor data (Text files or compressed files followed by uncompression) to your local computer. 2. Compute the electron density map For this we will use the sf-tool server at RCSB PDB (http://sf-tool.rcsb.org/). Load the coordinate and structure files as follows: Boot Camp January 2014 Now select the option for checking model (coordinates) against the structure factor file as shown below and click on the Run button. Once the calculation is complete the report page will show statistics calculated by the tool and the electron density map. From the mmCIF file link you can download a residue-by-residue report on how well the coordinates of the residue matches the map and the map itself (using a parameter called the Real space R factor). Residues/ligands with poor matches are summarized in the TABLE. Download the 2Fo-Fc map from above and save on your computer for visualization using Chimera. 3. Visualize the coordinates and corresponding electron density maps Upload the coordinates of the PDB entry (1gfl) in Chimera using Menu File… Fetch by ID… then type the PDB ID in the box and click on Fetch. Make sure that the radio button next to the database PDB is checked on. Boot Camp January 2014 Once the file is uploaded hide the ribbons and show the coordinates in the all atom view by clicking on Menu Action… Ribbons… Hide and Action… Atoms/Bonds… Show. Upload the electron density map by clicking on Menu File… Open… map file name. When the map is loaded to the structure display window a new Volume Viewer window opens. In the Volume Viewer window move the vertical marker in the histogram of data values to select a suitable contour map. The ideal contour should be between 1 and 1.5 sigma. Since Chimera does not normalize the maps so you have to determine the contour for each map visually or by moving the slider to approximate the Level value listed here: PDB 1.0 sigma 1.5 sigma Chromophore entry level level residue # 1gfl 0.28893 0.43339 S65, Y66, G67 2b3p 0.43079 0.64618 (CRO)66 2h5o 0.44597 0.66895 (CRO)66 2h5q 0.37432 0.56147 (CH6)66 2wso 0.46722 0.70083 (CRF)66 3dq7 0.39649 0.59473 (CR2)66 Change the Contour step to 1 and style to mesh. You may choose to change the color of the map by clicking on the Color box and selecting a color of your choice. Focus on the chromophore in chain A of 1gfl to see how well the coordinate model fits the electron density map. Click on Menu Favorites… Sequence… Chain A… Show to launch a new window with the sequence of residues in chain A. Use Shift-drag to select the residue 65-67 (representing the chromophore) Boot Camp January 2014 Q: Do you think that the chromophore in chain A of this structure agrees well with the electron density map? What do you think about the agreement of map and model of residues neighboring the chromophore? Focus on another region closer to the surface of the protein structure you are exploring. Q: What do you think about the coordinate model – electron density map fit for the residues you explore? Boot Camp January 2014 4. Visualize and compare the structures of all of the above FPs Load the structure of 1gfl (using the fetch option as described above). Now one by one, load the PDB entries that closely match the Fluorescent protein assigned to the various teams as follows: Menu File… Fetch by ID… and type the PDB ID (e.g. 2h5o) in the box. Superpose the 2 structures by clicking on Menu Tools… Structure Comparison… Matchmaker. This brings up the structure alignment window: On the left side of the new window, under Reference Structure, highlight 1gfl by clicking it once, then select the other PDB ID (e.g. 2h5o) in the right hand section (structure to match). Now press Apply or OK. View the 6 superposed structures. Q: How well does the overall beta-barrel in these structures overlap? Q: How do the chromophores in these structures overlap in this superposition? View the structure based sequence alignment of these structures by clicking on Menu Tools… Structure Comparison… Match -> Align. A new window will open called Create Alignment From Superposition. The superposed chains should already be selected, if not, click on them to select and say OK to compute the sequence alignment. Q: Based on the sequence alignment can you identify an absolutely conserved Arg and an absolutely conserved Glu, both located in the core of the beta barrel? (Hint: Select it Boot Camp January 2014 from the sequence alignment by click-shift-drag and display the side chains as Menu Action… Atoms/Bonds… Show.) Q: Comment on the relationship of these conserved residues and the chromophores in these proteins. 5. Explore the sequence and structural neighbors of the PDB entry (that you are exploring) and find the most distant neighbor. Open the structure summary page for the PDB entry that you are exploring by typing the PDB ID in the top search box on the RCSB PDB website (www.rcsb.org). Click on the Sequence Similarity tab. Click open any of the clusters to see what types of proteins are closely related to the GFP protein by sequence. Q: What type of proteins do you see here? Name any three from the 30% cluster. Note that there is a dramatic difference in the number of chains in the 30% cluster compared to the 40% cluster. All proteins in the 40% cluster are clearly related in sequence and structure. Those in the 30% cluster may or may not have the same structure and function. (To learn more about this read the article by Sander and Schneider, 1991, Proteins: Structure, Function and Genetics 9:56-68). For structure-based comparisons in the PDB, the top ranking protein in the 40% sequence cluster is considered to be a representative of the PDB entry that you are working with (e.g. 1gfl). Click on the 3D Similarity tab to see the structures in the PDB that are structurally similar to the PDB entry that you are exploring. Boot Camp January 2014 Click on the structure comparison results to see the PDB entries that are structurally related to the PDB entry you are exploring (e.g. 1gfl). Click on the title of the rmsd column to sort by the rmsd values of matched structures. The structure with the highest rmsd value is probably the most distant relative. Q: What is the most distant relative of the PDB entry you are exploring? List the PDB ID/Chain ID and name of the protein. 6. What other protein(s) may be related to your protein of interest. Draw a phylogenetic tree to describe its/their relationship with the Fluorescent proteins being studied here. Go to the PDB entry of any one protein that you identified as a distant structural relative of the protein of your interest (above) and get the FASTA sequence of the relevant chain ID. Run a PSI-Blast on this sequence (at http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK _LOC=blasthome) to identify homologous proteins/domains. Paste the FASTA sequence here and select the options show below: Boot Camp January 2014 Remember to change the Database to UniProtKB/SwissProt and select the algorithm PSIBLAST. In the algorithm parameters get as many sequences as possible by selecting the Max target sequences to be 20000. Click on the Blast button to start the search. In the Results page, look for proteins that are not orthologs (same protein as your query but from different organisms) but are related to the query sequence. Repeat the PSI-Blast (upto 3 times) to see if other proteins show up in the results. Boot Camp January 2014 After the 3rd iteration look for sequences that have a sufficiently low E value but low sequence identity. Q: What protein(s)/domains did you find using this search? Save the FAST format sequence of this protein/domain. To make a phylogenetic tree go to the website: http://www.cbrg.ethz.ch/services/PhylogeneticTree and paste the sequences of all the fluorescent proteins that you and the other groups have been working with, the sequence of the farthest structural/sequence relatives (identified in 5 and 6 above). For each sequence include the tags <E><SEQ> and </SEQ></E> at the beginning and end of the sequences respectively. Remember to list tags for all the sequences (in the order that you upload the sequences) and select both Distance and Parsimony modes of calculation. All other options on the page can be retained at the default value. Click on the submit button to generate the phylogeny tree as follows: Q: Comment on the evolutionary relationship of these proteins.