Bioinformatics II: Structural similarities between different proteins: VAST and Deep View Created by: Matthew Cordes 11/08/05 Lesson Date: 11/09/05 – W Last week we learned how to use BLAST compare an amino-acid sequence to other sequences in online databases to see if any of them are significantly similar, including any that have a solved structure. We also learned how to do Conserved Domain Database searches that compare a sequence to a database of sequence patterns/profiles that are common to all members of a particular protein domain family. Both of these involve inferring relationships using amino-acid sequence data alone. An important feature of proteins, however, is that their sequences often do not remain the same or even very similar across time. Related proteins may acquire differences in sequence through mutation until it is no longer possible to see that they are related by comparing the sequences! However, they may retain similarities in structure and function despite having lost all similarity in sequence. Such similarities may be revealed by structural database similarity searches involving structural alignments/comparisons. This is very powerful, but there are two major problems: 1) you have to know what the structure of your protein is, which is much harder than figuring out what its sequence is! 2) because it's harder to get a protein's structure than to get its sequence, structure databases like the PDB contain many fewer entries than sequence databases. Thus, if no one happens to have solved a protein structure similar to yours, you won't get any hits. There are many web-based tools for structural similarity searches. The NCBI site emphasizes a software tool called VAST. Another commonly used one is DALI. Here we will conduct a VAST search of the PDB for similar structures using the copper-binding protein azurin as a query structure. We will then compare the hits in this search to the hits from a BLAST search using the sequence of azurin as a query sequence. We will see that there are proteins of known structure which are similar to azurin in structure but not in sequence. We will then use Deep View to superimpose two of these structures and see how similar/different they are. 1. Setting up a VAST search for structures similar to a copper-binding protein called azurin from the bacterium Pseudomonas aeruginosa. copy the PDB file 1AZU from the instructor_deposit directory Go to http://www.ncbi.nlm.nih.gov/ Click on Structure (top menu bar) Click on VAST SEARCH along the left side of the window Click on Choose file and select the PDB file 1AZU from your directory Leave the box labelled "medium-redundancy subset of the PDB" checked. This will reduce search time by eliminating multiple hits to different protein structures that have the same or nearly the same aminoacid sequence. Click on submit to initiate a VAST search for structures similar to 1AZU First VAST will try to cut your protein into domains before it searches with each domain. A window will pop up with a summary of the domains it finds. In the case of 1AZU there is only one domain. Click on START at the bottom of the page to do the actual VAST search. VAST will now try to align your structure to each structure in the database using the positions of alpha-helices and beta-sheets as a guide. This will take a few minutes. Leave the browser window open and go on to the next part. 2. Setting up a BLAST search to look for sequences in the PDB similar to azurin. Open a second browser window and go to http://www.ncbi.nlm.nih.gov/ again. Click on BLAST (top menu bar) Click on protein-protein BLAST Enter the accession code 1AZU in the query box Limit the search to the PDB under the choose database option. Leave the conserved domain search box checked (same as last week) click BLAST! note that the CD search finds a copper-binding domain. FORMAT your BLAST results. Leave the results window open and return to your VAST search window. Hopefully the search will be done now. 3. Looking at the VAST search results and comparing them to the BLAST results. Click on the words "entire chain" to view the VAST search results. The formatted results will show the list of hits from best to worst, by default according to how much of the structure (how many residues) the program was able to reasonably align. The red bars next to each hit show the regions of the sequence that aligned structurally to the query. Note that for very few of the hits is VAST is able to see similarity across more than about 75% of the sequence. At the top of the page, under List, change the hit sorting to sort by VAST P-value. The P-value is the probability that the structural similarity detected would have occurred by pure chance and is therefore insignificant. Click List to update the sorting. Make a note of a few of the PDB codes for some of the better hits in the VAST search. Move the cursor over the PDB codes to see annotations of the hits. Do most of the best hits appear to be copperbinding proteins? Now look in your BLAST results page and test a few of the VAST hits to see if they also show up as hits in the BLAST search. The best way to do this is to use the Find... function under the Edit menu in Internet Explorer, and type the PDB code into the search box. Do all the VAST hits also show up as BLAST hits with E-values less than 0.001? 4. Deep View superposition of 1AZU and 1A8Z. Having a first-hand look at structural similarities. Now we're going to look at a structure superposition corresponding to a VAST hit. Go back to the VAST results page and click on 1A8Z. This will link you to the Entrez MMDB structure summary page. From there, you can link to the PDB and download the coordinates for 1A8Z. Either do this yourself, or simply copy the 1A8Z PDB file out of the instructor deposit folder and into your own folder. Open Deep View (Swiss PDB Viewer) and load both structures. Show them as C-alpha traces only, with no side chains displayed. Color the traces different colors. Under Fit, do Magic fit. In the selection box, choose CA atoms only, and make sure the layers involved say 1AZU and 1A8Z. Click OK and note what happens. The two structures should now be at least partly superimposed. Under Fit, choose Inprove fit. Notice that the fit gets better. Do the Improve fit once more. Now look in the Control panel window for each layer--only some of the residues are selected (red). These are the ones the program was able to easily superimpose in the two structures. Turn the show checkmarks off for all residues in each layer by shift-clicking at the top of the show column. Without changing the residues selected, shift-click at the top of the ribbon column to show the ribbon for all residues. Use the control panel to color the ribbons for the two structures differently. Under preferences/ribbon, click render as solid ribbon Under the display menu, select use OpenGL rendering. You should now see nice ribbon diagrams for the superposition. Note that some parts overlay well and others don't. click on ribbon in the control panel window in both layers, to just show the ribbon for the aligned portion of the structures. What kind of structural element do the proteins seem to share in common? Now select and display the Cu atom in each structure (it's at the bottom of the atom list in the control panel). Is it bound in the same position in the two structures? Now select only the Cu atom in both layers and use the Neighbors of selected aa... function under the Select menu, to display only groups that are within 4.0 Å of the Cu atom. Use the label function in the toolbar to identify what these residues are. Are they the same in both structures? Under the Window menu, choose Sequence alignment. This will display an alignment of the two sequences based on the superposition of the structures. Click the little page icon in the far left corner of the alignment window. A new window should pop up with a more convenient view of the alignment. The region of the alignment between the two sequences is about 145 amino acids long. The asterisks indicate which amino acids are identical. How many are identical? Approximately what percentage of the amino acids in the alignment are identical? What this exercise should tell you is that there can be similarities and relationships between the structures and functions of proteins that aren't evident from simple comparisons of the amino acid sequences in a BLAST search!