Go to the NCBI site and click on the structure

advertisement
Bioinformatics II:
Structural similarities between different proteins:
VAST and Deep View
Created by: Matthew Cordes 11/08/05
Lesson Date: 11/09/05 – W
Last week we learned how to use BLAST compare an amino-acid sequence to other sequences in online
databases to see if any of them are significantly similar, including any that have a solved structure. We also
learned how to do Conserved Domain Database searches that compare a sequence to a database of
sequence patterns/profiles that are common to all members of a particular protein domain family. Both of
these involve inferring relationships using amino-acid sequence data alone.
An important feature of proteins, however, is that their sequences often do not remain the same or even
very similar across time. Related proteins may acquire differences in sequence through mutation until it is
no longer possible to see that they are related by comparing the sequences! However, they may retain
similarities in structure and function despite having lost all similarity in sequence. Such similarities may be
revealed by structural database similarity searches involving structural alignments/comparisons. This
is very powerful, but there are two major problems: 1) you have to know what the structure of your protein
is, which is much harder than figuring out what its sequence is! 2) because it's harder to get a protein's
structure than to get its sequence, structure databases like the PDB contain many fewer entries than
sequence databases. Thus, if no one happens to have solved a protein structure similar to yours, you won't
get any hits.
There are many web-based tools for structural similarity searches. The NCBI site emphasizes a software
tool called VAST. Another commonly used one is DALI. Here we will conduct a VAST search of the
PDB for similar structures using the copper-binding protein azurin as a query structure. We will then
compare the hits in this search to the hits from a BLAST search using the sequence of azurin as a query
sequence. We will see that there are proteins of known structure which are similar to azurin in structure but
not in sequence. We will then use Deep View to superimpose two of these structures and see how
similar/different they are.
1. Setting up a VAST search for structures similar to a copper-binding protein called azurin from the
bacterium Pseudomonas aeruginosa.










copy the PDB file 1AZU from the instructor_deposit directory
Go to http://www.ncbi.nlm.nih.gov/
Click on Structure (top menu bar)
Click on VAST SEARCH along the left side of the window
Click on Choose file and select the PDB file 1AZU from your directory
Leave the box labelled "medium-redundancy subset of the PDB" checked. This will reduce search time
by eliminating multiple hits to different protein structures that have the same or nearly the same aminoacid sequence.
Click on submit to initiate a VAST search for structures similar to 1AZU
First VAST will try to cut your protein into domains before it searches with each domain. A window
will pop up with a summary of the domains it finds. In the case of 1AZU there is only one domain.
Click on START at the bottom of the page to do the actual VAST search. VAST will now try to align
your structure to each structure in the database using the positions of alpha-helices and beta-sheets as a
guide.
This will take a few minutes. Leave the browser window open and go on to the next part.
2. Setting up a BLAST search to look for sequences in the PDB similar to azurin.










Open a second browser window and go to http://www.ncbi.nlm.nih.gov/ again.
Click on BLAST (top menu bar)
Click on protein-protein BLAST
Enter the accession code 1AZU in the query box
Limit the search to the PDB under the choose database option.
Leave the conserved domain search box checked (same as last week)
click BLAST!
note that the CD search finds a copper-binding domain.
FORMAT your BLAST results.
Leave the results window open and return to your VAST search window. Hopefully the search will be
done now.
3. Looking at the VAST search results and comparing them to the BLAST results.
 Click on the words "entire chain" to view the VAST search results.
 The formatted results will show the list of hits from best to worst, by default according to how much of
the structure (how many residues) the program was able to reasonably align. The red bars next to each
hit show the regions of the sequence that aligned structurally to the query. Note that for very few of the
hits is VAST is able to see similarity across more than about 75% of the sequence.
 At the top of the page, under List, change the hit sorting to sort by VAST P-value. The P-value is the
probability that the structural similarity detected would have occurred by pure chance and is therefore
insignificant. Click List to update the sorting.
 Make a note of a few of the PDB codes for some of the better hits in the VAST search. Move the
cursor over the PDB codes to see annotations of the hits. Do most of the best hits appear to be copperbinding proteins?
 Now look in your BLAST results page and test a few of the VAST hits to see if they also show up as
hits in the BLAST search. The best way to do this is to use the Find... function under the Edit menu in
Internet Explorer, and type the PDB code into the search box. Do all the VAST hits also show up as
BLAST hits with E-values less than 0.001?
4. Deep View superposition of 1AZU and 1A8Z. Having a first-hand look at structural similarities.
 Now we're going to look at a structure superposition corresponding to a VAST hit. Go back to the
VAST results page and click on 1A8Z. This will link you to the Entrez MMDB structure summary
page. From there, you can link to the PDB and download the coordinates for 1A8Z. Either do this
yourself, or simply copy the 1A8Z PDB file out of the instructor deposit folder and into your own
folder.
 Open Deep View (Swiss PDB Viewer) and load both structures.
 Show them as C-alpha traces only, with no side chains displayed. Color the traces different colors.
 Under Fit, do Magic fit. In the selection box, choose CA atoms only, and make sure the layers
involved say 1AZU and 1A8Z. Click OK and note what happens. The two structures should now be at
least partly superimposed.
 Under Fit, choose Inprove fit. Notice that the fit gets better. Do the Improve fit once more.
 Now look in the Control panel window for each layer--only some of the residues are selected (red).
These are the ones the program was able to easily superimpose in the two structures. Turn the show
checkmarks off for all residues in each layer by shift-clicking at the top of the show column.
 Without changing the residues selected, shift-click at the top of the ribbon column to show the ribbon
for all residues. Use the control panel to color the ribbons for the two structures differently. Under
preferences/ribbon, click render as solid ribbon Under the display menu, select use OpenGL
rendering. You should now see nice ribbon diagrams for the superposition. Note that some parts
overlay well and others don't.
 click on ribbon in the control panel window in both layers, to just show the ribbon for the aligned
portion of the structures. What kind of structural element do the proteins seem to share in common?
 Now select and display the Cu atom in each structure (it's at the bottom of the atom list in the control
panel). Is it bound in the same position in the two structures?
 Now select only the Cu atom in both layers and use the Neighbors of selected aa... function under the
Select menu, to display only groups that are within 4.0 Å of the Cu atom. Use the label function in
the toolbar to identify what these residues are. Are they the same in both structures?
 Under the Window menu, choose Sequence alignment. This will display an alignment of the two
sequences based on the superposition of the structures.
 Click the little page icon in the far left corner of the alignment window. A new window should pop up
with a more convenient view of the alignment. The region of the alignment between the two sequences
is about 145 amino acids long. The asterisks indicate which amino acids are identical. How many are
identical? Approximately what percentage of the amino acids in the alignment are identical?
What this exercise should tell you is that there can be similarities and relationships between the structures
and functions of proteins that aren't evident from simple comparisons of the amino acid sequences in a
BLAST search!
Download