Protein Data Bank Protein Data Bank is a great source for information about proteins. In addition to performing BLAST matches, it also provides links to published literature and allows the user to view the structure of the protein. It offers many ways to search for matches. The most useful when annotating is by search by sequence. This is the front page of the PDB website. To search, click on the search tab in the upper left hand corner. PDB gives many search options. If we’re just beginning to investigate a protein, it’s best to start with “sequence”. This page accepts protein sequences with either the 1- or 3-letter amino acid abbreviations. Don’t post the nucleotide sequence, as it is not accepted. Select the “Use sequence” radio button, post your sequence, and hit search. When the results page comes up, you will see multiple results. As an example, the first result from the search above is below. There’s a lot of information given with each result, so we’ll sort though it line by line. The combination of letters and numbers in the left hand corner is the PDB ID. You can search using this ID later to find the proteins of this category. The two icons furthest to the left in the corner (the blue arrow and list) allow you to download or view, respectively, a printout of the protein match. This printout gives a great deal of useful information in a text-only format that facilitates copying and pasting. The eye icon gives you a chance to look at the jmol structure of the protein match. The question mark leads to a help page. The purple link at the top of the page leads to the PDB structural report for this protein. One feature of the main results page is the color-coded BLAST results. Above the results is the numerical BLAST output. Length is the number of amino acids in the alignment sequence. The score is calculated based on the number of identities, in which the amino acid is the same for both sequences, positives, where a positive value is generated by the algorithm and mismatches, where a negative value is generated. The E-value is also given; the lower the E-value, the better the alignment of the two proteins. Gaps indicate regions in which one protein contains amino acid sequences not found in the other protein. PDB color-codes the results by these criteria. Identities are marked black, with the amino acid abbreviation repeated in the row between the two sequences. Positives are marked yellow, with a + between the sequences. Non-matching amino acids are marked in red. The amino acids marked grey and blue are slightly more confusing. They indicate amino acids that are present in one protein sequence but not the other (gaps). Once you’ve looked at the BLAST results, clicking on the grey text to bring up the structural report is a good next step. Once you do that, you will see a page with many blue links. Each blue link can be clicked to go to a different page. For example, you could click on hydrolase activity under molecular function to search for all proteins that display hydrolase activity. This page gives a great deal of data about the protein. In addition, it also supplies many useful tools. On the right hand side of the page, you see a colorful protein with links under it. Clicking on each of those links will allow you to open the protein in a different rendering agent. Each rendering agent allows you to rotate the protein and zoom in and out. There are also different viewing modes you can take advantage of, such as space fill and ribbon. Another useful link is the EC number, which is a classification number that is determined by what type of reaction an enzyme catalyzes. Although the names of two proteins may be different, if the EC number is the same, the proteins have the same function. The link to the PubMed abstract is also very useful. When clicked, this link opens the abstract of the paper that contains the information about this protein. In addition to the abstract, the paper lists keywords that pertain to this protein. By typing one or more of those keywords into the search box, you can search for other proteins whose abstracts list those keywords, such as carboxylic ester hydrolases. If you need to find the protein again, simply write down the PDB ID. Then, when you’re ready to look at the protein again, simply type the ID number into the search box at the top of the page. We are then directed back to the structural report for the protein. Another useful feature of PDB is that, for the duration of your session, it keeps your queries and search results in memory. They can be easily accessed from the upper left corner of the page. You may want to place your results in a spreadsheet. In this case, PDB gives you the opportunity to export the information you want in Excel format. To do this, go to the toolbar on the left, click on Tabulate, and then on Custom Report. The picture below represents some of the options available for data that can be included in the Excel report. Simply click as few or as many as you would like, then scroll down and click create report. For this example, I’ve highlighted Sequence and Keywords. Notice that selecting the bolded category on the left hand side of the page selects all of the information directly to the right of that heading. Once we click create report, we are taken to a page that shows the attributes we selected. Simply click on the green Excel logo in the upper right corner to download the Excel file. The report can also be downloaded as a CSV file.