What distribution of peptides result from digesting proteins with trypsin? This tour guides you through a computational experiment that you can perform within BioBIKE. To get to BioBIKE, go to: http://ixion.csbc.vcu.edu:8003/biologin Enter a login name (letters only, no spaces) No password necessary This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor more Click anywhere to position go on to theobvious. next slide To do this, click Slide Show on the top tool bar, then View show. What do you get when you cut all the proteins of an organism with trypsin, a proteolytic enzyme that cuts proteins after lysine (K) and arginine (R)? You get a mess of peptides, of course, but how small? How many of each size class? We can answer the question through a computational experiment. Teaching BioBIKE to be trypsin and applying the enzyme to an organism-worth of proteins. Let's learn how to do this with one protein sequence and apply the lesson to all lessons of a protein. First get a sequence of a protein. Perhaps a protein from our favorite cyanobacterium, ss120. I'll choose p-pro0047. Type it in the box and press Enter. Click “Execute” in the Action Menu to evaluate the expression and see the sequence in the lower frame. Ok. There it is, in all its glory. Or rather 200aa of its glory. But that is neither here nor there. We want to cut the sequence (or split it) after "K" and "R". Select “Split” to begin thinking like an enzyme. It looks like BioBIKE would appreciate a string here. Fair enough, let’s give it one… … a sequence is a string. My strategy is to cut and paste the SEQUENCEOF command into the argument of SPLIT. Click in the input box of SPLIT. Over in our sequence-calling node, select “Cut” to cut this function… … and paste it into the SPLIT function. Done and done. Let’s give SPLIT a little guidance by telling it where to cut the string. Trypsin cuts after "K" and "R". Trypsin cuts after every "K" and "R". Let's try "K" first. Enter "K" and execute your SPLIT command to see what it does. The result of splitting the sequence after every K is shown below. Take a look at it. Is it right? How can you be sure? But trypsin cuts at both "K" and "R". You can tell SPLIT to split at either letter by supplying a list of letters. Erase "K" by clicking the red x. Bring down the LIST function into the input box of EVERY. One list item comes automatically, but we need two, one for each letter. Type both letters into the input boxes. “K” and “R” now specified. Robot trypsin will cut after every Lysine and Arginine. Hit execute to see what this looks like. Look carefully at the results. Are these fragments what you expect? Should there be others? Are there any mistakes? Seems pretty variable… Our goal is to get a distribution of lengths of fragments produced by trypsin. Strategy: wrap LENGTHS-OF around the current function. Choose “Surround with” and … … click on LENGTHS-OF to evaluate the length of each fragment. Don’t forget we want the plural statement, and not the one directly above it. Hit “Execute.” And there they are: the length of each fragment of the protein p-Pro0047 as digested by Trypsin. Do the numbers make sense? Check by comparing these numbers with the peptides of the previous result. Do the lengths agree? Now that we've taught BioBIKE how to digest a single protein, let’s do all the proteins in an organism. Erase p-pro0047, click the input box, and… … and choose PROTEINS-OF from the GENOME menu. Enter ss120 into the Input Box of PROTEINS-OF, and press Enter. Then execute LENGTHS-OF again (making sure to choose Execute from the action menu of LENGTHS-OF) to display the lengths of the fragments produced by digesting all proteins in ss120 by trypsin. Are these numbers correct? Check by… we seem to have skipped a step! We haven't produced any segments to compare the numbers to. OK, do it now, by executing the SPLIT function, using Execute off of SPLIT's action menu. Now you can check, comparing the lengths of these sequences with the numbers in the previous This time the numbers and lengths don't agree at all! (31 31 75…) isn't (2 17 5…) Why? One clue is the double parentheses preceding the list. If you scroll through your results, you'll find that the result is in the form ((…)(…)…). BioBIKE is actually returning the number of elements in each sublist. If those extra parentheses just went away, the function should work. We can do this with the SIMPLIFY-LIST function, which goes around your SPLIT node and combines all the sublists into a single list. Surround SPLIT… …and wrap SIMPLIFY-LIST around it. No more Execute ((double SIMPLIFY-LIST, parentheses))! just But to does make thissure do any it works good?before executing LENGTHS-OF. Execute the entire function Again, there they are. Now do the numbers make sense? Now that it seems to work, we want to package the procedure into a function so that we never have to think about it again. First we need to get more screen space. Select “collapse” from the SIMPLIFY-LIST node. Bring down into the workspace DEFINE-FUNCTION. It’s up in the DEFINITION menu. Give it a catchy name. Then describe the function so you (and perhaps others) can get a basic idea of what it does… …from a helpful summary Give a descriptive name to the argument – the information the function acts on. TRYPSIN-DIGEST-OF will act on proteins. The body of the function is simply the procedure we perfected using proteins of ss120 as the test case. Copy that function… … and paste it into the body, using the menu obtained by clicking the box's green action icon. Now that it's there, expand it to see what we got. Whoops! We left it working on a specific case – all proteins of ss120. We need to make it work on a general case, whatever the user provides as the argument, proteins. Delete the PROTEINS-OF node. …and click on the input box of SEQUENCE-OF... …to type “proteins,” which is the variable we asked TRYPSIN-DIGEST-OF to look for in the first place. Executing DEFINE-FUNCTION adds our function to BioBIKE. Check it out. No, seriously. Check it out. If the function is correct, it should replace what we had before and give exactly the same result. Start off by deleting what we had before. Click on the input box of LENGTHS-OF, and put inside of it our new function. Notice that you can get the function from the new FUNCTIONS button. The function you made is now part of the language Click on the input box of TRYPSINDIGEST-OF, and put inside of it PROTEIN-OF (from the Genes-Protein menu). Fill its input box with ss120. This is exactly what we did before (I hope), except that the complicated SIMPLIFY-LIST function is now encapsulated in TRYPSIN-DIGEST-OF. Execute it (the moment of truth). Are these results the same as before? Now that you've confirmed the results, we can trust the new function (more or less). So let's get rid of it. Much less cluttered! Now the problem is to analyze all those numbers. Strategy: Count how many there are in each size class, then hand the results over to Excel to make a graph. The BIN-DATA-OF function will do the counting, according to classes that I define. The function should act on the results I just obtained. Click on the input box of BIN-DATA-OF. I could give those results a name, using DEFINE, but in this case I'll refer the result using the PREVIOUS-RESULT function. Sometimes it's desirable to make bins that combine different classes, but this time, I'll put each size class in a separate bin. So the bin-width is set to 1, and I make a guess that I'm not going to find a fragment bigger than 500 amino acids. After typing in these values, I execute the function. The results tell me that there are 0 fragments of length 0 up to 1, 7958 fragments of length 1 up to 2, and so forth. To save the result to a file for import into Excel, I use the WRITE function. The material to write is the previous result, which… … I can copy… … and paste into the appropriate input box. The file-name is… … whatever I wish, so long as the name is in quotes. It is also desirable to give it a txt extension, so that it can be automatically opened by Notepad or similar. Finally, Excel likes tab-delimited files, so I choose that format... … and execute the resulting function. Executing the file, writes the binned data to my personal file space on the BioBIKE server. To see this file and to download it to my own computer, I go to the BioBIKE Files menu. This gets me to the directory of my personal file space. Clicking on the file gets me a view of the file. I can use the usual browser controls to download the file to my own computer. Now you can make an x-y scatter plot to visualize the distribution of trypsin fragments resulting from the digestion of all proteins of Prochlorococcus ss120. What distribution of peptides result from digesting proteins with trypsin? In this tour, you've seen: - How to simulate a digestion of a protein by trypsin. - How to digest all proteins of an organism at once - How to package a useful procedure into a new function - How to format numeric results for a histogram plot and a very important general lesson: - The importance of checking every result to avoid being fooled by the computer.