Bioinformatics Assignment (BLUE is the same assignment done on the Biology Workbench) 1. Translation of DNA into Amino Acid sequence Log on to the ISU Bioinformatics Portal (contact Mike Thomas, mthomas@isu.edu) Username: ##### Password: ##### On the left side you will see a listing of nearly 20 software programs. Click on the file labeled EMBOSS, click on Nucleic, click on Translation, click on transeq Below input Section, enter your DNA sequence in the box marked Actual data In the advanced Section, under Frame(s) to translate, choose All six frames Scroll back up to the top, change the default email address to your own (otherwise I will delete any emails sent to me) To begin the search, click on Submit transeq at the top of the page When your results are returned, you will see at the top of the page Results: click on outseq.out You will see 6 different results, one for each reading frame. If any sequences have an asterisk (*), that means there is a stop codon. These sequences with an * are meaningless and you should immediately disregard that as an option. Copy the remaining sequences and proceed to the next step. 1. Translation of DNA into Amino Acid sequence Go to workbench.sdsc.edu, and register/log in. Click on Session Tools, and then select “Start New Session”. Give your session a name such as “Unknown Id” and click on the “Start New Session” button. Click on Nucleic Tools, and then select “Add New Nucleic Sequence” and “Run”. Put a description of your sequence (such as Unknown Sequence 1) into the “Label:” box. Copy and paste your sequence into the “Sequence:” box. Scroll down the page and click on “Save” you will end up back in the Nucleic Tools window. Select your sequence by clicking in the check box. Scroll to the bottom of the available tools and select “SIXFRAME” and the “Run”. You will be taken to the options page, click on the “Submit” button. The next screen shows you the amino acid translations of your nucleotide sequence in all six reading frames. Select the sequences that have no *, which indicate stop codons, by clicking on the check box. Then click on “Import Sequence(s)” at the bottom of the page. Your translated sequence(s) are now in the Protein Tools section of the Workbench. 2. BLAST Protein search Log on to the following web address: http://www.ncbi.nlm.nih.gov/BLAST/ Under Protein, click on Protein-protein BLAST (blastp) Enter your amino acid sequence in the box labeled Search Click on BLAST!, on the next screen click on Format! This may take a few seconds to search all the available databases, so please wait patiently Your file will be returned with a match to known amino acid sequences. As you scroll down the screen you will see Distribution of Blast Hits on the Query Sequence, keep scrolling down until you find Related Structures Sequences producing significant alignments: Each sequence has a score (bits) , which tells you how closely aligned the 2 sequences are. By clicking on the Score, it will take you down to a match. Each match appears as follows: Query: 1 Sbjct: 300 (or some other number) Query is the sequence you submitted for analysis. Sbjct is the match that was found. The middle line is the computer’s attempt to align them for you. Find 10 different species (if available, use human as one species) that share similarity to the submitted sequence. Copy and paste this Sbjct sequence into a new file to be used for phylogenetic analysis. Each species must have a ONE word name, if the name consists of more than one word, please use a one word abbreviation of 8 characters or less. Do NOT use the Latin name of genus and species, find the common name. The format must be as follows for this sequence to be used in the next step: >species name(return) Amino acid sequence(return) >species name(return) Amino acid sequence(return) >human ANSNCVMFKLGIRKMRL >frog ANSDHYMKLGIKMRL Write all 10 sequences like this!! 2. BLAST protein search Select one of your translated sequences by clicking on the check box. Scroll through the list of tools, and select “BLASTP”, then click on “Run”. From the list of databases, select “Genpept Full Release” and “Genpept Updates”. Do not change any other options on this page. Scroll down the page and click on “Submit”. In less than a minute, your results will appear on the screen. As you scroll down the screen you will see “Sequences producing significant alignments:” Each sequence has a score (bits) , which tells you how closely aligned the 2 sequences are. By clicking on the Score, it will take you down to a match. Each match appears as follows: Query: 1 Sbjct: 300 (or some other number) Query is the sequence you submitted for analysis. Sbjct is the match that was found. The middle line is the computer’s attempt to align them for you. If the Expect value associated with the first Sbjct is not close to 0, then the reading frame is not correct, and you will need to repeat the above steps with your other translated sequences. Find 10 different species (if available, use human as one species) that share similarity to the submitted sequence. Scroll through the table at the top of the screen, and click on the check box for each sequence that you wish to select. Then click on the “Import Sequences” button. Select the ten sequences from the BLAST search by clicking on their check boxes. Select “Edit Protein Sequence(s)” and click on “Run”. Replace the Genpept:### that follows the > in the “Sequence:” box with the common name of the species from which it comes. Each species must have a ONE word name, if the name consists of more than one word, please use a one word abbreviation of 8 characters or less. Do NOT use the Latin name of genus and species, find the common name. Be careful to keep the > symbol exactly where it is. When you have changed all of the Sequence files, click on the “Save” button, which can be found at either the top or the bottom of the screen. 3. Phylogenetic Analysis Back on the ISU Bioinformatics Portal, on the left side, click on Clustalw, click on clustalw Under Number 2. Actual data, paste in your 10 sequences in the format above In the next box labeled Actions, use the drop down menu to choose -tree: calculate NJ tree Scroll down, under Multiple Alignments Parameters The second drop down box asks you to choose between Protein or DNA (-type), choose protein Scroll back up to the top of the page, change the default Email to your own, and click on submit clustalw If you did everything correctly, you will see under Results: a file named infile.ph Below this file is a drop down box, choose drawgram Click on the box next to this which reads Run the selected program on infile.ph A new screen will pop up, under Drawgram options Choose the Tree style named P: Phenogram Scroll down to Drawgram Options Next to the box labeled, Which plotter or printer will the tree be drawn on Choose P: PCX file format Scroll back to the top of the page and click on Run drawgram Your results will appear in a file named plotfile.pcx Save this file to your computer. 2. Phylogenetic Analysis Select your newly edited sequences, which are now labeled “edited”. Scroll through the Tools, and select “CLUSTALW”, then click on “Run”. Do not change the options, click on “Submit” In less than a minute, you will see the aligned sequences. These sequences can be copied and pasted into a word document, although you will lose the color. Click on the “Import Alignment” button. Your aligned sequences are now in the Alignment Tools section. Select your alignment, scroll to and select “DRAWGRAM” and click on “Run”. On the options page change “Exclude positions with gaps:” from no to yes and “Correct for multiple substitutions:” from no to yes. Click on the submit button. You can download a postscript version of your tree. Click on the “Return” button when you are done. Do NOT print your assignment; send it to me via email. Please include the following information in your assignment: 1. The name of your protein 2. An alignment of each of 10 different species with differences shown in bold or a different color (each sequence is compared only to the human not to each other) 3. A phylogenetic tree showing the evolutionary relatedness of each species