Introduction to Bioinformatics (236523) HW 1 – Winter 2016 General Instructions: Dead Line: 18/11/15 23:55. Submission according to published pairs only. The submission is electronic only in the course website. You should submit a single file in a .ZIP format named according to next format: <HW#>_<ID1>_<ID2>.zip For example: HW1_333222111_012345678.zip Local/ Global Alignments 1) For the following two sequences: GGCTATC GGATC Create global and local alignment matrices with the following scoring system: Match = +2 Mismatch = -1 Indel = -2 Report the result alignments and their scores. In the alignment matrices detail all the arrows and numbers as shown in class. 2) How would you change the scoring system in the following scenarios: a. You do not want to allow any insertions or deletions in a local alignment. b. The DNA can be phosphorylated, meaning a phosphate group can be added to some bases. Say you want to consider the phosphorylation when aligning two sequences, how would you change the scoring system? c. You want to compare DNA and RNA sequences 3) An RNA molecule can be folded into a structure called "stem-loop" in which complementary base paring takes place in a single molecule. See example below. Suggest an algorithm that given a sequence, the algorithm finds whether this sequence can form a stem-loop. 4) What alignment type (local/ global) would you use in each of the following questions and why (given that BLAST is not available)? a. You want to compare tomato genes to potato genes, you have sequences of the tomato genes and the whole potato genome without annotations. b. You want to find all the binding sites of a DNA binding protein in a mouse's genome. c. You are studying a genetic disorder and want to compare the sequence of a gene from an affected individual with a sequence of a healthy individual. BLAST search You were given a sequence of unknown origin (available in the attached file Seq1.txt). Answer the following questions: 5) Use blastn to query the sequence. Describe in your own words the results you get (number of similarities found, similarity scores, coverage of query etc.) Please provide print-screens of the results (the summary diagram and 10-20 of the highest results in the table). 6) Which organism was this sequence taken from? How can you tell? 7) Did you find similarity to the sequence in other organisms? What can you learn from this about the sequence? 8) As you can see from the results there are many results with E-value of 0 but %Identity of less than 100%. How can you explain this? 9) Now use the blastx tool to search the sequence. Describe in your own words the results you get (number of similarities found, similarity scores, coverage of query etc.) Please provide print-screens of the results (the summary diagram and 10-20 of the highest results in the table). 10) Explain the difference between a query in blastn and blastx (in general). 11) What are the difference in the results you got in step 1 and 5? In your answer mention the query cover, the E-values and the Identity%. Explain what can be the reason for the differences in the results. BLAST Parameters 12) a. What parameter can you change to optimize BLAST (NOT PSI-BLAST) to identify evolutionarily distant homologs? Explain and describe why changing this parameter would be helpful to find the distant homologs. b. Test your suggestion by running blast on the human gene available on the site (gene.txt). Compare results for a protein from distant species before and after the change. Please provide screen shots showing the search results.