HW1 (2016)

advertisement
Introduction to Bioinformatics (236523)
HW 1 – Winter 2016
General Instructions:

Dead Line: 18/11/15 23:55.

Submission according to published pairs only.

The submission is electronic only in the course website.

You should submit a single file in a .ZIP format named according to next format:
<HW#>_<ID1>_<ID2>.zip
For example: HW1_333222111_012345678.zip
Local/ Global Alignments
1) For the following two sequences:
GGCTATC
GGATC
Create global and local alignment matrices with the following scoring system:
Match = +2
Mismatch = -1
Indel = -2
Report the result alignments and their scores. In the alignment matrices detail all the
arrows and numbers as shown in class.
2) How would you change the scoring system in the following scenarios:
a. You do not want to allow any insertions or deletions in a local alignment.
b. The DNA can be phosphorylated, meaning a phosphate group can be added to
some bases. Say you want to consider the phosphorylation when aligning two
sequences, how would you change the scoring system?
c. You want to compare DNA and RNA sequences
3) An RNA molecule can be folded into a structure called "stem-loop" in which
complementary base paring takes place in a single molecule. See example below.
Suggest an algorithm that given a sequence, the algorithm finds whether this sequence
can form a stem-loop.
4) What alignment type (local/ global) would you use in each of the following questions
and why (given that BLAST is not available)?
a. You want to compare tomato genes to potato genes, you have sequences of the
tomato genes and the whole potato genome without annotations.
b. You want to find all the binding sites of a DNA binding protein in a mouse's
genome.
c. You are studying a genetic disorder and want to compare the sequence of a
gene from an affected individual with a sequence of a healthy individual.
BLAST search
You were given a sequence of unknown origin (available in the attached file Seq1.txt). Answer
the following questions:
5) Use blastn to query the sequence. Describe in your own words the results you get
(number of similarities found, similarity scores, coverage of query etc.) Please provide
print-screens of the results (the summary diagram and 10-20 of the highest results in
the table).
6) Which organism was this sequence taken from? How can you tell?
7) Did you find similarity to the sequence in other organisms? What can you learn from this
about the sequence?
8) As you can see from the results there are many results with E-value of 0 but %Identity of
less than 100%. How can you explain this?
9) Now use the blastx tool to search the sequence. Describe in your own words the results
you get (number of similarities found, similarity scores, coverage of query etc.) Please
provide print-screens of the results (the summary diagram and 10-20 of the highest
results in the table).
10) Explain the difference between a query in blastn and blastx (in general).
11) What are the difference in the results you got in step 1 and 5? In your answer mention
the query cover, the E-values and the Identity%. Explain what can be the reason for the
differences in the results.
BLAST Parameters
12)
a. What parameter can you change to optimize BLAST (NOT PSI-BLAST) to identify
evolutionarily distant homologs? Explain and describe why changing this
parameter would be helpful to find the distant homologs.
b. Test your suggestion by running blast on the human gene available on the site
(gene.txt). Compare results for a protein from distant species before and after
the change. Please provide screen shots showing the search results.
Download