Bioinformatics Exam - BIOTEC - Biotechnology Center TU Dresden

advertisement
Exam for Algorithmic Bioinformatics
International Masters in Molecular Bio-Engineering, TU Dresden
Prof. Michael Schroeder
15.1.2008
Questions
Please remove all but pen and question and answer sheet from your desk.
Please place your student ID card clearly visible on your desk.
You have 90 minutes to answer the questions.
All three questions have to be answered.
Please mark the question and cover sheet of the answer paper with your name and student ID.
Please read all questions before you start answering.
------------------------------------------------------------------------------------------------------------------------
Q1
Q2
Q3
Sum
Percent
Mark
Question 1 (Sequence alignment, each sub-question counts 20%)
a) Write down the algorithm as pseudo-code for a global alignment of two sequences. Use variables
to denote any penalties and rewards used in your algorithm.
b) How do you have to set the above variables so that your algorithm's score is the number of
aligned identical residues, i.e. the length of the longest common subsequence?
c) Fill in the dynamic programming matrix according to your algorithm of a) and your scoring
scheme of b) for the sequences LARS and ARTS. How long is the longest common subsequence?
d) Briefly describe the three steps the Blast algorithm carries out and discuss the
advantages/disadvantages between Blast and your algorithm.
e) Currently, Blast takes a few seconds for a search on public sequence databases with millions of
sequences. With the advent of high throughput sequencing the number of sequences in public
databases will continue to explode. There are two approaches to provide nonetheless fast sequence
search: first, changing the way the data is organised; second, changing parameters in the Blast
algorithm. Explain the two approaches and discuss their pros and cons.
1
Question 2 (Multiple sequence alignment, each sub-question counts 20%)
a) Briefly explain how the progressive alignment method works.
b) Apply the progressive alignment method to align the sequences LARS, ARTS, ARE in the given
order. Assume a math score of 1 and gap and mismatch penalties of 0. Show the final multiple
sequence alignment and its score. Show all intermediate steps.
c) Briefly, explain two alternative methods to generate multiple sequence alignments: First, high
dimensional dynamic programming and second, the A* algorithm
d) Briefly discuss the advantages and disadvantages of progressive alignment, the high dimensional
dynamic programming, and the A* algorithm for multiple sequence alignments.
e) How can progressive alignment and the A* algorithm be combined to have a fast algorithm that
generates globally optimal multiple sequence alignments?
Question 3: (Dynamic programming, each sub-question counts 25%)
a) Briefly explain when an algorithm is called “dynamic programming”.
b) The formula
F(n) = F(n-1) + F(n-2) for n >1 and
F(0) = 1
F(1) = 1
describes the growth of a rabbit population under simplified assumptions. Write down two
programmes in pseudo-code, which calculate the rabbit population: The first, uses dynamic
programming, the second does not.
c) Which of the two programmes in b) is faster and why?
d) Give an example of dynamic programming in bioinformatics other than sequence alignment.
Explain how dynamic programming is applied and why it helps in your example.
2
Download