Exam for Algorithmic Bioinformatics International Masters in Molecular Bio-Engineering, TU Dresden Prof. Michael Schroeder 15.1.2008 Questions Please remove all but pen and question and answer sheet from your desk. Please place your student ID card clearly visible on your desk. You have 90 minutes to answer the questions. All three questions have to be answered. Please mark the question and cover sheet of the answer paper with your name and student ID. Please read all questions before you start answering. ------------------------------------------------------------------------------------------------------------------------ Q1 Q2 Q3 Sum Percent Mark Question 1 (Sequence alignment, each sub-question counts 20%) a) Write down the algorithm as pseudo-code for a global alignment of two sequences. Use variables to denote any penalties and rewards used in your algorithm. b) How do you have to set the above variables so that your algorithm's score is the number of aligned identical residues, i.e. the length of the longest common subsequence? c) Fill in the dynamic programming matrix according to your algorithm of a) and your scoring scheme of b) for the sequences LARS and ARTS. How long is the longest common subsequence? d) Briefly describe the three steps the Blast algorithm carries out and discuss the advantages/disadvantages between Blast and your algorithm. e) Currently, Blast takes a few seconds for a search on public sequence databases with millions of sequences. With the advent of high throughput sequencing the number of sequences in public databases will continue to explode. There are two approaches to provide nonetheless fast sequence search: first, changing the way the data is organised; second, changing parameters in the Blast algorithm. Explain the two approaches and discuss their pros and cons. 1 Question 2 (Multiple sequence alignment, each sub-question counts 20%) a) Briefly explain how the progressive alignment method works. b) Apply the progressive alignment method to align the sequences LARS, ARTS, ARE in the given order. Assume a math score of 1 and gap and mismatch penalties of 0. Show the final multiple sequence alignment and its score. Show all intermediate steps. c) Briefly, explain two alternative methods to generate multiple sequence alignments: First, high dimensional dynamic programming and second, the A* algorithm d) Briefly discuss the advantages and disadvantages of progressive alignment, the high dimensional dynamic programming, and the A* algorithm for multiple sequence alignments. e) How can progressive alignment and the A* algorithm be combined to have a fast algorithm that generates globally optimal multiple sequence alignments? Question 3: (Dynamic programming, each sub-question counts 25%) a) Briefly explain when an algorithm is called “dynamic programming”. b) The formula F(n) = F(n-1) + F(n-2) for n >1 and F(0) = 1 F(1) = 1 describes the growth of a rabbit population under simplified assumptions. Write down two programmes in pseudo-code, which calculate the rabbit population: The first, uses dynamic programming, the second does not. c) Which of the two programmes in b) is faster and why? d) Give an example of dynamic programming in bioinformatics other than sequence alignment. Explain how dynamic programming is applied and why it helps in your example. 2