Module 5: Algorithms in Bioinformatics (Instructor: H. Ali) Description: The module focuses on introducing students to key problems in bioinformatics and how to solve them using several problem-solving techniques. In particular, this module includes algorithms for comparing biological sequences, constructing evolutionary trees and finding genes in sequenced genomes. The module emphasizes how solving various bioinformatics problems has become a key contributor to our biological knowledge. For example, biological sciences have a long tradition of discovery by comparison and obtaining information about an unknown biological element can be estimated by comparing attributes of the new element to attributes of known elements. With the current development of Bioinformatics algorithms, it is natural to use biological sequences as the attributes to explore the potential similarities between the unknown element and various known ones. The course will present basic algorithmic concepts in computational biology and show how they are connected to molecular biology and biotechnology. For example, the students will be introduced to alignment algorithms with a simple introduction to dynamic programming. The module will also include algorithms for gene prediction, clustering and constructing evolutionary trees. Homework: As part of the module, students will be asked to apply the introduced problem solving methods on simple but critical Bioinformatics problems, with a focus on how sequence comparison techniques can be used to classify and recognize organisms. Intended audience: At UNO, the target course for this module will be BIOL 4960 (Advanced Genetics). It would also be very applicable to CSCI 3320 (Data Structures). Module Outline: I. General introduction to basic problems in Bioinformatics A. Very brief introduction to Bioinformatics B. Basic Biological (Algorithmic) concepts to Biology (Computer Science) students C. Overview of key Bioinformatics problems and associated algorithms Sequence comparison Sequencing and map assembly Gene prediction Phylogenetic trees II. Sequence comparison and alignment algorithms A. Local and global alignment B. Multiple sequence alignment C. Applications of Sequence Comparison Identification/classification of organisms Gene Prediction III. Clustering algorithms and evolutionary trees A. Brief introduction to clustering algorithms B. Using a simple algorithm to construct evolutionary trees C. Linking multiple sequence alignment to Phylogeny Learning Objectives: 1. To gain a good understanding of what the field of Bioinformatics is and how it can play a significant role in solving various problems in the domain of biosciences. Students will be asked various questions related to what Bioinformatics is, and as a new emerging multi-disciplinary field, how other traditional disciplines contributes to the understanding of the basic concepts of Bioinformatics. A) Which of the following statements best describes the field of bioinformatics? a. Applying biological concepts in the design of computer algorithms. b. Using mathematical and computational techniques to solve biological problems. c. Using computers to speed up what bioscientists do manually. d. Using supercomputers to store complex biological data. B) Which of the following concepts are related to bioinformatics? (Check all that apply) Genetic algorithms Sequence comparisons Constructing evolutionary trees Swarming techniques Ants algorithms C) Bioinformatics is a multi-disciplinary field of study. Which of the following traditional disciplines are necessary for the study of bioinformatics? (Check all that apply) Computer science Biology Information systems Pharmacy Mathematics and statistics Medicine Chemistry 2. To gain general knowledge of the main Bioinformatics algorithms and their main applications, with a particular focus on sequence comparison in the context of analyzing biological data. Students will be asked to list the main Bioinformatics problems and suggest basic algorithms to solve simplified versions of important Bioinformatics problems such as how to measure the similarity between two biological sequences. They will also be asked how sequence comparison techniques can be used to recognize and/or classify various organisms. A) Sequence alignment is a key operation for several bioinformatics applications. Which of the following statements are true about bioinformatics applications? (Check all that are true) Sequence alignment is a computationally intensive problem. Sequence alignment is the only method used to compare sequences. Dynamic Programming (DP) is widely used to solve the alignment problem. A solution to the global alignment problem can be achieved in quadratic time complexity. Heuristics have been used to find a near optimal solution in linear time for the global alignment problem. Pairwise sequence alignment can be easily extended to solve the alignment of multiple sequences. B) Genes are important segments of DNA in every genome since they code into proteins. Various species are known to share genes; this can be used to search for genes by finding conserved regions in the genomes of the species. Which of the following statements is true? a. Multiple sequence alignment is ideal for finding conserved segments of DNA among various organisms b. Finding genes by aligning genomes works better if the genomes belong to closely related organisms. c. Closely related species tend to share conserved DNA regions beyond genes. d. Classification of organisms can be obtained by comparing how many DNA segments are conserved among their genomes. 3. To develop a good understanding of how to apply known algorithmic techniques to solve specific Bioinformatics problems such as how to construct evolutions trees for a set of related organisms. Students will be asked to compare various clustering algorithms and how suitable each one to use in constructing evolutionary trees. They will also be asked to demonstrate how trees can be constructed using various algorithmic techniques. A) Constructing the tree of life is one of the most outstanding problems in biosciences. Such a tree would depict how various organisms evolved from each other. Which of the following features would the tree of life have? (Check all that apply) The distance between two organisms in the tree would reflect how closely related they are. The tree would provide information about which species are threatened with extinction and hence need protection. The tree would be a binary tree. The path between two nodes would represent the number of evolutionary steps that took place to evolve one organism from another. Primates would be grouped together in the tree. B) Constructing the tree of life is an intractable problem. As a result, many heuristics have been developed to find approximated trees. Which of the following statements are consistent with the previous sentence? (Check all that apply) Finding the optimal evolutionary tree is not computationally possible within a reasonable amount of time. Finding such a tree may be easier for a certain group of organisms but may be very hard to obtain for another group. Practically, all approaches used to construct evolutionary trees are heuristics since they may not produce the best possible tree for every given input. Constructing an evolutionary tree would take too much time using a regular computer, but would take a reasonable time using a supercomputer.