Algorithms in Bioinformatics Module Learning Objectives

advertisement
Module 5: Algorithms in Bioinformatics (Instructor: H. Ali)
Description: The module focuses on introducing students to key problems in bioinformatics and
how to solve them using several problem-solving techniques. In particular, this module includes
algorithms for comparing biological sequences, constructing evolutionary trees and finding genes
in sequenced genomes. The module emphasizes how solving various bioinformatics problems
has become a key contributor to our biological knowledge. For example, biological sciences have
a long tradition of discovery by comparison and obtaining information about an unknown
biological element can be estimated by comparing attributes of the new element to attributes of
known elements. With the current development of Bioinformatics algorithms, it is natural to use
biological sequences as the attributes to explore the potential similarities between the unknown
element and various known ones. The course will present basic algorithmic concepts in
computational biology and show how they are connected to molecular biology and
biotechnology. For example, the students will be introduced to alignment algorithms with a
simple introduction to dynamic programming. The module will also include algorithms for gene
prediction, clustering and constructing evolutionary trees.
Homework: As part of the module, students will be asked to apply the introduced problem
solving methods on simple but critical Bioinformatics problems, with a focus on how sequence
comparison techniques can be used to classify and recognize organisms.
Intended audience: At UNO, the target course for this module will be BIOL 4960 (Advanced
Genetics). It would also be very applicable to CSCI 3320 (Data Structures).
Module Outline:
I.
General introduction to basic problems in Bioinformatics
A. Very brief introduction to Bioinformatics
B. Basic Biological (Algorithmic) concepts to Biology (Computer Science) students
C. Overview of key Bioinformatics problems and associated algorithms
 Sequence comparison
 Sequencing and map assembly
 Gene prediction
 Phylogenetic trees
II. Sequence comparison and alignment algorithms
A. Local and global alignment
B. Multiple sequence alignment
C. Applications of Sequence Comparison
 Identification/classification of organisms
 Gene Prediction
III. Clustering algorithms and evolutionary trees
A. Brief introduction to clustering algorithms
B. Using a simple algorithm to construct evolutionary trees
C. Linking multiple sequence alignment to Phylogeny
Learning Objectives:
1. To gain a good understanding of what the field of Bioinformatics is and how it can play a
significant role in solving various problems in the domain of biosciences.
Students will be asked various questions related to what Bioinformatics is, and as a new
emerging multi-disciplinary field, how other traditional disciplines contributes to the
understanding of the basic concepts of Bioinformatics.
A) Which of the following statements best describes the field of bioinformatics?
a. Applying biological concepts in the design of computer algorithms.
b. Using mathematical and computational techniques to solve biological problems.
c. Using computers to speed up what bioscientists do manually.
d. Using supercomputers to store complex biological data.
B) Which of the following concepts are related to bioinformatics? (Check all that apply)
 Genetic algorithms
 Sequence comparisons
 Constructing evolutionary trees
 Swarming techniques
 Ants algorithms
C) Bioinformatics is a multi-disciplinary field of study. Which of the following
traditional disciplines are necessary for the study of bioinformatics? (Check all that
apply)
 Computer science
 Biology
 Information systems
 Pharmacy
 Mathematics and statistics
 Medicine
 Chemistry
2. To gain general knowledge of the main Bioinformatics algorithms and their main
applications, with a particular focus on sequence comparison in the context of analyzing
biological data.
Students will be asked to list the main Bioinformatics problems and suggest basic
algorithms to solve simplified versions of important Bioinformatics problems such as
how to measure the similarity between two biological sequences. They will also be asked
how sequence comparison techniques can be used to recognize and/or classify various
organisms.
A) Sequence alignment is a key operation for several bioinformatics applications. Which
of the following statements are true about bioinformatics applications? (Check all that
are true)
 Sequence alignment is a computationally intensive problem.
 Sequence alignment is the only method used to compare sequences.
 Dynamic Programming (DP) is widely used to solve the alignment problem.
 A solution to the global alignment problem can be achieved in quadratic time
complexity.
 Heuristics have been used to find a near optimal solution in linear time for the
global alignment problem.
 Pairwise sequence alignment can be easily extended to solve the alignment of
multiple sequences.
B) Genes are important segments of DNA in every genome since they code into proteins.
Various species are known to share genes; this can be used to search for genes by finding
conserved regions in the genomes of the species. Which of the following statements is
true?
a. Multiple sequence alignment is ideal for finding conserved segments of DNA
among various organisms
b. Finding genes by aligning genomes works better if the genomes belong to closely
related organisms.
c. Closely related species tend to share conserved DNA regions beyond genes.
d. Classification of organisms can be obtained by comparing how many DNA
segments are conserved among their genomes.
3. To develop a good understanding of how to apply known algorithmic techniques to solve
specific Bioinformatics problems such as how to construct evolutions trees for a set of
related organisms.
Students will be asked to compare various clustering algorithms and how suitable each
one to use in constructing evolutionary trees. They will also be asked to demonstrate how
trees can be constructed using various algorithmic techniques.
A) Constructing the tree of life is one of the most outstanding problems in biosciences.
Such a tree would depict how various organisms evolved from each other. Which of the
following features would the tree of life have? (Check all that apply)
 The distance between two organisms in the tree would reflect how closely related
they are.
 The tree would provide information about which species are threatened with
extinction and hence need protection.
 The tree would be a binary tree.
 The path between two nodes would represent the number of evolutionary steps
that took place to evolve one organism from another.
 Primates would be grouped together in the tree.
B) Constructing the tree of life is an intractable problem. As a result, many heuristics
have been developed to find approximated trees. Which of the following statements are
consistent with the previous sentence? (Check all that apply)
 Finding the optimal evolutionary tree is not computationally possible within a
reasonable amount of time.
 Finding such a tree may be easier for a certain group of organisms but may be
very hard to obtain for another group.
 Practically, all approaches used to construct evolutionary trees are heuristics
since they may not produce the best possible tree for every given input.
 Constructing an evolutionary tree would take too much time using a regular
computer, but would take a reasonable time using a supercomputer.
Download