91.510 Syllabus

Computer Science 91.510
Computational Methods in Molecular Biology
Spring 2004
Instructor: Georges Grinstein
Office: Olsen 301
Phone: 978-934-3627
Email: [email protected]
Office Hours: 1:30 – 2:20 PM Tu-Th, and by appt.
Course Time and Place: Wed 5:30 – 8:20 PM in Olsen 410
Course web page: www.cs.uml.edu/~grinstei/91.510
Material: This is an advanced course in computer science, focusing on current problems
in genomics. Our emphasis will be analytic, on discovering appropriate combinatorial
algorithm problems and the techniques to solve these problems. Primary topics will
include DNA sequence assembly, DNA/protein sequence comparison, phylogenetic trees,
RNA and protein folding, microarray analysis, and their applications to human health.
The course will use retinol-binding protein 4 (RBP4) as a model gene/protein. RBP4 is a
member of the lipocalin family. It is a small, abundant carrier protein.
We will study it in a variety of contexts including
sequence alignment
gene expression
protein structure
homologs in various species
We will also use the Pol protein of HIV-1 as an example.
Homeworks and Software: The most important homework will be readings in the
primary textbooks, related chapters in secondary reference books, related papers, and
solving the laboratory problems. In each case, public domain or other software will be
made available to run the algorithms on public or other similar data sets.
Grading: Grades will be assigned based on the following formula:
Presentations - 10%
Laboratory, problem, and algorithm web page (portfolio/notebook) – 20%
Final Project – 40%
Final Exam – 30%
Extra credit: find a mistake in a database or in an algorithm in some public
domain software
Presentation: I will be lecturing on foundational material in computational biology and
algorithms. The following week, to augment this theoretical material, each student
(presumably in a group pair) will be required to make a presentation on the modern
algorithms in software tools currently available for some class of problems. This
presentation will emphasize both what the software does via demonstration, and a
discussion of what the associated algorithmic issues are (with pseudocode) . Each
presentation will be available on your web page with links to appropriate systems and
resources, as well as the slides used in the presentation and documented pseudocode. So
each class, except for the first few, will consist of an advanced presentation related to last
week’s topic, followed by my presentation on the topic at hand.
Web page: You will place all results from the labs, problems, exercises, algorithms
designed (pseudocode) or implemented (code) on your web page. This web page is how I
will be able to evaluate you.
Project: This is your opportunity to study and apply all aspects of the course topics in
depth. You will discover a novel gene (by April 30) and corresponding phylogenetic tree
(by May 13). The final results will be presented at a class poster session (as a PPT
presentation) as well as written up as a major report. Electronic versions of both, along
with supplementation information including figures, history, references, datasets, and
custom software will be also made available on your web page.
Exercises: I will be making up some exercises for practice both from the algorithmic and
biological viewpoints. Simply place the answers to these on your web pages. These are
required to pass the course but will count toward your grade.
Notes and guidelines:
1. This is an advanced course in algorithms, focused on applications to Molecular
Biology. It is targeted to advanced Masters and Doctoral students. Computer Science
students should not take this course if they do not have good knowledge of (or done badly
in) a course in algorithms (ideally the equivalent of the graduate course). Biomedical
engineering students (and life science or medical school students) are expected to have
good knowledge of biology, genetics and biochemistry.
2. I suggest that you form pairs with one computational science student and one life
scientist student for presentations and projects, so as to help each other get a more
balanced view.
3. Please check the WWW page for the course regularly. All course handouts and
materials are available there, along with the latest announcements.
4. Begin developing a professional web page related to the course. Place notes, figures,
and datasets there. Your final project will be placed there.
5. Because a primary goal of the course is to teach professionalism, any academic
dishonesty will be viewed as evidence that this goal has not been achieved, and will be
grounds for receiving a grade of F. (See CS and University procedures and guidelines on
academic dishonesty).
Textbooks: There are two reference textbooks for this course. Some of the material we
will also cover appears the secondary references, in order of priority! Additional readings
will be assigned with papers available on the course web page.
Required Textbooks:
Pevsner, Bioinformatics and Functional Genomics, Wiley-Liss Publishers, .John
Wiley and Sons, 2003, ISBN 0-471-21004-8
Setubal and Meidanis, Introduction to Computational Molecular Biology, Brooks
Cole Publishing Company, 1997. ISBN 0-534- 95262-3
Primary additional textbooks (Not required)
Durbin, Eddy, Krogh, and Mitchison, Biological Sequence Analysis: Probabilistic
Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.
Seventh printing 2002; Paperback ISBN 0-521-62971-3
Kohane, Kho and Butte, Microarrays for an Integrative Genomics, A Bradford
Book, MIT Press, 2003. ISBN 0-262-11271-X
Secondary Textbooks (Algorithms):
Mount, Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor
Laboratory Press, 2001. ISBN 0-87969-597-8
Jiang, Xu, and Zhang, Current Topics in Computational Molecular Biology, A
Bradford Book, MIT Press, 2002. ISBN 0-262-10092-4
Secondary Textbooks (Bioinformatics – Biological Viewpoint):
Krane and Raymer, Fundamental Concepts of Bioinformatics, 2003. Paperback
ISBN 0-8053-4633-3
Campbell and Heyer, Discovering Genomics, Proteomics, and Bioinformatics,
Benjamin Cummings, 2003. Paperback ISBN 0-8053-4722-4
Additional References - Algorithms:
Salzberg. Searls and Kasif, Computational Methods in Molecular Biology,
Elsevier, 2002. ISBN 0-444-50-204-1
Waterman, Introduction to Computational Molecular Biology: Maps, Sequences
and Genomes, Chapman & Hall, CRC Press, 1995, CRC reprint 2000. ISBN 0412-99391-0
Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge Univ. Press,
1997. ISBN 0-521-58519-8
Baldi and Brunak, Bioinformatics: The Machine Learning Approach,
MIT Press, 2001. ISBN 0-262-0256-X
Additional References – Molecular Biology:
Thompson, Hellack, Braver and Durica, Primer of Genetic Analysis: A Problems
Approach, Cambridge University Press, 1997. Reprinted 2000. Paperback isbn 0521-47312-8
Clark and Russell, Molecular Biology made simple and fun, Cache River Press,
1997. Paperback ISBN 0-9627422-9-5
Frank-Kamenetskii, Unraveling DNA: The Most Important Molecule of Life,
Perseus Books, 1997. Paperback ISBN 0-201-15884-2
Baldi and Hatfield, DNA Microarrays and Gene Expression: From Experiments
to Data Analysis and Modeling, Cambridge University Press, 2002. ISBN 0-52180022-6
Knudsen, A Biologist’s guide to Analysis of Microarray Data, Wiley-LISS, 2002.
ISBN 0-471-22490-1
Warrington, Todd and Wong, Microarrays and Cancer Research, BioTechniques
Press, Eaton Publishing, 2002. Paperback ISBN 1-881299-51-1
Bishop and Rawlins, DNA and Protein Sequence Analysis Oxford University
Press, 1997.
Baxevanis and Ouellette, Bioinformatics, Wiley, 1998
Sankoff and Kruskal Time Warps, String Edits, and Macromolecules, CSLI
Publications 1999 (reprint).
Watson, Gilman, Witkowski, and Zoller, Recombinant DNA, Scientific American
Press, 1992.
Recordings: the lectures will be recorded and posted at:
Schedule (P=Pevsner; SM=Setubal&Meidanis)
Book Chapter
Jan 28
Introduction to
P1, P2, SM1
Feb 4
Pairwise alignment:
algorithms and matrices
P3, SM2
Feb 9
BLAST and related programs P4, SM3
Feb 19
Advanced database searching P5, SM3
Feb 23
Gene expression
Mar 1
Gene expression: microarray P7
data analysis
Mar 8
Protein families &
P8, SM3
Mar 22
Protein structure
P9, SM8
Mar 29
Multiple sequence alignment P10, SM3
Apr 5
Molecular phylogeny:
Apr 12
Molecular phylogeny: making P11
P11, SM6
Genome Analysis:
P12, SM4, SM5,
Fragment Assembly; Physical SM7
Mappings of DNA; Genome
Rearrangements; Systematics
Apr 21* Completed genomes: viruses, P13, P14, P15,
prokaryotes and fungi
SM4, SM5, SM7
Apr 26
Functional analysis of
pathways: yeast
May 3
Eukaryotic genomes: from
parasites to primates
May 10
Human genome and disease
P17, P18
Final Project Presentations
Final Exam
Send me an email with a message (no more than one page) stating
1) your Computer Science, Mathematics, Biology and Chemistry backgrounds;
2) your goals and research interests;
3) what you hope to learn from taking this course; and
4) the amount of time you expect to spend on this course.