Computer Science 91.510 Computational Methods in Molecular Biology Spring 2004 Instructor: Georges Grinstein Office: Olsen 301 Phone: 978-934-3627 Email: grinstein@cs.uml.edu Office Hours: 1:30 – 2:20 PM Tu-Th, and by appt. Course Time and Place: Wed 5:30 – 8:20 PM in Olsen 410 Course web page: www.cs.uml.edu/~grinstei/91.510 Material: This is an advanced course in computer science, focusing on current problems in genomics. Our emphasis will be analytic, on discovering appropriate combinatorial algorithm problems and the techniques to solve these problems. Primary topics will include DNA sequence assembly, DNA/protein sequence comparison, phylogenetic trees, RNA and protein folding, microarray analysis, and their applications to human health. The course will use retinol-binding protein 4 (RBP4) as a model gene/protein. RBP4 is a member of the lipocalin family. It is a small, abundant carrier protein. We will study it in a variety of contexts including sequence alignment gene expression protein structure phylogeny homologs in various species We will also use the Pol protein of HIV-1 as an example. Homeworks and Software: The most important homework will be readings in the primary textbooks, related chapters in secondary reference books, related papers, and solving the laboratory problems. In each case, public domain or other software will be made available to run the algorithms on public or other similar data sets. Grading: Grades will be assigned based on the following formula: Presentations - 10% Laboratory, problem, and algorithm web page (portfolio/notebook) – 20% Final Project – 40% Final Exam – 30% Extra credit: find a mistake in a database or in an algorithm in some public domain software Presentation: I will be lecturing on foundational material in computational biology and algorithms. The following week, to augment this theoretical material, each student (presumably in a group pair) will be required to make a presentation on the modern algorithms in software tools currently available for some class of problems. This presentation will emphasize both what the software does via demonstration, and a discussion of what the associated algorithmic issues are (with pseudocode) . Each presentation will be available on your web page with links to appropriate systems and resources, as well as the slides used in the presentation and documented pseudocode. So each class, except for the first few, will consist of an advanced presentation related to last week’s topic, followed by my presentation on the topic at hand. Web page: You will place all results from the labs, problems, exercises, algorithms designed (pseudocode) or implemented (code) on your web page. This web page is how I will be able to evaluate you. Project: This is your opportunity to study and apply all aspects of the course topics in depth. You will discover a novel gene (by April 30) and corresponding phylogenetic tree (by May 13). The final results will be presented at a class poster session (as a PPT presentation) as well as written up as a major report. Electronic versions of both, along with supplementation information including figures, history, references, datasets, and custom software will be also made available on your web page. Exercises: I will be making up some exercises for practice both from the algorithmic and biological viewpoints. Simply place the answers to these on your web pages. These are required to pass the course but will count toward your grade. Notes and guidelines: 1. This is an advanced course in algorithms, focused on applications to Molecular Biology. It is targeted to advanced Masters and Doctoral students. Computer Science students should not take this course if they do not have good knowledge of (or done badly in) a course in algorithms (ideally the equivalent of the graduate course). Biomedical engineering students (and life science or medical school students) are expected to have good knowledge of biology, genetics and biochemistry. 2. I suggest that you form pairs with one computational science student and one life scientist student for presentations and projects, so as to help each other get a more balanced view. 3. Please check the WWW page for the course regularly. All course handouts and materials are available there, along with the latest announcements. 4. Begin developing a professional web page related to the course. Place notes, figures, and datasets there. Your final project will be placed there. 5. Because a primary goal of the course is to teach professionalism, any academic dishonesty will be viewed as evidence that this goal has not been achieved, and will be grounds for receiving a grade of F. (See CS and University procedures and guidelines on academic dishonesty). Textbooks: There are two reference textbooks for this course. Some of the material we will also cover appears the secondary references, in order of priority! Additional readings will be assigned with papers available on the course web page. Required Textbooks: Pevsner, Bioinformatics and Functional Genomics, Wiley-Liss Publishers, .John Wiley and Sons, 2003, ISBN 0-471-21004-8 Setubal and Meidanis, Introduction to Computational Molecular Biology, Brooks Cole Publishing Company, 1997. ISBN 0-534- 95262-3 Primary additional textbooks (Not required) Durbin, Eddy, Krogh, and Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998. Seventh printing 2002; Paperback ISBN 0-521-62971-3 Kohane, Kho and Butte, Microarrays for an Integrative Genomics, A Bradford Book, MIT Press, 2003. ISBN 0-262-11271-X Secondary Textbooks (Algorithms): Mount, Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Laboratory Press, 2001. ISBN 0-87969-597-8 Jiang, Xu, and Zhang, Current Topics in Computational Molecular Biology, A Bradford Book, MIT Press, 2002. ISBN 0-262-10092-4 Secondary Textbooks (Bioinformatics – Biological Viewpoint): Krane and Raymer, Fundamental Concepts of Bioinformatics, 2003. Paperback ISBN 0-8053-4633-3 Campbell and Heyer, Discovering Genomics, Proteomics, and Bioinformatics, Benjamin Cummings, 2003. Paperback ISBN 0-8053-4722-4 Additional References - Algorithms: Salzberg. Searls and Kasif, Computational Methods in Molecular Biology, Elsevier, 2002. ISBN 0-444-50-204-1 Waterman, Introduction to Computational Molecular Biology: Maps, Sequences and Genomes, Chapman & Hall, CRC Press, 1995, CRC reprint 2000. ISBN 0412-99391-0 Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge Univ. Press, 1997. ISBN 0-521-58519-8 Baldi and Brunak, Bioinformatics: The Machine Learning Approach, MIT Press, 2001. ISBN 0-262-0256-X Additional References – Molecular Biology: Thompson, Hellack, Braver and Durica, Primer of Genetic Analysis: A Problems Approach, Cambridge University Press, 1997. Reprinted 2000. Paperback isbn 0521-47312-8 Clark and Russell, Molecular Biology made simple and fun, Cache River Press, 1997. Paperback ISBN 0-9627422-9-5 Frank-Kamenetskii, Unraveling DNA: The Most Important Molecule of Life, Perseus Books, 1997. Paperback ISBN 0-201-15884-2 Baldi and Hatfield, DNA Microarrays and Gene Expression: From Experiments to Data Analysis and Modeling, Cambridge University Press, 2002. ISBN 0-52180022-6 Knudsen, A Biologist’s guide to Analysis of Microarray Data, Wiley-LISS, 2002. ISBN 0-471-22490-1 Warrington, Todd and Wong, Microarrays and Cancer Research, BioTechniques Press, Eaton Publishing, 2002. Paperback ISBN 1-881299-51-1 Others: Bishop and Rawlins, DNA and Protein Sequence Analysis Oxford University Press, 1997. Baxevanis and Ouellette, Bioinformatics, Wiley, 1998 Sankoff and Kruskal Time Warps, String Edits, and Macromolecules, CSLI Publications 1999 (reprint). Watson, Gilman, Witkowski, and Zoller, Recombinant DNA, Scientific American Press, 1992. Recordings: the lectures will be recorded and posted at: http://weblab.cs.uml.edu/~hgoodell/91-510-CompMethods/ Schedule (P=Pevsner; SM=Setubal&Meidanis) Week Lecture Book Chapter Topic Jan 28 Introduction to bioinformatics P1, P2, SM1 Sequence analysis Feb 4 Pairwise alignment: algorithms and matrices P3, SM2 Feb 9 BLAST and related programs P4, SM3 Feb 19 Advanced database searching P5, SM3 Feb 23 Gene expression Mar 1 Gene expression: microarray P7 data analysis Mar 8 Protein families & proteomics P8, SM3 Mar 22 Protein structure P9, SM8 Mar 29 Multiple sequence alignment P10, SM3 Apr 5 Molecular phylogeny: principles Apr 12 Molecular phylogeny: making P11 trees P6 Functional genomics P11, SM6 Genome Analysis: P12, SM4, SM5, Fragment Assembly; Physical SM7 Mappings of DNA; Genome Rearrangements; Systematics Genomics Apr 21* Completed genomes: viruses, P13, P14, P15, prokaryotes and fungi SM4, SM5, SM7 Apr 26 Functional analysis of pathways: yeast P15 May 3 Eukaryotic genomes: from parasites to primates P16 May 10 Human genome and disease P17, P18 Final Project Presentations Final Exam NOTE: Send me an email with a message (no more than one page) stating 1) your Computer Science, Mathematics, Biology and Chemistry backgrounds; 2) your goals and research interests; 3) what you hope to learn from taking this course; and 4) the amount of time you expect to spend on this course.