BioInformatics/Computational Biology Courses Developed and Taught by Professor B. Mishra over the last Three Years: Computational Biology: G22.3033.006 Prerequisites: Mathematical Maturity, Combinatorics, Statistics and Algorithms Design Syllabus: 1.Genome Structure & Grammar 2. Mapping 3. Sequencing & Resequencing 4. Transcription Maps: Gene Finding, Regulatory Sequences 5. Structural Genomics & Proteomics 6. Functional Genomics: Genetic Networks, Gene Expression Arrays, Clustering Algorithms, Ideas from Learning Theory 7. Population Genomics: SNPs, Linkage Analysis 8. Comparative Genomics 9. DNA Computers Required Text(s): Bioinformatics: Sequence and Genome Analysis by David Mount, Cold Spring Harbor Laboratory, ISBN 0879696087. Introduction to Computational Molecular Biology by Setubal & Meidanis, PWS Publishing Company, ISBN 0-534-95262-3. Analysis of Human Genetic Linkage by Jurg Ott, The Johns Hopkins University Press, ISBN 0-8018-4257-3. Recommended Text(s): Genomics: The Science and Technology Behind the Human Genome Project by Charles R. Cantor and Cassandra L. Smith, John Wiley & Sons, ISBN 0471599085. Principles of Genome Analysis by S.B. Primrose, Blackwell Science, ISBN 0-86542-946-4. The Human Genome Project: Deciphering the Blueprint of Heredity. Edited by N.G. Cooper, University Science Books, ISBN 935702-29-6. Special Topics in Math Biology Computational Genomics: G63.2856.002/G22.3033.006 Description: The genome contained within a human cell is very large and complex. It holds all of the genetic information necessary for its creation and function encoded with a total of six feet of DNA. The goals of the Human Genome Initiative (HGI), as framed by the National Institutes of Health and the Department of Energy, are to generate a complete map, containing well-defined markers, and to sequence the entire human genome within the next seven, or less years. The sequencing aspects of this project will have to deal with approximately 3 billion base pairs. A large number of genes (70,000-100,000) will be identified and characterized in terms of biochemical, developmental, and clinical criteria. Additionally, the development of approaches to globally, and quantitatively, characterize message (RNA transcripts, which direct synthesis of specific proteins) will also play a major role in virtually every aspect of biological, pharmaceutical and clinical research. The science of computational genomics and bio-informatics have been created out of this massive sea of sequence data and the need to establish functionality of genes largely based on similarities discerned at the level of the DNA code; bypassing the need for extensive biochemical characterization. This emerging subfield relies on some classical and many novel mathematical, statistical and algorithmic ideas that are essential to accomplish this task. This course deals with mainly these mathematical and computational approaches. The course is self-contained, developing the biological, statistical, probabilistic and algorithmic tools and techniques along the way. Syllabus: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Introduction & History. Some Molecular Biology: DNA, Transfer RNA and Protein sequence. Biochemistry. Restriction Maps. DDP (Double Digestion Problem): Complexity and Algorithms. Cloning and Clone Libraries. Physical Genome Maps (Oceans, Islands and Anchors): Lander-Waterman statistics. Sequence Assembly. Alignment of Two and Multiple Sequences. Lander-Waterman Statistics and Applications to Sequence Alignment. RNA Secondary Structure. Optical Mapping and Map-Based Sequence Assembly. Research Problems. Prerequisites: Mathematical Maturity, Combinatorics, Statistics and Algorithms Design Required Text(s): Statistical Genomics: Linkage, Mapping and QTL Analysis. By Ben Hui Liu, CRC Press, ISBN 0-8493-3166-8. Introduction to Computational Molecular Biology. By Setubal & Meidanis, PWS Publishing Company, ISBN 0-534-95262-3. Introduction to Computational Biology: Maps, Sequences and Genomes. By Michael Waterman, Chapman and Hall, ISBN 0-412-99391-0 Analysis of Human Genetic Linkage, By Jurg Ott, The Johns Hopkins University Press, ISBN 0-8018-4257-3. Recommended Text(s): Principles of Genome Analysis. By S.B. Primrose, Blackwell Science, ISBN 0-86542946-4. The Human Genome Project: Deciphering the Blueprint of Heredity. Edited by N.G. Cooper, University Science Books, ISBN 935702-29-6. Topics in Computational Biology: G22.3033.011 Comparative and Functional Genomics Prerequisites: Mathematical Maturity, Statistics and Introductory Genomics Syllabus: 1. Comparative Genomics: Evolutionary Models, Statistics.. 2. Phylogeny: Models, Algorithms and Complexity 3. Comparing Genome Structure among the Species: Genomic Rearrangements 4. Comparing Genome Structure within a Specie: Gene Duplications, Gene Families, Pseudo-genes, etc. 5. Transcription Maps: Gene Finding, Regulatory Sequences 6. Functional Genomics: Genetic Networks, Gene Expression Arrays, Clustering Algorithms, Ideas from Learning Theory 7. Gene Expression Arrays and its Effectivity: 8. Combining with with other Data: 9. Proteomics: 10. Population Genomics: SNPs, Linkage Analysis 11. Cancer Genomics: Required Text(s): Molecular Cell Biology by Harvey Lodish, Arnold Berk, S. Lawrence Zipursky, Paul Matsudaira Hardcover 4th Bk&CD-Rom (Windows) edition (October 1999) W H Freeman & Co.; ISBN: 071673706X Post-Genome Informatics by Minoru Kanehisa Paperback - 148 pages 1st edition (March 15, 2000) Oxford Univ Press; ISBN: 0198503261 Comparative Genomics - Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment and the Evolution of Gene Families by David Sankoff, Joseph H. Nadeau Paperback - 568 pages (September 1, 2000) Kluwer Academic Pub; ISBN: 0792365844 Course Description: The genes of all cells are composed of DNA. Proteins serve as structural components as well as enzymes within cells but the genes contain the blueprints for each protein and the program for controlling the production of proteins. Genes are transcribed to produce complementary molecules of mRNA (messenger RNA) and the mRNA is translated to proteins. There is a one to one correspondence (almost) between genes and proteins. Proteins perform the work of cells such as energy production, reaction catalysis, intercellular signaling, transcription and translation, cell reproduction, etc. All cells of an organism contain the same DNA. The level of production of the each of the types of proteins specifies the state of a cell. This state is determined by spatial and temporal variables such as tissue location and extra-cellular stimuli. Level of production of a protein is determined primarily by level of transcription of the corresponding gene into mRNA. This picture seems to be at the core of a universal story of life! With the recent availability of DNA sequence data, proteomics data and development of tools for wholegenome assays (e.g., gene expression arrays), it has become possible to understand the basic biology of the cells, identification and function of the genes and how a common/universal theme varies over all life. The greatest hurdles to the effective development and use of new tools for the "post-genomic informatics" are problems of mathematics and statistics. There are difficult problems of combinatorial mathematics, statistics, modeling and algorithm design. There are challenging problems of how to elucidate genetic networks based on time-sequenced gene expression data. There are important problems of how to classify cells based on expression pattern and how to develop diagnostic disease classifications systems. Because of the high dimensionality of the data, there are many challenging problems of multiplicity and multivariate analysis that must be addressed. Topics in Computational Biology: G22.3033.002 Cell Informatics/Systems Biology Required Text(s): Computational Analysis of Biochemical Systems: A Practical Guide for Biochemists and Molecular Biologists by Eberhard O. Voit Cambridge Univ Pr; ISBN: 0521785790. Receptors: Models for Binding, Trafficking, and Signaling by Douglas A. Lauffenburger, Jennifer J. Linderman Oxford University Press; ISBN: 0195106636. Course Description: Presently, there is no clear way to determine if the current body of biological facts is sufficient to explain phenomenology. In the biological community, it is not uncommon to assume certain biological problems to have achieved a cognitive finality without rigorous justification. In these particular cases, rigorous mathematical models with automated tools for reasoning, simulation, and computation can be of enormous help to uncover cognitive flaws, qualitative simplification or overly generalized assumptions. Some ideal candidates for such study would include: prion hypothesis, cell cycle machinery (DNA replication and repair, chromosome segregation, cell-cycle period control, spindle pole duplication, etc.), muscle contractility, processes involved in cancer (cell cycle regulation, angiogenesis, DNA repair, apoptosis, cellular senescence, tissue space modeling enzymes, etc.), signal transduction pathways, circadian rhythms (especially the effect of small molecular concentration on its robustness), and many others. We believe that the difficulty of biological modeling will become acute as biologists prepare to understand even more complex systems. Fortunately, in the past, similar issues had been faced by other disciplines: for instance, design of complex microprocessors involving many millions of transistors, building and controlling a configurable robots involving very high degree-of-freedom actuators, implementing hybrid controllers for high-way traffic or air-traffic, or even reasoning about data traffic on a computer network. The approaches developed by control theorists analyzing stability of a system with feedback, physicists studying asymptotic properties of dynamical systems, computer scientists reasoning about a discrete or hybrid (combining discrete events with continuous events) reactive systems---all have tried to address some aspects of the same problem in a very concrete manner. We believe that biological processes could be studied in a similar manner, once the appropriate tools are made available. The goal of this course is to understand, design and create a large-scale computational system centered on the biology of individual cells, population of cells, intra-cellular processes, and realistic simulation and visualization of these processes at multiple spatio-temporal scales. Such a reasoning system, in the hands of a working biologist, can then be used to gain insight into the underlying biology, design refutable biological experiments, and ultimately, discover intervention schemes to suitably modify the biological processes for therapeutic purposes. The course will focus primarily on two biological processes: genomeevolution and cell-to-cell communication.