BioInformatics/Computational Biology Courses

advertisement
BioInformatics/Computational Biology
Courses
Developed and Taught by Professor B. Mishra over the last Three
Years:
Computational Biology:
G22.3033.006
Prerequisites:
Mathematical Maturity, Combinatorics, Statistics and Algorithms Design
Syllabus:
1.Genome Structure & Grammar
2. Mapping
3. Sequencing & Resequencing
4. Transcription Maps:
Gene Finding, Regulatory Sequences
5. Structural Genomics & Proteomics
6. Functional Genomics:
Genetic Networks, Gene Expression Arrays,
Clustering Algorithms, Ideas from Learning Theory
7. Population Genomics:
SNPs, Linkage Analysis
8. Comparative Genomics
9. DNA Computers
Required Text(s):



Bioinformatics: Sequence and Genome Analysis by David Mount, Cold Spring Harbor
Laboratory, ISBN 0879696087.
Introduction to Computational Molecular Biology by Setubal & Meidanis, PWS Publishing
Company, ISBN 0-534-95262-3.
Analysis of Human Genetic Linkage by Jurg Ott, The Johns Hopkins University Press, ISBN
0-8018-4257-3.
Recommended Text(s):



Genomics: The Science and Technology Behind the Human Genome Project by Charles R. Cantor
and Cassandra L. Smith, John Wiley & Sons, ISBN 0471599085.
Principles of Genome Analysis by S.B. Primrose, Blackwell Science, ISBN 0-86542-946-4.
The Human Genome Project: Deciphering the Blueprint of Heredity. Edited by N.G. Cooper,
University Science Books, ISBN 935702-29-6.
Special Topics in Math Biology
Computational Genomics:
G63.2856.002/G22.3033.006
Description:
The genome contained within a human cell is very large and complex. It holds all of the genetic
information necessary for its creation and function encoded with a total of six feet of DNA. The goals of the
Human Genome Initiative (HGI), as framed by the National Institutes of Health and the Department of
Energy, are to generate a complete map, containing well-defined markers, and to sequence the entire
human genome within the next seven, or less years. The sequencing aspects of this project will have to deal
with approximately 3 billion base pairs. A large number of genes (70,000-100,000) will be identified and
characterized in terms of biochemical, developmental, and clinical criteria. Additionally, the development
of approaches to globally, and quantitatively, characterize message (RNA transcripts, which direct
synthesis of specific proteins) will also play a major role in virtually every aspect of biological,
pharmaceutical and clinical research.
The science of computational genomics and bio-informatics have been created out of this massive sea of
sequence data and the need to establish functionality of genes largely based on similarities discerned at the
level of the DNA code; bypassing the need for extensive biochemical characterization.
This emerging subfield relies on some classical and many novel mathematical, statistical and algorithmic
ideas that are essential to accomplish this task. This course deals with mainly these mathematical and
computational approaches. The course is self-contained, developing the biological, statistical, probabilistic
and algorithmic tools and techniques along the way.
Syllabus:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Introduction & History.
Some Molecular Biology: DNA, Transfer RNA and Protein sequence.
Biochemistry.
Restriction Maps.
DDP (Double Digestion Problem): Complexity and Algorithms.
Cloning and Clone Libraries.
Physical Genome Maps (Oceans, Islands and Anchors): Lander-Waterman statistics.
Sequence Assembly.
Alignment of Two and Multiple Sequences.
Lander-Waterman Statistics and Applications to Sequence Alignment.
RNA Secondary Structure.
Optical Mapping and Map-Based Sequence Assembly.
Research Problems.
Prerequisites:
Mathematical Maturity, Combinatorics, Statistics and Algorithms Design
Required Text(s):




Statistical Genomics: Linkage, Mapping and QTL Analysis. By Ben Hui Liu, CRC Press,
ISBN 0-8493-3166-8.
Introduction to Computational Molecular Biology. By Setubal & Meidanis, PWS
Publishing Company, ISBN 0-534-95262-3.
Introduction to Computational Biology: Maps, Sequences and Genomes. By Michael
Waterman, Chapman and Hall, ISBN 0-412-99391-0
Analysis of Human Genetic Linkage, By Jurg Ott, The Johns Hopkins University Press,
ISBN 0-8018-4257-3.
Recommended Text(s):


Principles of Genome Analysis. By S.B. Primrose, Blackwell Science, ISBN 0-86542946-4.
The Human Genome Project: Deciphering the Blueprint of Heredity. Edited by N.G.
Cooper, University Science Books, ISBN 935702-29-6.
Topics in Computational Biology:
G22.3033.011
Comparative and Functional Genomics
Prerequisites:
Mathematical Maturity, Statistics and Introductory Genomics
Syllabus:
1. Comparative Genomics:
Evolutionary Models, Statistics..
2. Phylogeny:
Models, Algorithms and Complexity
3. Comparing Genome Structure among the Species:
Genomic Rearrangements
4. Comparing Genome Structure within a Specie:
Gene Duplications, Gene Families, Pseudo-genes, etc.
5. Transcription Maps:
Gene Finding, Regulatory Sequences
6. Functional Genomics:
Genetic Networks, Gene Expression Arrays,
Clustering Algorithms, Ideas from Learning Theory
7. Gene Expression Arrays and its Effectivity:
8. Combining with with other Data:
9. Proteomics:
10. Population Genomics:
SNPs, Linkage Analysis
11. Cancer Genomics:
Required Text(s):



Molecular Cell Biology
by Harvey Lodish, Arnold Berk, S. Lawrence Zipursky, Paul Matsudaira
Hardcover 4th Bk&CD-Rom (Windows) edition (October 1999) W H Freeman & Co.;
ISBN: 071673706X
Post-Genome Informatics
by Minoru Kanehisa
Paperback - 148 pages 1st edition (March 15, 2000) Oxford Univ Press; ISBN:
0198503261
Comparative Genomics - Empirical and Analytical Approaches to Gene Order
Dynamics, Map Alignment and the Evolution of Gene Families
by David Sankoff, Joseph H. Nadeau
Paperback - 568 pages (September 1, 2000) Kluwer Academic Pub; ISBN: 0792365844
Course Description:
The genes of all cells are composed of DNA. Proteins serve as structural components as well as enzymes
within cells but the genes contain the blueprints for each protein and the program for controlling the
production of proteins. Genes are transcribed to produce complementary molecules of mRNA (messenger
RNA) and the mRNA is translated to proteins. There is a one to one correspondence (almost) between
genes and proteins. Proteins perform the work of cells such as energy production, reaction catalysis, intercellular signaling, transcription and translation, cell reproduction, etc. All cells of an organism contain the
same DNA. The level of production of the each of the types of proteins specifies the state of a cell. This
state is determined by spatial and temporal variables such as tissue location and extra-cellular stimuli.
Level of production of a protein is determined primarily by level of transcription of the corresponding gene
into mRNA. This picture seems to be at the core of a universal story of life!
With the recent availability of DNA sequence data, proteomics data and development of tools for wholegenome assays (e.g., gene expression arrays), it has become possible to understand the basic biology of the
cells, identification and function of the genes and how a common/universal theme varies over all life.
The greatest hurdles to the effective development and use of new tools for the "post-genomic informatics"
are problems of mathematics and statistics. There are difficult problems of combinatorial mathematics,
statistics, modeling and algorithm design.
There are challenging problems of how to elucidate genetic networks based on time-sequenced gene
expression data. There are important problems of how to classify cells based on expression pattern and how
to develop diagnostic disease classifications systems. Because of the high dimensionality of the data, there
are many challenging problems of multiplicity and multivariate analysis that must be addressed.
Topics in Computational Biology:
G22.3033.002
Cell Informatics/Systems Biology
Required Text(s):


Computational Analysis of Biochemical Systems: A Practical Guide for Biochemists and
Molecular Biologists
by Eberhard O. Voit
Cambridge Univ Pr; ISBN: 0521785790.
Receptors: Models for Binding, Trafficking, and Signaling
by Douglas A. Lauffenburger, Jennifer J. Linderman
Oxford University Press; ISBN: 0195106636.
Course Description:
Presently, there is no clear way to determine if the current body of biological facts is sufficient to explain
phenomenology. In the biological community, it is not uncommon to assume certain biological problems to
have achieved a cognitive finality without rigorous justification. In these particular cases, rigorous
mathematical models with automated tools for reasoning, simulation, and computation can be of enormous
help to uncover cognitive flaws, qualitative simplification or overly generalized assumptions. Some ideal
candidates for such study would include: prion hypothesis, cell cycle machinery (DNA replication and
repair, chromosome segregation, cell-cycle period control, spindle pole duplication, etc.), muscle
contractility, processes involved in cancer (cell cycle regulation, angiogenesis, DNA repair, apoptosis,
cellular senescence, tissue space modeling enzymes, etc.), signal transduction pathways, circadian rhythms
(especially the effect of small molecular concentration on its robustness), and many others. We believe that
the difficulty of biological modeling will become acute as biologists prepare to understand even more
complex systems.
Fortunately, in the past, similar issues had been faced by other disciplines: for instance, design of complex
microprocessors involving many millions of transistors, building and controlling a configurable robots
involving very high degree-of-freedom actuators, implementing hybrid controllers for high-way traffic or
air-traffic, or even reasoning about data traffic on a computer network. The approaches developed by
control theorists analyzing stability of a system with feedback, physicists studying asymptotic properties of
dynamical systems, computer scientists reasoning about a discrete or hybrid (combining discrete events
with continuous events) reactive systems---all have tried to address some aspects of the same problem in a
very concrete manner. We believe that biological processes could be studied in a similar manner, once the
appropriate tools are made available.
The goal of this course is to understand, design and create a large-scale computational system centered on
the biology of individual cells, population of cells, intra-cellular processes, and realistic simulation and
visualization of these processes at multiple spatio-temporal scales. Such a reasoning system, in the hands of
a working biologist, can then be used to gain insight into the underlying biology, design refutable
biological experiments, and ultimately, discover intervention schemes to suitably modify the biological
processes for therapeutic purposes. The course will focus primarily on two biological processes: genomeevolution and cell-to-cell communication.
Download