BISC 478

advertisement
BISC 478 : Computational Genome Analysis
Lecture Time: 9:00 – 10:50am TTh, Room RRI 301
Computer Lab/Discussion Time: 3-4pm Tue, Room RRI 301
Instructors
Professor Ting Chen
Professor Andrew Smith
Professor Frank Alber
213-740-2415(O)
213-821-4142(O)
213-740-0778(O)
tingchen@usc.edu
andrews@usc.edu
alber@usc.edu
T:1-3pm
T:1-3pm
T:1-3pm
Units: 4
Description: This course provides an introduction to the computational side of molecular biology, with
an emphasis on genome analysis. With the development of new biotechnologies, enormous amount of
biological data including molecular sequences, networks, pathways, structures, and genetic
polymorphisms, have been generated and accumulated from a wide range of scientific studies. These
biological data have significantly improved our understanding of life, environments and human health.
This course introduces students to the basics of computational and statistical thinking within the context
of biological and biomedical sciences through data analysis.
Goals:



To develop basic analytical skills in computation and statistics within the context of molecular
biology,
To prepare students for future academic research and training in broad areas of life sciences,
biotechnologies and health care, and
To prepare students for future career in life sciences and health care related industry.
Book:
Computational Genome Analysis: An Introduction. (Deonier, Tavare, Waterman, Springer 2005)
Course Contents: The course content includes introduction to probability and statistics, probability and
statistics of biological sequences, genome rearrangement, sequence alignment, BLAST and FASTA,
sequence assembly, sequence motifs, gene expression analysis, phylogenetic analysis, genetic variations
in populations, and comparative genomics.
Computer Lab and Discussion: Students will participate in the computer lab to learn statistical
computing environment of R and using R to solve biological problems.
Grade: The final grade will be based on two midterms and one final exam (26% each),
homeworks (15%), and course attendants (7%)
Course Schedule
Wk. 1
Introduction to Genomes (Ch1, all)
Words/ An Introduction to Probability, 1 (Ch2; 2.1-2.3.3)
Wk. 2
Words/ An Introduction to Probability, 2 (Ch2; 2.3.4-2.5)
Words/ An Introduction to Probability, 3 (Ch2; 2.6-2.9)
Wk. 3
Words/ An Introduction to Statistics, 1 (Ch3; 3.1-3.2)
Words/ An Introduction to Statistics, 2 (Ch3; 3.3-3.4.1)
Wk. 4
Words/ An Introduction to Statistics, 3 (Ch3; 3.4.2-3.6)
Physical Mapping-1 (Ch4; 4.1-4.4)
Wk. 5
Mini-review
Examination I
Wk. 6
Genome Rearrangements (Ch5;5.1-5.2)
Genome Rearrangements (Ch5;5.3-5.4)
Wk. 7
Sequence Alignments (Ch6;6.1-6.4)
Sequence Alignments (Ch6;6.5-6.8)
Wk. 8
FASTA and BLAST (Ch7;7.1-7.2,7.5)
FASTA and BLAST (Ch7;7.3-7.4)
Wk. 9
Sequence Assembly (Ch8;8.1-8.3)
Sequence Assembly (Ch8;8.3-8.4)
Wk. 10 Spring Break
Spring Break
Wk. 11 Mini-review
Examination II
Wk. 12 Clustering (Ch10; 10.1-10.3)
Clustering (Ch10; 10.4-10.5)
Wk. 13 Gene Expression (Ch11; 11.1-11.3)
Gene Expression (Ch11; 11.4)
Wk. 14 Gene Expression (Ch11; 11.5)
Phylogenetics (Ch12; 12.1-3)
Wk. 15 Phylogenetics (Ch12; 12.4-13.5)
Genetic Variation (Ch13; 13.1-13.3)
Wk. 16 Genetic Variation (Ch13; 13.4-13.6)
Mini-review
FINAL EXAM
Topics
Introduction to genome: We will review the basics and genome structure and gene expression
mechanism. (Chap 1)
Introduction to Probability I: We will introduce basic concepts of discrete random variables,
probabilistic distributions, independence, expected values, variances, binomial distributions, conditional
probability, the Markov probability, and Markov chains under the context of words in biological
sequences. (Chap 2)
Introduction to Probability II: We will further study the distribution of words in biological sequences,
Poisson approximation to the binomial distributions, the Poisson process, continuous random variables,
the Central Limit Theorem, and the confidence intervals, with applications to motif-finding in biological
sequence analysis. (Chap 3)
Genome Rearrangements: We will introduce the concept of conserved synteny, various types of
genome rearrangements, and estimating reversal distances between genome sequences. (Chap 5)
Sequence Alignment: We will discuss dynamic programming, global alignments and local alignments.
(Chap 6)
BLAST and FASTA: We will introduce the word count statistics, the computational framework in
BLAST and FASTA, the scoring function of BLOSUM, and the statistics of BLAST. (Chap 7)
Sequence Assembly: We will introduce the principles of DNA sequencing, probability and statistics of
genome sequencing, the overlap-layout-consensus assembly algorithm, and the de Brujin graph assembly
algorithm. (Chap 8)
Signals in DNA: We will discuss how to model DNA motifs for transcriptional factor (TF) binding sites,
and how to identify these DNA motifs. (Chap 9)
Similarity, Distance and Clustering: We will introduce how to compute distance and similarity for
various biological measures, and basic algorithm for hierarchical clustering. (Chap 10)
Gene Expression Analysis: We will discuss the technology for profiling gene expression levels, and
analysis of gene expression data including clustering and principle component analysis. (Chap 11)
Phylogenetic Trees: We will discuss biological trees, pasimony methods and distance methods,
stochastic model for base substitutions, distance estimation, and the maximum likelihood methods for tree
construction. (Chap 12)
Genetic Variation in Population: We will discuss genetic variation in human, population structure,
recombination, linkage disequilibrium (LD), the Wright-Fisher Model for gene frequencies, and
Coalescent model. (Chap 13)
Download