BISC 478 : Computational Genome Analysis Lecture Time: 9:00 – 10:50am TTh, Room RRI 301 Computer Lab/Discussion Time: 3-4pm Tue, Room RRI 301 Instructors Professor Ting Chen Professor Andrew Smith Professor Frank Alber 213-740-2415(O) 213-821-4142(O) 213-740-0778(O) tingchen@usc.edu andrews@usc.edu alber@usc.edu T:1-3pm T:1-3pm T:1-3pm Units: 4 Description: This course provides an introduction to the computational side of molecular biology, with an emphasis on genome analysis. With the development of new biotechnologies, enormous amount of biological data including molecular sequences, networks, pathways, structures, and genetic polymorphisms, have been generated and accumulated from a wide range of scientific studies. These biological data have significantly improved our understanding of life, environments and human health. This course introduces students to the basics of computational and statistical thinking within the context of biological and biomedical sciences through data analysis. Goals: To develop basic analytical skills in computation and statistics within the context of molecular biology, To prepare students for future academic research and training in broad areas of life sciences, biotechnologies and health care, and To prepare students for future career in life sciences and health care related industry. Book: Computational Genome Analysis: An Introduction. (Deonier, Tavare, Waterman, Springer 2005) Course Contents: The course content includes introduction to probability and statistics, probability and statistics of biological sequences, genome rearrangement, sequence alignment, BLAST and FASTA, sequence assembly, sequence motifs, gene expression analysis, phylogenetic analysis, genetic variations in populations, and comparative genomics. Computer Lab and Discussion: Students will participate in the computer lab to learn statistical computing environment of R and using R to solve biological problems. Grade: The final grade will be based on two midterms and one final exam (26% each), homeworks (15%), and course attendants (7%) Course Schedule Wk. 1 Introduction to Genomes (Ch1, all) Words/ An Introduction to Probability, 1 (Ch2; 2.1-2.3.3) Wk. 2 Words/ An Introduction to Probability, 2 (Ch2; 2.3.4-2.5) Words/ An Introduction to Probability, 3 (Ch2; 2.6-2.9) Wk. 3 Words/ An Introduction to Statistics, 1 (Ch3; 3.1-3.2) Words/ An Introduction to Statistics, 2 (Ch3; 3.3-3.4.1) Wk. 4 Words/ An Introduction to Statistics, 3 (Ch3; 3.4.2-3.6) Physical Mapping-1 (Ch4; 4.1-4.4) Wk. 5 Mini-review Examination I Wk. 6 Genome Rearrangements (Ch5;5.1-5.2) Genome Rearrangements (Ch5;5.3-5.4) Wk. 7 Sequence Alignments (Ch6;6.1-6.4) Sequence Alignments (Ch6;6.5-6.8) Wk. 8 FASTA and BLAST (Ch7;7.1-7.2,7.5) FASTA and BLAST (Ch7;7.3-7.4) Wk. 9 Sequence Assembly (Ch8;8.1-8.3) Sequence Assembly (Ch8;8.3-8.4) Wk. 10 Spring Break Spring Break Wk. 11 Mini-review Examination II Wk. 12 Clustering (Ch10; 10.1-10.3) Clustering (Ch10; 10.4-10.5) Wk. 13 Gene Expression (Ch11; 11.1-11.3) Gene Expression (Ch11; 11.4) Wk. 14 Gene Expression (Ch11; 11.5) Phylogenetics (Ch12; 12.1-3) Wk. 15 Phylogenetics (Ch12; 12.4-13.5) Genetic Variation (Ch13; 13.1-13.3) Wk. 16 Genetic Variation (Ch13; 13.4-13.6) Mini-review FINAL EXAM Topics Introduction to genome: We will review the basics and genome structure and gene expression mechanism. (Chap 1) Introduction to Probability I: We will introduce basic concepts of discrete random variables, probabilistic distributions, independence, expected values, variances, binomial distributions, conditional probability, the Markov probability, and Markov chains under the context of words in biological sequences. (Chap 2) Introduction to Probability II: We will further study the distribution of words in biological sequences, Poisson approximation to the binomial distributions, the Poisson process, continuous random variables, the Central Limit Theorem, and the confidence intervals, with applications to motif-finding in biological sequence analysis. (Chap 3) Genome Rearrangements: We will introduce the concept of conserved synteny, various types of genome rearrangements, and estimating reversal distances between genome sequences. (Chap 5) Sequence Alignment: We will discuss dynamic programming, global alignments and local alignments. (Chap 6) BLAST and FASTA: We will introduce the word count statistics, the computational framework in BLAST and FASTA, the scoring function of BLOSUM, and the statistics of BLAST. (Chap 7) Sequence Assembly: We will introduce the principles of DNA sequencing, probability and statistics of genome sequencing, the overlap-layout-consensus assembly algorithm, and the de Brujin graph assembly algorithm. (Chap 8) Signals in DNA: We will discuss how to model DNA motifs for transcriptional factor (TF) binding sites, and how to identify these DNA motifs. (Chap 9) Similarity, Distance and Clustering: We will introduce how to compute distance and similarity for various biological measures, and basic algorithm for hierarchical clustering. (Chap 10) Gene Expression Analysis: We will discuss the technology for profiling gene expression levels, and analysis of gene expression data including clustering and principle component analysis. (Chap 11) Phylogenetic Trees: We will discuss biological trees, pasimony methods and distance methods, stochastic model for base substitutions, distance estimation, and the maximum likelihood methods for tree construction. (Chap 12) Genetic Variation in Population: We will discuss genetic variation in human, population structure, recombination, linkage disequilibrium (LD), the Wright-Fisher Model for gene frequencies, and Coalescent model. (Chap 13)