Differential Expression, supervised and unsupervised clustering methods STAT 675 (Rice University) or GS010103 (GSBS) Data sets: Choose one of the following tab-delimited data sets which are available at http:// odin.mdacc.tmc.edu/~kdo/TeachBioinf/AdvStatGE-Prot.htm (1) nci60.tsv Also see: http://bioinf.ucd.ie/people/aedin/R/pages/made4/html/NCI60.html (2) golub3731new.tsv Also see http://www.mssm.edu/faculty/yongchao-ge/multtest/multtest-manual.pdf (3) alon2000unique.tsv Also see http://microarray.princeton.edu/oncology/affydata/index.html Questions: 1/ Read the papers in the reference list below. 2/ Perform some preliminary filtering process to reduce the chosen data set, using a specified FDR argument (the reduced data set can be around 300 genes), using SAM, beta-uniform mixtures (BUM), or some other method. 3/ On the reduced data set, perform and compare the following clustering methods: hierarchical clustering, classification and regression trees, principal components analysis, and gene shaving (supervised and unsupervised). Discuss the merits and demerits of each clustering method, and compare the final clusters obtained through the different methods. 4/ Compute posterior probabilities of differential expression associated with each of the “genes” in the final clusters. Discuss your analysis results and comment/compare with results in the published papers below. 5/ Present the analysis results in a report and an oral presentation. Notes: Specific notes on how to use the GeneClust code (implementing gene shaving) on the Department of Biostatistics website can be found at http:// odin.mdacc.tmc.edu/~kdo/TeachBioinf/AdvStatGE-Prot.htm References: Alon et al (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96: 67456-6750. Ben-Dor et al (2000). Tissue classification with gene expression profiles. Journal of Computational Biology 7: 559-584. Do et al (2003), GeneClust. In the book The Analysis of Gene Expression Data – Methods and Software (edited by Parmigiani, Garret, Irizarry, and Zeger). Golub et al (1999). Molecular classification of cancer: calss discovery and class prediction by gene expression monitoring. Science 286:531-537. Hastie et al (2000). ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 1: research0003.1-0003.21. Scherf et al (2000), A Gene Expression Database for the Molecular Pharmacology of Cancer. Nature Genetics 24: 236-244.