Description of a project on detecting differentially expressed genes

advertisement
Differential Expression, supervised and unsupervised clustering methods
STAT 675 (Rice University) or GS010103 (GSBS)
Data sets: Choose one of the following tab-delimited data sets which are available at
http:// odin.mdacc.tmc.edu/~kdo/TeachBioinf/AdvStatGE-Prot.htm
(1) nci60.tsv
Also see: http://bioinf.ucd.ie/people/aedin/R/pages/made4/html/NCI60.html
(2) golub3731new.tsv
Also see http://www.mssm.edu/faculty/yongchao-ge/multtest/multtest-manual.pdf
(3) alon2000unique.tsv
Also see http://microarray.princeton.edu/oncology/affydata/index.html
Questions:
1/ Read the papers in the reference list below.
2/ Perform some preliminary filtering process to reduce the chosen data set, using a specified
FDR argument (the reduced data set can be around 300 genes), using SAM, beta-uniform
mixtures (BUM), or some other method.
3/ On the reduced data set, perform and compare the following clustering methods: hierarchical
clustering, classification and regression trees, principal components analysis, and gene shaving
(supervised and unsupervised). Discuss the merits and demerits of each clustering method,
and compare the final clusters obtained through the different methods.
4/ Compute posterior probabilities of differential expression associated with each of the “genes”
in the final clusters. Discuss your analysis results and comment/compare with results in the
published papers below.
5/ Present the analysis results in a report and an oral presentation.
Notes:
Specific notes on how to use the GeneClust code (implementing gene shaving) on the
Department of Biostatistics website can be found at
http:// odin.mdacc.tmc.edu/~kdo/TeachBioinf/AdvStatGE-Prot.htm
References:
Alon et al (1999). Broad patterns of gene expression revealed by clustering analysis of
tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96: 67456-6750.
Ben-Dor et al (2000). Tissue classification with gene expression profiles. Journal of
Computational Biology 7: 559-584.
Do et al (2003), GeneClust. In the book The Analysis of Gene Expression Data –
Methods and Software (edited by Parmigiani, Garret, Irizarry, and Zeger).
Golub et al (1999). Molecular classification of cancer: calss discovery and class
prediction by gene expression monitoring. Science 286:531-537.
Hastie et al (2000). ‘Gene shaving’ as a method for identifying distinct sets of genes
with similar expression patterns. Genome Biology, 1: research0003.1-0003.21.
Scherf et al (2000), A Gene Expression Database for the Molecular Pharmacology of
Cancer. Nature Genetics 24: 236-244.
Download