Masha Kazakov, Michal Rabani, 2/5/04 Gene Expression: Clustering Abstract Microarray technology is rapidly becoming a standard technique used in research laboratories all across the world. This technology allows simultaneous profiling of the expression levels of tens of thousands of genes, and potentially whole genomes in a single experiment. This unique power provides scientists with an opportunity to look at the transcriptional profile of biologic systems, processes in an unbiased fashion. This amount of information cannot be analyzed without some computational methods. Therefore, a major computational task is to understand the structure of the data that arises from this technology. Gene clustering is a tool for arranging genes according to similarity in their expression patterns. Classifying genes into clusters can lead to interesting biological insights. Patterns seen in genomewide expression experiments can give indications about unknown regulatory elements. Moreover, genes with similar functions cluster together. Thus clustering genes of known functions with poorly characterized genes may provide a simple means of gaining insights into the functions of these uncharacterized genes1. Patterns seen in genome-wide expression data can give indications about the status of cellular processes and information about unknown biological pathways2, 3. In addition, cluster analysis is used for data reduction and visualization1. We will focus on one of many clustering methods - hierarchical clustering, which is commonly used. Here relationships among genes are represented by a tree whose branch lengths reflect the degree of similarity between the objects, as assessed by a pairwise similarity function4. This method is useful to represent varying degrees of similarity and more distant relationships among groups of closely related genes. To illustrate the method and it's power in analyzing biological data, we will review two experiments in which pairwise average linkage clustering algorithm (hierarchical clustering) was applied to gene expression data collected from yeast cells2, 3. The first is a genome-wide experiment in Saccharomyces cerevisiae, designed to identify genes whose regulation is cell-cycle dependent and to classify them2. It illustrates how understanding of cellular processes can be extracted from a set of microarray experiments followed by gene clustering. Furthermore it shows how new regulatory elements can be discovered using clustering methods. The second experiment deals with Saccharomyces cerevisiae adaptation to environmental changes. This experiment demonstrate how clustering enables us to find the relevant genes and characterize biological pathways3. 1. Eisen M. B., Spellman P. T., Brown R. O., Botstein D. Cluster analysis and display of genome-wide expression pattern. Proc. Natl. Acad. Sci. USA; 95:14863-14868 (1998). 2. Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell; 9:3273-3297 (1998). 3. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO. Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell; 11(12):4241-57 (2000). 4. Shannon William, Culverhouse Robert, Duncan Jill. Analyzing microarray data using cluster analysis. Pharmacogenomics; 4(1):41-51 (2003). Review. 5. Kaminski Naftali, Friedman Nir. Practical Approaches to Analyzing Results of Microarray Experiments. American Journal of Respiratory and Cell Molecular Biology; 27:125-132 (2002). Review.