Gene Expression: Clustering

advertisement
Masha Kazakov, Michal Rabani, 2/5/04
Gene Expression: Clustering
Abstract
Microarray technology is rapidly becoming a standard technique used in research laboratories all
across the world. This technology allows simultaneous profiling of the expression levels of tens of
thousands of genes, and potentially whole genomes in a single experiment. This unique power
provides scientists with an opportunity to look at the transcriptional profile of biologic systems,
processes in an unbiased fashion. This amount of information cannot be analyzed without some
computational methods. Therefore, a major computational task is to understand the structure of the
data that arises from this technology.
Gene clustering is a tool for arranging genes according to similarity in their expression patterns.
Classifying genes into clusters can lead to interesting biological insights. Patterns seen in genomewide expression experiments can give indications about unknown regulatory elements. Moreover,
genes with similar functions cluster together. Thus clustering genes of known functions with poorly
characterized genes may provide a simple means of gaining insights into the functions of these
uncharacterized genes1. Patterns seen in genome-wide expression data can give indications about
the status of cellular processes and information about unknown biological pathways2, 3. In addition,
cluster analysis is used for data reduction and visualization1.
We will focus on one of many clustering methods - hierarchical clustering, which is commonly
used. Here relationships among genes are represented by a tree whose branch lengths reflect the
degree of similarity between the objects, as assessed by a pairwise similarity function4. This
method is useful to represent varying degrees of similarity and more distant relationships among
groups of closely related genes.
To illustrate the method and it's power in analyzing biological data, we will review two
experiments in which pairwise average linkage clustering algorithm (hierarchical clustering) was
applied to gene expression data collected from yeast cells2, 3.
The first is a genome-wide experiment in Saccharomyces cerevisiae, designed to identify genes
whose regulation is cell-cycle dependent and to classify them2. It illustrates how understanding of
cellular processes can be extracted from a set of microarray experiments followed by gene
clustering. Furthermore it shows how new regulatory elements can be discovered using clustering
methods.
The second experiment deals with Saccharomyces cerevisiae adaptation to environmental changes.
This experiment demonstrate how clustering enables us to find the relevant genes and characterize
biological pathways3.
1. Eisen M. B., Spellman P. T., Brown R. O., Botstein D. Cluster analysis and display of genome-wide expression
pattern. Proc. Natl. Acad. Sci. USA; 95:14863-14868 (1998).
2. Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces
cerevisiae by microarray hybridization. Mol. Biol. Cell; 9:3273-3297 (1998).
3. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO.
Genomic expression programs in the response of yeast cells to environmental changes.
Mol. Biol. Cell; 11(12):4241-57 (2000).
4. Shannon William, Culverhouse Robert, Duncan Jill. Analyzing microarray data using cluster analysis.
Pharmacogenomics; 4(1):41-51 (2003). Review.
5. Kaminski Naftali, Friedman Nir. Practical Approaches to Analyzing Results of Microarray Experiments.
American Journal of Respiratory and Cell Molecular Biology; 27:125-132 (2002). Review.
Download