Gene Cluster Analysis with MeV. Luca Zammataro luca.zammataro@iit.it Aim of practice: 1. Analyze gene expression datasets by means of cluster analysis algorithms 2. Individuate common trends among regulated genes, in a particular experimental condition. Background The ratios of fluorescence on a microarray must be optically analyzed to determine whether there has been repression or induction. The fluorescence of both colors (red and green) at each spot is quantified by an image scanner. Recall that red represents the control mRNA, and green represents the experimental mRNA. Each spot is then given a ratio of green:red, which tells us whether that particular gene was produced in greater quantities (induced) or produced in smaller quantities (repressed) in comparison to the baseline amount of expression. This expression ratio is subsequently divided to give a decimal value and then converted to a logarithmic (base 2) scale. The resulting data can be converted into images in order to quickly assess repression or induction visually. Green and Red represent the changes in expression and not the initial fluorescence from mRNA hybridization. A typical scale would look something like this, where a black spot on the array would indicate that equal amounts of red and green fluorescence were observed on the original spot, thereby giving an equal expression ratio of 1:1, and subsequently a logarithmic value of 0. In this example we will use the The Pearson correlation coefficient to clusterize our genes. The Pearson correlation is a similarity metric, whose values vary from -1 (perfect anticorrelation) to +1 (perfect correlation). High correlation values thus indicate strong correspondences between expression profiles. However, the hierarchical clustering algorithm requires a distance matrix, where high values indicate strong differences between two objects (expression profiles). Pearson's correlation can be transformed into a distance metric by subtracting from 1. distPearson = 1 – corPearson The Pearson distance varies from 0 (perfect correlation) to 2 (perfect anti-correlation). By retracing the order in which the genes were progressively joined into clusters and by knowing the correlation value of each step, you can map out which genes are related to each other closely and which genes are related only distantly. This is best represented graphically, as shown by the following hypothetical diagram. 1 MeV (Multiple Array Viewer) MeV is a desktop application for the analysis, visualization and data-mining of large-scale genomic data. It is a versatile microarray tool, incorporating sophisticated algorithms for clustering, visualization, classification, statistical analysis and biological theme discovery. http://www.tm4.org/ 1. We will start uploading an expression data set, with all normalizad data, and with background subraction. All the data have to be represented in Log2 values. (The tutorial file is “GSE5099-GPL97-Complete_Cleaned_median.txt”, which represent a matrix with adjusted values from the Eset.txt file and 250TopList.txt file derived from GEO2R) Always take in consideration the groups that you have created during statistical evaluation using GEO2R as in the case of this example: G0 G1 G2 G3 G4 2. After uploading, the file (it appears as a spreadsheet) follow the guidline (click on the first value representing the starting point of the matrix you want to process. Then click on “load”) 2 3. From the “Display” menu, sßet the “Color Scheme” to -1, 0 1, to rescale all values in a green/red range in which the lower values correspond to green and the upper to red. The midpoint value is black (zero) 3 4. Now, choose “K-means/Median Clustering” from Clustering algorithm menu: 5. The KMC menu gives you the possibility to choose Distance Metrics: Choose Pearson Correlation, uncheck the “Sample Tree” checkbox and check the box for the construction of “Hierarchical Trees” for our exercise, and then the click on OK. 4 Finally you can browse clusters, studying differences or common behaviours among selected genes, across various experiental conditions offered by the microarray. Save the analysis using the menu. 5 Questions: 1. Identify clusters in which all genes are upregulated and clusters in which genes are all downregulated in a particular condition. a. 2. Choosing the “biological function” from the “Display menu”, individuate selected cluster in which are almost two or more genes having similar functions among these: a. b. c. d. e. f. g. h. Transport (i.e.) i. cluster 1: SLC30A4, SLC16A6 ii. cluster 4: TNPO1, SLC41A2 iii. Cluster 5: SLC41A2; SLC38A5.. Complete the exercise… immune response skeletal system development regulation of cell growth. cell adhesion signal transduction apoptosis Transcriptio/regulation of transcription 6