Gene Cluster Analysis with MeV.

advertisement
Gene Cluster Analysis with MeV.
Luca Zammataro
luca.zammataro@iit.it
Aim of practice:
1. Analyze gene expression datasets by means of cluster analysis algorithms
2. Individuate common trends among regulated genes, in a particular experimental
condition.
Background
The ratios of fluorescence on a microarray must be optically analyzed to determine whether
there has been repression or induction. The fluorescence of both colors (red and green) at each
spot is quantified by an image scanner. Recall that red represents the control mRNA, and green
represents the experimental mRNA. Each spot is then given a ratio of green:red, which tells us
whether that particular gene was produced in greater quantities (induced) or produced in
smaller quantities (repressed) in comparison to the baseline amount of expression. This
expression ratio is subsequently divided to give a decimal value and then converted to a
logarithmic (base 2) scale.
The resulting data can be converted into images in order to quickly assess repression or
induction visually. Green and Red represent the changes in expression and not the initial
fluorescence from mRNA hybridization. A typical scale would look something like this, where
a black spot on the array would indicate that equal amounts of red and green fluorescence were
observed on the original spot, thereby giving an equal expression ratio of 1:1, and subsequently a
logarithmic value of 0.
In this example we will use the The Pearson correlation coefficient to clusterize our genes. The
Pearson correlation is a similarity metric, whose values vary from -1 (perfect anticorrelation) to
+1 (perfect correlation). High correlation values thus indicate strong correspondences between
expression profiles. However, the hierarchical clustering algorithm requires a distance matrix,
where high values indicate strong differences between two objects (expression profiles).
Pearson's correlation can be transformed into a distance metric by subtracting from 1.
distPearson = 1 – corPearson
The Pearson distance varies from 0 (perfect correlation) to 2 (perfect anti-correlation).
By retracing the order in which the genes were progressively joined into clusters and by knowing
the correlation value of each step, you can map out which genes are related to each other closely
and which genes are related only distantly. This is best represented graphically, as shown by the
following hypothetical diagram.
1
MeV (Multiple Array Viewer)
MeV is a desktop application for the analysis, visualization and data-mining of large-scale
genomic data. It is a versatile microarray tool, incorporating sophisticated algorithms for
clustering, visualization, classification, statistical analysis and biological theme discovery.
http://www.tm4.org/
1.
We will start uploading an expression data set, with all normalizad data, and with
background subraction. All the data have to be represented in Log2 values. (The tutorial
file is “GSE5099-GPL97-Complete_Cleaned_median.txt”, which represent a matrix with
adjusted values from the Eset.txt file and 250TopList.txt file derived from GEO2R)
Always take in consideration the groups that you have created during statistical
evaluation using GEO2R as in the case of this example:
G0
G1
G2
G3
G4
2.
After uploading, the file (it appears as a spreadsheet) follow the guidline (click on the
first value representing the starting point of the matrix you want to process. Then click
on “load”)
2
3.
From the “Display” menu, sßet the “Color Scheme” to -1, 0 1, to rescale all values in a
green/red range in which the lower values correspond to green and the upper to red.
The midpoint value is black (zero)
3
4.
Now, choose “K-means/Median Clustering” from Clustering algorithm menu:
5.
The KMC menu gives you the possibility to choose Distance Metrics:
Choose Pearson Correlation, uncheck the “Sample Tree” checkbox and check the
box for the construction of “Hierarchical Trees” for our exercise, and then the
click on OK.
4
Finally you can browse clusters, studying differences or common behaviours among selected
genes, across various experiental conditions offered by the microarray. Save the analysis using
the menu.
5
Questions:
1.
Identify clusters in which all genes are upregulated and clusters in which genes are all
downregulated in a particular condition.
a.
2.
Choosing the “biological function” from the “Display menu”, individuate selected cluster
in which are almost two or more genes having similar functions among these:
a.
b.
c.
d.
e.
f.
g.
h.
Transport (i.e.)
i. cluster 1: SLC30A4, SLC16A6
ii. cluster 4: TNPO1, SLC41A2
iii. Cluster 5: SLC41A2; SLC38A5..
Complete the exercise…
immune response
skeletal system development
regulation of cell growth.
cell adhesion
signal transduction
apoptosis
Transcriptio/regulation of transcription
6
Download