34 Overview of the Tools for Microarray Analysis TOC

advertisement
34
Overview of the Tools
for Microarray Analysis
Transcription Profiling, DNA Chips, and Differential Display
TOC
Fig. 1. Polyacrylamide gel electrophoresis is used in differential display to identify fragments of regulated genes. Double-stranded complementary DNA is prepared from populations of RNA, then fragmented with restriction endonucleases. Linkers are attached to the ends of the fragments, then the
fragments are amplified using the polymerase chain reaction (PCR). The PCR products are separated
by polyacrylamide gel electrophoresis. Differentially expressed genes will yield restriction fragments
of different intensity in the treated vs control samples. The primary treated and control samples give
a complex mixture of unaffected and differentially expressed fragments. For this reason, fragments
are often parsed or selectively amplified using one or more degenerate nucleotide at the 3'-position
of one or both PCR primers to deconvolute the mixture of amplified species. Addition of a single
degenerate nucleotide in one of the PCR primers results in a partial deconvolution of the sample,
allowing differentially expressed fragments to be separated and excised more easily.
Fig. 2. Affymetrix uses several oligonucleotides to get broad coverage of individual genes. Each gene
is represented by oligos designed to be a perfect match to the target sequence, as well as oligos
designed to contain a single mismatch. Pairs of perfect match and single nucleotide mismatch oligonucleotides are designed to different regions of each gene. Typically 16 or 20 oligo pairs are arrayed
for each gene represented on the GeneChip®. The oligonucleotides are typically 25 bases in length.
TOC
Fig. 3. False color images demonstrate signal intensity on an Affymetrix GeneChip ®. Signal intensity at each element is indicated by color, with lighter, hotter colors representing greater signal
intensities. Many single-channel, oligonucleotide arrays, including Affymetrix arrays, include several oligonucleotides for each gene. For this reason, there tend to be clusters of elements with
similar signal intensity, representing all the elements that comprise a single gene. In the expanded
region of the array, individual Perfect match/mismatch paired elements can be visualized. Perfect
match/mismatch pairs for which the signal intensity of the perfect match oligo is greater than for
the mismatch oligo (high average difference pairs), are used when comparing arrays to make calls
regarding differential expression.
Fig. 4. Experiments using cDNA microarrays require two-channel hybridizations. Two channel or
cDNA arrays contain cloned or PCR amplified fragments of genes. Two separate populations of RNA
are labeled with different fluorescent dyes. The dye-labeled samples are applied to the array, and
differential hybridization is measured by recording fluorescence in both channels at each element.
Single-channel signal intensity at any element on a cDNA array may not accurately reflect the
amount of the message present in the original biological sample relative to any other message.
Instead, the ratio of Cy5 vs Cy3 labeled cDNA probe at each element contains information regarding differential expression.
TOC
Fig. 5. An expression histogram from a two-channel hybridization reveals how well the channels are
balanced. An expression histogram is a convenient way to visualize differences in input material and/
or labeling efficiency for a two-channel hybridization. After balancing the array data using a balance
coefficient, the two lines should largely overlap. Balance coefficients can be determined in a number
of ways. The simplest way to calculate a balance coefficient relies on a ratio of total average signal in
both channels.
Fig. 6. False color images can be used to demonstrate signal intensity in both channels of a twochannel array hybridization. Signal intensity at each element in either channel is indicated by
color, with lighter, hotter colors representing greater signal intensities. After balancing, elements
that give greater intensity in one channel than in the other are considered to represent differentially
expressed genes.
TOC
Fig. 7. Unacceptable hybridization on a two-channel array is determined by careful examination of
several parameters. In some cases, total average signal intensity, total average background, or both
are very different between the two channels of a two-channel hybridization. If the background is
uniform in each channel or if a gradient is present in both channels, it can often be corrected. If, as in
the example shown, there is a gradient effect in one channel and not the other, it may be very difficult
to achieve reliable expression data from the microarray.
Fig. 8. Imperfections and impurities that affect hybridization and therefore expression data may occur
on microarrays. (A) Some microarrays may contain imperfections that occur during the fabrication
process. (B) Elements can also be affected by impurities introduced at the time of the hybridization. The
comet-like imperfection seen here is most likely due to a dust particle. (C) It is uncertain what caused
the impurity on the lower portion of this array. It may be an artifact from the scanning process. Regardless of the cause, the affected elements may need to be discarded from further consideration.
TOC
Fig. 9. The TIGR Spotfinder tool allows the user to adjust the grid to be sure that an entire spot is
included in the reference field. It is not uncommon for spots on a microarray to be off center. Spot
finding software that uses a static grid may often cut off portions of spots, resulting in a loss of useful
information. Additionally, badly off-center spots may result in a portion of one spot appearing in the
grid of an adjacent spot, resulting in false-positive gene expression values.
Fig. 10. Scatter plots can be generated from a two-channel hybridization or from any two singlechannel hybridizations, after data is balanced with respect to signal intensity. Signal intensity for each
element from 2 oligo arrays, or from both channels of a cDNA array is graphed on a logarithmic scale.
Genes lying on the slanted line with a slope of one are not regulated. Those elements that demonstrate
differences in signal intensity in the two channels (or on two different oligo arrays) after balancing may
be differentially expressed. The differential expression value for the element marked in (A) may be
reliable, even though it represents only a 1.6-fold difference in the signal intensities of the two samples.
A similar differential expression value in (B) would almost certainly be meaningless.
TOC
Fig. 11. Confidence limits may be placed on microarray data. The lines parallel to the diagonal
represents a reasonable cut-off based upon signal strength and differential expression. Elements
with lower signal intensity must be highly differentially expressed to be considered reliable, owing
to the increased error at the lower edge of the signal range. RNA quality, time since fabrication of
the array, hybridization conditions and numerous other factors can all affect the final outcome of a
microarray experiment. Reproducibility and reliability of differential expression values are affected
by these variables (see Fig. 9). Therefore, confidence limits should be adjusted to reflect better or
worse hybridizations.
Fig. 12. The application of color gradients to differential expression values aids in rapid visualization
of differential expression. The use of color gradients also permits the identification of genes that are
differentially regulated by multiple treatments, and may also help identify experimental outliers. In
the example shown, data from several microarrays hybridized with RNA from cells treated with vehicle (veh) and one of nine compounds (Tx1–9) is used. This approach can be used with a simple
spreadsheet. However, many software tools exist that facilitate this type of data visualization.
TOC
Fig. 13. Heat maps aid in the visualization of transcription profiling data. A heat map such as this
one produced by Spotfire™, may be used to visually identify broad patterns of gene expression.
Typically genes above and below user-defined thresholds are colored red (induction) and green
(repression), respectively. Individual microarray experiments are organized on the X-axis, while
individual genes are organized along the Y-axis. Several large groups of similar microarrays may be
observed in this example. One might reasonably infer that similar gene expression patterns suggest
similar mechanisms of action for the compounds used in these studies.
Fig. 14. Clustering of microarray data can be performed using a number of different techniques.
Euclidean distance in n-dimensional space applies a variable derived from each microarray (a differential expression value or signal intensity) to every element present on the microarray. Each point on
this graph represents an element on the microarray utilized. Elements in the same neighborhood are
circled, representing six gene clusters.
TOC
Fig. 15. The European Bioinformatics Institute’s Expression Profiler tool allows users to cluster
microarray data from multiple experiments. Genes whose expression patterns most closely match
one another across many experiments cluster more closely than genes whose expression patterns
are not closely matched. A dendogram is generated for each element, with branch points representing clusters.
Fig. 16. Self-organizing maps represent to cluster microarray data into groups of similarly regulated
genes. Millennium Pharmaceuticals and others provide tools that enable scientists to generate selforganizing maps from their TxP data.
TOC
Fig. 17. Partek Pro 2000 allows scientists multiple data analysis options. Principal component analysis, multidimensional scaling, and inferential analysis can all be performed by tools such as Partek
Pro. These higher-order statistical analyses of microarray results are vital to avoid falsepositives and false-negatives. The costs of following up on false results can be staggering. As transcription profiling finds application in medical diagnostics, false results may even endanger patients.
Fig. 18. Correlation analysis can be used to identify genes whose expression correlates with a specific
phenotype of interest. In the experiment shown, 12 rats were randomly assigned to one of three diet
groups, A, B, or C for 4 wk. Fasted serum triglycerides were measured in individual rats prior to
sacrifice. Liver RNA was used to prepare probes for hybridization onto microarrays. The triglyceride
phenotype was used to calculate a balanced differential triglyceride (BDTG) value, using the same
equation to determine differential expression values for the array elements. The BDTG value was
then compared to each element. In this example, the expression of gene 1 correlates well with the
measured phenotype.
TOC
Table 1
Differential Expression
Gene
ID
a
b
c
d
e
f
g
h
i
Control
(Cy3)
Treated
(Cy5)
bCy5
Ratio
Log
ratio
BDE
fBDE
500
500
500
500
500
500
505
250
2500
800
200
100
80
2000
404
400
600
3800
1000
250
125
100
2500
505
500
750
4750
2.00
0.50
0.25
0.20
5.00
1.01
0.99
3.00
1.90
0.30
– 0.30
– 0.60
– 0.70
0.70
0.00
0.00
0.48
0.28
2.00
– 2.00
– 4.00
– 5.00
5.00
1.01
– 1.01
3.00
1.90
1.00
– 1.00
– 3.00
– 4.00
4.00
0.01
– 0.01
2.00
0.90
Table 2
Clustering Expression Data
ID
Gene 1
Gene 2
Gene 3
Gene 4
Gene 5
Gene 6
Gene 7
Gene 8
Array 1
Array 2
1200
1050
990
700
690
660
650
640
700
750
730
500
1200
120
450
1090
ID
Gene 9
Gene 10
Gene 11
Gene 12
Gene 13
Gene 14
Gene 15
Gene 16
Array 1
Array2
630
600
540
420
380
260
200
190
1140
470
140
480
490
220
180
250
TOC
Download