MICROARRAYS

advertisement
MICROARRAYS
Thymine
361
Adenine
H
H2C
O
H
N H
H N
H
N
N
N
N
N
O
H
To DNA Chain
Cytosine
Guanine
H
H
N H
H
N
O
N
H N
N
N
O
To DNA Chain
MICROARRAYS
NEIL WINEGARDEN
University Health Network
Microarray Centre, Toronto
Ontario, Canada
INTRODUCTION
Microarrays allow for the simultaneous, parallel, interrogation of multiple biological analytes. Originally, microarrays were devised as a method by which gene expression
could be measured in a massively parallel manner (all the
genes in the genome at once), however, recent advances
have demonstrated that microarrays can be used to interrogate epigenetic phenomena, promoter binding, protein
expression, and protein binding among other processes.
The overall process is reliant upon the manufacture of a
highly ordered array of biological molecules, which are
typically known entities. The features of this array behave
as probes, which react with and bind to the unknown, but
complimentary material present in a biological sample.
Here we will focus specifically on gene expression (deoxyribonucleic acid, DNA) microarrays, which can be used to
assay the activity of thousands of genes at a time.
In 1993, Affymetrix published a novel method of using
light directed synthesis to build oligonucletide arrays that
could be used for a variety of biological applications (1).
Shortly thereafter, a group lead by Patrick Brown and Ron
Davis at Stanford University demonstrated that robotically printed cDNA arrays could be used to assay gene
expression (2). Now, more than a decade after this initial
work was made public, both types of DNA array are commonly found in genomics laboratories.
BASIC PRINCIPLES
A DNA microarray contains a highly ordered arrangement
(array) of several discrete probe molecules. Generally, the
H
N
H N
H
Figure 1. Watson–Crick base pairing interactions. During
hybridization, specific base-paring interactions occur by which
Thymine (T) binds specificly to Adenine (A) and Cytosine (C)
binds specifically to Guanine (G). The binding of these bases to
one another is mediated by hydrogen bonding as shown. The GC
base pairs are stronger by virtue of the three hydrogen bonds
formed compard to only two for AT.
identity of these probes, be they cDNA or oligonucleotides, is
either known or can be determined readily. The probes are
deposited by some means (see the section Fabrication of
Microarrays) onto a solid-support substrate such as glass or
silicon. DNA microarrays take advantage of a basic characteristic of DNA, namely, the ability of one strand of DNA
to find its complementary strand in solution and bind
(hybridize) to it. This hybridization event is highly specific
following standard Watson–Crick base pairing rules (Fig. 1).
Gene Expression
With some exceptions, the genetic makeup of every cell in
an organism is the same. Each cell has the same complement of genes, which comprise the organism’s genome. The
subset of genes that are active in a particular cell dictate
that cell’s function. When we say a gene is active or
expressed, we mean that particular gene is being transcribed. Transcription is the process by which ribonucleic
acid (RNA) polymerase II (an enzymatic complex) reads a
gene and creates a complementary copy of messenger RNA
(mRNA). The more a gene is transcribed, the more copies of
mRNA will be present in a cell. Thus genes that are highly
active in the cell will be represented by multiple copies of
mRNA, whereas genes that are inactive in the cell will have
very few or no copies of mRNA in the cell. Microarrays
function to measure the amount of mRNA present in the
cells of a biological sample such as a tumor biopsy. The
activity of the genes is inferred from this measure.
Gene Structure
In higher eukaryotes, somatic cells (diploid) have
two copies of every gene: one maternally and the other
Encyclopedia of Medical Devices and Instrumentation, Second Edition, edited by John G. Webster
Copyright # 2006 John Wiley & Sons, Inc.
362
MICROARRAYS
paternally derived. In the context of the diploid cell, each
copy is termed an allele. In the case where both inherited
alleles are the same for a given gene, that gene is said to be
homozygous. If the two alleles are different, then the gene
is heterozygous. Alleles may be either dominant (phenotypically manifested regardless of what the other allele is),
or recessive (phenotypically manifested only in the absence
of a dominant allele). In the case of a heterozygous gene,
the dominant allele will be phenotypically manifested and
the recessive allele will not. If both alleles are different, but
dominant, they are termed codominant and both alleles
will elicit a phenotype. The gene is comprised of DNA,
which is double stranded. One strand is the sense strand or
the strand that encodes the information, which will be
ultimately represented in mRNA. The other strand is said
to be anti-sense and is the strand of DNA that is actually
read by the RNA polymerase to generate the mRNA. DNA
has directionality: A gene is transcribed starting at the 30
end of the antisense strand of the DNA and is read toward
the 50 end. The resultant mRNA is made from the 50 to the 30
end.
Genes are regulated by specific sequences of DNA that
lie outside the coding region of the gene. The first such
sequence is the promoter. Promoters bind the transcriptional machinery (RNA polymerase II) that performs transcription. Promoters are found 50 (upstream) of the gene
and are proximal to the transcription start site. An additional class of regulatory sequence called an enhancer may
be associated with the gene. Enhancers may lie upstream,
downstream, or internal (usually in noncoding regions
termed introns) to the gene (3). Specific transcription
factors bind enhancers and promote recruitment or activation of the basal transcriptional machinery. It is the coordinated function of the promoter and enhancer, with the
transcription factors that bind them, that control if a gene
is active or not within the cell. Thus, genes are regulated,
and can be turned on, off, or modulated up or down by the
regulatory mechanisms of the cell.
RNA Isolation
Ribonucleic acid must be isolated from cells in order to
prepare the material for hybridization to the array. A cell
contains three major species of RNA: mRNA, transfer RNA
(tRNA), and ribosomal RNA (rRNA). Together they are
refered to as total RNA. For the purpose of gene expression
experiments with microarrays, the mRNA is the species we
are interested in and represents 1% of total RNA. In
order to isolate total RNA from cells, one of two main
modalities is used: solution- or solid-phase extraction. In
solution-phase methods, cells are lysed in the presence of
isothiocyanate in order to inactivate any RNases (naturally
occurring enzymes that nonspecifically degrade RNA). The
lysate is then extracted with an acidified phenol:
chlorophorm:isoamyl alcohol solution. The RNA selectively
partitions to the aqueous phase of this mixture away from
proteins and DNA. The aqueous phase is removed and RNA
is precipitated out of solution using isopropyl alcohol at
high salt concentrations. Solid-phase methods make use of
the variable binding activity of RNA to a silica matrix at
high and low salt conditions. Cells are again lysed in the
presence of isothiocyanate. The high concentration of isothiocyante used in this methodology not only inactivates
the RNases, it also selectively precipitates proteins out of
solution. The lysate is applied to a column containing a
silica filter at the bottom. The lysate is pulled through the
column via vacuum or centrifugation, thereby removing
the proteins and cellular debris. In this method, DNA may
also bind to the column, and as such contaminating DNA is
removed by the application of DNase. The column is
washed to remove any further contaminants, and then
the RNA is eluted from the filter using water.
mRNA Structure
In eukaryotic cells, mRNA has a unique feature that allows
researchers to either purify it away from the rest of the
RNA or to direct enzymes to it specifically while avoiding
the other RNA species. This feature is the polyA tail. The
polyA tail is a long stretch of adenine nucleotides found at
the 30 end of mRNA, which is added post-transcriptionally.
Such stretches of adenine nucleotides do not typically occur
naturally in genes or other RNA species. The polyA tail will
hybridize to an artificially generated oligonucleotide made
up of a series of deoxythymine nucleotides (oligo-dT). If the
oligo-dT is coupled to a support matrix (e.g., beads) the
mRNA can be pulled out of solution thereby purifying it
away from the rest of the total RNA. While some researchers prefer to include this step in their process, it is generally not a requirement for microarray analysis. Rather
than purify the mRNA, the oligo-dT can be used as a primer
for creating an enzymatically labeled complement of the
mRNA.
Labeling
In order to render the RNA visible to a detection system, it
is necessary to label it in some manner. While some
laboratories choose a direct methodology of chemically
labeling the mRNA itself, it is most common to work via
a cDNA or cRNA intermediate that is labeled enzymatically.
The simplest methodology involves creating labeled
cDNA. In this technique, the RNA is reverse-transcribed
(DNA is made from an RNA template) by an enzyme named
reverse transcriptase (RT) (for sample protocols, see Ref.
4). Reverse transcriptase requires a small oligonuclotide
primer that binds to the RNA creating a short doublestranded region (an RNA:DNA hybrid). In order to ensure
that the RT enzyme reads only the mRNA, the polyA tail of
mRNA is exploited by using a primer made of a stretch of
several (usually 20–25) thymine residues. The resultant
DNA is the complement of the RNA and it is thus referred
to as complementary DNA (cDNA). The RT reaction
requires that free nucleotides (each of A, C, G, and T)
are present to create the DNA. If one of these nucleotides
is chemically modified with some detectable molecule (such
as a fluorophore), then it will be incorporated into the
cDNA strand, and that cDNA will be detectable with a
fluorescent reader. Alternatively, it is possible to use a
reactive molecule (such as amino-allyl) in place of a fluorescent molecule. After incorporation into the DNA, the
DNA is then coupled to a reactive form of a fluorophore
MICROARRAYS
(usually a reactive ester). This latter implementation of the
method has an advantage in that the amino-allyl modifier
is a much smaller chemical group that is incorporated
much more efficiently into DNA than a bulky fluorescent
moiety.
Often the amount of RNA available is limiting and
cannot be detected by standard means. In this case, it is
generally necessary to amplify the amount of material
present. A typical microarray experiment usually requires
5–10 mg of total RNA in order to be able to obtain useful
data. When researchers are working with diminishingly
small samples, such as from a needle biopsy or a fine needle
aspirate, it is often not possible to obtain this amount of
total RNA. To overcome this limitation, various amplification strategies have been adopted. The most popular
method of amplification is based on the protocols of Dr.
James Eberwine from the University of Pennsylvania (5).
In this technique, RNA is converted into cDNA using the
same method described above with two key differences: (1)
there is no labeled nucleotide incorporated and (2) the
oligo-dT primer has another short sequence of DNA
appended to it that represents a T7 promoter region.
The T7 promoter is a bacteriophage-derived sequence that
initiates transcription by T7 polymerase. After the cDNA is
created, a second strand is generated creating a doublestranded artificial gene with a T7 promoter on one end.
This artificial gene is then transcribed by the addition of T7
polymerase, which is allowed to make numerous transcripts of the gene. The transcripts that are obtained can
either be labeled directly, or they in turn can be turned into
labeled cDNA using standard methodologies described
above. The resultant RNA is now actually the opposite
sequence of the original mRNA, so it is said to be cRNA
(complementary RNA).
The Affymetrix GeneChips utilize an amplification system based on T7 transcription as described above. During
the production of cRNA, biotin modified nucleotides are
incorporated. Posthybridization (see the section on Hybridization) the arrays are stained with a streptavidin bound
fluorophore. Streptavidin is a protein that specifically and
tightly binds to biotin molecules, allowing the fluorophore
to be attached to the cRNA.
A clean-up step is required to remove any free, unbound
detection molecules. This step helps to ensure that background signal is kept to a minimum. There are two main
methods by which such purification is performed, one is
based on standard nucleic acid purification systems, similar to the RNA isolation method described earlier, and the
other is based on size exclusion. For the first method, a
nucleic acid purification column is utilized. The cRNA or
cDNA binds to the silica filter, but the less charged free
nucleotides flow through. After a series of washes, the
cRNA or cDNA is eluted from the column. The second
methodology utilizes a membrane filter (usually incorporated into a column) that has a defined pore size. The large
cRNA and cDNA molecules are retained on the membrane;
where as the small free nucleotides flow through. The
column is then inverted and the cDNA or cRNA is then
eluted off the column by flowing wash buffer in the opposite
direction. This purified labeled material is then ready for
hybridization to the array.
363
Hybridization
Microarray technology relies on the natural ability of
single-stranded nucleic acids to find and specifically bind
complementary sequences. Purified labeled material is
exposed to the spotted microarray and the pool of labeled
material ‘‘self-assembles’’ onto the array, with each individual nucleic acid (cDNA or cRNA) species hybridizing to a
specific spot on the array containing its complement. The
specificity of this interaction needs to be controlled, as
there may be several similar and related sequences present
on the array. The control of hybridization specificity is
accomplished through the adjustment of the hybridization
stringency. Highly stringent conditions promote exact
matches where as low stringency will allow some related,
but nonexact matches to occur. In a microarray experiment, stringency is typically controlled by two factors: the
concentration of salt in the hybridization solution and the
temperature at which hybridization is allowed to occur.
High salt concentrations tend to lead to lower stringency of hybridization. Both strands of nucleic acid
involved in the hybridization event contain a net negative
charge. As such, there is a small repulsion between these
two strands, which needs to be overcome to bring the
labeled nucleic acid into proximity of the arrayed probe.
The salt ions cluster around the nucleic acid strands creating a mask and shielding the electrostatic forces. Higher
salt concentrations have a greater masking effect, thus
allowing hybridization to occur more easily. If salt concentrations are high enough, the repulsion effects are completely masked and even strands of DNA that have low
degrees of homology may bind to one another.
Temperature is another important factor. Every doublestranded nucleotide has a specific temperature at which
the two strands will ‘‘melt’’ or separate. The temperature at
which exactly 50% of a population of pure double-stranded
material separates is termed the melting temperature
(Tm). The Tm of a nucleic acid is controlled partially by
the length of the strand and partially by the percentage of
G and C residues (termed the GC content). The G and C
residues bind to one another as a Watson–Crick base pair.
This pairing interaction is the result of three hydrogen
bonds forming. The other potential base pair in a DNA
hybrid, A:T, only has two such hydrogen bonds and thus
the greater the GC content of the nucleotide, the more
stable the hybrid. At very low temperatures, nonstandard
Watson–Crick base pair interactions can also occur causing
noncomplementary sequences or sequences that are
<100% matched to form hybrids. It is necessary therefore
to find a temperature that will prevent or melt nonspecific
hybrids, but allow the specific interactions to occur. For a
microarray, this presents a challenge as there are thousands of specific interactions that must be accommodated.
In the case of oligonucleotide arrays, the design of the
oligonucleotides to be spotted takes this issue into account
and probes are designed that tend to fall within a narrow
window of potential melting temperatures. cDNA arrays
are more difficult because the sequences spotted vary
greatly in both GC content and length. In such cases, it
is often true that conditions that represent somewhat of a
‘‘compromise’’ are necessary.
364
MICROARRAYS
Hybridization kinetics can generally be modeled as
shown in Eq. 1(6). The change in the amount of hybridization product LS over time is a function of the decrease in
the concentration of labeled target L and free spotted DNA
S over time. To simplify the equation, the rate of hybridization is equal to some rate constant k multiplied by the
product of the concentrations of L and S. Thus hybridization rate is a direct function of the concentrations of the
labeled target molecule and the DNA probe in the spot.
d½LS
d½L d½S d½L S
¼
¼
¼ k ½L½S
dT
dT
dT
dT
ð1Þ
In the case of an oligonucleotide microarray, it is often the
case that the number of spotted DNA molecules is in great
excess to the number of target molecules. As such, the
concentration of the spotted DNA probe remains fairly
constant and can be considered part of the constant k.
Thus the equation for hybridization can be simplified as
shown in Eq. 2 (6), where the rate of hybridization is
typically driven by the concentration of the labeled target
molecules alone.
d½LS
ð2Þ
¼ k0 ½L
dT
In the case of two color oligonucleotide arrays, the two
labeled samples compete for hybridization to the probe that
remains in excess and thus hybridization is simply a
reflection of the concentrations of each of the two labeled
targets L1 and L2 [Eq. 3(6)].
d½L1 S k01 ½L1 ¼
d½L2 S k02 ½L2 ð3Þ
The situation becomes somewhat more complex when the
probe molecules are not in excess of the target molecules.
This is often the case with cDNA arrays. In these cases, the
concentration of the spotted probe does change significantly as hybridization occurs and thus each of the labeled
targets L1 and L2 hybridize in a manner described by Eqs. 4
and 5 (7).
d½L1 S
¼ k1 ½S½L1 ¼ k1 ð½S0 ½L1 S ½L2 SÞð½L01 ½L1 SÞ
dT
ð4Þ
d½L2 S
¼ k2 ½S½L2 ¼ k2 ð½S0 ½L2 S ½L1 SÞð½L02 ½L2 SÞ
dT
ð5Þ
In such a case, the rate of hybridization is affected by the
change in the concentrations of the spotted probe from the
initial concentration S0, where S0 changes as the probe
molecules are bound by either L1 and L2.
When looking at differential hybridization between the
two targets, we can represent the kinetics as shown in
Eq. 6 (7).
d½L1 S k1 ð½L01 ½L1 SÞ
ð6Þ
¼
d½L2 S k2 ð½L02 ½L2 SÞ
If one is to assume that the two fluorescent molecules
used in a two-color experiment behave similarly, and that
the rate of hybridization of the two labeled targets is the
same, we can say k1 ¼ k2. It has been demonstrated that
under ideal conditions and when the hybridization reaction
is allowed to continue to equilibrium that the ratio of the
concentrations of each possible hybrid L1S and L2S is
equivalent to the ratio of the original concentrations of
the two targets L1 and L2 [Eq. 7 (7)]. This point is important
because it is the basis for microarrays to work, assuming
that the ratios read from the scans during data analysis are
reflective of an actual biological condition.
½L1 S ½L01 ¼
½L2 S ½L02 ð7Þ
The goal of microarray hybridization is to produce a
result for which the signal obtained from specific hybridization is very strong when compared to any background
signal that may be obtained by a nonspecific adsorption of
labeled material to the substrate, or nonspecific binding to
spotted elements. To reach this goal, it is common to use
certain nonspecific blocking reagents in the hybridization
solution. Frequently, nucleic acids from sources known not
to contain any sequences that will interfere with specific
hybridization are used. For example, in a hybridization of a
human sample to an array, one might use yeast tRNA and
salmon sperm RNA as competitors to bind any regions of
the substrate or probes that have a generic nucleic acid
binding capacity. These nucleic acids are nonlabeled and
will therefore not contribute any signal when the array is
scanned.
Washing
Unlike traditional northern blots, the majority of the
stringency of a microarray assay is accomplished at the
hybridization step. The washing step of a microarray
experiment is a critical operation, but is important more
as a means to remove unbound material in order to reduce
background signal than it is to control the specificity of the
signal obtained.
Wash buffers generally contain two components: a salt
solution and a detergent. The salt solution, frequently
sodium chloride sodium citrate (SSC), is set to a concentration that supports the maintenance of the hybridized
molecules. This concentration most frequently falls in the
1 to 2 concentration range with some labs using as low
of a concentration as 0.1 (1 SSC contains 0.15 M NaCl
and 0.015 M Na-citrate).
The detergents used in wash buffers help to remove
the unbound fluorescent molecules that would normally
stick to the surface of the slide. The detergent acts as a
surfactant and helps to isolate and remove the unbound
fluorescent material. Typically, an anionic detergent
such as sodium dodecyl sulfate (SDS) is used for this
purpose.
The temperature for the washes varies depending on the
stringency of the wash solution being used. As with hybridization, the combination of temperature and salt concentration determines the overall stringency of the washes.
After washing the microarrays, it is generally necessary
to perform a rinse. The rinse is typically a solution similar
to the wash solutions without the detergent. If detergent
remains on the slide after drying, the solution may fluoresce particularly if the labeled material has been trapped in
detergent micelles.
MICROARRAYS
Scanning
It is necessary to use an imaging device to detect the
fluorescent labels present on the hybridized microarray.
In general, the imaging device must contain an excitation
light source, an emission filter, and a light gathering
device.
During scanning, the labeled material, be it fluorescent
or some other form of detectable molecule, is imaged and
the resultant data is converted to a digital image. The
optimal resolution at which the image is scanned is dependent on the size of the features and on their interspot
spacing. A general rule of thumb is that the resolution of
the image should be such that the pixels represent onetenth of the diameter of the spot. For spotted arrays, for
example, the features tend to be on the order of 100 mm in
diameter and thus 10 mm resolution is frequently used.
Affymetrix’s technology, however, can generate features
that are 11 mm square; in this case, a much higher resolution of down to 1 mm is required.
Most commonly, the image that is generated is a 16-bit
grayscale TIFF (Tagged Image File Format) image
(Fig. 2). The 16-bit depth of the image provides a total
of 65,536 gray levels providing a possibility of more than
five orders of magnitude range. The TIFF format is important because it is a universally accepted format that is
LOSSLESS; that is, even with compression, this format
retains all image information. The images can then
be imported into the appropriate image quantification
software.
Image Quantification
After scanning, it is necessary to extract data from the
images. Image quantification generally starts with
segmentation. Segmentation is the process by which
pixels that represent the signal are isolated from those
that represent background. During segmentation, the
discrete areas of the image that represent the spotted
DNA material are identified and digitally isolated from
the remainder of the image. The intensities of all of the
365
pixels in the individual spot are averaged to determine
the overall spot intensity. This spot intensity is proportional to the amount of material hybridized to that
region, with higher intensities resulting from increased
numbers of hybridized molecules. Each spot, for each
channel (in the case of two color microarrays) is
quantified, and the resultant data are tabulated. Other
data may also be extracted at this stage. It is common to
also obtain intensity data for the area outside of the
individual spots. This value represents the background
of the image and indicates the amount of signal that would
have been obtained regardless of a specific hybridization
event. It is common, however, not universal, to subtract
the background values from the signal intensities of the
spots.
There are several means by which segmentation can be
carried out. In the most basic setup, a fixed shape (usually a
circle) is placed over each spot. The entire complement of
pixels lying within the circle is used to determine the
average intensity. Pixels lying outside of one of these
circles are deemed to be background signal. More advanced
segmentation algorithms attempt to account for the fact
that most of the spotted features on a microarray are in fact
not perfectly uniform. Spots may deviate from a true
circular shape, or may have regions within the circle in
which DNA was not attached (creating a spot that is
reminiscent of a doughnut). In addition, it is not uncommon
for each of the spots to have some degree of variance in
their diameter. The more advanced methods utilize various
algorithms and statistics to determine which pixels actually represent signal and which are more representative of
background.
Image quantitation software then processes the entire
image and produces a table of results that represents the
signal, and the background for each feature on the array.
These packages may also export various other data, which
can be used in quality control analysis such as standard
deviations, coefficients of variance, circularity, or uniformity of the spot, and so on. This data table can then be
processed as part of the data analysis.
Figure 2. Arrays imaged on a
microarray scanner are presented as
16-bit grayscale TIFF images. The
picture shown represents a small
subsection of a larger array. Each
spot is 100 mm in diameter and the
spot-to-spot spacing is 200 mm in this
image. The image was scanned at
10-mm resolution.
366
MICROARRAYS
Data Analysis
An exhaustive description of the process of DNA microarray data analysis is far beyond the scope of this article
(for an excellent review see Ref. 8). The exact process
followed depends greatly on the experimental design and
the question being addressed. There are, however, some
basic principles that tend to be fairly common in dealing
with microarray data: statistical analysis of data, supervised and/or nonsupervised data mining, data visualization, and validation are all key components.
Statistical analysis of microarray data comes into play
in two main areas. The first is to determine which spots are
reliable and provide sufficient data. Spots that have a high
degree of variance across replicates, for example, are likely
not able to provide reliable data. These hypervariable
genes or signals need to be filtered from the data so as
to not skew the results of data mining. Statistics may also
play a role in supervised data analysis.
There are two major categories of data mining: supervised and nonsupervised. Supervised data mining utilizes
algorithms in which the user imparts restrictions on how
the data is grouped. For example, in an experiment where a
cohort of patients was tested in which one group
was healthy and the other group was afflicted with a
particular disease, one would indicate to the algorithm
which arrays were from the healthy patients and which
were from the patients with disease. The algorithm then
tests the data to find genes that are markers for the
diseases. Specifically, each gene is tested to see if the
expression levels for that gene are statistically significantly different in each of the two patient groups. The goal
is to find a series of genes that can act as markers that are
diagnostic of the disease.
In nonsupervised clustering, the algorithm is not given
any indication as to how the individual samples are
related. In true nonsupervised clustering, the algorithm
is not even told how many groups exist. The data are
analyzed and the samples are grouped based on similarity
metrics. The classical methods of nonsupervised clustering
include hierarchical clustering and principal components
analysis (PCA). The algorithms generally display the data
via some visualization pattern such as the canonical
‘‘plaid’’ expression patterns seen from hierarchical clustering. The researcher then overlays the grouping information onto the patterns provided to see if the individual
groups naturally separate from one another. In other cases,
this methodology may be being used to determine how
many groups there truly are, as the researcher may not
have this information a priori. In such cases, the groups
can then be further examined to see if there are differences
in treatment response, survival, or any other characteristic
desired. Generally, after this technique is performed one
will attempt to look for clusters of genes in the patterns
that distinguish between the different groups and again
use these genes as markers.
Regardless of the methodology utilized, it is extremely
important to validate the data. Cross-validation strategies
are various, but in their most basic form, one obtains a
cohort of patients to profile. A subset of this cohort is used
to look for potential markers. Once the markers have been
identified, the remaining patients are tested and only the
identified markers are used to try and group the patients. If
the markers are able to stratify the patients into their
appropriate groups, then the markers are considered to be
viable and may provide beneficial diagnostic ability. On
occasion, however, the validation set is not properly
grouped. In such cases, the markers are only useful for
the narrow set of patients used in the initial tests and more
testing is required to find a viable set of markers.
FABRICATION OF MICROARRAYS
There are two main methodologies for manufacturing
microarrays, which differ in the means by which the probe
material spotted onto the arrays is prepared. In one methodology, the DNA to be spotted is generated in situ using
either standard or modified phosphoramidite chemistry.
(Phospohoramidites are reactive forms of each of the
nucleotides that make up DNA. Phosphoramidite chemistry is a well-defined process by which moderate length
stretches of DNA can be created with any specific
sequence.) This method is used by Affymetrix and Agilent,
the two largest commercial suppliers of microarrays,
although both groups use a different approach to the
in situ synthesis.
Other groups use ex situ synthesis, whereby the DNA
material is either prepared as PCR products (cDNA) or
oligonucleotides manufactured using standard phosphoramidite synthesis. Once this material is prepared it is
spotted onto the array substrate using either contact or
noncontact printing methodologies. Amersham (now GE
Healthcare) and Applied Biosystems use this methodology
to make microarrays as do almost all of the ‘‘homebrew’’
laboratories that make microarrays in house.
Fabrication of DNA Arrays In Situ
There are two main approaches to the generation of microarrays by in situ synthesis of DNA: photolithography and
inkjetting. Affymetrix, the industry leader uses a proprietary photolithography process to mask off areas of the
array, protecting some areas, and leaving others available
for the DNA synthesis reaction to occur (1). This is a
multistep process requiring several masks per array to
be made. Each synthesis reaction is performed sequentially. For each nucleotide position, there are four possible
masks (one for each of A, G, C, and T). Thus, an array
comprised of 25-mer oligonucleotides would require 100
masks to complete the process (typically 70 are required
for an array due to the sequences used). Affymetrix uses a
modified phosphoramidite chemistry for synthesis of the
oligonucleotide chains; whereas standard phospohoramidite chemistry uses acid labile protection groups, the Affymetrix technology utilizes groups that can be removed by
ultraviolet (UV) light. The Affymetrix technology allows for
extremely high density arrays of hundreds of thousands of
features to be prepared on very small substrates of <1 cm2.
Other groups have developed technologies that allow
them to get around the need for multiple masks to be
made for each array design. The pioneer in this area
was Nimblegen, who uses digital light processor (DLP)
MICROARRAYS
micromirrors to create the masks (9). Each of these DLP
units (used typically in AV projectors and large screen
televisions) comprises thousands of tiny (10 mm2) micromirrors. The micromirrors can be individually addressed
and the angle of the mirrors changed to allow light to pass
through. In the ‘‘open state’’, the micromirror directs light
onto the surface of the microarray, allowing DNA synthesis
to occur. In the ‘‘closed state’’, the micromirror reflects light
away from the surface, disallowing DNA synthesis. A
computer controls the mirrors and thus each DLP unit
has a near infinite number of combinations that can each
be controlled, and as such, a single unit can create any
pattern desired on the array. Nimblegen uses the same
chemistry as Affymetrix, using light activated deprotection
of the phosphoramidites. A somewhat newer entry into this
area is Xeotron (now part of Invitrogen). Xeotron also uses
micromirror DLPs to address the masks, however, they
have also incorporated small microfluidic channels on their
chips. Each feature is placed in a microscopic well on the
chip. Rather than using the modified phosphoramidite
chemistry of Affymetrix and Nimblegen, Xeotron uses
standard chemistry, but has instead employed a caged acid
that can be freed by light (10,11). As such, the acid that
controls deprotection of the nascent oligonucleotide can be
directed to specific locations by light. The Nimblegen and
Xeotron technologies have the advantage of being highly
amenable to custom array generation, however the Affymetrix technology is particularly well suited to mass production of a standard array. Each of these approaches has
found customers in the marketplace.
A third approach to in situ synthesis of the oligonucleotides involves ink-jet spotting. Agilent uses this technology
(developed by Rosetta Inpharmatics) in which each of the
reactive phosphoramidites (A, G, C, and T) are loaded in to
a separate ‘‘ink-cartridge’’ to allow for control of which
nucleotide is added to each spot during the synthesis stage
(12,13). This methodology eliminates the need for masks,
but does require very high precision robotics as the print
head must return to the same spot many times, within
micron accuracy, during the course of synthesis. This
technology draws from the strength of each of the others
mentioned in that it is relatively easy to customize the
design of arrays, and yet, mass production of arrays is
possible using a large robotic system.
Fabrication of DNA Arrays Ex Situ
Some of the commercial vendors and nearly all of the
‘‘homebrew’’ microarray centers utilize and approach of
spotting DNA that was prepared ex situ. In the case of
cDNA arrays, the spotted material is prepared by polymerase chain reaction (PCR), whereas oligonucleotide
arrays are generated using oligos created via high throughput oligo synthesis. The DNA material is purified and
placed into a specific spotting buffer that is compatible
with the substrates being used.
The DNA is typically aliquoted out into multiwell plates
(96, 384, or 1536 wells /plate) to facilitate transfer by the
arraying robot. The buffer that the DNA is placed in has
several functions. First, the buffer stabilizes the DNA to
prevent it from degradation. Second, the buffer must
367
provide an appropriate surface tension to ensure that
the spots that are placed on the substrate are of a controllable size and uniform in shape. Of similar importance,
however, is that the buffer must provide conditions that are
compatible with the attachment chemistry that is going to
be utilized.
The DNA may either be coupled to the slide through
rather simple electrostatic interactions or via a specific
coupling reaction. Electrostatic interactions are mediated
by using a uniform positively charged substrate that
attracts the negatively charged DNA. Often the substrates
used are silylated to provide reactive amine groups on the
surface. Alternatively, one may coat the slides with a
chemical such as poly-L-lysine, which simply adsorbs onto
the substrate and provides a net positive charge. This type
of interaction is mass based. As such, there is a maximum
mass of DNA that can bind to any one spot on the substrate.
Longer DNAs will be represented by fewer copies than
shorter DNAs. To overcome this, it is possible to use more
specific interactions by using modifiers on the DNA that
will react with certain groups on the slide. The two most
common such modalities involve aldehyde or epoxide chemistry. In this method, the DNA is modified with a primary
amine group. The substrate has reactive aldehydes or
epoxides that will react specifically with the primary amine
to form a covalent bond (Fig. 3). This type of interaction is
molarity based, and as such, with the exception of steric
effects, the number of DNAs that bind per spot is relatively
equivalent regardless of length.
EQUIPMENT
The manufacture of microarrays, and their subsequent use
requires some very specialized equipment. Generally, a
facility that produces microarrays will require some
advanced robotics for fabrication. A laboratory that uses
arrays will require scanning devices to read the arrays.
Due to the relatively high costs of these pieces of equipment
it is common for many people to rely on core facilities for
some or all of the process.
Arraying Robots
Ex situ prepared DNAs are spotted onto the microarray
substrates via robotics (Fig. 4). Robotics are required to
accurately position the printing devices over the slides to
create the arrays. The majority of systems utilize pins and
direct contact to deposit the DNA material. In this system,
a printhead with several spotting pins in a defined arrangement is used to dip into the multiwell plates and pick up the
material to be spotted. The typical operation sequence of an
arrayer robot may include:
1. Dipping the printing applicators (pins) into a source
plate to pick up DNA samples. Each applicator picks
up a separate DNA sample from an individual well in
the plate. Typically 32–48 pins are used at one time.
2. Movement to a blot-station to preprint from the pins.
This step removes excess solution from the pins to
ensure that the spots that are printed onto the arrays
368
MICROARRAYS
O
(a)
(b)
NaBH4
C
Slide
H
NH DNA
C
Slide
NH2 DNA
H
H
O
−
C
N+H2-DNA
Slide
H
(c)
O
HC
Proton Transfer
CH2
Slide
NH2
DNA
OH
C
Slide
H
NH-DNA
Carbinolamine
O−
HC
Slide
CH2
N+H2
DNA
H2O
O+H2
C
Slide
H
NH-DNA
HO
HC
Slide
CH2
NH
DNA
H2O
H −
DNA
N+
OH
C
Slide
H
Iminium Ion
Figure 3. Covalent attachment of aminomodified DNAs to aledhyde (a) or epoxide (b)
slides is possible. An amino-modified DNA
reacts with an aldehyde surface by a Schiff’s
base reaction. The resultant Schiff base must
be reduced with an agent such as sodium
borohydride (NaBH4) to prevent reversal of the
reaction.
DNA
N
+ H3O
C
Slide
H
Imine (Schiff Base)
are uniform in size and do not run into one another
causing contamination.
3. Movement to the slide platform. The print head then
moves over the slide platform taking position over
the first slide.
4. Printing onto the arrays. The print head moves down
bringing the pins in contact with the slide. The DNA
solution held in the pins by capillary action is spotted
onto the slide. The printhead then moves to the next
slide position and again spots onto the slide. This
process is repeated until all of the slides on the
platform have been printed.
5. Washing the pins. The print head then moves the
pins to a wash station. Although there are many
configurations possible, the basic principle is to use
water or some other solution to remove the excess
liquid from the pins and then to dry the pins (under
vacuum or stream of air). This process may be
repeated several times to make sure there is no
carryover.
MICROARRAYS
369
rotation creating a mixing effect. The fluidics station is a
more advanced system that is required to introduce the
various labeling components and wash solutions required.
This station allows the user to keep the cartridge sealed
without having to attempt to pipette solutions in and out.
Scanners
Figure 4. A microarraying robot. The robotic arrayer prints DNA
onto glass slides with very high precision. Robots such as this have
extremely high accuracy, on the order of 10 mm or less.
6. Loading the next sample. The print head returns to
the source plate to pick up the next set of samples.
In a typical high throughput system, such as those
offered by Bio-Rad, BioRobotics, GeneMachines, Genetix,
and Telechem International, 48 pins are used at one time.
The entire operation sequence described above may take
3–4 min to complete for 100 arrays. Often arrays may
contain 20,000–40,000 spots. As such, a typical print run
may require 600 or more cycles through the operation
sequence, which can take as long as 30 h or more to
complete.
Hybridization and Fluidics Stations
Certain array platforms require that a specific hybridization and/or fluidics station be utilized. In the case of spotted
arrays (home-brew in particular), this is usually an option
and often a case of personal preference. In these cases, a
hybridization station may be utilized to improve mixing of
the hybridization solution over the array. The rate of
diffusion of a labeled nucleic acid in solution is actually
very low, and as such, some researchers prefer to use an
automated station that performs mixing of the solution.
In the case of Affymetrix GeneChip technology, a specific hybridization and fluidics station are required. The
hybridization station is simply a rotating incubator in
which the chips are placed. A bubble that is introduced
into the sealed array cartridge moves around during
While some microarray imagers such as the Perkin Elmer
ScanArray and GeneFocus DNAScope are confocal scanners, this is not a strict requirement. Confocal imaging
serves to eliminate extraneous signals, but reduces the
light gathering ability of the device. There are >10,000
commercial microarray scanners in the field capable of
reading standard glass microarrays. The leading scanner
makers include Agilent, Axon, Bio-Rad, GeneFocus, PerkinElmer, and other vendors. The laser scanner uses one or
more lasers with wavelengths appropriate to the fluorophores being used. The most commonly used fluorophores
for microarrays are cyanine 3 and cyanine 5 (or fluors with
equivalent spectra). Cyanine 3 has an absorbance maximum of 550 nm and emission maximum of 570 nm. There
are 2 main lasers used in scanners to excite this fluorphore:
‘‘Gre-Ne’’ (green neon) gas lasers and Nd:YAG (neodymium
doped yttrium aluminum garnet) frequency doubled solidstate diode lasers. Cyanine 5 has an absorbance maximum
of 650 nm and an emission maximum of 670 nm. There are
two main lasers used in scanners to excite this fluorophore:
standard He–Ne gas lasers and red diode lasers. Table 1
shows some of the characteristics of these two dyes, along
with two other popular dyes, Alexa 555 and Alexa 647,
which have spectra that are very similar to those of Cy3
and Cy5 respectively (Fig. 5).
Cyanine 3 and 5 have some important features that
make these dyes particularly suitable for use in microarray
analysis. The spectra of these dyes have little over lap and
can generally be separated from one another with little to
no cross-talk. In addition, these fluors have a somewhat
unique property in that they are brighter when dry than
when wet. Most fluorophores have the opposite behavior,
which is impractical for microarrays because the scanners
generally cannot handle wet preparations.
The other major class of microarray imager is a CCD
(charge coupled device) based system. In general, these
imagers use a white light source to excite the fluorophores.
The fluorescent light that is emitted is captured by the
CCD and converted into a digital image. Rather than
scanning the slide, a CCD based imager tiles together
several sections of the slide to create an image of the entire
surface. This tiling can create a stitching effect whereby
the ‘‘seams’’ of the images may not be completely smooth.
Table 1. Key Characteristics of the Most Commonly Used Fluorophores for Microarray Analysis
Fluorophore
Cy3
Cy5
Alexa555
Alexa647
Phycoerytherin
Excitation Max, nm
Emission Max, nm
Molar Extinction Coefficient
Molecular Weight
550
649
555
650
566
570
670
565
668
575
150,000
250,000
150,000
239,000
19,600,000
766
792
1,250
1,250
240,000
MICROARRAYS
550
600
650
wavelength (nm)
excitation/absorption (–)
(- -) emission/fluorescence
500
Alexa Fluor 647/Cy5
(- -) emission/fluorescence
Alexa Fluor 555
Cy3
excitation/absorption (–)
370
500
600
700
800
wavelength (nm)
Figure 5. Representative spectra of the fluors commonly used in spotted microarray experiments.
Alexa Fluor 555 and Cy3 are excited by green wavelengths of light whereas Alexa Fluor 647 and Cy5
are excited by red wavelengths of light. One green excited and one red excited fluor may be used at
the same time as there is little overlap in their excitation spectra.
This problem can be overcome with advanced lighting
systems and software.
Affymetrix arrays use a different labeling chemistry for
detection relying on the naturally occurring fluorescent
protein phycoerytherin. Phycoerythrin is a naturally
occurring pigment protein from light harvesting algae that
absorbs strongly at 566 nm and has an emission peak at
575 nm. It is a very bright fluorophore having a molar
extinction coefficient that is 80 times as high as the standard Cy3 and Cy5 molecules. The limitation of this molecule is that it is also 200 times larger, making the number
of molecules that can be incorporated per sequence
much less. As such, this molecule can only be applied to
the DNA posthybridization for fear that it would create
steric interference.
MICROARRAYS AS MEDICAL DEVICES
To date, microarrays have mostly found use in basic
research applications, and have yet to make a strong
impact on the diagnostic market. [During the preparation
of this text, Roche received FDA clearance for the first ever
array based diagnostic chip. The AmpliChip CYP450 based
on the Affymetrix platform was approved in January
of 2005 (see http://www.roche.com/med-cor-2005-01-12).]
Microarrays have indeed been used to study many diseases
including various cancers, cardiovascular disease, inflammatory disease, psychiatric disorders and infectious disease. This basic research will ultimately lead to the
identification of potential therapeutic markers for drugs
of for diagnostics. The potential of microarrays extends
beyond target discovery, however, and will eventually
impact on the way that medical care is performed.
Target Discovery
The use of microarrays in basic research laboratories
has often focused on target discovery. In these applications, microarrays are used to profile a particular
disease where disease tissues are compared to healthy
tissues either from the same patient or from a separate
test population. In such experiments, the goal is to find
genes that are differentially regulated (either up or down)
in the disease state compared to a healthy tissue. Such
genes are thought to be involved in the disease state or in
the cellular response to the disease. As such, these genes
are potential diagnostic markers and may also represent
drug targets.
Drug/Lead Discovery
Microarrays can also be used once the target has been
identified. It is possible to use microarrays to screen potential therapeutic compounds, for example, to determine
which candidates reverse the pattern of gene expression
that is indicative of disease. Microarrays have been even
more effective in looking at toxicity of lead compounds. One
of the leading contributors to failure of a pharmaceutical
compound is toxic or off target events. Microarrays have
proven useful in screening for the up-regulation in toxicity
related genes. In addition, it is possible to determine if the
compound creates other effects that while not toxic per se
could cause undesirable side effects from nonspecific interactions. Often toxicity models are tested in model organisms such as rats or dogs. Several toxicity specific arrays
have been developed that allow for profiling of genes in
these model systems rather than human cells.
Diagnostics and Prognostics
One of the more promising areas for microarrays to
have direct impact as a medical device is in the area of
diagnostics and prognostics. As mentioned under target
discovery, basic research has often strived to look for a
panel of genes that can be used as a molecular fingerprint
of a disease. There are numerous publications in
which researchers have attempted to use molecular
profiles to correlate to patient outcome, disease
state, tumor type, or any of several other factors. DNA
371
microarrays are particularly well suited to this type of
analysis. Many complex diseases are multifactoral;
rather than a single prognostic or diagnostic marker being
present, it may be necessary to look at several genes at one
time. Microarrays allow for identification of a panel of
genes, which when looked at together may provide diagnostic or prognostic power. Although it has not become
common practice yet, there are examples of microarrays
being used to prescreen patients on the basis of a molecular
profile (14).
Other attempts are being made at using microarrays to
study infectious disease. Often times a patient may present
with a set of symptoms that could be indicative of several
different infectious agents. It is possible to prepare a
microarray that would identify the agent as well as to
subtype the bacterium or virus on the basis of pathogenicity. This particular application may prove very useful in
identifying not only the infectious agent, but also the best
course of treatment.
Pharmacogenomics and Theranostics
A concept that is gaining in popularity is pharmacogenomics or theranostics (15). Both of these terms refer to
the idea of tailoring a patient’s treatment or therapy on the
basis of their genetic makeup. Many pharmaceuticals on
the market have not known any potentially serious side
effects in a subset of patients. In addition, there are typically at least some patients that are nonresponders to a
particular treatment. These effects are often times the
result of the patient’s genetic make-up. Most of the work
in this area has focused on genotyping: looking at certain
variable regions of DNA and determining which variants
are present in people who have negative reactions or in
people who respond well to a treatment. It is hoped that in
the near future it will be possible to screen a patient and
determine which of a panel of drugs will be most beneficial.
Perhaps even more important, it will be possible to prevent
serious negative outcomes by avoiding treatment of a
patient that will have a poor reaction to a drug. Theranostics also involves monitoring a patient through a course of
treatment. It is possible that a patient can be screened
during treatment to ensure that the therapy is working as
expected. If a change occurs, the physician would be able to
alter the therapy to ensure that the disease is treated in the
most effective way possible.
SUMMARY
Microarrays provide a means to screen hundreds to thousands of biological analytes in parallel. These analytes can
be DNA, RNA, or protein. DNA microarrays allow for rapid
profiling of gene expression. While there are a few competing platforms that can be utilised, the basic principles are
the same: RNA from a biological sample is extracted,
labeled and applied to an array of DNA probes. Signals
generated from the array indicate which genes are active
and which are not. The ability to screen multiple tissues or
patients make microarrays particularly well suited to
uncovering the complex gene networks involved in disease.
While typically used in basic research applications for
target or marker discovery, the future will most likely
see microarrays used in diagnostic applications and for
tailoring medical treatment.
BIBLIOGRAPHY
1. Fodor SP, Rava RP, Huang XC, Pease AC, Holmes CP, Adams
CL. Multiplexed biochemical assays with biological chips.
Nature (London) 1993;364:555–556.
2. Schena M, Shalon D, Davis RW, Brown PO. Quantitative
monitoring of gene expression patterns with a complementary DNA microarray. Science 1995;270:467–470.
3. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA,
Kampa D, Piccolboni A, Sementchenko V, Cheng J,
Williams AJ, Wheeler R, Wong B, Drenkow J, Yamanaka
M, Patel S, Brubaker S, Tammana H, Helt G, Struhl K,
Gingeras TR. Unbiased mapping of transcription factor
binding sites along human chromosomes 21 and 22 points
to widespread regulation of noncoding rnas. Cell 2004;116:
499–509.
4. Hegde P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R,
Hughes JE, Snesrud E, Lee N, Quackenbush J. A concise
guide to cdna microarray analysis. Biotechniques 2000;29:
548–550, 552–544, 556 passim.
5. Van Gelder RN, von Zastrow ME, Yool A, Dement WC,
Barchas JD, Eberwine JH. Amplified rna synthesized from
limited quantities of heterogeneous CDNA. Proc Natl Acad
Sci USA 1990;87:1663–1667.
6. Schena M. Microarray analysis. Hoboken: John Wiley &
Sons; 2003.
7. Wang Y, Wang X, Guo SW, Ghosh S. Conditions to ensure
competitive hybridization in two-color microarray: A theoretical and experimental analysis. Biotechniques 2002;32:
1342–1346.
8. Quackenbush J. Computational analysis of microarray data.
Nature Rev Genet 2001;2:418.
9. Singh-Gasson S, Green RD, Yue Y, Nelson C, Blattner
F, Sussman MR, Cerrina F. Maskless fabrication of lightdirected oligonucleotide microarrays using a digital micromirror array. Nat Biotechnol 1999;17:974–978.
10. Gao X, LeProust E, Zhang H, Srivannavit O, Gulari E, Yu P,
Nishiguchi C, Xiang Q, Zhou X. A flexible light-directed
DNA chip synthesis gated by deprotection using solution
photogenerated acids. Nucleic Acids Res 2001;29:4744–
4750.
11. LeProust E, Pellois JP, Yu P, Zhang H, Gao X, Srivannavit O,
Gulari E, Zhou X. Digital light-directed synthesis. A
microarray platform that permits rapid reaction optimization on a combinatorial basis. J Comb Chem 2000;2:349–
354.
12. Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ,
Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer
MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniants SB,
Cavet G, Walker WL, West A, Coffey E, Shoemaker DD,
Stoughton R, Blanchard AP, Friend SH, Linsley PS. Expression
profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol 2001;19:342–347.
13. Hughes TR, Shoemaker DD. DNA microarrays for expression
profiling. Curr Opin Chem Biol 2001;5:21–25.
14. Schubert CM. Microarray to be used as routine clinical
screen. Nat Med 2003;9:9.
15. Picard FJ, Bergeron MG. Rapid molecular theranostics in
infectious diseases. Drug Discov Today 2002;7:1092–1101.
See also DNA
REACTION.
SEQUENCE;
MICROBIOREACTORS;
POLYMERASE
CHAIN
Download