MICROARRAYS Thymine 361 Adenine H H2C O H N H H N H N N N N N O H To DNA Chain Cytosine Guanine H H N H H N O N H N N N O To DNA Chain MICROARRAYS NEIL WINEGARDEN University Health Network Microarray Centre, Toronto Ontario, Canada INTRODUCTION Microarrays allow for the simultaneous, parallel, interrogation of multiple biological analytes. Originally, microarrays were devised as a method by which gene expression could be measured in a massively parallel manner (all the genes in the genome at once), however, recent advances have demonstrated that microarrays can be used to interrogate epigenetic phenomena, promoter binding, protein expression, and protein binding among other processes. The overall process is reliant upon the manufacture of a highly ordered array of biological molecules, which are typically known entities. The features of this array behave as probes, which react with and bind to the unknown, but complimentary material present in a biological sample. Here we will focus specifically on gene expression (deoxyribonucleic acid, DNA) microarrays, which can be used to assay the activity of thousands of genes at a time. In 1993, Affymetrix published a novel method of using light directed synthesis to build oligonucletide arrays that could be used for a variety of biological applications (1). Shortly thereafter, a group lead by Patrick Brown and Ron Davis at Stanford University demonstrated that robotically printed cDNA arrays could be used to assay gene expression (2). Now, more than a decade after this initial work was made public, both types of DNA array are commonly found in genomics laboratories. BASIC PRINCIPLES A DNA microarray contains a highly ordered arrangement (array) of several discrete probe molecules. Generally, the H N H N H Figure 1. Watson–Crick base pairing interactions. During hybridization, specific base-paring interactions occur by which Thymine (T) binds specificly to Adenine (A) and Cytosine (C) binds specifically to Guanine (G). The binding of these bases to one another is mediated by hydrogen bonding as shown. The GC base pairs are stronger by virtue of the three hydrogen bonds formed compard to only two for AT. identity of these probes, be they cDNA or oligonucleotides, is either known or can be determined readily. The probes are deposited by some means (see the section Fabrication of Microarrays) onto a solid-support substrate such as glass or silicon. DNA microarrays take advantage of a basic characteristic of DNA, namely, the ability of one strand of DNA to find its complementary strand in solution and bind (hybridize) to it. This hybridization event is highly specific following standard Watson–Crick base pairing rules (Fig. 1). Gene Expression With some exceptions, the genetic makeup of every cell in an organism is the same. Each cell has the same complement of genes, which comprise the organism’s genome. The subset of genes that are active in a particular cell dictate that cell’s function. When we say a gene is active or expressed, we mean that particular gene is being transcribed. Transcription is the process by which ribonucleic acid (RNA) polymerase II (an enzymatic complex) reads a gene and creates a complementary copy of messenger RNA (mRNA). The more a gene is transcribed, the more copies of mRNA will be present in a cell. Thus genes that are highly active in the cell will be represented by multiple copies of mRNA, whereas genes that are inactive in the cell will have very few or no copies of mRNA in the cell. Microarrays function to measure the amount of mRNA present in the cells of a biological sample such as a tumor biopsy. The activity of the genes is inferred from this measure. Gene Structure In higher eukaryotes, somatic cells (diploid) have two copies of every gene: one maternally and the other Encyclopedia of Medical Devices and Instrumentation, Second Edition, edited by John G. Webster Copyright # 2006 John Wiley & Sons, Inc. 362 MICROARRAYS paternally derived. In the context of the diploid cell, each copy is termed an allele. In the case where both inherited alleles are the same for a given gene, that gene is said to be homozygous. If the two alleles are different, then the gene is heterozygous. Alleles may be either dominant (phenotypically manifested regardless of what the other allele is), or recessive (phenotypically manifested only in the absence of a dominant allele). In the case of a heterozygous gene, the dominant allele will be phenotypically manifested and the recessive allele will not. If both alleles are different, but dominant, they are termed codominant and both alleles will elicit a phenotype. The gene is comprised of DNA, which is double stranded. One strand is the sense strand or the strand that encodes the information, which will be ultimately represented in mRNA. The other strand is said to be anti-sense and is the strand of DNA that is actually read by the RNA polymerase to generate the mRNA. DNA has directionality: A gene is transcribed starting at the 30 end of the antisense strand of the DNA and is read toward the 50 end. The resultant mRNA is made from the 50 to the 30 end. Genes are regulated by specific sequences of DNA that lie outside the coding region of the gene. The first such sequence is the promoter. Promoters bind the transcriptional machinery (RNA polymerase II) that performs transcription. Promoters are found 50 (upstream) of the gene and are proximal to the transcription start site. An additional class of regulatory sequence called an enhancer may be associated with the gene. Enhancers may lie upstream, downstream, or internal (usually in noncoding regions termed introns) to the gene (3). Specific transcription factors bind enhancers and promote recruitment or activation of the basal transcriptional machinery. It is the coordinated function of the promoter and enhancer, with the transcription factors that bind them, that control if a gene is active or not within the cell. Thus, genes are regulated, and can be turned on, off, or modulated up or down by the regulatory mechanisms of the cell. RNA Isolation Ribonucleic acid must be isolated from cells in order to prepare the material for hybridization to the array. A cell contains three major species of RNA: mRNA, transfer RNA (tRNA), and ribosomal RNA (rRNA). Together they are refered to as total RNA. For the purpose of gene expression experiments with microarrays, the mRNA is the species we are interested in and represents 1% of total RNA. In order to isolate total RNA from cells, one of two main modalities is used: solution- or solid-phase extraction. In solution-phase methods, cells are lysed in the presence of isothiocyanate in order to inactivate any RNases (naturally occurring enzymes that nonspecifically degrade RNA). The lysate is then extracted with an acidified phenol: chlorophorm:isoamyl alcohol solution. The RNA selectively partitions to the aqueous phase of this mixture away from proteins and DNA. The aqueous phase is removed and RNA is precipitated out of solution using isopropyl alcohol at high salt concentrations. Solid-phase methods make use of the variable binding activity of RNA to a silica matrix at high and low salt conditions. Cells are again lysed in the presence of isothiocyanate. The high concentration of isothiocyante used in this methodology not only inactivates the RNases, it also selectively precipitates proteins out of solution. The lysate is applied to a column containing a silica filter at the bottom. The lysate is pulled through the column via vacuum or centrifugation, thereby removing the proteins and cellular debris. In this method, DNA may also bind to the column, and as such contaminating DNA is removed by the application of DNase. The column is washed to remove any further contaminants, and then the RNA is eluted from the filter using water. mRNA Structure In eukaryotic cells, mRNA has a unique feature that allows researchers to either purify it away from the rest of the RNA or to direct enzymes to it specifically while avoiding the other RNA species. This feature is the polyA tail. The polyA tail is a long stretch of adenine nucleotides found at the 30 end of mRNA, which is added post-transcriptionally. Such stretches of adenine nucleotides do not typically occur naturally in genes or other RNA species. The polyA tail will hybridize to an artificially generated oligonucleotide made up of a series of deoxythymine nucleotides (oligo-dT). If the oligo-dT is coupled to a support matrix (e.g., beads) the mRNA can be pulled out of solution thereby purifying it away from the rest of the total RNA. While some researchers prefer to include this step in their process, it is generally not a requirement for microarray analysis. Rather than purify the mRNA, the oligo-dT can be used as a primer for creating an enzymatically labeled complement of the mRNA. Labeling In order to render the RNA visible to a detection system, it is necessary to label it in some manner. While some laboratories choose a direct methodology of chemically labeling the mRNA itself, it is most common to work via a cDNA or cRNA intermediate that is labeled enzymatically. The simplest methodology involves creating labeled cDNA. In this technique, the RNA is reverse-transcribed (DNA is made from an RNA template) by an enzyme named reverse transcriptase (RT) (for sample protocols, see Ref. 4). Reverse transcriptase requires a small oligonuclotide primer that binds to the RNA creating a short doublestranded region (an RNA:DNA hybrid). In order to ensure that the RT enzyme reads only the mRNA, the polyA tail of mRNA is exploited by using a primer made of a stretch of several (usually 20–25) thymine residues. The resultant DNA is the complement of the RNA and it is thus referred to as complementary DNA (cDNA). The RT reaction requires that free nucleotides (each of A, C, G, and T) are present to create the DNA. If one of these nucleotides is chemically modified with some detectable molecule (such as a fluorophore), then it will be incorporated into the cDNA strand, and that cDNA will be detectable with a fluorescent reader. Alternatively, it is possible to use a reactive molecule (such as amino-allyl) in place of a fluorescent molecule. After incorporation into the DNA, the DNA is then coupled to a reactive form of a fluorophore MICROARRAYS (usually a reactive ester). This latter implementation of the method has an advantage in that the amino-allyl modifier is a much smaller chemical group that is incorporated much more efficiently into DNA than a bulky fluorescent moiety. Often the amount of RNA available is limiting and cannot be detected by standard means. In this case, it is generally necessary to amplify the amount of material present. A typical microarray experiment usually requires 5–10 mg of total RNA in order to be able to obtain useful data. When researchers are working with diminishingly small samples, such as from a needle biopsy or a fine needle aspirate, it is often not possible to obtain this amount of total RNA. To overcome this limitation, various amplification strategies have been adopted. The most popular method of amplification is based on the protocols of Dr. James Eberwine from the University of Pennsylvania (5). In this technique, RNA is converted into cDNA using the same method described above with two key differences: (1) there is no labeled nucleotide incorporated and (2) the oligo-dT primer has another short sequence of DNA appended to it that represents a T7 promoter region. The T7 promoter is a bacteriophage-derived sequence that initiates transcription by T7 polymerase. After the cDNA is created, a second strand is generated creating a doublestranded artificial gene with a T7 promoter on one end. This artificial gene is then transcribed by the addition of T7 polymerase, which is allowed to make numerous transcripts of the gene. The transcripts that are obtained can either be labeled directly, or they in turn can be turned into labeled cDNA using standard methodologies described above. The resultant RNA is now actually the opposite sequence of the original mRNA, so it is said to be cRNA (complementary RNA). The Affymetrix GeneChips utilize an amplification system based on T7 transcription as described above. During the production of cRNA, biotin modified nucleotides are incorporated. Posthybridization (see the section on Hybridization) the arrays are stained with a streptavidin bound fluorophore. Streptavidin is a protein that specifically and tightly binds to biotin molecules, allowing the fluorophore to be attached to the cRNA. A clean-up step is required to remove any free, unbound detection molecules. This step helps to ensure that background signal is kept to a minimum. There are two main methods by which such purification is performed, one is based on standard nucleic acid purification systems, similar to the RNA isolation method described earlier, and the other is based on size exclusion. For the first method, a nucleic acid purification column is utilized. The cRNA or cDNA binds to the silica filter, but the less charged free nucleotides flow through. After a series of washes, the cRNA or cDNA is eluted from the column. The second methodology utilizes a membrane filter (usually incorporated into a column) that has a defined pore size. The large cRNA and cDNA molecules are retained on the membrane; where as the small free nucleotides flow through. The column is then inverted and the cDNA or cRNA is then eluted off the column by flowing wash buffer in the opposite direction. This purified labeled material is then ready for hybridization to the array. 363 Hybridization Microarray technology relies on the natural ability of single-stranded nucleic acids to find and specifically bind complementary sequences. Purified labeled material is exposed to the spotted microarray and the pool of labeled material ‘‘self-assembles’’ onto the array, with each individual nucleic acid (cDNA or cRNA) species hybridizing to a specific spot on the array containing its complement. The specificity of this interaction needs to be controlled, as there may be several similar and related sequences present on the array. The control of hybridization specificity is accomplished through the adjustment of the hybridization stringency. Highly stringent conditions promote exact matches where as low stringency will allow some related, but nonexact matches to occur. In a microarray experiment, stringency is typically controlled by two factors: the concentration of salt in the hybridization solution and the temperature at which hybridization is allowed to occur. High salt concentrations tend to lead to lower stringency of hybridization. Both strands of nucleic acid involved in the hybridization event contain a net negative charge. As such, there is a small repulsion between these two strands, which needs to be overcome to bring the labeled nucleic acid into proximity of the arrayed probe. The salt ions cluster around the nucleic acid strands creating a mask and shielding the electrostatic forces. Higher salt concentrations have a greater masking effect, thus allowing hybridization to occur more easily. If salt concentrations are high enough, the repulsion effects are completely masked and even strands of DNA that have low degrees of homology may bind to one another. Temperature is another important factor. Every doublestranded nucleotide has a specific temperature at which the two strands will ‘‘melt’’ or separate. The temperature at which exactly 50% of a population of pure double-stranded material separates is termed the melting temperature (Tm). The Tm of a nucleic acid is controlled partially by the length of the strand and partially by the percentage of G and C residues (termed the GC content). The G and C residues bind to one another as a Watson–Crick base pair. This pairing interaction is the result of three hydrogen bonds forming. The other potential base pair in a DNA hybrid, A:T, only has two such hydrogen bonds and thus the greater the GC content of the nucleotide, the more stable the hybrid. At very low temperatures, nonstandard Watson–Crick base pair interactions can also occur causing noncomplementary sequences or sequences that are <100% matched to form hybrids. It is necessary therefore to find a temperature that will prevent or melt nonspecific hybrids, but allow the specific interactions to occur. For a microarray, this presents a challenge as there are thousands of specific interactions that must be accommodated. In the case of oligonucleotide arrays, the design of the oligonucleotides to be spotted takes this issue into account and probes are designed that tend to fall within a narrow window of potential melting temperatures. cDNA arrays are more difficult because the sequences spotted vary greatly in both GC content and length. In such cases, it is often true that conditions that represent somewhat of a ‘‘compromise’’ are necessary. 364 MICROARRAYS Hybridization kinetics can generally be modeled as shown in Eq. 1(6). The change in the amount of hybridization product LS over time is a function of the decrease in the concentration of labeled target L and free spotted DNA S over time. To simplify the equation, the rate of hybridization is equal to some rate constant k multiplied by the product of the concentrations of L and S. Thus hybridization rate is a direct function of the concentrations of the labeled target molecule and the DNA probe in the spot. d½LS d½L d½S d½L S ¼ ¼ ¼ k ½L½S dT dT dT dT ð1Þ In the case of an oligonucleotide microarray, it is often the case that the number of spotted DNA molecules is in great excess to the number of target molecules. As such, the concentration of the spotted DNA probe remains fairly constant and can be considered part of the constant k. Thus the equation for hybridization can be simplified as shown in Eq. 2 (6), where the rate of hybridization is typically driven by the concentration of the labeled target molecules alone. d½LS ð2Þ ¼ k0 ½L dT In the case of two color oligonucleotide arrays, the two labeled samples compete for hybridization to the probe that remains in excess and thus hybridization is simply a reflection of the concentrations of each of the two labeled targets L1 and L2 [Eq. 3(6)]. d½L1 S k01 ½L1 ¼ d½L2 S k02 ½L2 ð3Þ The situation becomes somewhat more complex when the probe molecules are not in excess of the target molecules. This is often the case with cDNA arrays. In these cases, the concentration of the spotted probe does change significantly as hybridization occurs and thus each of the labeled targets L1 and L2 hybridize in a manner described by Eqs. 4 and 5 (7). d½L1 S ¼ k1 ½S½L1 ¼ k1 ð½S0 ½L1 S ½L2 SÞð½L01 ½L1 SÞ dT ð4Þ d½L2 S ¼ k2 ½S½L2 ¼ k2 ð½S0 ½L2 S ½L1 SÞð½L02 ½L2 SÞ dT ð5Þ In such a case, the rate of hybridization is affected by the change in the concentrations of the spotted probe from the initial concentration S0, where S0 changes as the probe molecules are bound by either L1 and L2. When looking at differential hybridization between the two targets, we can represent the kinetics as shown in Eq. 6 (7). d½L1 S k1 ð½L01 ½L1 SÞ ð6Þ ¼ d½L2 S k2 ð½L02 ½L2 SÞ If one is to assume that the two fluorescent molecules used in a two-color experiment behave similarly, and that the rate of hybridization of the two labeled targets is the same, we can say k1 ¼ k2. It has been demonstrated that under ideal conditions and when the hybridization reaction is allowed to continue to equilibrium that the ratio of the concentrations of each possible hybrid L1S and L2S is equivalent to the ratio of the original concentrations of the two targets L1 and L2 [Eq. 7 (7)]. This point is important because it is the basis for microarrays to work, assuming that the ratios read from the scans during data analysis are reflective of an actual biological condition. ½L1 S ½L01 ¼ ½L2 S ½L02 ð7Þ The goal of microarray hybridization is to produce a result for which the signal obtained from specific hybridization is very strong when compared to any background signal that may be obtained by a nonspecific adsorption of labeled material to the substrate, or nonspecific binding to spotted elements. To reach this goal, it is common to use certain nonspecific blocking reagents in the hybridization solution. Frequently, nucleic acids from sources known not to contain any sequences that will interfere with specific hybridization are used. For example, in a hybridization of a human sample to an array, one might use yeast tRNA and salmon sperm RNA as competitors to bind any regions of the substrate or probes that have a generic nucleic acid binding capacity. These nucleic acids are nonlabeled and will therefore not contribute any signal when the array is scanned. Washing Unlike traditional northern blots, the majority of the stringency of a microarray assay is accomplished at the hybridization step. The washing step of a microarray experiment is a critical operation, but is important more as a means to remove unbound material in order to reduce background signal than it is to control the specificity of the signal obtained. Wash buffers generally contain two components: a salt solution and a detergent. The salt solution, frequently sodium chloride sodium citrate (SSC), is set to a concentration that supports the maintenance of the hybridized molecules. This concentration most frequently falls in the 1 to 2 concentration range with some labs using as low of a concentration as 0.1 (1 SSC contains 0.15 M NaCl and 0.015 M Na-citrate). The detergents used in wash buffers help to remove the unbound fluorescent molecules that would normally stick to the surface of the slide. The detergent acts as a surfactant and helps to isolate and remove the unbound fluorescent material. Typically, an anionic detergent such as sodium dodecyl sulfate (SDS) is used for this purpose. The temperature for the washes varies depending on the stringency of the wash solution being used. As with hybridization, the combination of temperature and salt concentration determines the overall stringency of the washes. After washing the microarrays, it is generally necessary to perform a rinse. The rinse is typically a solution similar to the wash solutions without the detergent. If detergent remains on the slide after drying, the solution may fluoresce particularly if the labeled material has been trapped in detergent micelles. MICROARRAYS Scanning It is necessary to use an imaging device to detect the fluorescent labels present on the hybridized microarray. In general, the imaging device must contain an excitation light source, an emission filter, and a light gathering device. During scanning, the labeled material, be it fluorescent or some other form of detectable molecule, is imaged and the resultant data is converted to a digital image. The optimal resolution at which the image is scanned is dependent on the size of the features and on their interspot spacing. A general rule of thumb is that the resolution of the image should be such that the pixels represent onetenth of the diameter of the spot. For spotted arrays, for example, the features tend to be on the order of 100 mm in diameter and thus 10 mm resolution is frequently used. Affymetrix’s technology, however, can generate features that are 11 mm square; in this case, a much higher resolution of down to 1 mm is required. Most commonly, the image that is generated is a 16-bit grayscale TIFF (Tagged Image File Format) image (Fig. 2). The 16-bit depth of the image provides a total of 65,536 gray levels providing a possibility of more than five orders of magnitude range. The TIFF format is important because it is a universally accepted format that is LOSSLESS; that is, even with compression, this format retains all image information. The images can then be imported into the appropriate image quantification software. Image Quantification After scanning, it is necessary to extract data from the images. Image quantification generally starts with segmentation. Segmentation is the process by which pixels that represent the signal are isolated from those that represent background. During segmentation, the discrete areas of the image that represent the spotted DNA material are identified and digitally isolated from the remainder of the image. The intensities of all of the 365 pixels in the individual spot are averaged to determine the overall spot intensity. This spot intensity is proportional to the amount of material hybridized to that region, with higher intensities resulting from increased numbers of hybridized molecules. Each spot, for each channel (in the case of two color microarrays) is quantified, and the resultant data are tabulated. Other data may also be extracted at this stage. It is common to also obtain intensity data for the area outside of the individual spots. This value represents the background of the image and indicates the amount of signal that would have been obtained regardless of a specific hybridization event. It is common, however, not universal, to subtract the background values from the signal intensities of the spots. There are several means by which segmentation can be carried out. In the most basic setup, a fixed shape (usually a circle) is placed over each spot. The entire complement of pixels lying within the circle is used to determine the average intensity. Pixels lying outside of one of these circles are deemed to be background signal. More advanced segmentation algorithms attempt to account for the fact that most of the spotted features on a microarray are in fact not perfectly uniform. Spots may deviate from a true circular shape, or may have regions within the circle in which DNA was not attached (creating a spot that is reminiscent of a doughnut). In addition, it is not uncommon for each of the spots to have some degree of variance in their diameter. The more advanced methods utilize various algorithms and statistics to determine which pixels actually represent signal and which are more representative of background. Image quantitation software then processes the entire image and produces a table of results that represents the signal, and the background for each feature on the array. These packages may also export various other data, which can be used in quality control analysis such as standard deviations, coefficients of variance, circularity, or uniformity of the spot, and so on. This data table can then be processed as part of the data analysis. Figure 2. Arrays imaged on a microarray scanner are presented as 16-bit grayscale TIFF images. The picture shown represents a small subsection of a larger array. Each spot is 100 mm in diameter and the spot-to-spot spacing is 200 mm in this image. The image was scanned at 10-mm resolution. 366 MICROARRAYS Data Analysis An exhaustive description of the process of DNA microarray data analysis is far beyond the scope of this article (for an excellent review see Ref. 8). The exact process followed depends greatly on the experimental design and the question being addressed. There are, however, some basic principles that tend to be fairly common in dealing with microarray data: statistical analysis of data, supervised and/or nonsupervised data mining, data visualization, and validation are all key components. Statistical analysis of microarray data comes into play in two main areas. The first is to determine which spots are reliable and provide sufficient data. Spots that have a high degree of variance across replicates, for example, are likely not able to provide reliable data. These hypervariable genes or signals need to be filtered from the data so as to not skew the results of data mining. Statistics may also play a role in supervised data analysis. There are two major categories of data mining: supervised and nonsupervised. Supervised data mining utilizes algorithms in which the user imparts restrictions on how the data is grouped. For example, in an experiment where a cohort of patients was tested in which one group was healthy and the other group was afflicted with a particular disease, one would indicate to the algorithm which arrays were from the healthy patients and which were from the patients with disease. The algorithm then tests the data to find genes that are markers for the diseases. Specifically, each gene is tested to see if the expression levels for that gene are statistically significantly different in each of the two patient groups. The goal is to find a series of genes that can act as markers that are diagnostic of the disease. In nonsupervised clustering, the algorithm is not given any indication as to how the individual samples are related. In true nonsupervised clustering, the algorithm is not even told how many groups exist. The data are analyzed and the samples are grouped based on similarity metrics. The classical methods of nonsupervised clustering include hierarchical clustering and principal components analysis (PCA). The algorithms generally display the data via some visualization pattern such as the canonical ‘‘plaid’’ expression patterns seen from hierarchical clustering. The researcher then overlays the grouping information onto the patterns provided to see if the individual groups naturally separate from one another. In other cases, this methodology may be being used to determine how many groups there truly are, as the researcher may not have this information a priori. In such cases, the groups can then be further examined to see if there are differences in treatment response, survival, or any other characteristic desired. Generally, after this technique is performed one will attempt to look for clusters of genes in the patterns that distinguish between the different groups and again use these genes as markers. Regardless of the methodology utilized, it is extremely important to validate the data. Cross-validation strategies are various, but in their most basic form, one obtains a cohort of patients to profile. A subset of this cohort is used to look for potential markers. Once the markers have been identified, the remaining patients are tested and only the identified markers are used to try and group the patients. If the markers are able to stratify the patients into their appropriate groups, then the markers are considered to be viable and may provide beneficial diagnostic ability. On occasion, however, the validation set is not properly grouped. In such cases, the markers are only useful for the narrow set of patients used in the initial tests and more testing is required to find a viable set of markers. FABRICATION OF MICROARRAYS There are two main methodologies for manufacturing microarrays, which differ in the means by which the probe material spotted onto the arrays is prepared. In one methodology, the DNA to be spotted is generated in situ using either standard or modified phosphoramidite chemistry. (Phospohoramidites are reactive forms of each of the nucleotides that make up DNA. Phosphoramidite chemistry is a well-defined process by which moderate length stretches of DNA can be created with any specific sequence.) This method is used by Affymetrix and Agilent, the two largest commercial suppliers of microarrays, although both groups use a different approach to the in situ synthesis. Other groups use ex situ synthesis, whereby the DNA material is either prepared as PCR products (cDNA) or oligonucleotides manufactured using standard phosphoramidite synthesis. Once this material is prepared it is spotted onto the array substrate using either contact or noncontact printing methodologies. Amersham (now GE Healthcare) and Applied Biosystems use this methodology to make microarrays as do almost all of the ‘‘homebrew’’ laboratories that make microarrays in house. Fabrication of DNA Arrays In Situ There are two main approaches to the generation of microarrays by in situ synthesis of DNA: photolithography and inkjetting. Affymetrix, the industry leader uses a proprietary photolithography process to mask off areas of the array, protecting some areas, and leaving others available for the DNA synthesis reaction to occur (1). This is a multistep process requiring several masks per array to be made. Each synthesis reaction is performed sequentially. For each nucleotide position, there are four possible masks (one for each of A, G, C, and T). Thus, an array comprised of 25-mer oligonucleotides would require 100 masks to complete the process (typically 70 are required for an array due to the sequences used). Affymetrix uses a modified phosphoramidite chemistry for synthesis of the oligonucleotide chains; whereas standard phospohoramidite chemistry uses acid labile protection groups, the Affymetrix technology utilizes groups that can be removed by ultraviolet (UV) light. The Affymetrix technology allows for extremely high density arrays of hundreds of thousands of features to be prepared on very small substrates of <1 cm2. Other groups have developed technologies that allow them to get around the need for multiple masks to be made for each array design. The pioneer in this area was Nimblegen, who uses digital light processor (DLP) MICROARRAYS micromirrors to create the masks (9). Each of these DLP units (used typically in AV projectors and large screen televisions) comprises thousands of tiny (10 mm2) micromirrors. The micromirrors can be individually addressed and the angle of the mirrors changed to allow light to pass through. In the ‘‘open state’’, the micromirror directs light onto the surface of the microarray, allowing DNA synthesis to occur. In the ‘‘closed state’’, the micromirror reflects light away from the surface, disallowing DNA synthesis. A computer controls the mirrors and thus each DLP unit has a near infinite number of combinations that can each be controlled, and as such, a single unit can create any pattern desired on the array. Nimblegen uses the same chemistry as Affymetrix, using light activated deprotection of the phosphoramidites. A somewhat newer entry into this area is Xeotron (now part of Invitrogen). Xeotron also uses micromirror DLPs to address the masks, however, they have also incorporated small microfluidic channels on their chips. Each feature is placed in a microscopic well on the chip. Rather than using the modified phosphoramidite chemistry of Affymetrix and Nimblegen, Xeotron uses standard chemistry, but has instead employed a caged acid that can be freed by light (10,11). As such, the acid that controls deprotection of the nascent oligonucleotide can be directed to specific locations by light. The Nimblegen and Xeotron technologies have the advantage of being highly amenable to custom array generation, however the Affymetrix technology is particularly well suited to mass production of a standard array. Each of these approaches has found customers in the marketplace. A third approach to in situ synthesis of the oligonucleotides involves ink-jet spotting. Agilent uses this technology (developed by Rosetta Inpharmatics) in which each of the reactive phosphoramidites (A, G, C, and T) are loaded in to a separate ‘‘ink-cartridge’’ to allow for control of which nucleotide is added to each spot during the synthesis stage (12,13). This methodology eliminates the need for masks, but does require very high precision robotics as the print head must return to the same spot many times, within micron accuracy, during the course of synthesis. This technology draws from the strength of each of the others mentioned in that it is relatively easy to customize the design of arrays, and yet, mass production of arrays is possible using a large robotic system. Fabrication of DNA Arrays Ex Situ Some of the commercial vendors and nearly all of the ‘‘homebrew’’ microarray centers utilize and approach of spotting DNA that was prepared ex situ. In the case of cDNA arrays, the spotted material is prepared by polymerase chain reaction (PCR), whereas oligonucleotide arrays are generated using oligos created via high throughput oligo synthesis. The DNA material is purified and placed into a specific spotting buffer that is compatible with the substrates being used. The DNA is typically aliquoted out into multiwell plates (96, 384, or 1536 wells /plate) to facilitate transfer by the arraying robot. The buffer that the DNA is placed in has several functions. First, the buffer stabilizes the DNA to prevent it from degradation. Second, the buffer must 367 provide an appropriate surface tension to ensure that the spots that are placed on the substrate are of a controllable size and uniform in shape. Of similar importance, however, is that the buffer must provide conditions that are compatible with the attachment chemistry that is going to be utilized. The DNA may either be coupled to the slide through rather simple electrostatic interactions or via a specific coupling reaction. Electrostatic interactions are mediated by using a uniform positively charged substrate that attracts the negatively charged DNA. Often the substrates used are silylated to provide reactive amine groups on the surface. Alternatively, one may coat the slides with a chemical such as poly-L-lysine, which simply adsorbs onto the substrate and provides a net positive charge. This type of interaction is mass based. As such, there is a maximum mass of DNA that can bind to any one spot on the substrate. Longer DNAs will be represented by fewer copies than shorter DNAs. To overcome this, it is possible to use more specific interactions by using modifiers on the DNA that will react with certain groups on the slide. The two most common such modalities involve aldehyde or epoxide chemistry. In this method, the DNA is modified with a primary amine group. The substrate has reactive aldehydes or epoxides that will react specifically with the primary amine to form a covalent bond (Fig. 3). This type of interaction is molarity based, and as such, with the exception of steric effects, the number of DNAs that bind per spot is relatively equivalent regardless of length. EQUIPMENT The manufacture of microarrays, and their subsequent use requires some very specialized equipment. Generally, a facility that produces microarrays will require some advanced robotics for fabrication. A laboratory that uses arrays will require scanning devices to read the arrays. Due to the relatively high costs of these pieces of equipment it is common for many people to rely on core facilities for some or all of the process. Arraying Robots Ex situ prepared DNAs are spotted onto the microarray substrates via robotics (Fig. 4). Robotics are required to accurately position the printing devices over the slides to create the arrays. The majority of systems utilize pins and direct contact to deposit the DNA material. In this system, a printhead with several spotting pins in a defined arrangement is used to dip into the multiwell plates and pick up the material to be spotted. The typical operation sequence of an arrayer robot may include: 1. Dipping the printing applicators (pins) into a source plate to pick up DNA samples. Each applicator picks up a separate DNA sample from an individual well in the plate. Typically 32–48 pins are used at one time. 2. Movement to a blot-station to preprint from the pins. This step removes excess solution from the pins to ensure that the spots that are printed onto the arrays 368 MICROARRAYS O (a) (b) NaBH4 C Slide H NH DNA C Slide NH2 DNA H H O − C N+H2-DNA Slide H (c) O HC Proton Transfer CH2 Slide NH2 DNA OH C Slide H NH-DNA Carbinolamine O− HC Slide CH2 N+H2 DNA H2O O+H2 C Slide H NH-DNA HO HC Slide CH2 NH DNA H2O H − DNA N+ OH C Slide H Iminium Ion Figure 3. Covalent attachment of aminomodified DNAs to aledhyde (a) or epoxide (b) slides is possible. An amino-modified DNA reacts with an aldehyde surface by a Schiff’s base reaction. The resultant Schiff base must be reduced with an agent such as sodium borohydride (NaBH4) to prevent reversal of the reaction. DNA N + H3O C Slide H Imine (Schiff Base) are uniform in size and do not run into one another causing contamination. 3. Movement to the slide platform. The print head then moves over the slide platform taking position over the first slide. 4. Printing onto the arrays. The print head moves down bringing the pins in contact with the slide. The DNA solution held in the pins by capillary action is spotted onto the slide. The printhead then moves to the next slide position and again spots onto the slide. This process is repeated until all of the slides on the platform have been printed. 5. Washing the pins. The print head then moves the pins to a wash station. Although there are many configurations possible, the basic principle is to use water or some other solution to remove the excess liquid from the pins and then to dry the pins (under vacuum or stream of air). This process may be repeated several times to make sure there is no carryover. MICROARRAYS 369 rotation creating a mixing effect. The fluidics station is a more advanced system that is required to introduce the various labeling components and wash solutions required. This station allows the user to keep the cartridge sealed without having to attempt to pipette solutions in and out. Scanners Figure 4. A microarraying robot. The robotic arrayer prints DNA onto glass slides with very high precision. Robots such as this have extremely high accuracy, on the order of 10 mm or less. 6. Loading the next sample. The print head returns to the source plate to pick up the next set of samples. In a typical high throughput system, such as those offered by Bio-Rad, BioRobotics, GeneMachines, Genetix, and Telechem International, 48 pins are used at one time. The entire operation sequence described above may take 3–4 min to complete for 100 arrays. Often arrays may contain 20,000–40,000 spots. As such, a typical print run may require 600 or more cycles through the operation sequence, which can take as long as 30 h or more to complete. Hybridization and Fluidics Stations Certain array platforms require that a specific hybridization and/or fluidics station be utilized. In the case of spotted arrays (home-brew in particular), this is usually an option and often a case of personal preference. In these cases, a hybridization station may be utilized to improve mixing of the hybridization solution over the array. The rate of diffusion of a labeled nucleic acid in solution is actually very low, and as such, some researchers prefer to use an automated station that performs mixing of the solution. In the case of Affymetrix GeneChip technology, a specific hybridization and fluidics station are required. The hybridization station is simply a rotating incubator in which the chips are placed. A bubble that is introduced into the sealed array cartridge moves around during While some microarray imagers such as the Perkin Elmer ScanArray and GeneFocus DNAScope are confocal scanners, this is not a strict requirement. Confocal imaging serves to eliminate extraneous signals, but reduces the light gathering ability of the device. There are >10,000 commercial microarray scanners in the field capable of reading standard glass microarrays. The leading scanner makers include Agilent, Axon, Bio-Rad, GeneFocus, PerkinElmer, and other vendors. The laser scanner uses one or more lasers with wavelengths appropriate to the fluorophores being used. The most commonly used fluorophores for microarrays are cyanine 3 and cyanine 5 (or fluors with equivalent spectra). Cyanine 3 has an absorbance maximum of 550 nm and emission maximum of 570 nm. There are 2 main lasers used in scanners to excite this fluorphore: ‘‘Gre-Ne’’ (green neon) gas lasers and Nd:YAG (neodymium doped yttrium aluminum garnet) frequency doubled solidstate diode lasers. Cyanine 5 has an absorbance maximum of 650 nm and an emission maximum of 670 nm. There are two main lasers used in scanners to excite this fluorophore: standard He–Ne gas lasers and red diode lasers. Table 1 shows some of the characteristics of these two dyes, along with two other popular dyes, Alexa 555 and Alexa 647, which have spectra that are very similar to those of Cy3 and Cy5 respectively (Fig. 5). Cyanine 3 and 5 have some important features that make these dyes particularly suitable for use in microarray analysis. The spectra of these dyes have little over lap and can generally be separated from one another with little to no cross-talk. In addition, these fluors have a somewhat unique property in that they are brighter when dry than when wet. Most fluorophores have the opposite behavior, which is impractical for microarrays because the scanners generally cannot handle wet preparations. The other major class of microarray imager is a CCD (charge coupled device) based system. In general, these imagers use a white light source to excite the fluorophores. The fluorescent light that is emitted is captured by the CCD and converted into a digital image. Rather than scanning the slide, a CCD based imager tiles together several sections of the slide to create an image of the entire surface. This tiling can create a stitching effect whereby the ‘‘seams’’ of the images may not be completely smooth. Table 1. Key Characteristics of the Most Commonly Used Fluorophores for Microarray Analysis Fluorophore Cy3 Cy5 Alexa555 Alexa647 Phycoerytherin Excitation Max, nm Emission Max, nm Molar Extinction Coefficient Molecular Weight 550 649 555 650 566 570 670 565 668 575 150,000 250,000 150,000 239,000 19,600,000 766 792 1,250 1,250 240,000 MICROARRAYS 550 600 650 wavelength (nm) excitation/absorption (–) (- -) emission/fluorescence 500 Alexa Fluor 647/Cy5 (- -) emission/fluorescence Alexa Fluor 555 Cy3 excitation/absorption (–) 370 500 600 700 800 wavelength (nm) Figure 5. Representative spectra of the fluors commonly used in spotted microarray experiments. Alexa Fluor 555 and Cy3 are excited by green wavelengths of light whereas Alexa Fluor 647 and Cy5 are excited by red wavelengths of light. One green excited and one red excited fluor may be used at the same time as there is little overlap in their excitation spectra. This problem can be overcome with advanced lighting systems and software. Affymetrix arrays use a different labeling chemistry for detection relying on the naturally occurring fluorescent protein phycoerytherin. Phycoerythrin is a naturally occurring pigment protein from light harvesting algae that absorbs strongly at 566 nm and has an emission peak at 575 nm. It is a very bright fluorophore having a molar extinction coefficient that is 80 times as high as the standard Cy3 and Cy5 molecules. The limitation of this molecule is that it is also 200 times larger, making the number of molecules that can be incorporated per sequence much less. As such, this molecule can only be applied to the DNA posthybridization for fear that it would create steric interference. MICROARRAYS AS MEDICAL DEVICES To date, microarrays have mostly found use in basic research applications, and have yet to make a strong impact on the diagnostic market. [During the preparation of this text, Roche received FDA clearance for the first ever array based diagnostic chip. The AmpliChip CYP450 based on the Affymetrix platform was approved in January of 2005 (see http://www.roche.com/med-cor-2005-01-12).] Microarrays have indeed been used to study many diseases including various cancers, cardiovascular disease, inflammatory disease, psychiatric disorders and infectious disease. This basic research will ultimately lead to the identification of potential therapeutic markers for drugs of for diagnostics. The potential of microarrays extends beyond target discovery, however, and will eventually impact on the way that medical care is performed. Target Discovery The use of microarrays in basic research laboratories has often focused on target discovery. In these applications, microarrays are used to profile a particular disease where disease tissues are compared to healthy tissues either from the same patient or from a separate test population. In such experiments, the goal is to find genes that are differentially regulated (either up or down) in the disease state compared to a healthy tissue. Such genes are thought to be involved in the disease state or in the cellular response to the disease. As such, these genes are potential diagnostic markers and may also represent drug targets. Drug/Lead Discovery Microarrays can also be used once the target has been identified. It is possible to use microarrays to screen potential therapeutic compounds, for example, to determine which candidates reverse the pattern of gene expression that is indicative of disease. Microarrays have been even more effective in looking at toxicity of lead compounds. One of the leading contributors to failure of a pharmaceutical compound is toxic or off target events. Microarrays have proven useful in screening for the up-regulation in toxicity related genes. In addition, it is possible to determine if the compound creates other effects that while not toxic per se could cause undesirable side effects from nonspecific interactions. Often toxicity models are tested in model organisms such as rats or dogs. Several toxicity specific arrays have been developed that allow for profiling of genes in these model systems rather than human cells. Diagnostics and Prognostics One of the more promising areas for microarrays to have direct impact as a medical device is in the area of diagnostics and prognostics. As mentioned under target discovery, basic research has often strived to look for a panel of genes that can be used as a molecular fingerprint of a disease. There are numerous publications in which researchers have attempted to use molecular profiles to correlate to patient outcome, disease state, tumor type, or any of several other factors. DNA 371 microarrays are particularly well suited to this type of analysis. Many complex diseases are multifactoral; rather than a single prognostic or diagnostic marker being present, it may be necessary to look at several genes at one time. Microarrays allow for identification of a panel of genes, which when looked at together may provide diagnostic or prognostic power. Although it has not become common practice yet, there are examples of microarrays being used to prescreen patients on the basis of a molecular profile (14). Other attempts are being made at using microarrays to study infectious disease. Often times a patient may present with a set of symptoms that could be indicative of several different infectious agents. It is possible to prepare a microarray that would identify the agent as well as to subtype the bacterium or virus on the basis of pathogenicity. This particular application may prove very useful in identifying not only the infectious agent, but also the best course of treatment. Pharmacogenomics and Theranostics A concept that is gaining in popularity is pharmacogenomics or theranostics (15). Both of these terms refer to the idea of tailoring a patient’s treatment or therapy on the basis of their genetic makeup. Many pharmaceuticals on the market have not known any potentially serious side effects in a subset of patients. In addition, there are typically at least some patients that are nonresponders to a particular treatment. These effects are often times the result of the patient’s genetic make-up. Most of the work in this area has focused on genotyping: looking at certain variable regions of DNA and determining which variants are present in people who have negative reactions or in people who respond well to a treatment. It is hoped that in the near future it will be possible to screen a patient and determine which of a panel of drugs will be most beneficial. Perhaps even more important, it will be possible to prevent serious negative outcomes by avoiding treatment of a patient that will have a poor reaction to a drug. Theranostics also involves monitoring a patient through a course of treatment. It is possible that a patient can be screened during treatment to ensure that the therapy is working as expected. If a change occurs, the physician would be able to alter the therapy to ensure that the disease is treated in the most effective way possible. SUMMARY Microarrays provide a means to screen hundreds to thousands of biological analytes in parallel. These analytes can be DNA, RNA, or protein. DNA microarrays allow for rapid profiling of gene expression. While there are a few competing platforms that can be utilised, the basic principles are the same: RNA from a biological sample is extracted, labeled and applied to an array of DNA probes. Signals generated from the array indicate which genes are active and which are not. The ability to screen multiple tissues or patients make microarrays particularly well suited to uncovering the complex gene networks involved in disease. While typically used in basic research applications for target or marker discovery, the future will most likely see microarrays used in diagnostic applications and for tailoring medical treatment. BIBLIOGRAPHY 1. Fodor SP, Rava RP, Huang XC, Pease AC, Holmes CP, Adams CL. Multiplexed biochemical assays with biological chips. Nature (London) 1993;364:555–556. 2. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995;270:467–470. 3. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ, Wheeler R, Wong B, Drenkow J, Yamanaka M, Patel S, Brubaker S, Tammana H, Helt G, Struhl K, Gingeras TR. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding rnas. Cell 2004;116: 499–509. 4. Hegde P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R, Hughes JE, Snesrud E, Lee N, Quackenbush J. A concise guide to cdna microarray analysis. Biotechniques 2000;29: 548–550, 552–544, 556 passim. 5. Van Gelder RN, von Zastrow ME, Yool A, Dement WC, Barchas JD, Eberwine JH. Amplified rna synthesized from limited quantities of heterogeneous CDNA. Proc Natl Acad Sci USA 1990;87:1663–1667. 6. Schena M. Microarray analysis. Hoboken: John Wiley & Sons; 2003. 7. Wang Y, Wang X, Guo SW, Ghosh S. Conditions to ensure competitive hybridization in two-color microarray: A theoretical and experimental analysis. Biotechniques 2002;32: 1342–1346. 8. Quackenbush J. Computational analysis of microarray data. Nature Rev Genet 2001;2:418. 9. Singh-Gasson S, Green RD, Yue Y, Nelson C, Blattner F, Sussman MR, Cerrina F. Maskless fabrication of lightdirected oligonucleotide microarrays using a digital micromirror array. Nat Biotechnol 1999;17:974–978. 10. Gao X, LeProust E, Zhang H, Srivannavit O, Gulari E, Yu P, Nishiguchi C, Xiang Q, Zhou X. A flexible light-directed DNA chip synthesis gated by deprotection using solution photogenerated acids. Nucleic Acids Res 2001;29:4744– 4750. 11. LeProust E, Pellois JP, Yu P, Zhang H, Gao X, Srivannavit O, Gulari E, Zhou X. Digital light-directed synthesis. A microarray platform that permits rapid reaction optimization on a combinatorial basis. J Comb Chem 2000;2:349– 354. 12. Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniants SB, Cavet G, Walker WL, West A, Coffey E, Shoemaker DD, Stoughton R, Blanchard AP, Friend SH, Linsley PS. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol 2001;19:342–347. 13. Hughes TR, Shoemaker DD. DNA microarrays for expression profiling. Curr Opin Chem Biol 2001;5:21–25. 14. Schubert CM. Microarray to be used as routine clinical screen. Nat Med 2003;9:9. 15. Picard FJ, Bergeron MG. Rapid molecular theranostics in infectious diseases. Drug Discov Today 2002;7:1092–1101. See also DNA REACTION. SEQUENCE; MICROBIOREACTORS; POLYMERASE CHAIN