ISSUES RELATED TO EXPERIMENTAL DESIGN AND NORMALIZATION RE: Mouse and Human PancChips Elisabetta Manduchi1 and Peter White2 1 Computational Biology and Informatics Laboratory, Center for Bioinformatics and 2Functional Genomics Core, Department of Genetics University of Pennsylvania, Philadelphia, PA 19104 Introduction This document informally describes some of the issues that need to be evaluated when designing a microarray experiment, with a specific focus on the human and mouse PancChips. There is no hard and fast rule for the design of any particular study. Decisions typically must be made on a case by case basis, keeping into account the concerns listed below and the questions and samples of interest. 1. Direct-Comparison vs. Reference Design When using 2-channel arrays to compare different conditions, there are different options regarding what to hybridize in each channel of each array. For example, when comparing condition A with condition B, the following are two possibilities: a. Carry out a series of direct comparisons, say n arrays, with a sample of type A in one channel and one of type B in the other channel in each array (and possible dyeswaps). b. Select a common reference sample of type C and carry out a series of hybridizations with a sample of type A in one channel and the common reference in the other and a series of hybridizations with a sample of type B in one channel and the common reference in the other. Then A and B would be compared by comparing the ratios A/C to B/C. Option (b) introduces more variability and if the only comparison of interest is between the two conditions A and B then design (a) is preferable. However, if one plans to extend a study to involve additional conditions to be compared between each other and with A and B, then option (b) would be preferable to a loop design. For additional background on this issue and pointers to further references, the reader is referred to [1]. In the case of a design using a common reference, care must be taken as to what to use for such a reference. To minimize variability, the reference should come from the same pool of RNA, rather than represent different biological replicates of the same type. The issue arises on whether or not this should be a pool of RNA prior to labeling, so that the labeling of the reference is done separately for each hybridization, or whether one should start with the same pool of labeled RNA. If the experiments are done on the same day, the second option would be preferable, as it -1- would introduce less variability. However if the hybridizations are done over the course of several days, a concern is that the labeled RNA might not be stable for an extended period of time. Other important considerations regarding choosing a common reference include: i. Good “expression coverage”: by this we mean that it would be useful to have a reference for which a high portion of the genes represented on the chip are expressed, to avoid spots with zero denominators in the A/C and B/C ratios, which would force discarding those spots from the analyses. ii. If there are no suitable controls to be used for normalization and if one is forced to use all spots to compute the normalization function, then the reference should be chosen to be such that the hypothesis for the applicability of such a normalization are satisfied. If an intensitydependent normalization is used, these hypotheses should be satisfied at all intensities. For example, in the case of global lowess normalization, the underlying assumption is that that changes over the spots represented on the chip are roughly symmetric in the two samples (i.e. the sample of interest and the reference sample) at all intensities or that few genes change. If the method used is print-tip lowess normalization, such hypothesis should be satisfied for the set of spots of every print-tip group. One reference whose use we have considered is Stratagene’s Universal Reference RNA. This RNA is pooled total RNA derived from a number of different cell lines and as such offers broad gene coverage on many microarrays. Thus use of this as a common reference fulfills consideration (i), but raises significant issues in terms of consideration (ii). When carrying out a study with the PancChip using samples of pancreatic origin, there will clearly be significant differences in terms of gene expression if these samples are compared with Universal Reference RNA. This situation would require an alternative method of normalization, utilizing appropriate controls rather than all genes on the chip (see section 3) and/or utilizing dye-swap experimental designs. 2. Replication issues. When comparing conditions, it is crucial to get a sense of the variability within each condition, especially in studies that look for differentially expressed genes. Typically, if the question is comparison between two populations A and B, it is important to get a sense of the biological variability, besides the technical variability. If the kind of replication performed is only technical, for example n hybridizations involving the same mouse of type A versus n hybridizations involving the same mouse of type B, then one would be able to make inferences about differences between these two particular mice, but not necessarily between the populations themselves. If one wants to make inferences about the populations then true biological replicates should be used, e.g. different mice of each population. Needless to say the number of replicates per conditions should be sufficiently large. For studies on the PancChip we recommend a minimum of five biological replicates. The use of only 2 or 3 replicates is simply too small and will seriously limit the statistical analyses that can be done to this end. -2- When possible, it is also usually a good idea to have a dye-swap technical replicates for each of the biological replicates. This can expand the range of normalization methods applicable to the assays at hand. 3. Normalization. In the microarray jargon, “normalization” indicates the attempt to identify and remove systematic sources of variation in the measured intensities due to separate reverse transcription and labeling, different scanning parameters, print-tip differences, spatial effects, different dye labeling efficiency, the quality of the microarray printing, the quality of the mRNA used to synthesize cDNA, etc. Normalization is necessary in order to put the data on equal footing before making intensity comparisons within or between slides. Reference [2] discusses some normalization method that can be used with 2-channel data, in particular normalization of the M values where M=log2(Cy5)-log2(Cy3) for each given spot. These methods include: normalization by a global constant, intensity-dependent lowess-normalization, print-tip lowess normalization etc. Normalization involves two choices: (i) (ii) the choice of which normalization function to use (e.g. lowess curves, etc.) to normalize the values at each spot; the choice of the spots to be used in order to compute such a normalization function. The latter could be all genes on the array or a suitable subset of these genes, such as a set of appropriate controls. Whether or not a given method (with choices as in (i) and (ii)) is applicable, will depend on whether or not the samples hybridized to the two channels satisfy certain assumptions relative to the spots used in (ii). For example, if all spots on the array are used in (ii) to build a global lowess curve to be used to normalize the M values, then the assumption is that changes over the spots represented on the chip are roughly symmetric in the two samples at all intensities or few genes change. If the method used in (i) is print-tip lowess normalization, such hypothesis should be satisfied for the set of spots of every print-tip group. Thus, if one hybridizes to a PancChip two samples like pancreas and brain (or even Universal Reference RNA), these hypotheses are unlikely satisfied. Even for pancreas and liver there is some concern. One solution would be to use for (ii) not all genes on the array, but a suitable subset (appropriate control genes) for which such hypotheses would be satisfied. One such set of controls are MultiSample Pool (MSP) controls, also suggested in [2]. However there are some technical difficulties and concerns regarding inserting these on the PancChip. Another alternative set of controls are spiked controls. These have been used by some researchers to this end. We have experimented with the use of the Stratagene’s SpotReport®-10 array validation system. In an attempt to evaluate the concerns that arise in the evaluation of microarray hybridization data, the SpotReport system provides positive and negative controls that are printed onto the PancChip along with our set of test genes. The kit also provides 10 exogenous Arabidopsis thaliana mRNA Spikes that can be added to the labeling reaction along with the experimental RNA. Our primary use of these Spikes has been to provide our labeling reactions with an internal quality control, and allow us to rapidly identify problems in terms of -3- mRNA quality along with any potential issues that may have arisen due to an error in the labeling experimental procedure. When comparing samples for which normalization utilizing all genes on the arrays in step (ii) above would be inappropriate, these mRNA spikes may provide an alternative set for normalization. By spiking in each of the 10 mRNAs at varying concentrations we see that it is possible to determine the expected dye ratios and to normalize the signal intensities due to the differences in dye incorporation and quantum yield. To achieve a complete coverage of the dynamic range of intensities we typically spike in a doubling range from 2 pg to 2000 pg utilizing all ten mRNA spikes. The expression data for the genes of interest can then be normalized in an intensity dependent manner based on the expression values for the A. thaliana spikes. Our main concern with this type of control has been the observation that they are not useful in situations when the quality/quantity of the RNA in the two samples of interest is different. As such, use of these Spikes for normalization should only occur when the samples being compared are not closely matched, and after taking due care to ensure that the samples are of same purity and quantity using an Agilent Bioanalyzer. One more option to keep in mind is to employ pairs of technical replicates done in dye-swap and utilize the paired-slides normalization described in [3]. Because of the assumptions underlying this normalization method (see [3]), it is best to control image acquisition settings (e.g. maintain the same settings for Cy3 over the replicates and the same settings for Cy5). In the case in which no other normalization method is applicable as its assumptions are not satisfied, paired-slides normalization might be applicable and might offer a valuable alternative. Moreover, even when other normalization methods are applicable, they might be combined with paired-slides normalization, if the latter is applicable too. 4. Conclusion. It is not possible to produce a set of hard and fast rules, or a simple recipe, for microarray design and analysis because there are factors which are heavily influenced by the fundamental questions that the researcher hopes to be answered. However, it is our hope that the recommendations, issues and discussion contained in this manuscript will help guide the potential user of the PancChip in making wise and appropriate choices during the microarray process, from initial experimental design to final data analysis. REFERENCES 1. Yang Y.H., and Speed T. (2002), Nature Reviews Genetics (3): 579-588. 2. Yang Y.H, Dudoit S., Luu P., Lin D.M., Peng V., Ngai J., Speed T.P. (2002), Nucleic Acids Res 30: e15, 2002. 3. Yang Y.H, Dudoit S., Luu P., Speed T.P. [http://www.stat.berkeley.edu/users/terry/zarray/TechReport/589.pdf] -4-