RE-INTERPRETING DNA MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu The Cell as the Smallest DNA-Based Molecular Computer (S. Ji, BioSystems 52, 123-133, 1999) Cells 2 1 Brains 4 Computers 3 DNA 5 Ontogeny = 1, 2 Epistemology = 3, 4, and 5 The Complementary (+/-) Relations among the various DNA and RNA molecules involved in Microarray Experiments RP RT H (+) DNA ------ > (-) mRNA ------ > (+) DNA ------- > (-) DNA. | | DNA | Polymerase RP = RNA polymerase | or RT = Reverse transcriptase | Synthesizer H = Hybridizes to; no enzymes needed | \/ (-) DNA (Used to fabricate DNA microarrays) DNA Microarrays • There are two kinds of DNA microarrays – cDNA or EST microarray and the Gene Chips. cDNA/EST microarrays are discussed first. • One microarray can measure 104 mRNA levels simultaneously • mRNA levels in the cell are determined by mRNA synthesis (Vsyn) and mRNA hydrolysis (Vhyd), because the rate of change in mRNA levels inside the cell is always: dR/dt = Vsyn - Vhyd • Only when certain kinetic conditions are met (to be discussed later) can the DNA microarray technique measure rates of gene expression [1]. • Each square can recognize one kind of mRNA molecules. How to Make Oligonucleotide Gene Chips [1] Preparation of Oligonucleotide Arrays (or Gene Chips) [2] 1) “Gene chips” is a phrase coined by a commercial organization Affymetric, Inc. in California. Unlike DNA microarrays that are produced by individual scientists and by biotechnology companies, gene chips are an exclusive product of Affymetric. 2) The technical base for producing gene chips originated from the computer chip industry and DNA synthesis lab. The key steps involved in producing gene chips are schematically shown in the next slide: The surface of the glass base is treated with chemicals so that it can bind single nucleotides (A, T, G, or C). In a step-wise process, each small area of the chip (also 10,000 squares each 100 micron x 100 micron) serves as the site for the synthesis of short cDNA fragments (oligonucleotides on the order of 10 to 20 nucleotides each), one nucleotide at a time. A first mask covers most the glass surface of the chip except for all sites which are destined to contain an oligonucleotide which begins with A and these exposed sites are then coupled to adenosine (A). The next mask (after some chemistry) allows other sites to couple C; a third mask allows the coupling of T, and a final mask allows the coupling of G. The combination of all four masks completes the attachment of the first nucleotide to the glass surface. A second nucleotide is then added to each site with four other masks. Thus, for a chip bearing 10 base-long oligonucleotides, it is necessary to use 40 masks in all for directing each nucleotide to its proper area. 3) Affymetric can synthesize 10,000 to 1,000,000 different oligonucleotides on each chip! How DNA Microarray Experiments are Done 1 1. 2. 3 2 3. 4. 4 5. 5 6. 6 5 Isolate mRNA from broken cells. Synthesize fluorescently labeled cDNA from mRNA using reverse transcriptase and fluorescent nucleotides. Prepare a microarray either with EST or oligonucleotides (synthesized right on the microarray surface by Affimetric,Inc.). Pour the fluorescently labeled cDNA preparations over the microarray surface to effect hybridization. Wash off excess debris. Measure fluorescently labeled cDNA using a computer-assisted microscope. The final result is a table of numbers, each number registering the fluorescent intensity which is in turn proportional to the concentration of cDNA (and ultimately mRNA) located at row x and column y, row indicating the identity of genes, and y the conditions under which the mRNA levels are measured. Covalent and Noncovalent Interactions in Microarray Experiments 1) 2) 3) CTAATGT (Original DNA) 1 2 4) 5) 3 3 6) Transcription inside the cell Reverse transcription inside the test tube Hybridization on the microarray surface Probably millions of cDNA molecules are adsorbed on each square on a DNA microarray. If mRNA formed inside the cell is stable, then the amount of mRNA formed during Step 1 can be estimated from the amount of cDNA bound to microarray surface in Step 3. But mRNA molecules inside the cell are unstable, because they are rapidly hydrolyzed into ribonucleotides by various ribonucleases. Therefore, it is impossible to estimate how many mRNA molecules are formed in Step 1 by measuring how many molecules of cDNA are bound to microarray surface in Step 3 (more on this later). Output Input 10 9 Gradients 5 6 4 Proteins Amino Acids 3 7 2 RNA Ribonucleotides 1 8 Genes Figure 1. The molecular model of the cell known as the Bhopalator [S. Ji, J. theoret. Biol. 116:399-426 (1985)]. Cluster Analysis The changes in mRNA levels of human fibroblasts (cells of connective tissues that synthesize and secrete fibrillar procollagen, fibronectin, and collagenase) measured with DNA microarrays over a time period of 24 hours. Green represents a decrease in mRNA levels (or “genes”, which term is strictly speaking incorrect; see below), black no change, and red an increase. Each mRNA molecule is represented by a single row of colored boxes, and each measuring time point is represented by a single column. Notice that the mRNA molecules belonging to cluster A started to decrease around 8 hours after beginning experiment. The mRNA molecules belonging to cluster E began to increase at around 5 hours after the beginning of the experiment. How to Interpret DNA Microarray Data (I) What we measure with DNA microarrays are changes in florescence intensities. The changes in fluorescence intensities can be divided into two categories – artifactual and non-artifactual. The present state of the development of the microarray technique is such that artifactual fluorescence intensity changes account for about 50%! This is why it is a common practice to use the notion of “fold changes” referring to fluorescence intensity changes that are greater than 100% (which would be one-fold change). Only the non-artifactual fluorescence intensities can be related to mRNA levels. mRNA levels measured with DNA microarrays can be divided into two categories – steady state and nonsteady state. The difference between these two categories of mRNA levels can be represented mathematically as follows, where R is a mRNA level and t is time: Steady state : dR/dt = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1) Non-steady state: dR/dt is not zero. The steady-state mRNA levels divide into two categories – dynamic and equilibrium. The intracellular levels of mRNA molecules are always determined by two terms – the source term (i.e., the rate of mRNA synthesis, denoted by V_syn) and the sink term (i.e., the rate of mRNA hydrolysis into smaller fragments, denoted as V_hyd): dR/dt = V_syn – V_hyd . . . . . . . . . . . . . . . . . . . . . . . . . . . . (2) There are two ways of making Eq. (2) = 0; when V_syn and V_hyd are equal, and when V_syn and V_hyd are both zero’s: Dynamic steady state: V_syn = V_hyd Equilibrium steady state: V_syn = V_hyd = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . (3) . . . . . . . . . . . . . . . . . . . . . . . . . (4) How to Interpret DNA Microarray Data (II) The non-steady state mRNA levels divide into two categories: On-the-way-up: dR/dt > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . (5) On-the-way-down: dR/dt < 0 . . .. . . . . . . . . . . . . . . . . . . . . . . . . (6) It is probably safe to assume that V_syn is always independent of R (i.e., gene expression is turned on or off by factors other than intracellular levels of corresponding mRNA levels). But V_hyd may often depend on R, leading to the conclusion that there are at least two categories of dynmaic steady states: Zero-order dynamic steady state: V_hyd = k (R)0 = k . . . . . . . . . . . . . . . . . . (7) First-order dynamic steady state: V_hyd = k (R)1 = kR . . . . . . . . . . . . . . . . . (8) These results can be summarized as follows: Combining Equations (3) and (8) leads to the following useful relation: V_syn = V_hyd = kR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (9) Equation (9) states that, under the conditions of the first-order dynamic steady state, the mRNA levels, R, measured with DNA microarrays are directly proportional to the rates of expression of their corresponding genes, V_syn or rates of transcript degradation. An important corollary of Equation (9) is that, under all other conditions, there is no direct proportionality relation between mRNA levels and the rates of expression of their corresponding genes. Six Categories of Microarray Fluorescence Intensities (Only non-artifactual fluorescence intensities, i.e., A through D, measure mRNA levels) Microarray Fluorescence Intensity Non-Artifactual Steady State Dynamic Steady State (A) V_hyd = k Zero-Order (A_0) Equilibrium Steady State V_syn = V_hyd = 0 (B) V_hyd = kR 1st Order (A_1) Artifactual (E) Non-Steady State On-the-Way-Up V_syn - V_hyd > 0 (C) On-the-Way-Down V_syn - V_hyd < 0 (D) a YBL091C-A b 160 120 1 30 1 80 TR TR YNL162W 40 6 6 20 40 10 0 -40 0 5 10 15 20 0 0 TL c YLR084C d 2 TL 200 300 YHR029C 25 6 1.5 1 20 15 1 TR TR 100 0.5 10 1 5 0 6 0 0 20 40 TL 60 80 -5 0 10 20 TL 30 40 Table 1. The frequency distributions of the 8 modules of RNA metabolism (defined in the legend to Figure 1) as the functions of the 5 time periods following the glucose- galactose shift. If the angles are homogeneously distributed over 360°, the expected distributions can be calculated as shown in the 7th row. The p-values for the difference between the observed and the expected distributions are given in the last row. The differences are all significant, except for Mechanism 5. Mech Segments 1 2 3 4 5 6 7 8 Total 1 0 142 234 3470 96 1732 12 39 5725 2 14 18 3 37 5 3729 617 1302 5725 3 340 1914 52 638 314 1471 28 968 5725 4 477 4237 21 151 61 143 19 616 5725 5 12 1151 238 4213 38 56 4 13 5725 Total, Observed (%) 843 (2.94) 7462 (26.07) 548 (1.91) 8509 (29.73) 514 (1.80) 7131 (24.91) 680 (2.38) 2938 (10.26) 28625 Total, Expected (%) 477 (1.67) 6678 (23.33) 477 (1.67) 6678 (23.33) 477 (1.67) 6678 (23.33) 477 (1.67) 6678 (23.33) 28625 0 0 0.0011 0 0.0919 0.0000 0 0 p-value b Average m RNA Levels = Glycolysis; = Oxphos a Tim e - v_S plots = Glycolysis; = Oxphos 1 v_S, molecules/cell/min mRNA, molecules/cell 50 40 30 20 10 0.8 0.6 0.4 0.2 0 0 0 200 400 600 800 -200 1000 0 200 Tim e, m in 400 Tim e, m in Degradation/Transacription (D/T) Ratios vs Tim e = Glycolysis; = Oxphos c 3.5 3.0 D/T Ratios -200 2.5 2.0 1.5 1.0 0.5 0.0 0 200 400 Tim e, m in 600 800 600 800 1000 Table 2. The five mechanisms (or modules) of the changes (Δ) in transcript abundances (X) in cells due to transcript synthesis (ΔXS) and transcript degradation (ΔXD) : ΔX = ΔXS – ΔXD ΔX + ΔXS ΔXD ΔXS /ΔXD (or Mechanism) + ΔXD < ΔXS < 1 A 0 ΔXD < 0 (Physically (Physically Not Allowed) Not Allowed) ΔXD = ΔXS = 1 + 0 - Module B (Mathematically 0 ΔXD = 0 + ΔXD > ΔXS > 1 D 0 ΔXD > 0 0 E Not Allowed) C References: [1] Ji, S. (2004). Molecular Information Theory: Solving the Mysteries of DNA. In: Modeling in Molecular Biology (G. Ciobanu, ed.), Elsevier (in press). [2] Watson, S. J., and Akil, U. (1999). Gene Chips and Arrays Revealed: A Primer on Their Power and Their Uses. Biol. Psychiatry 45:533-543.