Re-interpreting DNA Microarray Data

advertisement
RE-INTERPRETING DNA
MICROARRAY DATA
Sungchul Ji, Ph.D.
Department of Pharmacology and Toxicology
Rutgers University
Piscataway, N.J. 08855
sji@rci.rutgers.edu
The Cell as the Smallest DNA-Based Molecular Computer
(S. Ji, BioSystems 52, 123-133, 1999)
Cells
2
1
Brains
4
Computers
3
DNA
5
Ontogeny = 1, 2
Epistemology = 3, 4, and
5
The Complementary (+/-) Relations among the various DNA
and RNA molecules involved in Microarray Experiments
RP
RT
H
(+) DNA ------ > (-) mRNA ------ > (+) DNA ------- > (-) DNA.
|
| DNA
| Polymerase
RP = RNA polymerase
|
or
RT = Reverse transcriptase
| Synthesizer
H = Hybridizes to; no enzymes needed
|
\/
(-) DNA
(Used to fabricate DNA microarrays)
DNA Microarrays
•
There are two kinds of DNA microarrays
– cDNA or EST microarray and the Gene
Chips. cDNA/EST microarrays are
discussed first.
•
One microarray can measure 104 mRNA
levels simultaneously
•
mRNA levels in the cell are determined
by mRNA synthesis (Vsyn) and mRNA
hydrolysis (Vhyd), because the rate of
change in mRNA levels inside the cell is
always:
dR/dt = Vsyn - Vhyd
•
Only when certain kinetic conditions are
met (to be discussed later) can the DNA
microarray technique measure rates of
gene expression [1].
•
Each square can recognize one kind of
mRNA molecules.
How to Make Oligonucleotide Gene Chips [1]
Preparation of Oligonucleotide Arrays (or Gene Chips) [2]
1)
“Gene chips” is a phrase coined by a commercial organization Affymetric, Inc. in California.
Unlike DNA microarrays that are produced by individual scientists and by biotechnology
companies, gene chips are an exclusive product of Affymetric.
2)
The technical base for producing gene chips originated from the computer chip industry and DNA
synthesis lab. The key steps involved in producing gene chips are schematically shown in the
next slide:

The surface of the glass base is treated with chemicals so that it can bind single nucleotides
(A, T, G, or C).

In a step-wise process, each small area of the chip (also 10,000 squares each 100 micron x
100 micron) serves as the site for the synthesis of short cDNA fragments (oligonucleotides
on the order of 10 to 20 nucleotides each), one nucleotide at a time.

A first mask covers most the glass surface of the chip except for all sites which are destined
to contain an oligonucleotide which begins with A and these exposed sites are then coupled
to adenosine (A). The next mask (after some chemistry) allows other sites to couple C; a
third mask allows the coupling of T, and a final mask allows the coupling of G. The
combination of all four masks completes the attachment of the first nucleotide to the glass
surface.

A second nucleotide is then added to each site with four other masks.

Thus, for a chip bearing 10 base-long oligonucleotides, it is necessary to use 40 masks in all
for directing each nucleotide to its proper area.
3)
Affymetric can synthesize 10,000 to 1,000,000 different oligonucleotides on each chip!
How DNA Microarray Experiments are Done
1
1.
2.
3
2
3.
4.
4
5.
5
6.
6
5
Isolate mRNA from broken cells.
Synthesize fluorescently labeled cDNA
from mRNA using reverse transcriptase and
fluorescent nucleotides.
Prepare a microarray either with EST or
oligonucleotides (synthesized right on the
microarray surface by Affimetric,Inc.).
Pour the fluorescently labeled cDNA
preparations over the microarray surface to
effect hybridization. Wash off excess
debris.
Measure fluorescently labeled cDNA using
a computer-assisted microscope.
The final result is a table of numbers, each
number registering the fluorescent intensity
which is in turn proportional to the
concentration of cDNA (and ultimately
mRNA) located at row x and column y, row
indicating the identity of genes, and y the
conditions under which the mRNA levels
are measured.
Covalent and Noncovalent Interactions in Microarray Experiments
1)
2)
3)
CTAATGT
(Original DNA)
1
2
4)
5)
3
3
6)
Transcription inside the cell
Reverse transcription inside the test
tube
Hybridization on the microarray
surface
Probably millions of cDNA molecules
are adsorbed on each square on a DNA
microarray.
If mRNA formed inside the cell is
stable, then the amount of mRNA
formed during Step 1 can be estimated
from the amount of cDNA bound to
microarray surface in Step 3.
But mRNA molecules inside the cell
are unstable, because they are rapidly
hydrolyzed into ribonucleotides by
various ribonucleases. Therefore, it is
impossible to estimate how many
mRNA molecules are formed in Step 1
by measuring how many molecules of
cDNA are bound to microarray
surface in Step 3 (more on this later).
Output
Input
10
9
Gradients
5
6
4
Proteins
Amino
Acids
3
7
2
RNA
Ribonucleotides
1
8
Genes
Figure 1. The molecular model of the cell known as the Bhopalator
[S. Ji, J. theoret. Biol. 116:399-426 (1985)].
Cluster Analysis

The changes in mRNA levels of human
fibroblasts (cells of connective tissues that
synthesize and secrete fibrillar procollagen,
fibronectin, and collagenase) measured with
DNA microarrays over a time period of 24
hours.

Green represents a decrease in mRNA levels
(or “genes”, which term is strictly speaking
incorrect; see below), black no change, and
red an increase.

Each mRNA molecule is represented by a
single row of colored boxes, and each
measuring time point is represented by a
single column.

Notice that the mRNA molecules belonging
to cluster A started to decrease around 8
hours after beginning experiment.

The mRNA molecules belonging to cluster E
began to increase at around 5 hours after the
beginning of the experiment.
How to Interpret DNA Microarray Data (I)

What we measure with DNA microarrays are changes in florescence intensities.

The changes in fluorescence intensities can be divided into two categories – artifactual and non-artifactual. The
present state of the development of the microarray technique is such that artifactual fluorescence intensity
changes account for about 50%! This is why it is a common practice to use the notion of “fold changes”
referring to fluorescence intensity changes that are greater than 100% (which would be one-fold change).

Only the non-artifactual fluorescence intensities can be related to mRNA levels.

mRNA levels measured with DNA microarrays can be divided into two categories – steady state and nonsteady state. The difference between these two categories of mRNA levels can be represented mathematically
as follows, where R is a mRNA level and t is time:
Steady state :
dR/dt = 0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1)
Non-steady state: dR/dt is not zero.

The steady-state mRNA levels divide into two categories – dynamic and equilibrium.
The intracellular levels of mRNA molecules are always determined by two terms – the source term (i.e., the
rate of mRNA synthesis, denoted by V_syn) and the sink term (i.e., the rate of mRNA hydrolysis into smaller
fragments, denoted as V_hyd):
dR/dt = V_syn – V_hyd
. . . . . . . . . . . . . . . . . . . . . . . . . . . . (2)
There are two ways of making Eq. (2) = 0; when V_syn and V_hyd are equal, and when V_syn and V_hyd are
both zero’s:
Dynamic steady state:
V_syn = V_hyd
Equilibrium steady state: V_syn = V_hyd = 0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . (3)
. . . . . . . . . . . . . . . . . . . . . . . . . (4)
How to Interpret DNA Microarray Data (II)

The non-steady state mRNA levels divide into two categories:
On-the-way-up:
dR/dt > 0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . (5)
On-the-way-down: dR/dt < 0
. . .. . . . . . . . . . . . . . . . . . . . . . . . . (6)

It is probably safe to assume that V_syn is always independent of R (i.e., gene expression is turned
on or off by factors other than intracellular levels of corresponding mRNA levels). But V_hyd
may often depend on R, leading to the conclusion that there are at least two categories of dynmaic
steady states:
Zero-order dynamic steady state: V_hyd = k (R)0 = k . . . . . . . . . . . . . . . . . . (7)
First-order dynamic steady state: V_hyd = k (R)1 = kR . . . . . . . . . . . . . . . . . (8)

These results can be summarized as follows:

Combining Equations (3) and (8) leads to the following useful relation:
V_syn = V_hyd = kR
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (9)
Equation (9) states that, under the conditions of the first-order dynamic steady state, the mRNA
levels, R, measured with DNA microarrays are directly proportional to the rates of expression of
their corresponding genes, V_syn or rates of transcript degradation.

An important corollary of Equation (9) is that, under all other conditions, there is no direct
proportionality relation between mRNA levels and the rates of expression of their corresponding
genes.
Six Categories of Microarray Fluorescence Intensities
(Only non-artifactual fluorescence intensities, i.e., A through D, measure mRNA levels)
Microarray
Fluorescence
Intensity
Non-Artifactual
Steady State
Dynamic
Steady State
(A)
V_hyd = k
Zero-Order
(A_0)
Equilibrium
Steady State
V_syn = V_hyd = 0
(B)
V_hyd = kR
1st Order
(A_1)
Artifactual
(E)
Non-Steady State
On-the-Way-Up
V_syn - V_hyd > 0
(C)
On-the-Way-Down
V_syn - V_hyd < 0
(D)
a
YBL091C-A
b
160
120
1
30
1
80
TR
TR
YNL162W
40
6
6
20
40
10
0
-40 0
5
10
15
20
0
0
TL
c
YLR084C
d
2
TL
200
300
YHR029C
25
6
1.5
1
20
15
1
TR
TR
100
0.5
10
1
5
0
6
0
0
20
40
TL
60
80
-5 0
10
20
TL
30
40
Table 1. The frequency distributions of the 8 modules of RNA metabolism (defined in the legend to Figure 1) as the
functions of the 5 time periods following the glucose- galactose shift. If the angles are homogeneously distributed
over 360°, the expected distributions can be calculated as shown in the 7th row. The p-values for the difference
between the observed and the expected distributions are given in the last row. The differences are all significant,
except for Mechanism 5.
Mech
Segments
1
2
3
4
5
6
7
8
Total
1
0
142
234
3470
96
1732
12
39
5725
2
14
18
3
37
5
3729
617
1302
5725
3
340
1914
52
638
314
1471
28
968
5725
4
477
4237
21
151
61
143
19
616
5725
5
12
1151
238
4213
38
56
4
13
5725
Total,
Observed
(%)
843
(2.94)
7462
(26.07)
548
(1.91)
8509
(29.73)
514
(1.80)
7131
(24.91)
680
(2.38)
2938
(10.26)
28625
Total,
Expected
(%)
477
(1.67)
6678
(23.33)
477
(1.67)
6678
(23.33)
477
(1.67)
6678
(23.33)
477
(1.67)
6678
(23.33)
28625
0
0
0.0011
0
0.0919
0.0000
0
0
p-value
b
Average m RNA Levels
= Glycolysis;
= Oxphos
a
Tim e - v_S plots
= Glycolysis;
= Oxphos
1
v_S, molecules/cell/min
mRNA, molecules/cell
50
40
30
20
10
0.8
0.6
0.4
0.2
0
0
0
200
400
600
800
-200
1000
0
200
Tim e, m in
400
Tim e, m in
Degradation/Transacription (D/T) Ratios vs Tim e
= Glycolysis;
= Oxphos
c
3.5
3.0
D/T Ratios
-200
2.5
2.0
1.5
1.0
0.5
0.0
0
200
400
Tim e, m in
600
800
600
800
1000
Table 2. The five mechanisms (or modules) of the changes (Δ) in transcript abundances
(X) in cells due to transcript synthesis (ΔXS) and transcript degradation (ΔXD) :
ΔX = ΔXS – ΔXD
ΔX
+
ΔXS
ΔXD
ΔXS /ΔXD
(or Mechanism)
+
ΔXD < ΔXS
< 1
A
0
ΔXD < 0
(Physically
(Physically Not Allowed)
Not Allowed)
ΔXD = ΔXS
= 1
+
0
-
Module
B
(Mathematically
0
ΔXD = 0
+
ΔXD > ΔXS
> 1
D
0
ΔXD > 0
0
E
Not Allowed)
C
References:
[1] Ji, S. (2004). Molecular Information Theory: Solving the Mysteries of DNA. In:
Modeling in Molecular Biology (G. Ciobanu, ed.), Elsevier (in press).
[2] Watson, S. J., and Akil, U. (1999). Gene Chips and Arrays Revealed: A Primer on
Their Power and Their Uses. Biol. Psychiatry 45:533-543.
Download