Talk4.long.series

advertisement
Analysis of time-course
gene expression data
Shyamal D. Peddada
Biostatistics Branch
National Inst. Environmental
Health Sciences (NIH)
Research Triangle Park, NC
Outline of the talk
Some objectives for performing “long series”
time-course experiments

Single cell-cycle experiment
A.
–
–
–
–
A nonlinear regression model
Phase angle of a cell cycle gene
Inference
Open research problems
Multiple cell-cycle experiments
B.
–
–
–
“Coherence” between multiple cell-cycle experiments
Illustration
Open research problems
Objectives
Some genes play an important role during the cell
division cycle process. They are known as “cellcycle genes”.
Objectives: Investigate various characteristics of
cell-cycle and/or circadian genes such as:
– Amplitude of initial expression
– Period
– Phase angle of expression (angle of maximum
expression for a cell cycle gene)
Phases in cell division cycle
A brief description
• G1 phase:
"GAP 1". For many cells, this phase is the
major period of cell growth during its lifespan.
• S ("Synthesis”) phase:
DNA replication occurs.
A brief description
•
G2 phase:
"GAP 2“: Cells prepare for M phase. The G2
checkpoint prevents cells from entering mitosis
when DNA was damaged since the last division,
providing an opportunity for DNA repair and
stopping the proliferation of damaged cells.
•
M (“Mitosis”) phase:
Nuclear (chromosomes separate) and
cytoplasmic (cytokinesis) division occur. Mitosis is
further divided into 4 phases.
Single, long series experiment …
Whitfield et al.
(Molecular Biology of the Cell, 2002)
Basic design is as follows:

Experimental units: Human cancer cells (HeLa)

Microarray platform: cDNA chips used with
approx 43000 probes (i.e. roughly 29000 genes)

3 different patterns of time points (i.e. 3
different experiments)
One of the goals of these experiments was to
identify periodically expressed genes.
Whitfield et al.
(Molecular Biology of the Cell, 2002)
Experiment 1: (26 time points)
Hela cancer cells arrested in the S-phase using
double thymidine block.
Sampling times after arrest (hrs):

–
0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 18 20 22
24 26 28 32 36 40 44.
Whitfield et al. (2002)
Experiment 2: (47 time points)
Hela cancer cells arrested in the S-phase using
double thymidine block.

Sampling times after arrest (hrs):
– every hour between 0 and 46.
Whitfield et al. (2002)
Experiment 3: (19 time points)
Hela cancer cells arrested arrested in the Mphase using thymidine and then by nocodazole.

Sampling times after arrest (hrs):
– 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
34 36.
Whitfield et al. (2002)
Phase marker genes:
Cell Cycle Phase
------------------
Genes
-------
G1/S
CCNE1, CDC6, PCNA,E2F1
S
RFC4, RRM2
G2
CDC2, TOP2A, CCNA2, CCNF
G2/M
STK15, CCNB1, PLK, BUB1
M/G1
VEGFC, PTTG1, CDKN3, RAD21
Questions

Can we describe the gene expression of a cellcycle gene as a function of time?

Can we determine the phase angle for a given cellcycle gene? i.e. can we quantify the previous table
in terms of angles on a circle?

What is the period of expression for a given gene?

Can we test the hypothesis that all cell-cycle
genes share the same time period?
Etc.

Profile of PCNA based on
experiment 2 data
Some important observations
1.
Gene expression has a sinusoidal shape
2.
Gene expression for a given gene is an average value of
mRNA levels across a large number of cells
3.
Duration of cell cycle varies stochastically across cells
4.
Initially cells are synchronized but over time they fall out
of synchrony
5.
Gene expression of a cell-cycle gene is expected to
“decrease/decay” over time. This is because of items 2
and 4 listed above!
Random Periods Model
(PNAS, 2004)
K
f (t )  a  bt 
2
•
•
•
•
•
a and b:
K:
T:
:
:

 z2 


2t
Cos T exp( z)    exp  2 dz
background drift parameters
the initial amplitude
the average period
the attenuation parameter
the phase angle
Fitted curves for some phase marker genes
Whitfield et al. (2002)
Phase marker genes:
Phase
--------
Genes
-------
G1/S
CCNE1, CDC6, PCNA,E2F1
5.83
Phase angles (radians)
-----------------------0.56, 5.96, 5.87,
S
RFC4, RRM2
5.47, 5.36
G2
CDC2, TOP2A, CCNA2, CCNF
4.24, 3.74, 3.55, 3.25
G2/M STK15, CCNB1, PLK, BUB1
2.51
M/G1
VEGFC, PTTG1, CDKN3, RAD21
3.06, 2.67, 2.61,
2.66, 2.40, 2.25, 1.81
A hypothesis of biological interest
Do all cell cycle genes have same T and same 
but the other 4 parameters are gene specific?
i.e.
H 0 : Tg  T ,
g 
for all genes g
An Important Feature
 Correlated data
– Temporal correlation within gene
– Gene-to-gene correlations
Test Statistic
 Wald statistic for heteroscedastic linear and nonlinear models
– Zhang, Peddada and Rogol (2000)
– Shao (1992)
– Wu (1986)
The Null Distribution
 Due to the underlying correlation structure
– Asymptotic 
appropriate.
2
approximation is not
– Use moving-blocks bootstrap technique on the
residuals of the nonlinear model.
Kunsch (1989)
Moving-blocks Bootstrap

Step 1: Fit the null model to the data and compute
the residuals.

Step 2: Draw a simple random sample (with
replacement) from all possible blocks , of a
specific size, of consecutive residuals.
Moving-blocks Bootstrap

Step 3: Add these residuals to the fitted curve
under the null hypothesis to obtain the bootstrap
data set

Step 4: Using the bootstrap data fit the model
under the alternate hypothesis and compute the
Wald statistic.
Moving-blocks Bootstrap

Step 5: Repeat the above steps a large number of
times.

Step 6: The bootstrap p-value is the proportion of
the above Wald statistics that exceed the Wald
statistic determined from the actual data.
Analysis of experiment 2

The bootstrap p-value for testing
H 0 : Tg  T ,
g 
using Experiment 2 data of Whitfield et al. (2002)
is 0.12.
Thus our model is biologically plausible.
Multiple experiments
Statistical inferences on the phase angle

Some questions of interest

How to evaluate or combine results from multiple
cell division cycle experiments?
– Are the results “consistent” across
experiments?


How to evaluate this?
What could be a possible criterion?
Data
ˆg ,i :
RPM estimate of phase angle of a cell-cycle gene ‘g’
from the
i th
experiment.
Representation using a circle
Consider 4 cell cycle genes A, B, C, D. The vertical
line in the circle denotes the reference line. The
angles are measured in a counter-clockwise.
Thus the sequential order
of expression in this
example is A, B, D, C.
A
B
D
C
“Coherence” in multiple cell-cycle
experiments

A group of cell cycle genes are said to be coherent
across experiments if their sequential order of
the phase angles is preserved across experiments.
A
B
D
Exp 2
B
D
D
C
A
C
C
Exp 3
Exp 1
A
B
Geometric Representation
 We shall represent phase angles from multiple cell
cycle experiments using concentric circles.
 Each circle represents an experiment.
 Same gene from a pair of experiments is
connected by a line segment.
– A figure with non-intersecting lines indicates perfect
coherence.
– If there is no coherence at all then there will be many
intersecting lines.
Example: Perfectly Coherent
Example: Perfectly Coherent
Example: No coherence
Estimated Phase Angles

Due to statistical errors in estimation, the
estimated phase angles from multiple cell cycle
experiments need not preserve the sequential
order even though the true phase angles are in a
sequential order.
How to evaluate coherence?
Some background on regression
for circular data
ˆ3, 2
ˆ1,1
ˆ2,1
ˆ3,1
ˆ1,2
ˆ2,2
Experiment A
Experiment B
Question: Can we determine a rotation matrix A such that
we can rotate the circle representing Experiment A to
obtain the circle representing Experiment B?
Angle of rotation for a rigid body

Yes! By solve the following minimization problem:
n

AS
min
|| ˆ
g 1
g ,2
 Aˆ
2
g ,1 || 2
 cos ˆv|u sin ˆv|u 

Aˆ  
  sin ˆ
ˆ 
cos

v|u
v|u 

Determination of Coherence
Across “k” Experiments
The Basic Idea

Consider a rigid body rotating in a plane. Suppose the body is
perfectly rigid with no deformations.

Let
Aii 1
denote the 2x2 rotation matrices from
experiment i to i+1 (k+1 = 1).
Then
A12 A23 A34 . . . Ak 1k  A1k
Alternatively
A12 A23 A34 . . . Ak 1k A'1k  I

A12 A23 A34 . . . Ak 1k Ak 1  I
The Basic Idea

Equivalently, if
cosˆ

ˆ
sin

i1|i
i1|i

Aii1  


ˆ
ˆ
sini1|i cosi1|i 
Then under perfect rigid body motion we should have

k
cos(  i 1|i )  1
i 1
Problem!

In the present context we do NOT necessarily have
a rigid body!
– Not all experiments are performed with same precision.
– The time axis may not be constant across experiments.
– Number of time points may not be same across
experiments.
– Etc.
Example: Not a rigid motion
but perfectly coherent
Consequence

Rotation matrix A alone may not be enough to
bring two circles to congruence!

An additional “association/scaling” parameter may
be needed as see in the previous figure!
Circular-Circular regression model
for a pair of experiments
(Downs and Mardia, 2002)

For g  1,2,..., G , let (ˆg ,1, ˆg ,2 )
denote a pair of
angular variables.

Suppose ˆg ,2 | ˆg ,1 is von-Mises distributed with
mean direction
 and concentration parameter 
Circular-Circular Regression Model
(Downs and Mardia, 2002)
The regression model is given by the link function
tan(
   2|1
2
)  2|1 tan(
ˆg ,1   2|1
2
), where
 2|1   2|1   2|1  the angle of rotation
2|1  " associatio n parameter"
0  2|1  1,     2|1  
Back to the toy examples
(ˆ B| A , ˆ C|B , ˆ A|C )  (1,1,1), | ˆB| A  ˆC|B  ˆA|C |  0
(ˆ B| A , ˆ C|B , ˆ A|C )  (.64,.34,.20), | ˆB| A  ˆC|B  ˆA|C |  0
(ˆ B| A , ˆ C|B , ˆ A|C )  (0,0,0), | ˆB| A  ˆC|B  ˆA|C |  2.2
Determination Of Coherence

Suppose we have K experiments, labeled as
1, 2, 3, …, K. Let ˆi| j denote the angle of rotation
for the regression of i on j for a group of g genes.
K

Compute |
ˆ
i 1

Note
i|i 1
K 1  1 .
|
Determination Of Coherence

We expect |
K
ˆ | under no coherence

 i|i 1
i 1
K
ˆ |
|

 i|i 1
to be “stochastically” larger than
i 1
under coherence.
Comparison of Cumulative
Distribution Functions
Blue line: Coherence
Pink line: No Coherence
Determination Of Coherence

For a given data compute c  |
K
ˆ |

 i|i 1
i 1

Generate the bootstrap distribution of
K
|  ˆi|i 1 |
i 1
under the null hypothesis of no coherence.
Bootstrap P-value For Coherence
Let
*
ˆ

i|i 1
denote the angle of rotation using
the bootstrap sample. Then the P-value is:
K
P( | ˆi*|i1 |  c)
i 1
Illustration: Whitfield et al. data

There are 3 experiments. The phase angles of
each gene was estimated using Liu et al., (2004)
model.

A total of 47 common cell-cycling genes were
selected from the three experiments.
Estimates

The estimated values of interest are
(ˆ 2|1 , ˆ 3|2 , ˆ1|3 )  (0.67,0.70,0.64),
(ˆB| A , ˆC|B , ˆA|C )  (0.5, - 3.03, 2.59)

Note that
| ˆ2|1  ˆ3|2  ˆ1|3 |  0.06 radians
P(| ˆ2*|1  ˆ3*|2  ˆ1*|3 |  0.06)  0.029
Conclusion

Since the bootstrap P-value < 0.05, we conclude
that the three experiments are coherent.
Accession
AA135809
W93120
T54121
AA131908
AA088457
AA464019
AA430092
AA425404
H73329
AA629262
AA157499
AA282935
AA053556
AA279990
AA402431
R11407
AA598776
AA262211
AA421171
AA010065
AA292964
AA430511
AA430511
AA676797
AA458994
AA235662
N63744
AA620485
AA608568
R96941
AA504625
AI053446
R22949
AA452513
T66935
AA099033
AA485454
AA485454
AA485454
AA485454
AA620553
AA425120
N57722
AA450264
H51719
H59203
R06900
Gene Symbol
EST
EST
CCNE1*
FLJ10540
EST
E2-EPF
BUB1
FLJ10156
C20orf1
PLK
MAPK13
MPHOSPH1
MKI67
TACC3
CENPE
STK15
CDC20
KIAA0008
NUF2R
CKS2
CKS2
FLJ14642
FLJ14642
CCNF
PMSCL1
FLJ14642
FLJ10468
ANKT
CCNA2
C20orf129
KNSL1
EST
EST
KNSL5
DKFZp762E1312
USP1*
EST
EST
EST
EST*
FEN1
CHAF1B
MCM6
PCNA
ORC1L
CDC6
RAMP
A
0.882
0.260
1.191
3.534
2.613
3.478
3.566
3.508
3.494
3.314
3.390
3.826
3.600
3.804
3.556
3.484
3.355
3.457
3.785
3.341
3.312
4.170
4.170
4.024
0.841
3.653
3.864
3.709
3.857
3.751
4.107
4.348
4.164
3.915
4.193
5.000
4.886
4.275
4.886
4.275
5.897
5.697
0.047
0.195
5.906
0.551
0.243
Phase (rad)
B
0.040
0.427
0.559
2.220
2.373
2.464
2.510
2.519
2.594
2.613
2.615
2.667
2.731
2.810
2.892
2.940
2.957
2.989
3.000
3.030
3.037
3.244
3.244
3.249
3.387
3.396
3.511
3.531
3.541
3.546
3.551
3.612
3.631
3.730
3.884
4.760
5.086
5.086
5.235
5.235
5.510
5.714
5.817
5.858
5.917
5.968
6.049
C
3.399
2.580
2.661
6.186
5.700
5.798
6.132
6.241
5.873
5.888
5.784
6.233
5.665
0.275
5.939
5.869
5.854
5.918
5.679
5.826
5.980
1.653
1.474
1.170
0.298
1.278
0.637
0.923
6.133
0.667
0.410
1.256
0.161
0.192
0.800
2.876
0.891
0.891
0.891
0.891
3.028
1.685
2.568
2.438
2.889
2.723
2.889
B - B|A
-0.29
0.52
0.02
-0.65
0.66
-0.33
-0.41
-0.32
-0.22
0.05
-0.05
-0.64
-0.24
-0.46
-0.01
0.14
0.34
0.23
-0.24
0.43
0.48
-0.57
-0.57
-0.35
-0.15
0.35
0.15
0.40
0.19
0.36
-0.17
-0.45
-0.17
0.29
0.04
-0.12
0.33
1.12
0.48
1.27
-0.21
0.16
-0.23
-0.29
0.19
-0.43
-0.13
Res (rad)
C - C|B
0.66
-0.58
-0.65
0.65
-0.02
-0.02
0.26
0.36
-0.09
-0.10
-0.20
0.19
-0.44
0.37
-0.33
-0.44
-0.47
-0.44
-0.69
-0.57
-0.42
1.35
1.17
0.86
-0.13
0.85
0.11
0.38
-0.70
0.11
-0.15
0.65
-0.46
-0.50
-0.01
1.43
-0.79
-0.79
-0.90
-0.90
1.02
-0.49
0.31
0.14
0.54
0.33
0.42
A - A|C
-0.10
0.53
1.35
-0.08
-0.68
0.12
-0.01
-0.14
0.08
-0.11
0.04
0.18
0.33
-0.05
0.10
0.08
-0.04
0.02
0.50
-0.04
-0.17
-0.74
-0.57
-0.46
0.12
-0.92
-0.23
-0.59
0.28
-0.36
0.17
-0.21
0.39
0.12
-0.01
-1.45
0.61
0.00
0.61
0.00
-0.79
0.76
0.34
0.67
-0.57
0.61
0.06
Dispersion (rad)
Cir_dist
0.04
0.21
0.33
0.25
0.08
0.07
0.11
0.12
0.04
0.02
0.02
0.12
0.06
0.13
0.01
0.01
0.00
0.00
0.10
0.02
0.01
0.70
0.55
0.36
0.01
0.51
0.07
0.24
0.05
0.11
0.00
0.21
0.02
0.03
0.03
0.71
0.24
0.44
0.32
0.55
0.23
0.16
0.00
0.02
0.14
0.04
0.00
Statistical inferences on the phase angle
- Some open problems

Estimation subject to inequality constraints

It is reasonable to hypothesize that for a normal cell
division cycle, the p phase marker genes must express in
an order around the unit circle.
Thus they must satisfy:
0  1  2  ...   p  2
Open problems
- data from single experiment

How to estimate the phase angles subject to the simple
order restriction?
0  1  2  ...   p  2

More generally - wow to estimate the phase angles subject
isotropic simple order restriction?
1  2  ...   p

How to test the above hypothesis? What are the null and
alternative hypotheses?
Open problems – data from multiple
experiments

How do we estimate the phase angles from
multiple experiments under the order restriction
on the phase angles of cell cycle genes?

What are the statistical errors associated with
such an estimator?

How to construct confidence intervals and test
hypotheses?
Acknowledgments







Delong Liu (former Post-doc at NIEHS)
David Umbach (NIEHS)
Leping Li (NIEHS)
Clare Weinberg (NIEHS)
Pat Crocket (Constella Group)
Cristina Rueda (Univ. of Valladolid, Spain)
Miguel Fernandez (Univ. of Valladolid, Spain)
Download