Wavelet-based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis Jeffrey S. Morris

advertisement
Wavelet-based Nonparametric
Modeling of Hierarchical Functions
in Colon Carcinogenesis
Jeffrey S. Morris
M.D. Anderson Cancer Center
Joint work with Marina Vannucci, Philip J. Brown,
and Raymond J. Carroll
1. Introduction
In this work, we have developed a new method to model hierarchical
functional data, which is a special type of correlated longitudinal data.
• Modeling of the longitudinal profiles is done nonparametrically, with
regularization done via wavelet shrinkage.
• We model all the data simultaneously using a single hierarchical
model, an approach that naturally adjusts for imbalance in the design
and accounts for the correlation structure among the individual profiles.
• Our method yields estimates and posterior samples for mean as well
as individual-level profiles, plus interprofile covariance parameters.
• This work was motivated by and applied to a data set from a colon
carcinogenesis study.
2. Biological Background of Application:
Cellular Structure of the Colon
• Colon cells grow within crypts,
fingerlike structures that grow into the
wall of the colon.
• Individual colon cells are generated
from stem cells at the bottom of the
crypt, and move up the crypt as they
mature.
• This means a cell’s relative depth
within the crypt is related to its age
and stage in the cell cycle, and as a
result it is important to model
biological processes in the colon as
functions of relative cell position.
• Define t = relative cell position,
t  (0,1)
0 = bottom of crypt,
1 = top of crypt
3. The Colon Carcinogenesis Study
Previous studies have demonstrated that diets high in fish oil fats are
protective against colon cancer compared with corn oil diets, but the
biological mechanisms behind this effect are not known.
One hypothesis is that the body is not effectively repairing damage to cells’
DNA caused by exposure to a carcinogen. This hypothesis was investigated
in the following rat colon carcinogenesis experiment.
The Data:
•Rats were fed one of two diets (fish/corn oil), exposed to carcinogen, then
euthanized at one of 5 times after exposure (0,3,6,9,12 hr). Their colons were
removed and stained to quantify the levels of a DNA repair enzyme.
– 3 rats in each of the 10 diet  time groups.
– 25 crypts sampled from each rat
– Response quantified on equally spaced grid (~300) along left side of crypt
• Hierarchical functional data: Longitudinal profile for each crypt, with crypts
nested within rats, and rats further nested within treatment.
4. The Colon Carcinogenesis Study
Some questions of interest from these data:
1.
Do fish oil rats have more DNA repair enzyme than corn oil rats? Does
this relationship depend on time since carcinogen exposure and/or depth
within the crypt?
2.
Does DNA repair enzyme activity vary w.r.t. depth within crypt?
3.
Where is the majority of variability in DNA repair expression, between
rats, between crypts, or within crypts?
4.
Are there any rats and/or crypts with unusual DNA repair profiles?
We set out to answer questions like these using a single unified model for all
the data.
5. Hierarchical Functional Model
• Yabc = response vector for crypt c, rat b of
treatment group a on grid t
• g·(t) = true crypt, rat, or treatment-level profiles
•eabc= measurement error (iid normals)
•abc(t), ab(t) = crypt/rat level error processes;
mean 0 with covariances 1(t1,t2) and 2 (t1,t2)
0 20 40MGT 60 80
0
0
.
0
0
.
2
0
.
4
0
.
6
1
.
8
.
0
0
0
.
0
0
.
2
0
.
4
0
.
6
1
.
8
.
0
0
0
.
0
0
.
2
0
.
4
0
.
6
1
.
8
.
0
t
t
Fi s h
t
F
,
i
T
s h3
Co
,
,
T
R
rn
9
0 20 40MGT 60 80
g ab (t )  g a (t )   ab (t )
0 20 40MGT 60 80
g abc (t )  g ab (t )  abc (t )
F
,
i
T
s h3
Co
,
,
T
r
R
n
9
0 20 40MGT 60 80
Yabc  g abc (t )  e abc ,
Fi s h
0 20 40MGT 60 80
MODEL (1):
If we were willing to place parametric
assumptions on g functions, model (1)
simplifies to standard mixed model.
0 20 40MGT 60 80
• Following is a model we can use to
think about our data:
0
0
.
0
0
.
2
0
.
4
0
.
6
1
.
8
.
0
0
0
.
0
0
.
2
0
.
4
0
.
6
1
.
8
.
0
0
0
.
0
0
.
2
0
.
4
0
.
6
1
.
8
.
0
t
t
t
6 of the 750 observed crypt profiles of DNA repair enzyme
•For our data, we need nonparametric
method, that can handle spatially
heterogeneous functions.
•In single function setting, wavelet
regression effective for modeling
functions like these.
6. Overview of Wavelets
Useful Properties of Wavelets
-1.0 -0.5 0. 0.5 1.0
Wavelets are orthonormal
basis functions that can be
used to parsimoniously
represent other functions.
-2
0
2
1.
Parsimonious representation: Many
functions can be represented almost
perfectly with just ~5% of the wavelet
coefficients.
2.
Fast Computation: O(n) algorithm
(DWT) available to compute set of n
wavelet coefficients from n data points
(IDWT to do inverse transformation).
3.
Linear Transformation: Switching
between data and wavelet spaces
involves linear transformation, which
preserves hierarchical relationships
between profiles in our model.
4.
Whitening property: Wavelet coeffs.
much less correlated than original data.
4
Daubechies Basis Function
7. Overview of Method
Implementation of our method fundamentally involves three steps:
1.
Perform a DWT on the response vector for each crypt, yabc, to obtain the
corresponding set of empirical wavelet coefficients, dabc. The wavelet
coefficients are double-indexed by j, corresponding to the scale, and k the
location.
2.
Fit a Bayesian multilevel hierarchical model to these empirical wavelet
coefficients, yielding posterior samples of the true wavelet coefficients
which correspond to the treatment, rat, and crypt-level profiles, plus the
covariance parameters at these levels.
- This model incorporates a shrinkage prior at the top level of the hierarchy,
that uses nonlinear shrinkage to perform smoothing, or regularization, in the
nonparametric estimation of the profiles.
3.
Posterior samples of any treatment, rat, or crypt-level profiles of interest can
be obtained by applying IDWTs to the corresponding of true wavelet
coefficient estimates. These samples can be used for estimation or Bayesian
inference.
8. Wavelet Space Model
• Assuming the wavelet coefficients are
independent, we specify a (scalar)
hierarchical model for the empirical
wavelet coefficients.
Notes on Wavelet Space Model
1.
All quantities in data space model
(1) obtainable using IDWT on
quantities in wavelet space model
(2).
2.
Although independence assumed
in wavelet space, model (2)
accommodates varying degrees of
autocorrelation in data space.
3.
Measurement error variance in
wavelet space,2e, the same as in
data space. Plug-in estimate used
as in Donoho&Johnstone(1994)
4.
For Bayesian inference, vague
proper priors placed on variance
components, and shrinkage prior
placed on top level coeffs. aj,k.
MODEL 2
j ,k
abc
j ,k
abc
d


~ N (
j ,k
abc
, )
~ N (
j ,k
ab
, )
~ N (
2
e
2
1, j
,
j ,k
j ,k
2
ab
a
2, j
• abcj,k,abj,k, aj,k: true wavelet
)
coefficients at crypt, rat, trt levels
• 22,j, 21,j, 2e: variance components
between rats, between crypts, and
within crypts at wavelet scale j
9. Shrinkage Prior
One property of the wavelet transform is that it tends to distribute any noise
equally among all the wavelet coefficients. Given the parsimony property, this
means that a majority of the coefficients will be small, and consist almost
entirely of noise, while a small proportion will be large and contain primarily
signal.
In order to perform regularization, we would like to shrink the wavelet
coefficients towards zero in a nonlinear fashion so that, if it is large, we leave
it alone since we think it is signal. The smaller it is, the more we would like to
shrink it, since we then think it is noise. This type of nonlinear shrinkage
results in denoised, or regularized, estimators.
This nonlinear shrinkage is achieved in our model via a prior on aj,k.that is a
mixture of point mass at zero and a normal, which we call a shrinkage prior.
Shrinkage Prior:
 aj ,k ~ N (0,  aj ,k 2j )

j ,k
a
~ Bernoulli ( p j )
pj = expected proportion of ‘nonzero’
wavelet coefficients at wavelet scale j.
2j = variance of ‘nonzero’ wavelet
coefficients at wavelet scale j.
10. Fitting the Model
• The shrinkage prior requires us to use an MCMC to fit model (2).
– We initially integrate out the random effects to improve convergence.
STEPS:
1. Sample mean - level wavelet coefficien ts  aj ,k from ( aj ,k | d aj ,k ,Ω j ) for each j , k , a
- Closed form mixture of point mass at 0 and Normal
2. Sample set of variance components Ω j from (Ω j | d j ,k , aj ,k ) for each j
- Metropolis step used
• If rat/crypt level profiles are desired, we proceed to sample random effects:
j ,k
1. Sample from ( abj ,k |  aj ,k ,  j , d ab
) for each a, b, j, k
j ,k
j ,k
2. Sample from ( abc
|  abj ,k ,  j , d abc
) for each a, b, c, j, k
•Inverse Discrete Wavelet Transforms are applied to the sets of
wavelet coefficients to get the corresponding profiles
11. Application: Results
0h
3h
9h
12h
12
hr
30
30
30
O
Fi
il
s h0
O
F
hr
i
il
s h3
O
F
hr
i
il
s h6
O
F
hr
is
l
h9
O
hr
il
30
30
Fi s h
6h
25
20
5
0
10 MGTlev 15
25
20
5
0
10 MGTlev 15
25
20
5
0
10 MGTlev 15
25
20
10 MGTlev 15
0
5
20
0
5
Fish
Oil
10 MGTlev 15
25
Key Results:
•
More DNA Repair at
top of crypts, which
is where tumors form.
•
No diet/time effect
evident through 9h
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
C
e
l
R
P
e
o
a
l
s
t i
t
vi
o
e
n
C
e
l
R
P
e
o
a
l
s
t i
t
v
i
o
i
e
n
C
e
l
R
P
e
o
a
l
s
t i
t
v
i
o
i
e
n
C
e
l
R
P
e
o
a
l
s
t i
t
v
i
o
i
e
n
C
e
l
30
25
•
5
0
10 MGTlev 15
20
30
25
20
5
0
10 MGTlev 15
25
5
0
10 MGTlev 15
20
30
25
5
0
10 MGTlev 15
20
30
25
20
10 MGTlev 15
5
0
Corn
Oil
P
o
si
t i
o
n
O
C i
or
l
n0
O
C
hr
i
or
l
n3
O
C
hr
i
or
l
n6
C
O
hr
or
il
n9
O
hr
il
30
R
e
a
l
t i
ve
C orn
C
e
l
R
P
e
o
a
l
s
t i
t
vi
o
e
n
C
e
l
R
P
e
o
a
l
s
t i
t
v
i
o
i
e
n
C
e
l
R
P
e
o
a
l
s
t i
t
v
i
o
i
e
n
C
e
l
R
P
e
o
a
l
s
t i
t
v
i
o
i
e
n
C
e
l
Estimated diet/time profiles of DNA Repair Enzyme with 90%
pointwise posterior credible bounds.
hr
Strange results at 12h
- Fish>Corn at top
- Caused by outlier?
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
0
.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
. 0
R
e
a
l
t i
ve
12
P
o
si
t i
o
n
12. Application: Results
Fish
Oil
Corn
Oil
Estimated rat profiles for all rats at 12 hour time point, with
90% posterior credible intervals. The dashed lines are naïve
estimates obtained by simply averaging the data over crypts.
•
Effect consistent across rats –
12h effect not driven by
outlier
•
Relative Variability:
79% between crypts,
20% between rats,
1% within crypts:
Implies importance of
sampling lots of crypts
•
Variance components at
wavelet resolution levels
reveal insights about nature
of profiles:
- Rat profiles more smooth
- Crypt profiles characterized
by peaks of width ~20 pixels
(peaks = individual cells?)
13. Sensitivity Analysis
• Uninformative priors used
everywhere, except shrinkage
hyperparameters p and T2,
•Sensitivity analysis reveals
this prior affects the
smoothness of the underlying
estimator, but not the
substantive results.
•Thus, these parameters are
essentially smoothing
parameters, or more accurately,
regularization parameters.
Estimated profiles at 12 hour time point, with 90% posterior
credible intervals, using different values for the shrinkage
hyperparameters p and T2
14. Conclusions
•New method to fit hierarchical longitudinal data, yielding:
• Nonparametric regularized estimates of profiles at mean, individual and
subsampling levels
• Covariance parameter estimates, from which correlation matrices at
various hierarchical levels can be constructed.
• Posterior samples from MCMC enable various Bayesian inferences to be
done.
•All done with unified, ‘nonparametric’ model that appropriately adjusts for
imbalance & correlation.
• Wavelets allow use of simpler covariance structures, work well for spatially
heterogeneous functions, and give multiresolution analysis.
• Our method can be generalized to more complex covariance structures,
effectively yielding a ‘nonparametric’ functional mixed model.
15. How does the nonlinear shrinkage work?
Consider the shrinkage estimator for one of the wavelet coefficients, which can be
seen to be a product of the (unshrunken) MLE and a shrinkage factor h. This
shrinkage factor depends on the hyperparameters (pj and 2j) and Z, which
measures the distance between the MLE and zero in units of its standard deviation
k
k
ˆaj,,SHRINK
 E aj ,k | d aj,k , Ω j   ˆaj,,MLE
* h( Z , T j2 , p j )
 
MLE
k
k
Z  ˆaj,,MLE
/ Var (ˆaj,,MLE
)
Shrinkage
Function
k
T j2   2j / Var (ˆaj,,MLE
)
This shrinkage factor has two parts: one that performs linear shrinkage, which
applies equally no matter what the magnitude of the data, and one that performs
nonlinear shrinkage, which shrinks more for data closer to zero. Note that the
nonlinear shrinkage involves the posterior odds that the coefficient aj,k is nonzero.
 T j2 
h( Z , T , p j )   2  Pr  aj ,k  1 | d aj,k
 T  1  
j

Nonlinear


Shrinkage

2
j
Linear
Shrinkage



O
,
O 1
O  Posterior Odds
Pr  aj ,k  1 | d aj ,k 
2
2

 pj 


T

1
Z

j
2




O

1 T j exp 

2



1 p j 
 2  T j  1 
Posterior Odds


 

Prior Odds

Bayes Factor
16. How does the nonlinear shrinkage work?
Sh
(
rb
i n
)
k
a
S
g
h
e
r i
f
h
(
Z
,
p
T
)
0. 0.2 0.4 0.6 0.8 1.0
h
(
Z
,
p
T
)
0. 0.2 0.4 0.6 0.8 1.0
( a
)
T2 =
T2 =
T2 =
T2 =
T2 =
- 4
- 2
0
2
2
5
20
100
1000
4
T2 =
T2 =
T2 =
T2 =
T2 =
- 4
- 2
0
Z
2
2
5
20
100
1000
4
Z
Shrinkage as function of Z=size of coefficient for p=0.50 and p=0.10, and various choices of T2.
Notes about shrinkage curves
1.
Coefficients close to zero are shrunken the most.
2.
Making pj smaller causes more shrinkage of coefficients, and thus more
smoothing of features at wavelet scale j.
3.
Making T2j too small results in much shrinkage, even for large coefficients.
4.
Making T2j too large causes a Lindley’s paradox-type effect, which is
manifested in steep shrinkage curves.
Download