Semiparametric Methods for Colonic Crypt Signaling

Semiparametric Methods for Colonic
Crypt Signaling
Raymond J. Carroll
Department of Statistics
Faculty of Nutrition and Toxicology
Texas A&M University
• Problem: Modeling correlations among
functions and apply to a colon carcinogenesis
• Biological background
• Semiparametric framework
• Nonparametric Methods
• Asymptotic summary
• Analysis
• Summary
Biologist Who Did The Work
Meeyoung Hong
Postdoc in the lab of
Joanne Lupton and
Nancy Turner
Data collection
represents a year of
Basic Background
• Apoptosis: Programmed cell death
• Cell Proliferation: Effectively the opposite
• p27: Differences in this marker are thought to
stimulate and be predictive of apoptosis and cell
• Our experiment: understand some of the
structure of p27 in the colon when animals are
exposed to a carcinogen
Data Collection
• Structure of Colon
• Note the finger-like
• These are colonic
• We measure
expression of cells
within colonic crypts
Data Collection
• p27 expression:
Measured by staining
• Brighter intensity = higher
• Done on a cell by cell
basis within selected
colonic crypts
• Very time intensive
Data Collection
• Animals sacrificed at 4 times: 0 = control,
12hr, 24hr and 48hr after exposure
• Rats: 12 at each time period
• Crypts: 20 are selected
• Cells: all cells collected, about 30 per crypt
• p27: measured on each cell, with logarithmic
Nominal Cell Position
• X = nominal cell
• Differentiated
cells: at top, X =
• Proliferating
cells: in middle,
• Stem cells: at
bottom, X=0
Standard Model
• Hierarchical structure: cells within crypts
within rats within times
Yrc (x) = η(x)+γ r (x)+θrc (x)+ε rc (x)
η(x)+γ r (x) = rat-level function
θrc (x) = crypt-level functions,
typically assumed independent
Standard Model
• Hierarchical structure: cell locations of cells
within crypts within rats within times
• In our experiment, the residuals from fits at the
crypt level are essentially white noise
• However, we also measured the location of the
colonic crypts
Crypt Locations at 24 Hours, Nominal
Scale: 1000’s
of microns
at between
Standard Model
• Hypothesis: it is biologically plausible that the
nearer the crypts to one another, the greater the
relationship of overall p27 expression.
• Expectation: The effect of the carcinogen
might well alter the relationship over time
• Technically: Functional data where the
functions are themselves correlated
Basic Model
• Two-Levels: Rat and crypt level functions
• Rat-Level: Modeled either nonparametrically or
• Semiparametrically: low-order regression
• Nonparametrically: some version of a kernel
Basic Model
• Crypt-Level: A regression spline, with few
knots, in a parametric mixed-model
θrc (x) =  rc,I + x  rc,L + C(x) rc,S ;
C(x)  spline basis functions
 p
cov( rc )   S  
 0
0 
2 
 p I 
Linear spline:
In practice, we used
a quadratic spline.
The covariance
matrix of the
quadratic part used
the PourahmadiDaniels construction
David Ruppert
Please buy our book! Or steal it 
Matt Wand
Basic Model
• Crypt-Level: We modeled the functions across
crypts semiparametrically as regression
• The covariance matrix of the parameters of the
spline modeled as separable
• Matern family used to model correlations
Matern correlation : (d, , m )
 2d m 
 2d m 
(d, , m )  m 1
 K m 
2  (m )   
K m  Modified Bessel function
Basic Model
• Crypt-Level: regression spline, few knots
• Separable covariance structure
θrc (x) =  rc,I +  rc,L x + C(x) rc,S ;
 d(i, j), , m 
cov( ri ,  rj )  
   S
  d(i, j), , m
d(i, j)  dis tan ce between crypts ( i, j)
Theory for Parametric Version
• The semiparametric approach we used
fits formally into a new general theory
• Recall that we fit the marginal function
at the rat level “nonparametrically”.
• At the crypt level, we used a
parametric mixed-model representation
of low-order regression splines
Xihong Lin
General Formulation
• Yij = Response
• Xij, Zij = cell and crypt locations
• Likelihood (or criterion function)
 Yi ,Zi ,β , θ(X i1 ),...,θ( X im )
• The key is that the function θ() is evaluated
multiple times for each rat
• This distinguishes it from standard
semiparametric models
• The goal is to estimate θ() and β efficiently
General Formulation: Overview
• Likelihood (or criterion function)
 Yi ,Zi ,β , θ(X i1 ),...,θ( X im )
• For iid versions of these problems, we have
developed constructive kernel-based methods
of estimation with
• Asymptotic expansions and inference available
• If the criterion function is a likelihood function,
then the methods are semiparametric efficient.
• Methods avoid solving integral equations
General Formulation: Overview
• We also show
• The equivalence of profiling and backfitting
• Pseudo-likelihood calculations
• The results were submitted 7 months ago, and
we confidently expect 1st reviews within the
next year or two , depending on when the
referees open their mail
General Formulation: Overview
• In the application, modifications necessary both
for theoretical and practical purposes
• Techniques described in the talk of Tanya
Apanasovich can be used here as well to cut
down on the computational burden due to the
correlated functions.
Nonparametric Fits
• Equal Spacing: Assume cell locations are
equally spaced
• Define V(x1 ,x2 ,Δ) = covariance between cryptlevel functions that are D apart, one at location
x1 and the other at location x2
• Assume separable covariance structure
V(x1 ,x2 ,Δ) =G(x1 ,x2 ) ρ(Δ)
Nonparametric Fits
• Discrete Version: Pretend D, x1 and x2 take on
a small discrete set of values (we actually use a
kernel-version of this idea)
• Form the sample covariance matrix per rat at D,
x1 and x2 , then average across rats.
• Call this estimate
V̂(x1 ,x2 ,Δ)
Nonparametric Fits
• Separability: Now use the separability to get a
rough estimate of the correlation surface.
V(x1 ,x 2 ,Δ) = G(x1 ,x 2 ) ρ(Δ)
 V̂(x ,x ,Δ)
ρ(Δ) =
x1 ,x 2
 V̂(x ,x ,0)
x1 ,x 2
Nonparametric Fits
• The estimate
is not a proper
correlation function
• We fixed it up using a trick due to Peter Hall
(1994, Annals), thus forming ρ̂(Δ) , a real
correlation function
• Basic idea is to do a Fourier transform, force it
to be non-negative, then invert
• Slower rates of convergence than the parametric
fit, more variability, etc.
• Asymptotics worked out (non-trivial)
Semiparametric Method Details
• In the example, a cubic polynomial actually
suffices for the rat-level functions
• Maximize in the covariance structure of the
crypt-level splines, including Matern-order
• The covariance structure includes
Matern order m (gridded)
Matern parameter 
Spline smoothing parameter
Quadratic part’s covariance matrix (PourahmadiDaniels)
• AR(1) for residuals
Results: Part I
• Matern Order: Matern order of 0.5 is the classic
autoregressive model
• Our Finding: For all times, an order of about
0.15 was the maximizer
• Simulations: Can distinguish from
• Different Matern orders lead to different
interpretations about the extent of the
Marginal over time Spline Fits
• time
• Location
24-Hour Fits: The Matern order matters
Nonparametric and Semiparametric
Fits at 24 Hours
Comparison of Fits
• The semiparametric and nonparametric fits are
roughly similar
• At 200 microns, quite far apart, the correlations
in the functions are approximately 0.40 for both
• Surprising degree of correlation
• We have studied the problem of crypt-signaling
in colon carcinogenesis experiments
• Technically, this is a problem of hierarchical
functional data where the functions are not
independent in the standard manner
• We developed (efficient, constructive)
semiparametric and nonparametric methods,
with asymptotic theory
• The correlations we see in the functions are
surprisingly large.
Statistical Collaborators
Yehua Li
Naisyin Wang
for this