Purpose: Survey the algorithms for computational analysis of

advertisement
Summary of “Modeling T-cell activation using gene expression profiling and
state-space models” by Rangel, et al.
Keala Chan
7/9/04
SoCalBSI
California State University at Los Angeles
INTRODUCTION
This paper has to do with microarray data analysis, in particular the use of microarray
data to model the gene pathways involved in T-cell activation. By applying microarray data to
state-space models, the authors show that this class of dynamic Bayesian networks effectively
represents a well-established model of T-cell activation.
BACKGROUND
T-cell Activation
The generation of T-lymphocytes is the central event in the generation of an immune
response. The T-cell is activated by the interaction between the T-cell receptor (TCR) complex
and an antigenic peptide on the surface of an antigen-presenting cell (1362). This event triggers
a network of signaling molecules that initiate a number of gene transcription events in the
nucleus. The paper reverse-engineers this well-known model of T-cell activation.
State-Space Models
State-space models (SSM) are a class of Bayesian networks in which observed
measurements depend on a Markov process of hidden state variables. Thus, it is believed that
state-space models are ideal for modeling complicated gene-gene interactions, since the hidden
state variables can be used to represent variables not explicitly measured in an experiment, such
as genes not included in the microarray, levels of regulatory proteins, effects of protein
degradation, etc… The main goal when using the state-space model for gene expression
modeling is to determine the matrices most likely to have generated the sequence of observation
vectors. These transition matrices determine the gene-gene interaction network, which can be
2
depicted graphically for clarity. Note, finding the transition matrices most likely to have
generated a sequence of observations is a common application of Hidden Markov Models.
A linear state-space model represents a sequence of p-dimensional observation vectors
{y1, … , yT} assuming that each yt was generated from a K-dimensional hidden-state variable xt,
in which {x1, … , xt} is a first-order Markov process1. Thus, the model is described by
(1)
xt+1 = Axt + wt
(2)
yt = Cxt + vt
where A represents transitions between the hidden states, C is the state to observation matrix,
and wt and vt are sequences of uncorrelated white noise (1362). Further, the observations can be
divided into input variables and response variables. For distinct inputs to the state (1) and
observation (2) equations, the SSM becomes
(3)
xt+1 = Axt + Bht + wt
(4)
yt = Cxt + Dut + vt
where ht and ut are inputs to the state and observation vectors, B is the input to state matrix, and
D is the input to observation matrix.
For the gene expression model in particular, the (suitably normalized) fluorescent
intensities measured for each of p genes at time step t are kept in the vector gt, while the hidden
variables remain in the vector xt. Gene expression is now more specifically modeled by
(5)
(6)
xt+1 = Axt + Bgt + wt
gt = Cxt + Dgt-1 + vt.
Thus, the hidden state xt+1 now depends on the previous state xt (according to the Markov
process) as well as the previous observed gene expressions gt, and the observation gt at time t
1
Markov process in which the next state of the system depends only on the previous state.
3
depends on the hidden variables xt as well as the previous observed gene expression gt-1. In
addition it is evident that the matrix D holds the probabilities of gene-gene interactions at
consecutive time points, the matrix B captures the influence of gene expression on the next
hidden state, and the matrix C shows the influence of hidden variables on gene expression at
each time point. Finally, it is the matrix CB+D that captures both direct gene-gene interaction
and the gene-gene interactions through the hidden states over time, in other words, all the
important information related to gene-gene interaction over one time step (1363).
METHOD
The microarrays were made by spotting PCR products on glass slides. The genes chosen
for the microarrays were all determined to be modulated in response to T-cell activation. Two
replicated experiments were hybridized on two sets of arrays. Genes whose expression values at
all time points were below a specified value were discarded, as were genes that displayed very
poor reproducibility between the two experiments.
The paper specifies the algorithms used to determine the optimal number of hidden states
and to estimate the transition matrices A, B, C, and D. The latter algorithm is described in a
previous paper by the authors, and because they are related to the field of statistical modeling
rather than bioinformatics, they are not described here.
RESULTS
After data pre-processing, 39 out of the 58 genes remained with significant interactions.
The structural parameters A, B, C, and D and corresponding confidence intervals are estimated
using the “EM algorithm” described in the authors’ previous paper. Then, a connectivity matrix
4
for CB+D is constructed by assigning zero to elements for which zero is within the confidence
interval, and assigning one to elements otherwise. Finally, a directed graph (Figure 1) is drawn
based on this connectivity matrix. In the graph, arrows are drawn from a gene expression
variable at a given time t to the gene variable whose expression it influences at the next time
point t+1. Note that the non-zero entries in CB+D can be positive or negative, indicating up or
down regulation; in Figure 1, up and down regulation are represented by solid and dotted arrows
respectively.
The authors discuss many functional groupings evident in the graphical representation
that correspond to known functions in T-cell activation. For instance, FYB (gene 1) is an
important adaptor molecule and cells defective in this molecule have severely impaired
proliferation and migratory response (Burack et al., 2002); the model links FYB to three
interleukin genes: IL2 (gene 7), IL4 (gene 5), and IL3 (gene 2). The cytokines that bind to the
three interleukin genes are well-known to be proliferation signals in T-cells. In another example,
the model shows the gene SMN1 (gene 19) negatively influencing the expression of JunB (gene
13), a pro-apoptotic gene (Weitzman, 2001). This fits with the experimental finding that SMN1
inhibits the onset of apoptosis. The graph suggests many more specific connections that are well
supported by the literature. In addition, labeling the genes by functional categories yields some
interesting groupings in the diagram. For example, FYB (gene 1) and 5 of its connected genes
are directly related to the inflammation response.
CONCLUSIONS
Reverse engineering of T-cell activation pathways using state-space models confirms
many known interactions, making the SSM a powerful tool in predicting gene pathways.
Moreover, interactions suggested by the model that are not supported in the literature represent
5
novel hypotheses (1370) and thus present opportunities for novel discoveries through
experimental confirmation. The authors suggest improvements to the modeling procedure such
as more replicates in the data set and additional time points. They also note that the hidden
variables are in general not identifiable, in that a one-to-one correspondence between hidden
variables and specific genes does not exist. Instead, hidden variables likely represent a
combination of complex events. However, it is precisely the inclusion of hidden variables in the
model that makes the SSM a realistic biological model.
The paper adequately shows the accuracy of the SSM in modeling the T-cell pathway
network; the litany of matches with the literature is very convincing as a feat of reverse
engineering. It is not specified how such a model would best be used, however, besides to
suggest hypotheses for immediate clinical research, and perhaps this is exciting enough. But
because the paper does not explain or prove the accuracy of the SSM in modeling a biological
system, SSM cannot be used independent of experimental evidence.
6
Figure 1
7
Download