isbra 2012 - UIC - Computer Science

advertisement
Models and Algorithmic Tools for Computational
Processes in Cellular Biology
Bhaskar DasGupta
Department of Computer Science
University of Illinois at Chicago
Chicago, IL 60607-7053
dasgupta@cs.uic.edu
What is “systems biology”
in one sentence ?
study to unravel and conceptualize
dynamic processes, feedback control loops
and signal processing mechanisms
underlying life
ISBRA 2012
Cellular Networks
• A single cell by itself is
complex enough
• Various technologies have
facilitated the monitoring of
expression of genes and
activities of proteins
• Difficult to find the causal
relations and overall
structure of the network
http://www.nyas.org/ebriefreps/ebrief/000534/images/mendes2.gif
ISBRA 2012
Cellular Networks
Genes and gene products interact on several levels, e.g.:
• Genes regulate each other’s expression as part of gene regulatory networks
– transcription factors can activate or inhibit the transcription of genes
to give mRNAs
– these transcription factors are themselves products of genes
• Protein-protein interaction networks
– proteins can participate in diverse post-translational interactions that
lead to modified protein functions or to formation of protein complexes
that have new roles
• Different levels of interactions are integrated
– e.g., presence of an external signal triggers a cascade of interactions
that involves biochemical reactions, protein-protein interactions and
transcriptional regulation
ISBRA 2012
Cellular networks
• cellular interaction maps only represent a network of
possibilities, and not all edges are present and active in vivo in
a given condition or in a given cellular location
• only an integration of time-dependent interaction and activity
information will be able to give the correct dynamical picture
of a cellular network
ISBRA 2012
Modeling problem
• interaction data produced by the biologist in the form of a
diagram (e.g., some type of labeled digraph)
• wish to pose questions about the behavior (dynamics) of such a
network
– essential to provide a precise mathematical formulation of
its dynamics, and specifically how the state of each node
depends on the state of the nodes interacting with it
ISBRA 2012
Models
•
•
•
•
•
discrete, continuous and hybrid models
their inter-relationships, powers and limitations
computational complexity and algorithmic issues
biological implications and validations
fascinating interplay between several areas such as:
– biology
– control theory
– discrete mathematics and computer science
ISBRA 2012
System dynamics
• state variables
– continuous
– discrete (e.g., small number of quantitative states)
• time variables
– continuous (e.g., partial differential equation, delay equations)
– discrete (difference equations, quantized descriptions of continuous
variables)
• deterministic or probabilistic nature of the model
• hybrid models
– combines continuous and discrete time-scales and/or
– combines continuous and discrete time variables
ISBRA 2012
Continuous-state dynamics
Differential equation
(continuous-time)
Difference equation
(discrete-time)
ISBRA 2012
Examples of other models
Boolean
x1, x2, x3  {0.1}
Boolean
feedforward
Signal
Transduction
ISBRA 2012
Reverse engineering of models
Given
– partial knowledge about the process/network
– access to suitable biological experiments
How to gain more knowledge about the model ?
– effective use of resources (time, cost)
ISBRA 2012
Reverse engineering
Process of backward
reasoning, requiring
careful observation of
inputs and outputs, to
elucidate the structure of
the system
ISBRA 2012
http://www.computerworld.com/computerworld/records/images/story/46Reverse-engineering.gif
Ingredients for reverse engineering
• Mathematical model to be reverse engineered
– e.g., differential equation model
• Biological experiments available, e.g.,
– perturbation experiments
– gene expression measurements
ISBRA 2012
Many reverse engineering approaches are possible
I will discuss two types of approaches:
– “hitting set” based combinatorial approaches
– modular response analysis (MRA) approach
ISBRA 2012
Reverse Engineering of Networks Via
Modular Response Analysis Method
Ingredients for reverse engineering via
modular response analysis approach
• Mathematical models
– differential equation model
• Biological experiments available
– perturbation experiments
ISBRA 2012
Differential Equation Model
state variables evolve by (unknown) ordinary differential equations
dx
= f(x,t) 
dt
dx1
= f1 (x1 , x 2 ,…, xn ,p1 ,p 2 ,…,p m )
dt
dxn
= fn (x1 , x 2 ,…, xn ,p1 ,p 2 ,…,p m )
dt
x = (x1(t),...,xn(t)) state variables over time t
measurable (e.g., activity levels of proteins)
p = (p1,...,pm)
parameters that can be manipulated
f(x*,p*)=0
p* “wild-type” (i.e., normal) condition of p
x* corresponding steady-state condition
ISBRA 2012
settings for modular response analysis method
– do not know f
– but, prior information of the following type is available
• parameter pj does not effect variables xi
(i.e., fi /pj ≡ 0 or not)
Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002
ISBRA 2012
Experimental protocols
(perturbation experiments)
•
perturb one parameter, say pk
•
for perturbed p, measure steady state vector x = (p)
– let the system relax to steady state
– measure xi (western blots, microarrys etc.)
•
estimate n “sensitivities”:
b ij 
 i *
1
(p )  *
( i ( p*  p j e j )   i ( p* )) for i = 1, 2,…,n
p j
pj - p j
where ej is the jth canonical basis vector
ISBRA 2012
Modeling Goal
Modeling goal can be at
different levels
1.
2.
3.
4.
A
Topology of
connections only
B
9.3
+
1.2
4.8
+
Direction of the
relationship
2.1
+
Information about
stimulatory or inhibitory
effects
Strength of relationship
C
5.3
-
ISBRA 2012
D
Goal of MRA approach
Obtain information about the sign of fi/xj(x,p)
e.g., if fi/xj  0, then xj has a positive (catalytic) effect on the
formation of xi
ISBRA 2012
In a nutshell
after some combinatorics and linear algebra
one can quantify the additional prior knowledge
necessary to reach the goal
Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002
Bermen, DasGupta and Sontag, Discrete Applied Math, 2007
Berman, DasGupta and Sontag, Annals of NYAS, 2007
ISBRA 2012
But, assuming (near)-sufficient prior information
• how to determine a minimum or near-minimum number of
perturbation experiments that will work?
This now becomes a algorithmic/complexity issue...
ISBRA 2012
After some effort, one can see that
designing minimal sets of experiments
leads to
the set multi-cover problem
ISBRA 2012
In our biological application context,
our set-multicover algorithm provides a set of suggested
experiments such that
# of experiments ≈ minimum possible
ISBRA 2012
Modular Response Analysis
for
Differential Equations model
Combinatorial
Algorithms
(randomized)
Linear Algebraic
formulation
Combinatorial
formulation
Selection of
appropriate
perturbation experiments
ISBRA 2012
Overall high-level picture
Experimental validation of MRA Method
See the paper:
S. D. M. Santos, P. J. Verveer, P. I. H. Bastiaens,
Growth factor-induced MAPK network topology shapes Erk response
determining PC-12 cell fate
Nature Cell Biology 9, 324 - 330 (2007)
• MAPK pathway involving proteins Raf, Mek and Erk is activated
through receptor tyrosine kinases TrkA and epidermal growth factor
receptor (EGFR) by two different stimuli, NGF (neuronal-) or EGF
(epidermal growth factor)
•
MRA method was applied to determine the MAPK network
architecture in the context of NGF and EGF stimulations
ISBRA 2012
Reverse Engineering of Networks Via
Hitting-set based (combinatorial) Method
“Hitting set” based combinatorial approaches
topology of
interconnection
network
hitting set
steady state profiles of
perturbations
of the network
multi-hitting set
introduce
redundancy
hitting set
expression data representing
state transition measurement
for wildtype and perturbation data
introduce
redundancy
ISBRA 2012
topology of
interconnection
network
multi-hitting set
Basic idea behind the hitting-set based approaches
x5 changes
so does x1, x3, x4
at least one of {x1,x3,x4} must influence x5
minimal dependency
(hitting set problem)
{x1,x2}
which variables
influence x5 ?
{x1}
{x2,x3,x4} {x1}
{x1,x3,x4}
{x1,x3}
build dependency information over
all successive time steps
ISBRA 2012
Why construct “minimal” dependency ?
Occam's razor
entia non sunt multiplicanda praeter necessitatem
(entities must not be multiplied beyond necessity)
However, biological networks may be redundant:
e.g.
– G. Tononi, O. Sporns, G. M. Edelman, PNAS, 1999
– R. Albert et al., Physical Review E, 2011
How can we introduce redundancy if necessary ?
ISBRA 2012
How can we introduce redundancy if necessary ?
First idea: add random extra dependencies (edges)
not good, these edges may not be supported by given
data
Better idea: modify hitting set to “multi-hitting set”
{x1,x3,x4}
previously: select at least 1
now: select at least 2
(in general, some r)
ISBRA 2012
Evaluation of performance of reverse engineering Methods
Reverse-engineering methods are ill-posed, i.e., their solution is not unique
– existence of measurement error
– not all molecular species involved in a given analyzed phenomenon are
included in the construction of a network
• i.e., existence of hidden variables
Two possible ways for evaluation:
Experimental testing of predictions:
after a model has been inferred, newly found interactions or predictions can
be tested experimentally
Benchmarking testing:
measure how “accurate” the method of our interest is in recovering a known
(“gold standard”) network
ISBRA 2012
Evaluation of performance of reverse engineering Methods
Metrics for accuracy for benchmark testing
Measurements:
–
–
–
–
correct interactions inferred (true positives, TP)
incorrect interactions inferred (false positives, FP)
correct non-interactions inferred (true negatives, TN)
incorrect non-interactions inferred (false negatives FN)
Metrics
– recall or true positive rate
TPR =
TP
TP +FN
FP
– false positive rate FPR = FP + TN
–
TP + TN
– accuracy ACC = total possible interactions
– precision or positive predictive value
PPV =
TP
FP + TP
ISBRA 2012
Two published method based on hitting set approach
(A) Ideker, Thorsson, Karp, PSB (2000)
First step (network inference):
estimate a set of Boolean networks consistent with an observed set of steadystate gene expression profiles, each generated from a different perturbation to
the genetic network
Second step (optimization):
use an entropy-based approach to select an additional perturbation experiment
to perform a model selection from the set of predicted Boolean networks
(B) Jarrah, Laubenbacher, Stigler, Stillman, Adv. in Applied Mathematics (2007)
Attempts to infer the most likely causal relationships among network elements from
gene expression data
For other published results, see, for example:
Krupa, Journal of Theoretical Biology (2002)
ISBRA 2012
Comparative analysis (via benchmark testing) of two approaches by
(A) Ideker, Thorsson, Karp
(B) Jarrah, Laubenbacher, Stigler, Stillman
Two gold standard networks:
a.
Segment polarity network of Drosophila melanogaster (fruit fly):
–
–
–
–
–
last step in the hierarchical cascade of gene families initiating the segmented
body of the fruit fly
genes of this network include
– engrailed (en)
– wingless (wg),
– hedgehog (hh)
– patched (ptc)
– cubitus interruptus (ci) and
– sloppy paired (slp)
coding for the corresponding proteins
1 para-segment of 4 cells
60 nodes: variables are expression levels of segment polarity genes/proteins
Boolean model from (Albert and Othmer, Journal of Theoretical Biology, 2003)
DasGupta, Vera-Licona, Sontag, 2011
ISBRA 2012
b. In Silico network: gene regulatory network with external perturbations
–
–
–
13 species: 10 genes plus 3 different environmental perturbations
perturbations affect the transcription rate of the gene on which they act directly
(through inhibition or activation) and their effect is propagated throughout the
network by the interactions between the genes
generated using the software package in (Mendes, Trends Biochem. Sci, 1997)
DasGupta, Vera-Licona, Sontag, 2011
ISBRA 2012
generated time courses for both networks (a) and (b)
For method (A) we considered both greedy and linear programming
based approximations to the hitting set problem as well as redundancy
values R=1, 2
For method (B), input data must be discrete
used three discretization methods:
• graph-theoretic based approached “D” from (Dimitrova, Garcia-Puente,
Jarrah, Laubenbacher, Stigler, Stillman, Vera-Licona, 2010)
• quantile “Q” discretization (method in which each variable state
receives an equal number of data values)
• interval “I” discretization (select thresholds for the different discrete
values).
DasGupta, Vera-Licona, Sontag, 2011
ISBRA 2012
Summary of Comparison
network (b):
• method (B) was better than method (A) in ROC space
• method (A) achieved a performance no better than random guessing
network (a):
• method (B) could not obtain any results after running over 12 hours
• method (A) was able to compute results in less than 1 minute
• method (A) improved slightly when small redundancy was introduced
DasGupta, Vera-Licona, Sontag, 2011
ISBRA 2012
implementation of method (B):
http://polymath.vbi.vt.edu/polynome/
implementation of method (A)
done by (DasGupta, Vera-Licona, Sontag, 2011) at
http://sts.bioengr.uic.edu/causal/
DasGupta, Vera-Licona, Sontag, 2011
ISBRA 2012
Direct Synthesization of
Signal Transduction Networks
Only from known interactions and information
No new experiments needed
Overall Goal
direct interaction
A→B
A ┤B
additional
information
network
Method
(algorithms, software)
minimal complexity
biologically relevant
double-causal
interaction
A → (B → C)
A → (B ┤C)
FAST
ISBRA 2012
Nature of experimental evidence
•
biochemical
–
direct interaction, e.g.,
• binding of two proteins
• a transcription factor activating the transcription of a gene
• a chemical reaction with a single reactant and single product
•
pharmacological
–
indirect causal effects most probably resulting from a chain of
interactions and reactions, e.g.,
•
•
binding of a chemical to a receptor protein starts a cascade of proteinprotein interactions and chemical reactions that ultimately results in the
transcription of a gene
genetic evidence of differential responses to a stimulus
–
can be direct, but most often indirect (double-causal)
ISBRA 2012
We describe a method for synthesizing
double-causal (path-level) information into
a consistent network
ISBRA 2012
Direct interactions
A promotes B
A→B
A inhibits B
A┤ B
A
B
A
B
Illustration of double-causal interaction
C promotes the process of A promoting B
pseudo
A
B
C
ISBRA 2012
“Critical” edge
(known direct interaction, part of input)
ISBRA 2012
Main computational step for network synthesis
• Pseudo-vertex collapse (PVC)
– easy
• Binary transitive reduction (BTR)
– hard
– need heuristics
ISBRA 2012
Pseudo-vertex collapse (PVC)
Intuitively, PVC is useful for reducing the pseudo-vertex set to
the minimal set that maintains the graph consistent with all
indirect experimental observations.
pseudo-vertices
u
out(u)=out(v)
in(u)=in(v)
v
new
psuedo-vertex
uv
ISBRA 2012
Illustration of Binary Transitive Reduction (BTR)
remove?
remove?
no,
critical edge
yes,
alternate path
Intuitively, the BTR problem is useful for
determining the sparsest graph consistent
with a set of experimental observations
ISBRA 2012
Some biologists did look at very simplified or somewhat different
version of BTR, e.g.:
•
A. Wagner, Estimating Coarse Gene Network Structure from Large-Scale Gene Perturbation Data,
Genome Research, 12, pp. 309-315, 2002
– too special (reachability only), no efficient algorithms reported
•
T. Chen, V. Filkov and S. Skiena, Identifying Gene Regulatory Networks from Experimental Data,
Third Annual International Conference on Computational Moledular Biology, pp. 94-103, 1999
– “excess edge deletion” problem, biologically too restrictive version
See the following excellent survey for more comprehensive information
about biological network inference and modeling:
•
V. Filkov, Identifying Gene Regulatory Networks from Gene Expression Data, in Handbook of
•
Computational Molecular Biology (edited by S. Aluru), Chapman & Hall/CRC Press, 2005
H. D. Jong, Modelling and Simulation of Genetic Regulatory Systems: A Literature Review, Journal of
Computational Biology, Volume 9, Number 1, pp. 67-103, 2002
ISBRA 2012
High level description of the network synthesis process
Synthesize direct interactions
Optimize
Interaction
with
biologists
BTR
Synthesize double-causal
interactions
Optimize
PVC
BTR
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
All the steps in the network synthesis procedure
except the steps that involve BTR can be done
easily
Thus, it behooves to look at BTR more closely
ISBRA 2012
But, before that, biological validation of the network
synthesis approach is desirable
Need a network that uses double-causal experimental
evidence
ISBRA 2012
Plant signal transduction network
consistent guard cell signal transduction network for ABAinduced stomatal closure
– manually curated
– described in S. Li, S. M. Assmann and R. Albert, Predicting Essential Components
of Signal Transduction Networks: A Dynamic Model of Guard Cell Abscisic Acid
Signaling, PLoS Biology, 4(10), October 2006
– list of experimentally observed causal relationships collected by Li et al. and
published as Table S1. This table contains
• around 140 interactions and causal inferences, both of type “A promotes B” and
“C promotes process (A promotes B)”
– We augment this list with critical edges drawn from biophysical/biochemical
knowledge on enzymatic reactions and ion flows and with simplifying hypotheses
made by Li et al. both described in Text of S1
ISBRA 2012
We also formalized an additional rule specific to the
context of this network (and implicitly assumed by
Li et al.) regarding enzyme-catalyzed reactions
ISBRA 2012
Regulatory interactions between ABA signal transduction pathway
components
ISBRA 2012
Regulatory interactions between ABA signal transduction pathway
components (continued)
ERA1 ┤(ABA → CalM)
NO → GC not critical and not enzymatic
ISBRA 2012
Some nodes in the network
GCR1
OST1
NO
ABH1
RAC1
putative G protein coupled receptor
protein
Nitric Oxide
RNA cap-binding protein
small GTPase protein
…
ISBRA 2012
(left) Guard cell signal transduction network for ABA-induced stomatal closure manually
curated by Li, Assmann and Albert [source: PloS Biology, 10 (4), 2006].
( right) our developed automated network synthesis procedure produced a reduced (fewer
edges) network while preserving all observed pathways
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Summary of comparison of the two networks
• Li et al. has 54 vertices and 92 edges
our network has 57 vertices but 84 edges
• Both networks have identical strongly connected component of
vertices
• All the paths present in the Li et al.’s reconstruction are present
in our network as well
• The two networks have 71 common edges
• It took a few seconds to synthesize our network
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Summary of comparison of the two networks (continued)
Thus the two networks are highly similar but diverge on a few
edges,
All these discrepancies are not due to algorithmic deficiencies but
to human decisions.
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Software is available at:
http://www.cs.uic.edu/~dasgupta/network-synthesis/
• runs on any machine with MS Windows (Win32)
– click, save the executable and run
ISBRA 2012
Data sources for this type of network synthesis
Signal transduction pathway repositories such as
• TRANSPATH (http://www.generegulation.com/pub/databases.html#transpath)
• protein interaction databases such as the Search Tool for the
Retrieval of Interacting Proteins (http://string.embl.de)
contain up to thousands of interactions, a large number of which
are not supported by direct physical evidence.
NET-SYNTHESIS can be used to filter redundant information
while keeping all direct interactions
ISBRA 2012
Transitive reduction step used a heuristic
How good is the heuristic in general?
ISBRA 2012
Performance of our BTR algorithm on
“random” signal transduction networks
But, what is a random biological network?
ISBRA 2012
Biological networks are scale-free: e.g.,
N. Guelzim, S. Bottani, P. Bourgine, and F. Kepes, Topological and causal
structure of the yeast transcriptional regulatory network, Nature Genetics
31, 60–63, 2002
Biological networks are NOT scale-free: e.g., :
R. Khanin and E. Wit, How Scale-Free Are Biological Networks ?, Journal of
Computational Biology, 13 (3), 810 -818, 2006
So, we decided to look at the literature ourselves and decide on a
reasonable model for random signal transduction networks
ISBRA 2012
According to us, random signal transduction networks:
• distribution of in-degree of the network is exponential:
Pr[in-degree=x]=L e-Lx, ½ ≤ L ≤ ⅓
maximum in-degree is 12
• distribution of out-degree is governed by a power-law:
x ≥ 1 : Pr[out-degree=x]=cx-c;
Pr[out-degree=0] ≥ c, 2 < c < 3
maximum out-degree is 200
• ratio of excitory to inhibitory edges between 2 and 4
random graphs with prescribed degree distributions are generated using the
procedure described in:
M. E. J. Newman, S. H. Strogatz and D. J. Watts. Random graphs with arbitrary degree
distributions and their applications, Physical Review E, 64 (2), 026118-026134, 2001
ISBRA 2012
What percentage of edges should be
Critical (known direct interaction)?
No known accurate estimates:
• curated network of Ma'ayan et al. (Science, 2005)
– expected to have close to 100% critical edges as they specifically focused on
collecting direct interactions only
• Protein interaction networks are expected to be mostly critical
– Giot et al., Science, 2003
– Han et al., Nature, 2004
– Li et al., Science, 2004
• Genetic interactions (e.g., synthetic lethal interactions)
– represent compensatory relationships
– only a minority are direct interactions.
• Reverse engineering approaches:
– lead to networks whose interactions are close to 0% critical
ISBRA 2012
We tried a few small and large values, such as
1%, 2% and 50%, for the percentage of edges
that are critical to catch qualitatively all
regions of dynamics of the network that are of
interest
ISBRA 2012
Tested on about 550 random networks
– # of vertices in the range of about 100 to 1000
– running time for individual networks
• seconds to at most a minute
ISBRA 2012
Verify the robustness of performance of our BTR algorithm
– perturb network such they do not change the optimal
solution of the original graph
Almost always the solution quality does not change because of
this
ISBRA 2012
frequency of occurence
Performance of our implemented algorithm for BTR on random
networks
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
% additional edges = ( ( |E'| / OPT ) - 1 ) * 100
On an average, we use about 5.5% more edges than
the optimum
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Other applications NET-SYNTHESIS
Synthesizing a Network for T Cell Survival and Death in LGL Leukemia
Backgound
• Large Granular Lymphocytes (LGL)
– medium to large size cells with eccentric nuclei and abundant cytoplasm
– comprise 10%~15% of the total peripheral blood mononuclear cells
– two major lineages
• CD3- natural-killer (NK) cell lineage: ~85% of LGL cells
• CD3+ lineage: ~15% of LGL
Kachalo, Zhang, Sontag, Albert, DasGupta, 2008
ISBRA 2012
LGL leukemia
disordered clonal expansion of LGL and their
invasions in the marrow, spleen and liver
ISBRA 2012
Background (continued)
Ras:
– small GTPase essential for controlling multiple essential signaling
pathways
– its deregulation is frequently seen in human cancers
Activation of H-Ras require its farnesylation, which can be blocked by
Farnesyltransferase inhibitiors (FTIs)
This envisions FTIs as future drug target for anti-cancer therapies, and
several FTIs have entered early phase clinical trials
This observation, together with the finding that Ras is constitutively
activated in leukemic LGL cells, leads to the hypothesis that Ras
plays an important role in LGL leukemia, and may
functions through influencing Fas/FasL pathway.
ISBRA 2012
we constructed the cell-survival/cell-death regulation-related signaling
network, with special interest on the Ras’ effect on apoptosis response
through Fas/FasL pathway
Goal: initiate understanding of the interactions between Ras pathway and
Fas/FasL pathways, two of the major pathways that regulate cell
survival/death decision.
Currently, there is no standard therapy for LGL leukemia.
Understanding the mechanism of this disease is crucial for
drug/therapy development
Proteins that modulate the Ras-apoptosis response can potentially serve as
future reference for drug design and therapeutic-target-molecule
search, and this may not be restricted to LGL leukemia
Kachalo, Zhang, Sontag, Albert, DasGupta, 2008
ISBRA 2012
Synthesizing a Network for T Cell Survival and Death in Large Granular Lymphocyte
Leukemia
• Synthesized a cell-survival/cell-death regulation-related signaling network
from the TRANSPATH 6.0 database, with additional information manually
curated from literature search
• 359 vertices of this network represent proteins/protein families and mRNAs
participating in pro-survival and Fas-induced apoptosis pathways
• 1295 edges represent regulatory relationships between nodes, including
protein interactions, catalytic reactions, transcriptional regulation (no
double-causal interactions were known)
• Performing BTR with NET-SYNTHESIS reduced the total edge-number to
873
Kachalo, Zhang, Sontag, Albert, DasGupta, 2008
ISBRA 2012
To focus on pathways that involve the 33 known T-LGL
deregulated proteins, we designated vertices that
correspond to proteins with no evidence of being changed
during T-LGL as pseudo-vertices and deleted the label
“Y” for those edges whose both endpoints were pseudovertices
Recursively performing “Reduction (faster)” BTR and
“Collapse degree-2 pseudonodes” of NET-SYNTHESIS
until no edge/node could be further removed simplified the
network to 267 nodes and 751 edges.
Kachalo, Zhang, Sontag, Albert, DasGupta, 2008
ISBRA 2012
For further results, see
R. Zhang, M. V. Shah, J. Yang, S. B. Nyland, X. Liu, J. K. Yun,
R. Albert, and T. P. Loughran,
Network Model of Survival Signaling in LGL Leukemia
PNAS, 2008
ISBRA 2012
Binary transitive reductions revives two further interesting
questions:
– how redundant are biological networks ?
• what is redundancy and how to measure it ?
– percentage of edges removed by binary transitive reduction
(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)
– are redundancy and dynamical properties correlated ?
ISBRA 2012
Feedback loops
and
dynamics of biological networks
analyzing behaviors of feedback loops is a long-standing topic in the
context of regulation, metabolism, and developments
– e.g., see classical reference works such as
J. Monod and F. Jacob, General conclusions: telenomic mechanisms in
cellular metabolism, growth, and differentiation, Cold Spring Harbor
Symp. Quant. Biol., 26, 389401, 1961
ISBRA 2012
Monotone dynamical system
ISBRA 2012
Monotone dynamical system
ISBRA 2012
Monotone systems are “simpler behaved” systems:
• pathological behavior (“chaos”) is ruled out
• even though they may have arbitrarily large dimensionality,
monotone systems behave in many ways like one-dimensional
systems
– e. g. , in monotone systems
• bounded trajectories generically converge to steady states
• there are no stable oscillatory behaviors
ISBRA 2012
Associated Signal Transduction Network
f k
( x)  0
xi
v1
 vi  vj  vk  vn
f k
( x)  0
x j
ISBRA 2012
+
+
+
-
+
-
-
+
+
-
+
sign-inconsistent
sign-consistent
parity: product of signs
sign-consistent: every undirected path between two nodes have
same parity
( check undirected paths 1 — 4 and 1 — 2 — 3 — 4 )
ISBRA 2012
sign-consistent networks are monotone system
This allows us to define the
“degree of monotonicity” M
of a differential equation system
in the following way:
minimum percentage of edges we need to delete
to make the associated signal transduction network
sign-consistent
(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)
ISBRA 2012
ISBRA 2012
Undirected Labeling Problem (ULP)
needed to compute degree of monotonicity M
Given: undirected graph G=(V,E)
edge labeling function h: E  {0,1}
Valid solution: a vertex labeling function f: V  {0,1}
Definition: an edge {u,v}E is consistent if
h(u,v) = f(u) + f(v) (mod 2)
Goal: maximize number of consistent edges
Bad news: NP-hard and even MAX-SNP-hard.
DasGupta, Enciso, Sontag, Zhang, 2007
ISBRA 2012
Algorithm for ULP
• Solve the following vector program via Semidefinite programming methods:
maximize
subject to:
for each vV, xv · xv = 1
for each vV, xv|V|
• Select an uniformly random vector r in the |V|-dimensional unit sphere
• Label each vertex v as
0 if r · xv  0
1 otherwise
It can be easily implemented in MATLAB
DasGupta, Enciso, Sontag, Zhang, 2007
ISBRA 2012
We have two measurable properties:
– (topological) redundancy R
• percentage of edges removed by binary transitive reduction
– (dynamical) monotonicity M
• minimum percentage of edges we need to delete to make the
associated signal transduction network consistent
M is negatively correlated to R
(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)
ISBRA 2012
Some other conclusions from
(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)
• the redundancy measure R is statistically significant
• transcriptional networks are less redundant than signaling networks
• redundancy of C. elegans metabolic network is largely due to currency
metabolites
• calculation of redundancy values and minimal networks provides a way to
gain insight into predicted orientation of a protein-protein-interaction (PPI)
networks
ISBRA 2012
Future Research Questions
in the context of parallel and distributed computing
• Synchronization:
– no “global clocks” are known to exist for cellular processes (ignoring
circadian rhythms and some other global timing mechanisms in higher
organisms)
• Spatial effects:
– localization (nuclear, cytoplasmic, membrane-bound) in cells
• akin to geographical location affecting communication speeds and
coordination in distributed computing
ISBRA 2012
List of some relevant references
R. Albert, B. DasGupta, et al. A New Computationally Efficient Measure of Topological Redundancy of Biological and Social Networks,
Physical Review E, 84 (3), 036117, 2011.
B. DasGupta, P. Vera-Licona, E. Sontag. Reverse Engineering of Molecular Networks from a Common Combinatorial Approach, in
Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, John Wiley & Sons, Inc., 2011.
R. Albert, B. DasGupta, E. Sontag. Inference of signal transduction networks from double causal evidence, in Methods in Molecular Biology:
Topics in Computational Biology, D. Fenyo (editor), Springer , 2010.
P. Berman, B. DasGupta, M. Karpinski. Approximating Transitive Reduction Problems for Directed Networks, 11th Algorithms and Data
Structures Symposium, 2009.
R. Albert, B. DasGupta, R. Dondi, E. Sontag. Inferring (Biological) Signal Transduction Networks via Transitive Reductions of Directed
Graphs, Algorithmica, 51 (2), 129-159, 2008.
S. Kachalo, R. Zhang, E. Sontag, R. Albert, B. DasGupta. NET-SYNTHESIS: A software for synthesis, inference and simplification of signal
transduction networks, Bioinformatics, 24 (2), 293-295, 2008.
P. Berman, B. DasGupta, E. Sontag. Algorithmic Issues in Reverse Engineering of Protein and Gene Networks via the Modular Response
Analysis Method, Annals of the New York Academy of Sciences, 2007.
R. Albert, B. DasGupta, et al. A Novel Method for Signal Transduction Network Inference from Indirect Experimental Evidence, Journal of
Computational Biology, 14 (7), 927-949, 2007.
B. DasGupta, G. A. Enciso, E. Sontag, Y. Zhang. Algorithmic and Complexity Results for Decompositions of Biological Networks into
Monotone Subsystems}, Biosystems, 90 (1), 161-178, 2007.
P. Berman, B. DasGupta, E. Sontag. Computational Complexities of Combinatorial Problems With Applications to Reverse Engineering of
Biological Networks, in Advances in Computational Intelligence: Theory and Applications, F.-Y. Wang and D. Liu (editors), Series in
Intelligent Control and Intelligent Automation, World Scientific publishers, 303-316, 2007.
P. Berman, B. DasGupta, E. Sontag. Randomized Approximation Algorithms for Set Multicover Problems with Applications to Reverse
Engineering of Protein and Gene Networks, Discrete Applied Mathematics, 155 (6-7), 733-749, 2007.
ISBRA 2012
Acknowledgments
Thanks to research collaborators for these projects
R. Albert (Penn State)
G. Enciso (UC Irvine)
R. Hegde (UIC)
P. Pal
P. Vera-Licona (INRIA)
R. Zhang (Penn State)
P. Berman (Penn State)
A. Gitter (CMU)
S. Kachalo (UIC)
G. S. Sivanathan (UIC)
K. Westbrooks (GSU)
Y. Zhang (UIC)
R. Dondi (U. of Bergamo)
G. Gürsoy (UIC)
M. Karpinski (Bonn)
E. Sontag (Rutgers)
A. Zelikovsky (GSU)
Thanks to National Science Foundation (NSF) for funding:
DBI-1062328
IIS-0610244
IIS-1064681
CCR-9800086
IIS-0346973
CNS-0206795
DBI-0543365
CCF-0208749
Thanks to generous support from DIMACS (Rutgers) during my
Sabbatical leave through their special focus on computational and
mathematical epidemiology
ISBRA 2012
Thank you for your attention!
Questions?
ISBRA 2012
98
Download