3-Medicine_biology_and_chemistry

advertisement
Predicting Gene Expression using Logic
Modeling and Optimization
Abhimanyu Krishna
New Challenges in the European Area: Young Scientist’s 1st International Baku Forum
What is Gene Expression? -> Regulation? -> Gene Regulatory Network?
Input Stimuli
Introduction:
R
p
p
A
A
B
p
R
A
C
p
A
TRB
TRC
B
C
Gene Regulatory
Network reconstruction
Objective
How to contextualize literature to our experimental conditions
+
Experimental expression
data
Literature based Gene
Regulatory Network
Missing expression
values in grey
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Introduction:
Biological processes Networks
represented
of interactions
as transitions in a landscape
Stable
state
Unstable
transient state
Stable
state
4
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Why these predictions are not trivial?
Noisy network reconstruction process
5
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Problem:
Inconsistency between network and
experimental expression data
Solution:
Contextualize the Network using
experimental expression data
6
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Why is this an optimization problem?
7
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Why is this an optimization problem?
Local consistency
8
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Why is this an optimization problem?
Edge removal
Local consistency
9
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Why is this an optimization problem?
Global consistency
Local consistency
10
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Which property are we going to use in the optimization?
Network stability
Stable
state
Unstable
transient state
Stable
state
11
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
12
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
14
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
15
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
16
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
17
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
18
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
19
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
But the contribution of interactions to the network
stability it is not linearly independent.
The evaluation of one specific link is highly dependent
of the links already removed or, in other words, the
order of removal.
We are going to capture interdependencies between
variables considering sequentially both the probability
distribution of positive circuits and separated edges.
Positive circuits are necessary condition to have several fixed points
Thomas R, Thieffry D, Kaufman M: DYNAMICAL BEHAVIOR OF BIOLOGICAL
REGULATORY NETWORKS .1. BIOLOGICAL ROLE OF FEEDBACK LOOPS AND
PRACTICAL USE OF THE CONCEPT OF THE LOOP-CHARACTERISTIC STATE. Bulletin of
Mathematical Biology 1995, 57:247-276.
Positive circuit
Positive circuit
Negative circuit
20
Iterative network pruning
Positive Circuit 1
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
21
Iterative network pruning
Positive Circuit 2
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
22
Iterative network pruning
Positive Circuit 3
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
23
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Which property are we going to use in the optimization?
Network stability
24
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Biological scope targeted by this approach:
transitions between long term expression patterns or
stable states
Epithelial
Mesenchymal
Example:
Epithelial-mesenchymal
transition
25
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Computing attractors in a discrete dynamical system
(Boolean)
Based on logic functions and the assumption of only 2 possible gene states: active
(ON or 1) and inactive (OFF or 0).
Types of attractors: fixed points and limit cycles
Logic functions:
Fixed point
The state of the node xi at time t+1 depends
on the state of its regulators at time t.
Updating scheme: Synchronous
Limit cycle
26
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Consistency between expression data and network stable states
27
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Iterative network pruning
Network topology optimized using an Estimation of Distribution
Algorithm (EDA)
Toy example:
Optimization of h(x) (objective
function)
h(x) = X1+X2+X3+X4+X5+x6
Xi = 0 or 1
28
EDA: toy example
Initial population
Next population
Top 10 solutions
29
EDA: toy example
Initial population
Next population
Top 10 solutions
30
EDA: toy example
Initial population
Next population
Top 10 solutions
31
EDA: toy example
Initial population
Next population
Top 10 solutions
32
EDA: toy example
Initial population
Next population
Top 10 solutions
0.7
33
EDA: toy example
Initial population
Next population
Top 10 solutions
0.7 0.7
34
EDA: toy example
Initial population
Next population
Top 10 solutions
0.7 0.7 0.6
35
EDA: toy example
Initial population
Next population
Top 10 solutions
0.7 0.7 0.6 0.6
36
EDA: toy example
Initial population
Next population
Top 10 solutions
0.7 0.7 0.6 0.6 0.8
37
EDA: toy example
Initial population
Next population
Top 10 solutions
0.7 0.7 0.6 0.6 0.8 0.7
38
EDA: toy example
Initial population
Next population
Top 10 solutions
0.7 0.7 0.6 0.6 0.8 0.7
39
EDA: toy example
Initial population
Next population
Top 10 solutions
0.7 0.7 0.6 0.6 0.8 0.7
STOP CRITERIA
40
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
41
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
43
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
44
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
45
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
46
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
47
Iterative network pruning
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
48
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
But the contribution of interactions to the network
stability it is not linearly independent.
The evaluation of one specific link is highly dependent
of the links already removed or, in other words, the
order of removal.
We are going to capture interdependencies between
variables considering sequentially both the probability
distribution of positive circuits and separated edges.
Positive circuits are necessary condition to have several fixed points
Thomas R, Thieffry D, Kaufman M: DYNAMICAL BEHAVIOR OF BIOLOGICAL
REGULATORY NETWORKS .1. BIOLOGICAL ROLE OF FEEDBACK LOOPS AND
PRACTICAL USE OF THE CONCEPT OF THE LOOP-CHARACTERISTIC STATE. Bulletin of
Mathematical Biology 1995, 57:247-276.
Positive circuit
Positive circuit
Negative circuit
49
Iterative network pruning
Positive Circuit 1
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
50
Iterative network pruning
Positive Circuit 2
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
51
Iterative network pruning
Positive Circuit 3
Objective function:
ℎ
+ℎ
𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2
1
𝜑
with ℎ𝜑α = 𝑁 𝑁
𝑖=1 𝜎𝑖 −
This score S uses the normalized Hamming distance
(h) to compare N Boolean gene expression values (σ)
between all calculated steady states (α) of a pruned
network and the two known phenotypes (φ1 and φ2)
defined by the expression data, in order to identify
the two best-matching phenotype/steady state
couples (φα1 and φα2)
52
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Algorithm:
53
“Predicting missing expression values in gene regulatory networks using a
discrete logic modeling optimization guided by network stable states”
Predictions based
on the consensus
between the
familiy of
alternative
solutions
54
Availability:
Paper
http://nar.oxfordjournals.org/content/early/2
012/08/30/nar.gks785.full
Software
http://maia.uni.lu/demo/
Thank you!
Questions?
Isaac Crespo
Computational
Biology Unit
(LCSB)
Abhimanyu
Krishna
Bioinformatic core
(LCSB)
Antony Le Béchec
Life sciences research
unit
(LSRU)
Vital-IT (SIB)
Antonio del
Sol
Head of
Computational
Biology Unit
(LCSB)
57
Download