Predicting Gene Expression using Logic Modeling and Optimization Abhimanyu Krishna New Challenges in the European Area: Young Scientist’s 1st International Baku Forum What is Gene Expression? -> Regulation? -> Gene Regulatory Network? Input Stimuli Introduction: R p p A A B p R A C p A TRB TRC B C Gene Regulatory Network reconstruction Objective How to contextualize literature to our experimental conditions + Experimental expression data Literature based Gene Regulatory Network Missing expression values in grey “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Introduction: Biological processes Networks represented of interactions as transitions in a landscape Stable state Unstable transient state Stable state 4 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Why these predictions are not trivial? Noisy network reconstruction process 5 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Problem: Inconsistency between network and experimental expression data Solution: Contextualize the Network using experimental expression data 6 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Why is this an optimization problem? 7 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Why is this an optimization problem? Local consistency 8 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Why is this an optimization problem? Edge removal Local consistency 9 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Why is this an optimization problem? Global consistency Local consistency 10 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Which property are we going to use in the optimization? Network stability Stable state Unstable transient state Stable state 11 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 12 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 14 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 15 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 16 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 17 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 18 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 19 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” But the contribution of interactions to the network stability it is not linearly independent. The evaluation of one specific link is highly dependent of the links already removed or, in other words, the order of removal. We are going to capture interdependencies between variables considering sequentially both the probability distribution of positive circuits and separated edges. Positive circuits are necessary condition to have several fixed points Thomas R, Thieffry D, Kaufman M: DYNAMICAL BEHAVIOR OF BIOLOGICAL REGULATORY NETWORKS .1. BIOLOGICAL ROLE OF FEEDBACK LOOPS AND PRACTICAL USE OF THE CONCEPT OF THE LOOP-CHARACTERISTIC STATE. Bulletin of Mathematical Biology 1995, 57:247-276. Positive circuit Positive circuit Negative circuit 20 Iterative network pruning Positive Circuit 1 Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 21 Iterative network pruning Positive Circuit 2 Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 22 Iterative network pruning Positive Circuit 3 Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 23 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Which property are we going to use in the optimization? Network stability 24 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Biological scope targeted by this approach: transitions between long term expression patterns or stable states Epithelial Mesenchymal Example: Epithelial-mesenchymal transition 25 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Computing attractors in a discrete dynamical system (Boolean) Based on logic functions and the assumption of only 2 possible gene states: active (ON or 1) and inactive (OFF or 0). Types of attractors: fixed points and limit cycles Logic functions: Fixed point The state of the node xi at time t+1 depends on the state of its regulators at time t. Updating scheme: Synchronous Limit cycle 26 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Consistency between expression data and network stable states 27 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Iterative network pruning Network topology optimized using an Estimation of Distribution Algorithm (EDA) Toy example: Optimization of h(x) (objective function) h(x) = X1+X2+X3+X4+X5+x6 Xi = 0 or 1 28 EDA: toy example Initial population Next population Top 10 solutions 29 EDA: toy example Initial population Next population Top 10 solutions 30 EDA: toy example Initial population Next population Top 10 solutions 31 EDA: toy example Initial population Next population Top 10 solutions 32 EDA: toy example Initial population Next population Top 10 solutions 0.7 33 EDA: toy example Initial population Next population Top 10 solutions 0.7 0.7 34 EDA: toy example Initial population Next population Top 10 solutions 0.7 0.7 0.6 35 EDA: toy example Initial population Next population Top 10 solutions 0.7 0.7 0.6 0.6 36 EDA: toy example Initial population Next population Top 10 solutions 0.7 0.7 0.6 0.6 0.8 37 EDA: toy example Initial population Next population Top 10 solutions 0.7 0.7 0.6 0.6 0.8 0.7 38 EDA: toy example Initial population Next population Top 10 solutions 0.7 0.7 0.6 0.6 0.8 0.7 39 EDA: toy example Initial population Next population Top 10 solutions 0.7 0.7 0.6 0.6 0.8 0.7 STOP CRITERIA 40 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 41 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 43 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 44 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 45 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 46 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 47 Iterative network pruning Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 48 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” But the contribution of interactions to the network stability it is not linearly independent. The evaluation of one specific link is highly dependent of the links already removed or, in other words, the order of removal. We are going to capture interdependencies between variables considering sequentially both the probability distribution of positive circuits and separated edges. Positive circuits are necessary condition to have several fixed points Thomas R, Thieffry D, Kaufman M: DYNAMICAL BEHAVIOR OF BIOLOGICAL REGULATORY NETWORKS .1. BIOLOGICAL ROLE OF FEEDBACK LOOPS AND PRACTICAL USE OF THE CONCEPT OF THE LOOP-CHARACTERISTIC STATE. Bulletin of Mathematical Biology 1995, 57:247-276. Positive circuit Positive circuit Negative circuit 49 Iterative network pruning Positive Circuit 1 Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 50 Iterative network pruning Positive Circuit 2 Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 51 Iterative network pruning Positive Circuit 3 Objective function: ℎ +ℎ 𝑆𝑛 = 1− 𝜑𝛼1 2 𝜑𝛼2 1 𝜑 with ℎ𝜑α = 𝑁 𝑁 𝑖=1 𝜎𝑖 − This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) 52 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Algorithm: 53 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Predictions based on the consensus between the familiy of alternative solutions 54 Availability: Paper http://nar.oxfordjournals.org/content/early/2 012/08/30/nar.gks785.full Software http://maia.uni.lu/demo/ Thank you! Questions? Isaac Crespo Computational Biology Unit (LCSB) Abhimanyu Krishna Bioinformatic core (LCSB) Antony Le Béchec Life sciences research unit (LSRU) Vital-IT (SIB) Antonio del Sol Head of Computational Biology Unit (LCSB) 57