Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others Gene Regulatory Networks - the Boolean Approach Gene Expressions Revisited Gene Expressions Revisited One of the major subjects of study in cell biology is the behaviour of proteins – the “workhorses” of a cell. Myoglobin molecule Gene Expressions Revisited We are interested in analysing protein expression levels – amounts of different proteins synthesized by the cell. Gene Expressions Revisited The “blueprints” for all possible proteins that can be synthesized by a cell – genes – are stored in the cell's nucleus. Only small fraction of all possible proteins is synthesized in each cell. Gene Expressions Revisited Proteins are synthesized from genes by the process of transcription and translation. Gene Expressions Revisited We estimate protein expression levels indirectly – by measuring gene expression levels (amounts of mRNA produced for a certain gene) with DNA chips. Gene Expressions Revisited This approach makes a number of assumptions: • Genes exist and are easily identifiable • Each protein is encoded by a single gene • Protein expression (amount of protein produced) is determined by the corresponding gene expression (amount of mRNA produced) These assumptions do not always hold (but we use them anyway :-) Gene Regulatory Networks - the Boolean Approach Gene Regulatory Networks Gene Regulatory Networks We want to use protein (or gene) expression measurements to understand the mechanisms regulating proteins' production. Note that there is certain circularity to our logic since we made certain assumptions about these very same mechanisms in order to measure protein expressions. Gene Regulatory Networks In the talks by Shahar and Leon we have seen the “regulatory network” approach to modelling the protein expression mechanisms. In his talk Oded has introduced tools for time series analysis that can be applied to our problem. Gene Regulatory Networks We are looking for a formal model of the protein expression control mechanism that can serve as a framework for a rigorous treatment of the problem. t To that end we assume that production rate of a certain protein at any given time is regulated only by the amount of other proteins within the cell at that time. Gene Regulatory Networks Example: Protein B inhibits Protein D Protein A Expression level Protein C Protein A Protein B Protein C Protein D time Gene Regulatory Networks Treating the gene expressions as real-valued functions of continuous time variable leads to the system of differential equations as the model for the gene regulatory network. i n dX i fi ( X1 ,... X n ) dt i 1 Gene Regulatory Networks - the Boolean Approach Boolean Regulatory Networks Boolean Regulatory Networks To facilitate the treatment of the problem we further simplify our model to the Boolean Regulatory Network. We assume: 1. 2. Discrete time and synchronous update model Genes’ expression level is binary Boolean Regulatory Networks More formally, a boolean network G (V , F ) consists of a set of nodes representing genes V v1 ,..., vn and a list of boolean functions F ( f1 ,..., f n ) where fi (vi ,..., vi ) is computes boolean function of nodes vi ,..., vi and assigns the output to vi 1 k 1 k Boolean Regulatory Networks The state of the network at time t is defined by assignment of 0s and 1s to the node variables. The state of each node vi at time t+1 is calculated from the states of the nodes vi ,..., vi at time t according to fi (vi ,..., vi ) 1 1 k k Boolean Regulatory Networks Boolean regulatory network can be visualized by the means of wiring diagram: Boolean Regulatory Networks Since the network’s state at t+1 is completely determined by its state at t, we can treat the gene expressions time series as an unordered set of input / output pairs. We say that the network is consistent with a set of input/output pairs if for each pair setting the network to the input state at time t causes it to reach the output state at t+1. Boolean Regulatory Networks We can now start formulating some of the fundamental problems for our model. CONSISTENCY: Given the number of nodes and set of input/output pairs, decide whether there is a boolean network consistent with the pairs. Boolean Regulatory Networks COUNTING: Given the number of nodes and set of input/output pairs, count the number of boolean networks consistent with the pairs. Boolean Regulatory Networks ENUMERATION: Given the number of nodes and set of input/output pairs, output all the boolean networks consistent with the pairs. Boolean Regulatory Networks IDENTIFICATION: Given the number of nodes and set of input/output pairs, decide whether there is a unique boolean network consistent with the pairs and output one if exists. Boolean Regulatory Networks The four problems presented above are closely related. We address them in the straightforward manner by constructing all possible boolean networks and checking them on all the input/output pairs. To make this task computationally feasible we need yet another assumption – we assume that the network’s indegree is bounded by some constant K. Boolean Regulatory Networks Some of the results: The complexity of the brute-force algorithm for the CONSISTENCY problem is O(2 n 2K K 1 m) Where n is the number of nodes (genes) and m is the number of input/output pairs. The results for the other problems are similar. Boolean Regulatory Networks Another theoretical result concerns the number of input/output pairs required to uniquely identify a boolean network. Again, to facilitate calculations, we make an unrealistic assumption: we assume that the input/output pairs are randomly drawn from a uniform distribution. Boolean Regulatory Networks Theorem: If O(22 K (2 K ) log n) input/output expressions are drawn from a uniform distribution, the probability that there are more than one boolean network consistent with 1 them is at most n Boolean Regulatory Networks Conclusions: Boolean gene expression networks represent a relatively simple model of the gene expression control mechanisms of the cell. However, despite many (often unrealistic) simplifying assumptions, this model has not yielded any interesting theoretical results yet, which indicates the intristic difficulty of modeling gene expression mechanisms.