Gene Regulatory Networks

advertisement
Gene Regulatory Networks - the
Boolean Approach
Andrey Zhdanov
Based on the papers by Tatsuya Akutsu et al
and others
Gene Regulatory Networks - the
Boolean Approach
Gene Expressions
Revisited
Gene Expressions Revisited
One of the major subjects of study in cell
biology is the behaviour of proteins – the
“workhorses” of a cell.
Myoglobin molecule
Gene Expressions Revisited
We are interested in analysing protein
expression levels – amounts of different
proteins synthesized by the cell.
Gene Expressions Revisited
The “blueprints” for all possible proteins that
can be synthesized by a cell – genes – are
stored in the cell's nucleus.
Only small fraction of all possible proteins is
synthesized in each cell.
Gene Expressions Revisited
Proteins are synthesized from genes by the
process of transcription and translation.
Gene Expressions Revisited
We estimate protein expression levels
indirectly – by measuring gene expression
levels (amounts of mRNA produced for a
certain gene) with DNA chips.
Gene Expressions Revisited
This approach makes a number of
assumptions:
• Genes exist and are easily identifiable
• Each protein is encoded by a single gene
• Protein expression (amount of protein
produced) is determined by the
corresponding gene expression (amount of
mRNA produced)
These assumptions do not always hold (but
we use them anyway :-)
Gene Regulatory Networks - the
Boolean Approach
Gene Regulatory
Networks
Gene Regulatory Networks
We want to use protein (or gene) expression
measurements to understand the mechanisms
regulating proteins' production.
Note that there is certain circularity to our logic
since we made certain assumptions about
these very same mechanisms in order to
measure protein expressions.
Gene Regulatory Networks
In the talks by Shahar and Leon we have
seen the “regulatory network” approach to
modelling the protein expression mechanisms.
In his talk Oded has introduced tools for time
series analysis that can be applied to our
problem.
Gene Regulatory Networks
We are looking for a formal model of the
protein expression control mechanism that
can serve as a framework for a rigorous
treatment of the problem.
t
To that end we assume that production rate of
a certain protein at any given time is regulated
only by the amount of other proteins within the
cell at that time.
Gene Regulatory Networks
Example:
Protein B
inhibits
Protein D
Protein A
Expression
level
Protein C
Protein A
Protein B
Protein C
Protein D
time
Gene Regulatory Networks
Treating the gene expressions as real-valued
functions of continuous time variable leads to
the system of differential equations as the
model for the gene regulatory network.
i
n
 dX i

 fi ( X1 ,... X n ) 

 dt
i 1
Gene Regulatory Networks - the
Boolean Approach
Boolean Regulatory
Networks
Boolean Regulatory Networks
To facilitate the treatment of the problem we
further simplify our model to the Boolean
Regulatory Network. We assume:
1.
2.
Discrete time and synchronous update
model
Genes’ expression level is binary
Boolean Regulatory Networks
More formally, a boolean network G (V , F )
consists of a set of nodes representing genes
V  v1 ,..., vn 
and a list of boolean functions
F  ( f1 ,..., f n )
where fi (vi ,..., vi ) is computes boolean function
of nodes vi ,..., vi and assigns the output to vi
1
k
1
k
Boolean Regulatory Networks
The state of the network at time t is defined by
assignment of 0s and 1s to the node variables.
The state of each node vi at time t+1 is
calculated from the states of the nodes vi ,..., vi
at time t according to fi (vi ,..., vi )
1
1
k
k
Boolean Regulatory Networks
Boolean regulatory network can be visualized
by the means of wiring diagram:
Boolean Regulatory Networks
Since the network’s state at t+1 is completely
determined by its state at t, we can treat the
gene expressions time series as an unordered
set of input / output pairs.
We say that the network is consistent with a
set of input/output pairs if for each pair setting
the network to the input state at time t causes
it to reach the output state at t+1.
Boolean Regulatory Networks
We can now start formulating some of the
fundamental problems for our model.
CONSISTENCY: Given the number of nodes
and set of input/output pairs, decide whether
there is a boolean network consistent with the
pairs.
Boolean Regulatory Networks
COUNTING: Given the number of nodes
and set of input/output pairs, count the number
of boolean networks consistent with the
pairs.
Boolean Regulatory Networks
ENUMERATION: Given the number of nodes
and set of input/output pairs, output all the
boolean networks consistent with the pairs.
Boolean Regulatory Networks
IDENTIFICATION: Given the number of nodes
and set of input/output pairs, decide whether
there is a unique boolean network consistent
with the pairs and output one if exists.
Boolean Regulatory Networks
The four problems presented above are
closely related. We address them in the
straightforward manner by constructing all
possible boolean networks and checking them
on all the input/output pairs.
To make this task computationally feasible we
need yet another assumption – we assume
that the network’s indegree is bounded by
some constant K.
Boolean Regulatory Networks
Some of the results:
The complexity of the brute-force algorithm for
the CONSISTENCY problem is
O(2  n
2K
K 1
 m)
Where n is the number of nodes (genes) and m
is the number of input/output pairs.
The results for the other problems are similar.
Boolean Regulatory Networks
Another theoretical result concerns the
number of input/output pairs required to
uniquely identify a boolean network.
Again, to facilitate calculations, we make an
unrealistic assumption: we assume that the
input/output pairs are randomly drawn from a
uniform distribution.
Boolean Regulatory Networks
Theorem: If O(22 K  (2 K   )  log n) input/output
expressions are drawn from a uniform
distribution, the probability that there are more
than one boolean network consistent with
1
them is at most n

Boolean Regulatory Networks
Conclusions:
Boolean gene expression networks represent
a relatively simple model of the gene
expression control mechanisms of the cell.
However, despite many (often unrealistic)
simplifying assumptions, this model has not
yielded any interesting theoretical results yet,
which indicates the intristic difficulty of
modeling gene expression mechanisms.
Download