Juraj`s notes

advertisement
Application of AI to molecular biology
Notes for 03/09/2006
Representing gene networks:
- we need to represent the relation between several thousands of genes
- Several different types of relations( protein 1 causes protein 2 to become active,
protein 1 and protein 2 cause the activation of protein 3, etc..)
Do versus observe:
Assume the following example:
Then:
P(rain | wet) > 0 (observation), while P(rain | doing(wet)) = 0 (action).
Same with genes:
Thus if I observe g3 to be high(active, …) it might have been the result of g1, g2.
However, if I MAKE g3 high(artificially,…), it might have no impact/effect/relation
on/with g1, g2.
Thus doing something is a different thing then observing it.
Another example:
In a group of people, 50% were given treatment and 50% were not. In both groups, 50%
of the people recovered, while the other 50% did not (e.g. died).
Let:
X = 1  treatment was given
X = 0  treatment was not given
Y = 1  person died
Y = 0  person recovered
Clearly, P(X = 1, Y = 1) = P(X = 0, Y = 1) = P(X = 1, Y = 0) = P(X = 0, Y = 0) = 0.25.
We have 2 queries:
Q1: What is the probability that Joe’s death occurred due to the treatment?
Q2: What is the probability that Joe, who died after treatment , would have lived had he
not been treated?
To answer these, we will use probabilistic casual models
Probabilistic casual models:
- direct acyclic graph (DAG) of variables
- variables:
-exogenous: They have no parents and the probabilities for those variables are
given
-other: They have parents, and their probabilities are not given, they are given as a
function of their parents (e.g. wet = f(rain, sprinkler) in the first example)
- variables are thus deterministic, the only non-determinism comes from exogenous
variables.
Back to example:
V1 – variable that decides treatment
V2 – variable that decides if one will die or not
Model 1:
V1
0
0
1
1
V2
0
1
0
1
X
0
0
1
1
Q1: We have V1=V2=1. Thus P(Q1) = 0.25.
Q2; Again, V1=V2=1.
P(Y = 0| do(X = 0)) = 0.
P(Y = 0| X = 0) = 0.
Y
0
1
0
1
Probability
0.25
0.25
0.25
0.25
Model 2:
V1 - variable that decides treatment
V2- a genetic factor which, if present, kills people who take this treatment or, if absent,
kills people who doesn’t take the treatment..
X = V1.
Y = X*V2+(1-X)*(1-V2)
We obtain the same model in terms of probability distribution as model 1:
V1
0
0
1
1
V2
0
1
0
1
X
0
0
1
1
Y
0
1
0
1
Probability
0.25
0.25
0.25
0.25
However, answer to the queries may change:
Again, V1 = V2 = 1.
P(Y = 0 | do(X = 1)) = 1.
Thus if I do something, the model may not apply, e.g. I can make X = 0 even if V1 =1.
(It’s outside the model)
Another example:
U – court ordered execution
C – captain gives signal
V – rifleman A is nervous
A – rifleman A shoots
B – rifleman B shoots
D – prisoner dies
Functions:
A = C or V
B=C
D = A or B
C=V
Query Q:
The prisoner died. What is the probability that the prisoners would not have died if A
decided not to shoot. (How responsible is A in the death of the prisoner)
U
V
A
B
C
D
Prob.
0
0
0
0
0
0
(1-p)(1-q)
0
1
1
0
0
1
(1-p)q
1
0
1
1
1
1
p(1-q)
1
1
1
1
1
1
pq
Only the latter 3 rows are relevant (prisoner died). Thus we calculate their new
probabilities:
Row 2: (1-p)q / 1 – (1-p)(1-q)
Row 3: p(1-q) / 1 – (1-p)(1-q)
Row 4: pq / 1 – (1-p)(1-q)
Now we set the value of A to zero (rifleman A decides not to shoot). Then we obtain the
following new values for A, B, C and D.
A
0
0
0
B
0
1
1
C
0
1
1
D
0
1
1
The only applicable row of this table is the first one, thus the probability of our query is
the probability of the first row, which is (1-p)q / 1 – (1-p)(1-q).
Thus P(Q) = (1-p)q / 1 – (1-p)(1-q) .
General algorithm:
1. Make the joint probability distribution my enumerating exogenous variables
2. Take the observation into account
3. Re-compute the joint probability distribution
4. Based on the query, change the model (take into account ‘do’)
5. Re-compute the joint probability distribution with respect to the new model
6. Answer the query, e.g. find P(X | do (Y)).
Download