slides (ppt)

advertisement
Synthesis for
Systems Biology
Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley
Jasmin Fisher Microsoft Research
Nir Piterman University of Leicester
2
Executable biology pushes our boundaries
Maximally non-deterministic systems
cells exhibit races ⇒ model must preserve all observed n/d
Needs new synthesis algorithms
from 2QBF to 3QBF
Incomplete specs
sparse wet lab experiments ⇒ unknown behavior
Needs analysis of ambiguity
are there alternative explanations of observed phenomena?
4
Other lessons and results
Design your own tools
To enable synthesis, design a domain language.
Then build a lightweight synthesizer.
Synthesized a C. elegans VPC model
We failed to write this model manually; others took months.
Beyond synthesis
Showed that available experiments are non-ambiguous.
Synthesized an new internally alternative model.
5
Systems biology
6
Understanding Diseases
“Cancer is fundamentally a disease of failure of
regulation of tissue growth. In order for a normal cell
to transform into a cancer cell, the genes which
regulate cell growth and differentiation must be
altered.” – Wikipedia
To understand cancer, investigate cell differentiation
7
How Are Cells Differentiated?
Two ways of differentiation:
– A single cell divides into cells of different type.
– Multiple identical cells differentiate by communicating.
To understand cell differentiation,
investigate cell communication.
8
Studying Differentiation on Worms
Cell differentiation in worms: similar to human but
much simpler.
identical
precursor cells
differentiated
vulval cells
9
The Research Goal
What is the cell’s “algorithm” for robustly
deciding cell fates through communication?
10
Mutation experiments are visually observable
Biologists mutate cell
genes and observe the
outcome of differentiation.
sqv mutants of Caenorhabditis
elegans are defective in vulval
epithelial invagination
[Herman et al. 1999]
11
The results from wet-lab experiments
12
Mutation experiments give partial knowledge
From gene mutation experiments, biologists infer a
protein interaction.
“In this assay, depletion of lst-2, lst-3, lst-4, or dpy-23,
as well as ark-1, caused ectopic vulval induction,
suggesting that they function as negative
regulators of the EGFR- MAPK pathway.”
[Yoo et al. 2004]
13
Making Sense of Experiments
14
Executable Systems biology
15
Executable Biology
Computational models are needed to tackle the
combinatorial complexity of cell communication.
Verification of models can show their inconsistency
with experimental data.
New interactions can be discovered. [Fisher et al. 2007]
16
Semantics of models
Time and protein concentrations are discrete:
discrete is sufficient to show interesting behavior
Cells are concurrent communicating automata
bounded asynchrony (cells progress at ~same rate)
Note: timing is modeled with state progression
17
Cells as a Reactive Modules (RM) program
atom Vul controls Vul reads go, Vul, IS, Muv_state, v_Vul awaits go, v_Vul, lst_state
init
[] (true) & v_Vul'= ko -> Vul':= off0;
[] (true) & v_Vul'~= ko -> Vul':= Evaluate0;
update
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
[] (~go
&
&
&
&
&
&
&
&
&
&
&
&
&
&
go')
go')
go')
go')
go')
go')
go')
go')
go')
go')
go')
go')
go')
go')
&
&
&
&
&
&
&
&
&
&
&
&
&
&
Vul
Vul
Vul
Vul
Vul
Vul
Vul
Vul
Vul
Vul
Vul
Vul
Vul
Vul
=
=
=
=
=
=
=
=
=
=
=
=
=
=
Evaluate0 & Muv_state = ON & IS ~= high -> Vul' := off1;
Evaluate0 & IS = high -> Vul' := let23;
Evaluate0 & Muv_state = OFF & IS ~= high -> Vul' := Evaluate1;
off1 & IS = med -> Vul' := Before_Partial_On;
off1 & IS = high -> Vul' := let23;
off1 & IS ~= high & IS ~= med -> Vul' := off2;
Evaluate1 -> Vul' := let23;
Before_Partial_On -> Vul' := let23;
let23 & lst_state' = OFF -> Vul' := sem5;
sem5 & lst_state' = OFF -> Vul' := let60;
let60 & lst_state' = OFF -> Vul' := mpk1;
let23 & lst_state' = ON -> Vul' := Vul_counteracted;
sem5 & lst_state' = ON -> Vul' := Vul_counteracted;
let60 & lst_state' = ON -> Vul' := Vul_counteracted
18
RM models: laborious to develop and update
Months of tweaking to get the timing right
hard to understand
hard to debug
RM is too expressive (eg, has clairvoyance)
it’s tempting to encode constructs that have no clear
biological explanations (strange abstractions)
Summary: modeling in executable biology is laborious
if only we could automate model development
19
Synthesis and Analysis of Biology Models
20
Our contribution
Automatically infer cell models (synthesis)
– obtain executable models faster
Enumerate alternative models (“distinct” synthesis)
– find alternative explanations of observed phenomena
Ask for more specifications (disambiguation)
– suggest experiments to disambiguate between models
21
Lessons: Build your tools!
Executable biology selects methods based on
availability of tools, eg model checkers.
We did the same for synthesis of models. It failed.
We argue here to build our own lightweight tools,
including the modeling language and its synthesizer.
We show how to DIY.
22
The language
23
Motivation for a high-level language (HLL)
HLL ⇒ smaller programs
⇒ smaller search space
⇒ faster synthesis
HLL ⇒ programs are biological diagrams
⇒ easier to read by biologists
24
Four levels of the language
schedule
𝐿 × πΏπ‘˜ → 𝐿
concentration
update
function
Top-level semantics
The program 𝑃 ∷ 𝑀 → 𝑆 → 𝐹
Inputs:
mutation (𝑀)
changes behavior of proteins
schedule (𝑆)
bounded length, controls cell interleaving
Output:
fates of cells (𝐹) resulting fates of cells
26
Correctness
Top level program 𝑃 ∷ 𝑀 → 𝑆 → 𝐹
Specification (experiments): 𝐸 ⊆ 𝑀 × πΉ
Correctness: Correct 𝑃, 𝐸
i.
demonic scheduler cannot produce unobserved fate
∀π‘š ∈ πœ‹π‘š 𝐸 . ¬∃𝑠 π‘š, 𝑃 π‘š, 𝑠 ∉ 𝐸
ii. angelic scheduler can produce each observed fate
∀ π‘š, 𝑓 ∈ 𝐸 . ∃𝑠 . 𝑃 π‘š, 𝑠 = 𝑓
27
Level 2: Program is composed from cells
Cells advance according to the schedule
Cells communicate by reading each others’ state
state: set of concentrations of proteins of cell proteins
Schedule: { (0,1,1,0,0,1), (1,0,0,1,1,0), … , (1,1,0,0,0,0)}
The first step executes cells 2, 3, and 6.
Bounded asynchrony: [Fischer et al.]
schedule can be partitioned into macrosteps,
in each macrostep, each cell makes one step
Our schedules contain exactly π‘˜ macrosteps
28
Level 3: In cells are proteins
Each cell is composed from proteins.
– protein state: discretized protein concentration
– proteins read states of other proteins (pot. in other cells)
– they update their own concentration next step
Synchronous execution:
– when a cell is scheduled, all of its proteins take one step
– ie, they update their concentration level
[similar to Synchronous/Reactive (SR) model, Edwards and Lee, 2002]
29
Level 4: In proteins are update functions
Protein state 𝑙 ∈ 𝐿, discretized concentrations
Protein update function 𝐿 × πΏπ‘˜ → 𝐿
reads concentrations of attached proteins and updates own
Note: these update functions are what we synthesize
i.e., in our partial models we leave (some) some update
functions unspecified
30
The output fate
The fate of the program is computed with a fate
function from the state of each cell
𝑓 = (fate 𝜎1 , … , fate πœŽπ‘ ),
where πœŽπ‘– is the state of cell 𝑖.
31
Example
Assume a network of police cameras. When a gunshot
happens, we want at least one nearby camera to take
a picture. Synthesize a protocol for deciding which
camera takes a picture. OK if multiple cameras do.
Two types of communications:
- sound from gunshot (“base station”) to cameras
- radio transmission between camera nodes announcing
“I took a picture, you don’t have to, save your battery”
Nodes should decide who is closest on the basis of
sound signal strength. No triangulation.
32
Example
33
Incomplete specification
signal from BS
take picture?
signal from BS
take picture?
cameras
managed to
communicate?
H
Y
H
N
Y
N
Y
Y
Y
H
Y
L
N
Y
H
Y
H
Y
N
34
Synthesized update functions for base receiver, delay node
35
Synthesis
36
Synthesis
Input to synthesizer:
𝐸 ⊆𝑀×𝐹
𝑃? ∷ 𝐻 → 𝑀 → 𝑆 → 𝐹
see next slide
specification
partial program (sketch)
“biological” invariants
Output:
completes 𝑃? into a correct π‘ƒβ„Ž
completion β„Ž
The synthesis problem:
∃β„Ž . Correct π‘ƒβ„Ž , 𝐸
∃β„Ž ( ∀π‘š ∈ πœ‹π‘š 𝐸 . ¬∃𝑠 π‘š, π‘ƒβ„Ž π‘š, 𝑠
∉ 𝐸 ∧ ∀ π‘š, 𝑓 ∈ 𝐸 . ∃𝑠 . π‘ƒβ„Ž π‘š, 𝑠 = 𝑓 )
a 3QBF problem (unlike ordinary 2QBF synthesis):
37
Enforcing Biological Invariants
Synthesized models must satisfy biological invariants.
Biologist’s invariants specify whether one protein
activates or inhibits another.
Asserted as monotonicity constraints on state transitions
39
The synthesizer
40
Architecture of synthesizer (3.5 KLOC)
DSL embedded in Scala
just defining classes for Cells, Proteins gives nice syntax
evaluate the Scala program
result is an abstract syntax graph (ASG)
interpreter for ASG in Scala
given ASG and (m, s), run the program to get the fate
compiler from ASG to a Z3 formula πœ™
use πœ™ by algorithms for verification, synthesis, ambiguity
41
Example of the embedded DSL
class BaseReceiver extends Node("BaseReceiver") {
val base
= input(“off”, "low", "high")
val lateralReceiver = input(“off”, "on")
val out
= output(“off”, "on")
// update functions implemented as a (more general) FSM
val stateful = logic(new StatefulLogic {
val off
= state("off")
// two observable states
val on
= state("on")
output(out)
// link these states to output port
init(off)
// “off” is the start state
nbStates(5)
// this state machine will have five hidden states
activating(base)
inhibiting(lateralReceiver)
})
register(stateful)
}
// biological invariants on inputs
// necessitated by the DSL
42
How to deal with 3QBF synthesis problem
Domain sizes:
holes 𝐻
schedules 𝑆
mutations 𝑀
large 𝑂(1053 )
large 𝑂(1026 )
small 𝑂(102 )
treated symbolically
treated symbolically
by demand enumeration
43
Algorithms
45
Synthesis Approach: CEGIS
assume we care only about the classical demonic correctness
initial input set
(schedule, experiment)
SAT
candidate model
synthesize
UNSAT
verify
SAT
add counterexample
(schedule, experiment)
UNSAT
46
Synthesis algorithm
verifier of demonic schedules
∃π‘š, 𝑠 . 𝑃 π‘š, 𝑠 ∉ 𝐸
verifier of angelic schedules
∃ π‘š, 𝑓 ∈ 𝐸 . ¬∃𝑠 . 𝑃 π‘š, 𝑠 = 𝑓
counterexample
counterexample
π‘šπ‘–, 𝑠𝑖
π‘šπ‘–, 𝑓𝑖
∃β„Ž
π‘š1, 𝑃 π‘š1, 𝑠1 ∈ 𝐸
∧ β‹―∧
π‘šπ‘™, 𝑃 π‘šπ‘™, 𝑠𝑙 ∈ 𝐸
∃β„Ž ( ∀π‘š ∈ πœ‹π‘š 𝐸 . ¬∃𝑠 π‘š, 𝑃 π‘š, 𝑠
∧
∃𝑠 . 𝑃 π‘š1, 𝑠 = 𝑓1
∧ β‹―∧
∃𝑠 . 𝑃 π‘šπ‘˜ , 𝑠 = π‘“π‘˜
∉ 𝐸 ∧ ∀ π‘š, 𝑓 ∈ 𝐸 . ∃𝑠 . 𝑃 π‘š, 𝑠 = 𝑓 )
47
Three communicating solvers
3QBF
2QBF
3QBF
2QBF
SAT
// blasts (m,f), turns to SAT
SAT
48
Supporting tools
49
Supporting tools
Work would not be productive without these tools
– execution visualizer
– causal tracer
– automaton minimizer
We still need ideas on how to construct those quickly
50
Visualizing the Synthesized Model
activated
connections
are colored
step through
execution
51
Results
52
Results (1): Automatic model inference
Synthesized a model of VPC in C. elegans
- the model expressed in our bio-inspired language
- we believe it’s more readable than in RM
Prior to synthesis
– we failed to manually fix a bug in an equivalent model
– collaborators took several months to make this model
53
Results (2): Are experiments complete?
We concluded that the set of experiments is complete
– this means there exists no alternative model that
behaves differently on experiments not yet performed
– this is under the assumption described in the sketch
provided by biologists, which encodes their knowledge
about C. elegans
Working on identifying minimal set of experiments
– if we want to validate these experiment, do we need to
repeat all of them?
54
Results (3)
No behaviorally distinct models.
But we synthesized a model that differs internally.
cell behavior due to a different protein interaction
These models can’t be distinguished via mutation and
fate observation (models have same fates, after all).
Hence one must “instrument”
the cell by tagging proteins
with fluorescent genes.
Here, our synthesis identifies
which genes to instrument
(the fewer the better).
55
Summary: Executable biology’s challenges
Infer models that can replay all observed behavior
… or else they don’t faithfully model cell phenomena.
This semantics leads to a 3QBF synthesis problem.
Analyze the space of plausible models
Are specs ambiguous, minimal?
Which experiments to perform to rule out a model?
56
Download