Formal Methods in Systems Biology

advertisement
Special Topics in Computational
Biology:
Formal Methods in Systems
Biology
Spring, 2008
Chris Langmead
Department of Computer Science
Carnegie Mellon University
James Faeder
Department of Computational Biology
University of Pittsburgh School of Medicine
General Info
• Course Numbers:
– CMU 15-872(A)
– CMU 02-730
– Pitt CMPBIO 2045(Arts & Sciences)
– Pitt MSCBIO 2045 (School of Medicine)
• Location: Newell-Simon Hall (NSH) 3002 - OK?
• Time: Tu, Th 1:30-2:50 PM
• Instructors
– Chris Langmead (cjl@cs.cmu.edu)
– Jim Faeder (faeder@pitt.edu)
• Office Hours: By appointment (please email)
• Course Wiki:
http://bionetgen.org/index.php/Formal_Methods_in_S
ystems_Biology (email Jim for account)
Course Format: An Informal
Course about Formal Methods
• Introductory lectures (two weeks)
• Students will read and present research
papers
– Sign up for open dates on the wiki (25 - projects)
• Students will design and complete a course
project on a subject of special interest
• Grading is based on completion of work
• Flexibility depending on course enrollment
– Journal club
– Focused project
– Review article
Encouragement
• Opportunity to learn about new areas and
methods that will be of direct interest in your
research.
• (True for the “instructors” as well)
• We will operate as a multi-disciplinary team
– Computer Scientists, Physicists, Chemists,
Engineers, Mathematicians, …, Biologists
– Good communication essential
Products of the Course
• Comprehensive bibliography in wiki format
• Research projects leading to publishable
results in the field
• Review article (?)
• Improved organization and presentation skills
• Participation on a multi-disciplinary team
Introductions
• Your name
• Your university, department, research
area(s) and research advisor
• Your educational background
– Computer Science, Math, Physics, etc.
• Goals taking the course
Outline of Today’s Lecture
• Definition of terms
• Goals
• Examples of Successful Abstractions
– Flux Balance Analysis
– Mass Action Kinetics
• Brief survey of topics
Importance of Symbols
• Invention of symbol for zero and decimal
system for writing numbers “among the
greatest human inventions.”
• 3 known independent inventions
• In each case, development took
centuries
• Major impact on trade, culture, and
philosophy.
• Celebration of zero dot in Sanskrit
poetry
“The dot on her forehead / Increases her
beauty tenfold,/ Just as a zero dot [sunyabindu] /Increases a number tenfold. Biharilal
QuickTime™ and a
PNG decompressor
are needed to see this picture.
Key Definitions - Formal Methods
• In computer science and software engineering, formal methods
are mathematically-based techniques for the specification,
development and verification of software and hardware systems.
• The use of formal methods for software and hardware design is
motivated by the expectation that, as in other engineering
disciplines, performing appropriate mathematical analyses can
contribute to the reliability and robustness of a design.
• However, the high cost of using formal methods means that they
are usually only used in the development of high-integrity
systems, where safety or security is important.
- WIKIPEDIA
Expanded View of Formal Methods
• Formal abstractions that may be used to
model system of interest
• In addition to sytems that can be
formally analyzed, we will consider
representations that can only be fully
explored by simulations.
Key Definitions - Systems Biology
• Systems biology is a relatively new biological study field that
focuses on the systematic study of complex interactions in
biological systems, thus using a new perspective (integration
instead of reduction) to study them.
• Particularly from 2000 onwards, the term is used widely in the
biosciences, and in a variety of contexts.
• Because the scientific method has been used primarily toward
reductionism, one of the goals of systems biology is to discover
new emergent properties that may arise from the systemic view
used by this discipline in order to understand better the entirety
of processes that happen in a biological system.
- WIKIPEDIA
Origin of Systems Biology
• Completion of genome projects is major
inspiration
• Provided “parts list” for the cell
• Next obvious step is to ask how parts
work together to carry out function?
Vision for Role of Computer
Science in Systems Biology
• “Computer science could provide the
abstraction[s] needed for consolidating
knowledge of biomolecular systems”
• “...the abstractions, tools and methods
used to specify and study computer
systems should illuminate our
accumulated knowledge about
biomolecular systems.”
Regev and Shapiro, “Cells as Computation,” Nature (2002).
Abstract Representations in
Biology
• DNA sequence represented by strings
with 4 letter alphabet (ATGC)
• Protein sequence and structure
– Strings with 20 letter alphabet
– Set of 3D atomic coordinates (PDB file)
The KaiC hexamer, a
Circadian clock protein. From
pdb.org.
(Some) Desirable Properties of an
Abstract Representation
1.
2.
3.
4.
5.
Relevant / accurate
Computable
Understandable
Extensible
Scalable
Modular
Hierarchical
1-4 from Regev and Shapiro, “Cells as Computation,” Nature (2002).
An Irony
• CS community aims to provide powerful
abstract representations to improve
understanding of systems.
• Manner of reporting results - technical reports
in conference proceedings - presents major
barrier to wider adoption by science and
engineering communities.
• There is a need for better communication
among disciplines!
Sometimes formalism creates a
barrier
Example: Red blood cell model
Agenda
• We are looking for useful abstractions
that can improve our understanding of
how biological systems behave
Goals
• Language(s) for constructing whole-cell
models (comprehensive, system-wide)
• Formal analysis (reasoning) of such models
• Simulation of models on distributed systems
• Combination of analysis and simulation to
predict behavior of models
– genotype  phenotype
Challenges
• Accuracy
– Missing interactions
• Computability
– Requirement to perform simulations for many
properties of interest
– Poor scaling of simulations
• Understanding
– Problem of network visualization
• Extensibility
– Missing biophysics
• Scalability
– Need to compute behavior on multiple scales, e.g.
tissuecellcytoplasmnucleus
Mathematical vs. Computational
Models
Consider an elementary chemical reaction
r1: A + B -> C
Mathematical
d[A] / dt  k[A][B]
Computational
module A
: [0..N] init N;
[r1] (A > 0) -> k*A*B: (A’ = A - 1);
…
endmodule
How important is this distinction?
Fisher & Henzinger, Nat. Biotechnol. (2007).
Tension between Accuracy and
Computability
• Application of formal methods requires that elements
of representation be relatively simple.
• For example, a representation that includes all
analytical functions in mathematics might not be
useful - impossible to make predictions.
• In general, increasing the complexity of the
representation limits ability for analysis.
• Representations are sometimes chosen for
amenability to analysis rather than realism - e.g.
boolean networks.
• Computational (“executable”) models tend to make
restrictions explicit.
Some successful abstractions in
systems biology
• Flux Balance Analysis
– Genome-wide models of metabolism
• Mass Action Kinetics
– Cell-cycle model
– Growth factor signaling model
Network Reconstruction (2D
Annotation)
B. O. Palsson, Nature Biotechnology 22, 1218 - 1219 (2004)
Network Reconstruction (cont.)
• Wiring diagram for the components in a cell
• Elements are
– Molecular Components (Species)
– Interactions (Reactions)
• Additional detail can be added.
• Genome-wide reconstructions for metabolism
are available for many model organisms
(including Homo Sapiens!)
• “All such interactions are ultimately represented by a
genome-scale stoichiometric matrix—a twodimensional genome annotation.”
B. O. Palsson, Nature Biotechnology 22, 1218 - 1219 (2004)
Overview of Flux Balance Analysis
• Genome-wide reconstruction of metabolic
network
v1
ri : s1  s2 
 s3
S1i  S2i  1; S3i  1; S ji  0, j {1, 2, 3}
• Assume steady state
S  v  b, where bi are known transport fluxes.
• Assume optimal growth (biomass production)
maximize f (v)  v  v out
Genome-Wide Reconstruction of
Haemophilus influenzae
Edwards, J. S. et al. J. Biol. Chem. 1999;274:17410-17416
Single and double deletion in the
central metabolic pathways of H.
Influenzae
Edwards, J. S. et al. J. Biol. Chem.
1999;274:17410-17416
What Accounts for Success?
• Knowledge Base
– Metabolic chemistry known from >50 years
biochemistry and genome sequence
• Simple Abstraction
– Biochemistry reduced to list of reaction
stoiochimetries
• Powerful Computation Method
– Highly optimized solvers for Linear Programming
problem
• Extensibility
– Non-optimal growth in mutants
– Constraints arising from molecular crowding
Cellular Signal Transduction
ligand
ligand-receptor
transphosphorylation
binding
aggregation
signaling complex
receptor
plasma
membrane
SH2
domain
kinase
adaptor
SH3
domain
Mass Action Kinetics
Q u ic k T im e ™ a n d a
T I F F ( Un c o m p r e s s e d ) d e c o m p r e s s o r
a r e n e e d e d t o s e e t h is p ic t u r e .
Q u ic k T im e ™ a n d a
T I F F ( Un c o m p r e s s e d ) d e c o m p r e s s o r
a r e n e e d e d t o s e e t h is p ic t u r e .
à àà àÜ
R Lá
RL
à
à
k
ka
d
Differential Equations
d[R]
 ka [R][L]  kd [RL]
dt
d[L]
 ka [R][L]  kd [RL]
dt
d[RL]
 ka [R][L]  kd [RL]
dt
Reaction Network Model of
Signaling
Kholodenko et al., J. Biol. Chem. 274, 30169 (1999)
Comparing Model and Experiment
Experimental Data
Simulation Results
Benefits of Mass Action Kinetic
Modeling
• Large knowledge base of signaling
biochemistry
• Models dynamical behavior
• Computational Methods Well Established
– ODE solvers for continuous systems
• Nonlinear Dynamics Theory
• Extensibility
– Stochastic Simulation Algorithm for discrete
systems
– Spatially-resolved models can be built on same
mass action equations
Limitations of Mass Action Kinetic
Modeling
• Rapidly expanding knowledge base
– Many components and interactions unknown
• Lack of precision
– ad hoc assumptions to limit combinatorial
explosion (next lecture)
• Large sets of nonlinear ODE’s are difficult to
simulate or analyze
• No comprehensive models yet
Map of Signaling Initiated by a
Single Family of Receptors
Qu ickTime™ and a
TIFF (Uncompressed) decompressor
are need ed to see this picture .
Oda and Kitano (2006) Mol. Syst. Biol.
Map of Signaling Initiated by a
Single Family of Receptors
Analysis is limited to
simple graph theoretic
measures and
qualitative discussions
of architecture.
Qu ickTime™ and a
TIFF (Uncompressed) decompressor
are need ed to see this picture .
Oda and Kitano (2006) Mol. Syst. Biol.
(Partial) List of Topics
•
•
•
•
•
•
•
•
Boolean Networks
Petri Nets
Statecharts
Process Algebras
Agent-Based Modeling
Hybrid Systems
Model Checking
Simulation Algorithms
Brief Overview of Two Useful
Abstractions
•
•
•
•
•
•
•
•
Boolean Networks
Petri Nets
Statecharts
Process Algebras
Agent-Based Modeling
Hybrid Systems
Model Checking
Simulation Algorithms
Boolean Networks
BN model of cell cycle in budding yeast
G1
Li, F., et al. PNAS 101, 4781–4786 (2004).
Boolean Networks
BN model of cell cycle in budding yeast
Update:
b(t 1)  a1 (t)  a2 (t)  a3 (t)  a4 (t)
Li, F., et al. PNAS 101, 4781–4786 (2004).
G1
Boolean Networks
BN model of cell cycle in budding yeast
Update:
b(t 1)  a1 (t)  a2 (t)  a3 (t)  a4 (t)
Li, F., et al. PNAS 101, 4781–4786 (2004).
G1
Blue arrows form
stable basin of
attraction
Balance Sheet for BNs
Pro
• Models may be
constructed on basis of
scant data*
• Fast computation
• Strong analysis tools (?)
• Good for reasoning
about stability and
robustness
Con
• Two levels may not be
enough
• Lack of compositionality
• Not hierarchical, but
may be embedded in
more complex models.
*Li S, Assmann SM, Albert R (2006) Predicting Essential
Components of Signal Transduction Networks: A Dynamic Model of
Guard Cell Abscisic Acid Signaling. PLoS Biol 4(10): e312
Petri Nets
Tokens
Places
Transition
Transition
Chaouiya, C. Petri net modelling of biological networks. Brief. Bioinform.
8, 210–219 (2007).
Petri Nets
Time Evolution
Tokens
Places
Transition
Transition
Chaouiya, C. Petri net modelling of biological networks. Brief. Bioinform.
8, 210–219 (2007).
Petri Nets Generalize Network
Reconstruction
p3
t2
p4
C corresponds to S
Chaouiya, C. Brief. Bioinform. 8, 210–219 (2007).
Some useful formal properties of
PNs
• P-invariants (C T  x  0 ) ~ Mass Conservation
• T-invariants ( C  y  0 ) ~ Loops / Ele. Modes
• Reachability - whether a state can be
reached
• Liveness - whether a transition can be fired
Overview of PNs
• PNs are graphs, and provide tight connection
between visualization and modeling
• PN formalism is isomorphic to network
reconstruction formalism (reaction networks)
• Many extensions are possible to overcome
limitations
– Colored Petri Nets, Hierarchical CPNs, Multi-level
PN, Stochastic PNs, etc.
• Extensions provide further modeling
capabilities at the expense of analysis.
Concluding Remarks
• Goal of course is to explore various
representations from CS literature that
can be used to model biomolecular
systems.
• What opportunities do these
representations offer in terms of
analysis, simulation, understanding, and
scalability?
Download