Optimality models in biology nanocourse @ Harvard

advertisement
Optimality models in biology
Ron Milo & Michael Brenner
May 2009
1
Why are biological systems built the way
they are?
• In biology we usually ask (and answer) questions
about:
–
–
–
–
what are the processes?
how are they functioning?
who are the molecular players?
when and where are they expressed?
• Can we approach Why questions?
2
Optimality models analysis is useful in a
wide range of biological fields
We learn through case studies:
• Foraging strategy
• Gene expression
• Spores shapes
• Metabolism network
http://openwetware.org/wiki/Optimality_In_Biology
Google: “optimality in biology”
3
Principles of minimality and maximality explain
many physical phenomena
• At the heart of many fields of physics
– “Minimal action” governs classical mechanics
(Lagrangian formulation)
– Maximal Entropy in thermodynamics
– Geometrical optics can be derived from Fermat’s
principle for minimal time
– Area minimization in soap bubbles due to surface
tension
4
Strong predictive power: geometrical optics
laws are derived from Fermat’s principle
Fermat’s principle:
A light ray traveling from one fixed point to another will follow
a path such that the time required is an extreme point – either a
maximum or a minimum.
Rules for Reflection and Refraction
“Sand”
“Water”
5
Optimization model example:
Which rectangle has maximum
area for given perimeter?
6
Evolutionary optimization model construction
1.
Ask an explicit biological question
2.
A range or space of alternatives is defined
3.
An assumption on what is being maximized, fitness proxy
4.
Convert alternatives to fitness payoffs, includes
constraints and tradeoffs
5.
Find optimal solution, test against observations
6.
Suggest experiments and make falsifiable predictions
7
Evolutionary optimization model construction
1.
Ask an explicit biological question
–
–
2.
A range or space of alternatives is defined
–
3.
Any ratio of males to females
An assumption on what is being maximized, fitness proxy
–
–
–
4.
Expected lifetime number of surviving offspring
For an allele can include same allele carried in relatives
Indirect measures often used: minimal energy, maximal food etc.
Convert alternatives to fitness payoffs, includes
constraints and tradeoffs
–
5.
6.
“Why is the sex ratio often unity?”
The question is assumed to have an adaptive answer
“for fixed resources more sons means less daughters”
Find optimal solution, test against observations
Suggest experiments and make falsifiable predictions
8
Evolutionary optimization model construction
1.
Ask an explicit biological question
2.
A range or space of alternatives is defined
3.
An assumption on what is being maximized, fitness proxy
4.
Convert alternatives to fitness payoffs, includes
constraints and tradeoffs
5.
Find optimal solution, test against observations
6.
Suggest experiments and make falsifiable predictions
This is the type of “why” answers - not a theological sense
9
Optimality analysis helps to sharpen our
understanding
• “Optimization models help us to test our insight into the
biological constraints that influence the outcome of
evolution. They serve to improve our understanding
about adaptations, rather than to demonstrate that
natural selection produces optimal solutions. “
Parker & Maynard-Smith, Nature (1990)
10
Optimality analysis helps to sharpen our
understanding – even though evolution is a
tinkerer
• “It [natural selection] works like a tinkerer - a tinkerer
who does not know exactly what he is going to produce
but uses whatever he finds around him whether it be
pieces of string, fragments of wood, or old cardboards; in
short it works like a tinkerer who uses everything at his
disposal to produce some kind of workable object. “
Jacob, Science (1977)
See also: Alon, Biological networks - the
tinkerer as an engineer, Science (2003)
11
Clarification – not everything is optimal
• Not everything in biology is claimed to be optimal –
optimality is a model assumption not a law of nature
• Phylogeny and development has major effects - frozen
accidents
• Random drift is often a dominant force (alleles can become
fixed in a population in spite of natural selection)
• Drift is especially pronounced in small populations
• If only small advantage for “optimal” then the multiplicity of
“good enough” will prevail
• Evolutionary selective pressure can appear only in some
periods of time
12
Foraging strategy of honeybees – why are
honeycrops filled only partially?
Question (1)
A full crop is approximately 55 flower visits but often bees
carry much less to the hive
Maximization of rate of energy extraction predicts
incomplete loads should only be gathered if patch is
depleting
13
Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985)
Foraging strategy of honeybees – why are
honeycrops filled only partially?
Hive    N flowers visited   Hive time
Slopes depends on load –
metabolic flight loss
Alternatives (2)
As a function of the number of flower visits (N):
measured Gross energetic gain (G)
measured Total energetic expenditure or loss (L)
measured Total time (T) per foraging cycle
Constraints and
conversion (4)
14
Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985)
Optimality models can differentiate among
fitness criteria
Gross energetic gain (G)
Total energetic expenditure or loss (L)
Total time (T) per foraging cycle
optimization criterion:
- net energy gain per unit time =
(G - L)
T
Fitness (3)
15
Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985)
Optimality models can differentiate among
fitness criteria
Gross energetic gain (G)
Total energetic expenditure or loss (L)
Total time (T) per foraging cycle
optimization criterion:
- net energy gain per unit time =
(G - L)
T
- net energy gain per unit energy expended =
(G - L)
Fitness (3)
L
16
Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985)
Optimality models can differentiate among
fitness criteria
Gross energetic gain (G)
Total energetic expenditure or loss (L)
Total time (T) per foraging cycle
optimization criterion:
- net energy gain per unit time =
(G - L)
T
- net energy gain per unit energy expended =
(G - L)
Fitness (3)
L
Test with observations (5)
17
Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985)
Optimality models can differentiate among
fitness criteria
Gross energetic gain (G)
Total energetic expenditure or loss (L)
Total time (T) per foraging cycle
optimization criterion:
- net energy gain per unit time =
(G - L)
T
- net energy gain per unit energy expended =
(G - L)
L
A worker's condition deteriorates as a function
of the amount of flight performed
Prediction (6)
18
Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985)
What can we gain from an optimality model?
•
Testing understanding of constraints and tradeoffs
•
Testing understanding of fitness function
•
Suggestions for new experiments and quantitative
questions
•
…
19
Interpreting an optimality model
• “The final step in the optimality approach is to test the
predictions against the observations. If they fit, then the
model may really reflect the forces that have molded
the adaptation. If they do not, we may have misidentified
the strategy set, or the optimization criterion, or the
payoffs; or the phenomenon we have chosen may not
any longer be adaptive…“
20
Parker & Maynard-Smith, Nature (1990)
Interpreting an optimality model
• “…by reworking our assumptions, we modify our model
and revise and retest the predictions. This has been
criticized as being an iterative procedure leading
inevitably to a fit. But this is how science works; theories
can only be discarded when they are disproven or found
to be unrealistic.”
21
Parker & Maynard-Smith, Nature (1990)
Why are optimality models at the molecular
level maturing now?
• After answering the who and how questions
• Requires quantitative tools and information
recently becoming available in biology
• We begin to design and build biological systems
22
The number you need, with reference in one minute
BioNumbers – Useful biological numbers database
Wiki-like, users edit and comment
Over 3500 properties & 5000 users/month
www.BioNumbers.org
23
Warm-up: trying to beat nature at design
How to transform 5 carbon sugars into 6 carbon sugars?
(e.g from cell wall or nucleic acids  glycolysis)
6x
x5
24
The pentose phosphate pathway
defined as a game
•
•
Goal:
Turn 6 Pentoses into 5
Hexoses
•
Rules:
Transfer 2-3 carbons
between two molecules
Never leave a molecule with
1-2 carbons


뛴
Optimization function:
Minimize the number of steps
(simplicity)
5
5
5
5
5
5
?
6
6
6
6
6
Among equally long solutions
prefer the one using the least
number of carbons in molecules
E. Meléndez-Hevia et al. (Journal of theoretical Biology 1994)
Serious, take 5 minutes and six 5
carbons and try it out
26
Solution to Pentose Phosphate
game in 7 steps
Solution to Pentose Phosphate game in 7
steps
•


Corresponds to natural
pathway
Doesn't explain why
the rules exist
Supports the idea of
simplicity
Are there simplifying principles to the structure of
the central carbohydrate metabolism network?
PPP
29
Cost Benefit methodology
Benefits B and costs C of adopting
strategy x.
e.g.: a foraging lapwing:
X
B(x) is the calorific value of prey
items obtained after each move of
distance x
C(x) is the energetic cost of
moving distance x.
Indirect fitness function:
net energy gain per move,
E(x) = B(x) - C(x).
30
Cost-benefit analysis case study:
Optimality and evolutionary tuning of
the expression level of a protein
Study by Erez Dekel
Uri Alon’s group
Weizmann Institute of Science
31
Different proteins are found in the cell at
different numbers
What determines the expression
level of a protein?
32
Evolutionary theory suggests maximization
of a fitness function
33
Fitness functions have seldom been
experimentally measured
• Can we measure fitness function?
• Can we find a deterministic theory to predict
an optimum in a given environment?
(why 60000 copies per cell?)
34
lac operon of E. coli is an ideal model
system
• Well studied, detailed knowledge of biochemical
parameters.
• Excellent tools:
– IPTG: induces the lac operon, but cells cannot grow on it
– ONPG: measures protein activity
• The fitness function in exponential growing bacteria can
be the growth rate.
35
Model system: The lac operon of
E. coli, a well-characterized gene system
Lactos
e
Z
Z
Z
Z
Growth
lac Z Y A
36
An experimental study of fitness and
optimization
1. Measure the cost and benefit of the lac
proteins in wild-type E. coli
2. Find the predicted optimal expression
as a function of the environment
3. Perform laboratory evolution experiments
in different environments and monitor
the evolution of the protein expression
level
37
Growth rate is sum of cost
and benefit of protein production
g  g 0  C ( Z )  B ( Z , L)
cost
benefit
Cost: reduction in growth due to burden
of producing protein
Benefit: increase in growth rate due to
action of protein (lactose utilization).
38
Cost function can be measured by
producing proteins without benefit
Use inducer IPTG to produce LacZ, this inducer
cannot be metabolized, hence no benefit
g=g0 –C(Z) +B(Z)
Decoupling cost and benefit
measuring all parameters (no free parameters)
39
Cost of full LacZ production is about 4.4%
(i.e. grows 4.4% slower)
cost (relative growth rate reduction)
Cost
0.08
0.06
0.04
M9+glycerol, 37C
0.02
0
0
0.5
1
Relative lac expression (Z/ZWT)
1.5
Expression
40
See also: Koch Mol. Evol. 1983; Lenski Mol. Biol. Evl. 1989; Dong, 1995
Benefit is measured by growth at various
levels of lactose with full lac expression
g=g0 –C(Zmax) + B(Zmax,L)
Constant cost Benefit depends
on concentration
of sugar lactose
41
Benefit of full LacZ production
at saturating lactose is 15%
Relative Growth Rate Difference
Benefit
0.2
0.1
B(Z,L)=B0[ZLin]
15%
0
h(ZWT)
-0.1
10
-4
-2
0
10
10
External Lactose (L) (mM)
10
2
Lactose level
Red curve: model of lactose transport with experimentally measured
parameters (Models: Kremling, 2001; Mackey, 2004)
42
Balance of cost and benefit predicts
optimal expression level
The calibrated fitness landscape
Optimum level at
low lactose is lower
than wild-type
43
Optimal expression level is higher
at high lactose concentrations
44
Wild-type protein level is predicted to
be optimal at lactose level of 0.6mM
45
Optimal LacZ level (relative to wild-type)
Predicted optimal protein level during
evolution in a constant lactose environment
46
Experimental evolution using serial
dilution
Day 1
Day 2
Day 3
......
Dilution rate 1:100
Number of generations per day is log2100=6.6
See also: Lenski PNAS 2003; Palsson Nature 2002
47
Evolutionary experiment on seven lactose
levels in parallel
• Minimal medium + IPTG and 0.1% glycerol
• Lactose concentrations:
0, 0.1, 0.2, 0.5, 1, 2, 5mM
• LacZ activity measured every 20 generations
(ONPG assay)
• Protein level measured by quantitative electrophoresis
48
Will LacZ expression level evolve
towards optimal predicted level?
49
LacZ activity and protein level adapts to the
environment within several hundred generations
1.2
5mM Lactose
2
1mM
Lactose
LacZ Activity
1
0.5mM Lactose
0.8
0.6
0.1mM Lactose
No Lactose
0.4
0.2
0
100
200
300
400
Generations
500
Wild-type levels do not change at 0.5mM lactose 50
Dekel & Alon, Nature (2005)
Adapted LacZ protein levels match
predicted optima
Normalized LacZ activity
1.2
1
0.8
0.6
0.4
0.2
0
0
1
2
3
L(mM)
4
5
6
lacZ level measured after more than 550 generations
51
Dekel & Alon, Nature (2005)
Arising questions:
What is the molecular basis and dynamics of the
adaptations?
What is the source of the protein production nonlinear cost?
52
Insights from optimality study
• Fitness function of lac protein expression was
experimentally determined.
• Fitness function predicts an optimum expression
at each lactose environment
• Cells can tune protein levels accurately to reach
optimal values within a few hundred generations
• Creates new quantitative research questions
53
54
Non optimality in biology
(actually in our body)
• Placement of windpipe in front of esophagus (food can
go down wrong tube)
• Anatomy of human eye where rods and cones are
located behind neurons rather than in front as in
octopus eye - leads to necessity of a blind spot.
Vertebrate eye
Octopus eye
55
Suggestions for apparent non optimality
in biology (actually in our body)
• Placement of windpipe in front of esophagus (food can
go down wrong tube)
• Anatomy of human eye where rods and cones are
located behind neurons rather than in front as in
octopus eye - leads to necessity of a blind spot.
Vertebrate eye
Octopus eye
56
Criticism on adaptionist arguments in biology
• Organisms not decomposable
• Loose criteria for acceptance
• Rejection of one story leads to another
57
Gould and Lewontin, Proc. Roy, Soc. (1979)
Criticism on adaptionist arguments in biology
• Organisms not decomposable:
– “organisms as integrated wholes, fundamentally not decomposable
into independent and separately optimized parts…constrained by
phyletic heritage, pathways of development, and general
architecture”
-> understanding the constraints is at the heart of the
optimality model
-> In some systems complexity might indeed not be
decomposable
58
Gould and Lewontin, Proc. Roy, Soc. (1979)
Criticism on adaptionist arguments in biology
• Loose criteria for acceptance:
– “The criteria for acceptance of a story are so loose that many pass
without proper confirmation. Often, evolutionists use consistency
with natural selection as the sole criterion and consider their work
done when they concoct a plausible story.“
-> Optimality models define a quantitative test for agreement
with observations
-> Population genetics can define criteria
-> Falsifiable predictions are required
59
Gould and Lewontin, Proc. Roy, Soc. (1979)
Criticism on adaptionist arguments in biology
• Rejection of one story leads to another:
– “The rejection of one adaptive story usually leads to its
replacement by another, rather than to a suspicion that a different
kind of explanation might be required. Since the range of adaptive
stories is as wide as our minds are fertile, new stones can always
be postulated. And if a story is not immediately available, one can
always plead temporary ignorance and trust that it will be
forthcoming.“
-> Science proceeds by rejection of theories and replacement
with new ones.
-> Over-fitting and fine tuning should be avoided
-> Adequacy and predictive power assessed by community
60
Gould and Lewontin, Proc. Roy, Soc. (1979)
Drift-selection balance
(Michael Brenner)
61
Fungal spores shape
62
Understanding
network
structure:
optimality in
metabolism
63
Solution to Pentose Phosphate
game in 7 steps



Corresponds to
natural pathway
Doesn't explain
why the rules
exist
Supports the
idea of simplicity
Are there simplifying principles to structure of
central carbohydrate metabolism network?
But what do you mean
simplifying principles?
http://www.nytimes.com/2007/12/09/magazine/09lefthandturn.html?ex=1354856400&en=c9a577b0fac3b645
&ei=5090&partner=rssuserland&emc=rss 65
Searching for design
principles in networks
Analogy:
• Many stations
connected in
“shortest paths”
• But not all
• Finding sets of
shortest relates to
function
(modules=lines,
hub=down town)
66
We develop a method to find shortest path from A to B
67
All possible reaction types are explored
aldehyde dehydrogenase (CoA):
pyruvate ↔ acetyl-CoA + CO2
isomerase (keto to enol):
pyruvate ↔ enolpyruvate
kinase (carboxyl):
pyruvate ↔ pyruvate-P
68
EC classes define 27 possible enzymatic
reaction families
69
Optimization function finds minimal number of
steps between any two metabolites
Fitness (3)
• The shortest path can be
found efficiently using a
customized BFS (breadth
first search)
70
Are all pairs of metabolites connected by
shortest possible paths? (as allowed by
biochemistry classes)
71
Are all pairs of metabolites connected by
shortest possible paths? (as allowed by
biochemistry classes)
•
Some pairs are connected by possible shortest paths
•
Other pairs can be connected in less steps via shortcuts
•
Cluster together pairs that contain shortest paths
•
Define these as optimality modules
72
Optimality modules are defined to contain
shortest paths
A
A
B
B
C
B
C
F
Possible EC
reactions
(biochemistry)
Only metabolites
connected by
shortest possible
paths are contained
in an optimality
module
D
E
E
Existing
reactions
(in organism)
C
D
D
F
A
E
F
Optimality
modules
73
Example: possible shortcut in glycolysis
break it into modules
GLU
DHAP
DHAP
GAP
GAP
BPG
EC 1.2
3PG
2PG
PYR
BPG
GAP  3PG (EC 1.2) is
biochemically feasible (exists in
plants), but is not part of E. coli
central metabolism
3PG
2PG
Therefore glycolysis is not as
short as possible and breaks
down into optimality modules
74
Central carbon
metabolism network
breaks down to
optimality modules
Noor et al, under review
Biomass precursors
are key metabolites
• Design principle: Every pair of
consecutive precursors is
connected by the minimal number
of enzymatic steps 
Central Carbon Metabolism is a
minimal walk between the 13
biomass precursors
“Make things as simple as possible but not simpler”
A two phase optimality model structure
•
Question: why metabolism network built the way it is?
•
Phase 1:
–
Optimality model analysis for pairs of two metabolites
–
Alternatives space: all ways to connect the two
–
Fitness function: minimal number of steps
–
Constraint: EC classes
–
Result: some pairs are optimally connected and some are not
78
A two phase optimality model structure
•
Question: why metabolism network built the way it is?
•
Phase 1:
•
–
Optimality model analysis for pairs of two metabolites
–
Alternatives space: all ways to connect the two
–
Fitness function: minimal number of steps
–
Constraint: EC classes
–
Result: some pairs are optimally connected and some are not
Draw groups – optimality modules
79
A two phase optimality model structure
•
•
Question: why metabolism network built the way it is?
Phase 1:
–
–
–
–
–
•
•
Draw groups – optimality modules
Phase 2:
–
–
–
–
•
Optimality model analysis for pairs of two metabolites
Alternatives space: all ways to connect the two
Fitness function: minimal number of steps
Constraint: EC classes
Result: some pairs are optimally connected and some are not
Constraint: pass through precursor metabolites
Optimality model analysis for consecutive precursor metabolites
Result: every pair is connected via minimal number of steps
Predictions: other metabolic networks, different required precursors
We gained insight into the constraints
80
Can carbon fixation
metabolism be “enhanced”?
81
Carbon is
assimilated into
plants by the
Calvin cycle
82
Can we find “better” ways to
achieve carbon fixation?
83
We systematically explore all possible
synthetic carbon fixation pathways
84
Summary - optimality models in biology as a
a useful tool of research
•
Optimality models test and sharpen our understanding
(constraints, tradeoffs, fitness function)
•
Defined structure that ensures rigor
•
They suggest new experiments and quantitative questions
•
At the molecular level becoming mature due to available
quantitative information and ability to design and test
predictions
85
References and recommended reading
• Cornish-Bowden, The Pursuit of Perfection - Aspects of Biochemical
Evolution, Oxford University Press, 2004
• Stearns, The evolution of life histories, Oxford University Press, 1992
• Gould and Lewontin, The Spandrels of San Marco and the
Panglossian Paradigm: A Critique of the Adaptationist Programme,
Proc. Roy, Soc. 1979
• Jacob, Evolution and Tinkering, Science 1977
• Alon, Biological networks - the tinkerer as an engineer, Science 2003
• Parker and Smith, Optimality theory in evolutionary biology, Nature
1990
• http://openwetware.org/wiki/Optimality_In_Biology
Google: “optimality in biology”
86
Download