Optimality models in biology Ron Milo & Michael Brenner May 2009 1 Why are biological systems built the way they are? • In biology we usually ask (and answer) questions about: – – – – what are the processes? how are they functioning? who are the molecular players? when and where are they expressed? • Can we approach Why questions? 2 Optimality models analysis is useful in a wide range of biological fields We learn through case studies: • Foraging strategy • Gene expression • Spores shapes • Metabolism network http://openwetware.org/wiki/Optimality_In_Biology Google: “optimality in biology” 3 Principles of minimality and maximality explain many physical phenomena • At the heart of many fields of physics – “Minimal action” governs classical mechanics (Lagrangian formulation) – Maximal Entropy in thermodynamics – Geometrical optics can be derived from Fermat’s principle for minimal time – Area minimization in soap bubbles due to surface tension 4 Strong predictive power: geometrical optics laws are derived from Fermat’s principle Fermat’s principle: A light ray traveling from one fixed point to another will follow a path such that the time required is an extreme point – either a maximum or a minimum. Rules for Reflection and Refraction “Sand” “Water” 5 Optimization model example: Which rectangle has maximum area for given perimeter? 6 Evolutionary optimization model construction 1. Ask an explicit biological question 2. A range or space of alternatives is defined 3. An assumption on what is being maximized, fitness proxy 4. Convert alternatives to fitness payoffs, includes constraints and tradeoffs 5. Find optimal solution, test against observations 6. Suggest experiments and make falsifiable predictions 7 Evolutionary optimization model construction 1. Ask an explicit biological question – – 2. A range or space of alternatives is defined – 3. Any ratio of males to females An assumption on what is being maximized, fitness proxy – – – 4. Expected lifetime number of surviving offspring For an allele can include same allele carried in relatives Indirect measures often used: minimal energy, maximal food etc. Convert alternatives to fitness payoffs, includes constraints and tradeoffs – 5. 6. “Why is the sex ratio often unity?” The question is assumed to have an adaptive answer “for fixed resources more sons means less daughters” Find optimal solution, test against observations Suggest experiments and make falsifiable predictions 8 Evolutionary optimization model construction 1. Ask an explicit biological question 2. A range or space of alternatives is defined 3. An assumption on what is being maximized, fitness proxy 4. Convert alternatives to fitness payoffs, includes constraints and tradeoffs 5. Find optimal solution, test against observations 6. Suggest experiments and make falsifiable predictions This is the type of “why” answers - not a theological sense 9 Optimality analysis helps to sharpen our understanding • “Optimization models help us to test our insight into the biological constraints that influence the outcome of evolution. They serve to improve our understanding about adaptations, rather than to demonstrate that natural selection produces optimal solutions. “ Parker & Maynard-Smith, Nature (1990) 10 Optimality analysis helps to sharpen our understanding – even though evolution is a tinkerer • “It [natural selection] works like a tinkerer - a tinkerer who does not know exactly what he is going to produce but uses whatever he finds around him whether it be pieces of string, fragments of wood, or old cardboards; in short it works like a tinkerer who uses everything at his disposal to produce some kind of workable object. “ Jacob, Science (1977) See also: Alon, Biological networks - the tinkerer as an engineer, Science (2003) 11 Clarification – not everything is optimal • Not everything in biology is claimed to be optimal – optimality is a model assumption not a law of nature • Phylogeny and development has major effects - frozen accidents • Random drift is often a dominant force (alleles can become fixed in a population in spite of natural selection) • Drift is especially pronounced in small populations • If only small advantage for “optimal” then the multiplicity of “good enough” will prevail • Evolutionary selective pressure can appear only in some periods of time 12 Foraging strategy of honeybees – why are honeycrops filled only partially? Question (1) A full crop is approximately 55 flower visits but often bees carry much less to the hive Maximization of rate of energy extraction predicts incomplete loads should only be gathered if patch is depleting 13 Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985) Foraging strategy of honeybees – why are honeycrops filled only partially? Hive N flowers visited Hive time Slopes depends on load – metabolic flight loss Alternatives (2) As a function of the number of flower visits (N): measured Gross energetic gain (G) measured Total energetic expenditure or loss (L) measured Total time (T) per foraging cycle Constraints and conversion (4) 14 Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985) Optimality models can differentiate among fitness criteria Gross energetic gain (G) Total energetic expenditure or loss (L) Total time (T) per foraging cycle optimization criterion: - net energy gain per unit time = (G - L) T Fitness (3) 15 Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985) Optimality models can differentiate among fitness criteria Gross energetic gain (G) Total energetic expenditure or loss (L) Total time (T) per foraging cycle optimization criterion: - net energy gain per unit time = (G - L) T - net energy gain per unit energy expended = (G - L) Fitness (3) L 16 Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985) Optimality models can differentiate among fitness criteria Gross energetic gain (G) Total energetic expenditure or loss (L) Total time (T) per foraging cycle optimization criterion: - net energy gain per unit time = (G - L) T - net energy gain per unit energy expended = (G - L) Fitness (3) L Test with observations (5) 17 Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985) Optimality models can differentiate among fitness criteria Gross energetic gain (G) Total energetic expenditure or loss (L) Total time (T) per foraging cycle optimization criterion: - net energy gain per unit time = (G - L) T - net energy gain per unit energy expended = (G - L) L A worker's condition deteriorates as a function of the amount of flight performed Prediction (6) 18 Schmid-Hempel, Kacelnik & Houston Behav. Ecol. Sociobiol. (1985) What can we gain from an optimality model? • Testing understanding of constraints and tradeoffs • Testing understanding of fitness function • Suggestions for new experiments and quantitative questions • … 19 Interpreting an optimality model • “The final step in the optimality approach is to test the predictions against the observations. If they fit, then the model may really reflect the forces that have molded the adaptation. If they do not, we may have misidentified the strategy set, or the optimization criterion, or the payoffs; or the phenomenon we have chosen may not any longer be adaptive…“ 20 Parker & Maynard-Smith, Nature (1990) Interpreting an optimality model • “…by reworking our assumptions, we modify our model and revise and retest the predictions. This has been criticized as being an iterative procedure leading inevitably to a fit. But this is how science works; theories can only be discarded when they are disproven or found to be unrealistic.” 21 Parker & Maynard-Smith, Nature (1990) Why are optimality models at the molecular level maturing now? • After answering the who and how questions • Requires quantitative tools and information recently becoming available in biology • We begin to design and build biological systems 22 The number you need, with reference in one minute BioNumbers – Useful biological numbers database Wiki-like, users edit and comment Over 3500 properties & 5000 users/month www.BioNumbers.org 23 Warm-up: trying to beat nature at design How to transform 5 carbon sugars into 6 carbon sugars? (e.g from cell wall or nucleic acids glycolysis) 6x x5 24 The pentose phosphate pathway defined as a game • • Goal: Turn 6 Pentoses into 5 Hexoses • Rules: Transfer 2-3 carbons between two molecules Never leave a molecule with 1-2 carbons 뛴 Optimization function: Minimize the number of steps (simplicity) 5 5 5 5 5 5 ? 6 6 6 6 6 Among equally long solutions prefer the one using the least number of carbons in molecules E. Meléndez-Hevia et al. (Journal of theoretical Biology 1994) Serious, take 5 minutes and six 5 carbons and try it out 26 Solution to Pentose Phosphate game in 7 steps Solution to Pentose Phosphate game in 7 steps • Corresponds to natural pathway Doesn't explain why the rules exist Supports the idea of simplicity Are there simplifying principles to the structure of the central carbohydrate metabolism network? PPP 29 Cost Benefit methodology Benefits B and costs C of adopting strategy x. e.g.: a foraging lapwing: X B(x) is the calorific value of prey items obtained after each move of distance x C(x) is the energetic cost of moving distance x. Indirect fitness function: net energy gain per move, E(x) = B(x) - C(x). 30 Cost-benefit analysis case study: Optimality and evolutionary tuning of the expression level of a protein Study by Erez Dekel Uri Alon’s group Weizmann Institute of Science 31 Different proteins are found in the cell at different numbers What determines the expression level of a protein? 32 Evolutionary theory suggests maximization of a fitness function 33 Fitness functions have seldom been experimentally measured • Can we measure fitness function? • Can we find a deterministic theory to predict an optimum in a given environment? (why 60000 copies per cell?) 34 lac operon of E. coli is an ideal model system • Well studied, detailed knowledge of biochemical parameters. • Excellent tools: – IPTG: induces the lac operon, but cells cannot grow on it – ONPG: measures protein activity • The fitness function in exponential growing bacteria can be the growth rate. 35 Model system: The lac operon of E. coli, a well-characterized gene system Lactos e Z Z Z Z Growth lac Z Y A 36 An experimental study of fitness and optimization 1. Measure the cost and benefit of the lac proteins in wild-type E. coli 2. Find the predicted optimal expression as a function of the environment 3. Perform laboratory evolution experiments in different environments and monitor the evolution of the protein expression level 37 Growth rate is sum of cost and benefit of protein production g g 0 C ( Z ) B ( Z , L) cost benefit Cost: reduction in growth due to burden of producing protein Benefit: increase in growth rate due to action of protein (lactose utilization). 38 Cost function can be measured by producing proteins without benefit Use inducer IPTG to produce LacZ, this inducer cannot be metabolized, hence no benefit g=g0 –C(Z) +B(Z) Decoupling cost and benefit measuring all parameters (no free parameters) 39 Cost of full LacZ production is about 4.4% (i.e. grows 4.4% slower) cost (relative growth rate reduction) Cost 0.08 0.06 0.04 M9+glycerol, 37C 0.02 0 0 0.5 1 Relative lac expression (Z/ZWT) 1.5 Expression 40 See also: Koch Mol. Evol. 1983; Lenski Mol. Biol. Evl. 1989; Dong, 1995 Benefit is measured by growth at various levels of lactose with full lac expression g=g0 –C(Zmax) + B(Zmax,L) Constant cost Benefit depends on concentration of sugar lactose 41 Benefit of full LacZ production at saturating lactose is 15% Relative Growth Rate Difference Benefit 0.2 0.1 B(Z,L)=B0[ZLin] 15% 0 h(ZWT) -0.1 10 -4 -2 0 10 10 External Lactose (L) (mM) 10 2 Lactose level Red curve: model of lactose transport with experimentally measured parameters (Models: Kremling, 2001; Mackey, 2004) 42 Balance of cost and benefit predicts optimal expression level The calibrated fitness landscape Optimum level at low lactose is lower than wild-type 43 Optimal expression level is higher at high lactose concentrations 44 Wild-type protein level is predicted to be optimal at lactose level of 0.6mM 45 Optimal LacZ level (relative to wild-type) Predicted optimal protein level during evolution in a constant lactose environment 46 Experimental evolution using serial dilution Day 1 Day 2 Day 3 ...... Dilution rate 1:100 Number of generations per day is log2100=6.6 See also: Lenski PNAS 2003; Palsson Nature 2002 47 Evolutionary experiment on seven lactose levels in parallel • Minimal medium + IPTG and 0.1% glycerol • Lactose concentrations: 0, 0.1, 0.2, 0.5, 1, 2, 5mM • LacZ activity measured every 20 generations (ONPG assay) • Protein level measured by quantitative electrophoresis 48 Will LacZ expression level evolve towards optimal predicted level? 49 LacZ activity and protein level adapts to the environment within several hundred generations 1.2 5mM Lactose 2 1mM Lactose LacZ Activity 1 0.5mM Lactose 0.8 0.6 0.1mM Lactose No Lactose 0.4 0.2 0 100 200 300 400 Generations 500 Wild-type levels do not change at 0.5mM lactose 50 Dekel & Alon, Nature (2005) Adapted LacZ protein levels match predicted optima Normalized LacZ activity 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 L(mM) 4 5 6 lacZ level measured after more than 550 generations 51 Dekel & Alon, Nature (2005) Arising questions: What is the molecular basis and dynamics of the adaptations? What is the source of the protein production nonlinear cost? 52 Insights from optimality study • Fitness function of lac protein expression was experimentally determined. • Fitness function predicts an optimum expression at each lactose environment • Cells can tune protein levels accurately to reach optimal values within a few hundred generations • Creates new quantitative research questions 53 54 Non optimality in biology (actually in our body) • Placement of windpipe in front of esophagus (food can go down wrong tube) • Anatomy of human eye where rods and cones are located behind neurons rather than in front as in octopus eye - leads to necessity of a blind spot. Vertebrate eye Octopus eye 55 Suggestions for apparent non optimality in biology (actually in our body) • Placement of windpipe in front of esophagus (food can go down wrong tube) • Anatomy of human eye where rods and cones are located behind neurons rather than in front as in octopus eye - leads to necessity of a blind spot. Vertebrate eye Octopus eye 56 Criticism on adaptionist arguments in biology • Organisms not decomposable • Loose criteria for acceptance • Rejection of one story leads to another 57 Gould and Lewontin, Proc. Roy, Soc. (1979) Criticism on adaptionist arguments in biology • Organisms not decomposable: – “organisms as integrated wholes, fundamentally not decomposable into independent and separately optimized parts…constrained by phyletic heritage, pathways of development, and general architecture” -> understanding the constraints is at the heart of the optimality model -> In some systems complexity might indeed not be decomposable 58 Gould and Lewontin, Proc. Roy, Soc. (1979) Criticism on adaptionist arguments in biology • Loose criteria for acceptance: – “The criteria for acceptance of a story are so loose that many pass without proper confirmation. Often, evolutionists use consistency with natural selection as the sole criterion and consider their work done when they concoct a plausible story.“ -> Optimality models define a quantitative test for agreement with observations -> Population genetics can define criteria -> Falsifiable predictions are required 59 Gould and Lewontin, Proc. Roy, Soc. (1979) Criticism on adaptionist arguments in biology • Rejection of one story leads to another: – “The rejection of one adaptive story usually leads to its replacement by another, rather than to a suspicion that a different kind of explanation might be required. Since the range of adaptive stories is as wide as our minds are fertile, new stones can always be postulated. And if a story is not immediately available, one can always plead temporary ignorance and trust that it will be forthcoming.“ -> Science proceeds by rejection of theories and replacement with new ones. -> Over-fitting and fine tuning should be avoided -> Adequacy and predictive power assessed by community 60 Gould and Lewontin, Proc. Roy, Soc. (1979) Drift-selection balance (Michael Brenner) 61 Fungal spores shape 62 Understanding network structure: optimality in metabolism 63 Solution to Pentose Phosphate game in 7 steps Corresponds to natural pathway Doesn't explain why the rules exist Supports the idea of simplicity Are there simplifying principles to structure of central carbohydrate metabolism network? But what do you mean simplifying principles? http://www.nytimes.com/2007/12/09/magazine/09lefthandturn.html?ex=1354856400&en=c9a577b0fac3b645 &ei=5090&partner=rssuserland&emc=rss 65 Searching for design principles in networks Analogy: • Many stations connected in “shortest paths” • But not all • Finding sets of shortest relates to function (modules=lines, hub=down town) 66 We develop a method to find shortest path from A to B 67 All possible reaction types are explored aldehyde dehydrogenase (CoA): pyruvate ↔ acetyl-CoA + CO2 isomerase (keto to enol): pyruvate ↔ enolpyruvate kinase (carboxyl): pyruvate ↔ pyruvate-P 68 EC classes define 27 possible enzymatic reaction families 69 Optimization function finds minimal number of steps between any two metabolites Fitness (3) • The shortest path can be found efficiently using a customized BFS (breadth first search) 70 Are all pairs of metabolites connected by shortest possible paths? (as allowed by biochemistry classes) 71 Are all pairs of metabolites connected by shortest possible paths? (as allowed by biochemistry classes) • Some pairs are connected by possible shortest paths • Other pairs can be connected in less steps via shortcuts • Cluster together pairs that contain shortest paths • Define these as optimality modules 72 Optimality modules are defined to contain shortest paths A A B B C B C F Possible EC reactions (biochemistry) Only metabolites connected by shortest possible paths are contained in an optimality module D E E Existing reactions (in organism) C D D F A E F Optimality modules 73 Example: possible shortcut in glycolysis break it into modules GLU DHAP DHAP GAP GAP BPG EC 1.2 3PG 2PG PYR BPG GAP 3PG (EC 1.2) is biochemically feasible (exists in plants), but is not part of E. coli central metabolism 3PG 2PG Therefore glycolysis is not as short as possible and breaks down into optimality modules 74 Central carbon metabolism network breaks down to optimality modules Noor et al, under review Biomass precursors are key metabolites • Design principle: Every pair of consecutive precursors is connected by the minimal number of enzymatic steps Central Carbon Metabolism is a minimal walk between the 13 biomass precursors “Make things as simple as possible but not simpler” A two phase optimality model structure • Question: why metabolism network built the way it is? • Phase 1: – Optimality model analysis for pairs of two metabolites – Alternatives space: all ways to connect the two – Fitness function: minimal number of steps – Constraint: EC classes – Result: some pairs are optimally connected and some are not 78 A two phase optimality model structure • Question: why metabolism network built the way it is? • Phase 1: • – Optimality model analysis for pairs of two metabolites – Alternatives space: all ways to connect the two – Fitness function: minimal number of steps – Constraint: EC classes – Result: some pairs are optimally connected and some are not Draw groups – optimality modules 79 A two phase optimality model structure • • Question: why metabolism network built the way it is? Phase 1: – – – – – • • Draw groups – optimality modules Phase 2: – – – – • Optimality model analysis for pairs of two metabolites Alternatives space: all ways to connect the two Fitness function: minimal number of steps Constraint: EC classes Result: some pairs are optimally connected and some are not Constraint: pass through precursor metabolites Optimality model analysis for consecutive precursor metabolites Result: every pair is connected via minimal number of steps Predictions: other metabolic networks, different required precursors We gained insight into the constraints 80 Can carbon fixation metabolism be “enhanced”? 81 Carbon is assimilated into plants by the Calvin cycle 82 Can we find “better” ways to achieve carbon fixation? 83 We systematically explore all possible synthetic carbon fixation pathways 84 Summary - optimality models in biology as a a useful tool of research • Optimality models test and sharpen our understanding (constraints, tradeoffs, fitness function) • Defined structure that ensures rigor • They suggest new experiments and quantitative questions • At the molecular level becoming mature due to available quantitative information and ability to design and test predictions 85 References and recommended reading • Cornish-Bowden, The Pursuit of Perfection - Aspects of Biochemical Evolution, Oxford University Press, 2004 • Stearns, The evolution of life histories, Oxford University Press, 1992 • Gould and Lewontin, The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationist Programme, Proc. Roy, Soc. 1979 • Jacob, Evolution and Tinkering, Science 1977 • Alon, Biological networks - the tinkerer as an engineer, Science 2003 • Parker and Smith, Optimality theory in evolutionary biology, Nature 1990 • http://openwetware.org/wiki/Optimality_In_Biology Google: “optimality in biology” 86