The Use of Computer Simulation in Studying Biological Evolution: Pure Possible Processes, Selective Regimes and Open-ended Evolution Philippe Huneman IHPST (CNRS / Université PARIS I Sorbonne) • The interplay between evolutionary biology and computer science • Bioinformatics, Biocomputation • Genetic algorithms (lexicon : « crossing over », genes etc) • The radical AL claim : digital organisms are themselves living entities (rather than their simulations) (Adami 2002). Idea of non-carbonbased forms of life as the essence of life • Version # 2: Darwinian evolution is an « algorithm » (Dennett, see Maynard Smith 2000) • What are simulations doing / teaching ? • What is the role of natural selection in them ? • Investigates the relations between biological evolution and computer simulations of evolving entities through natural selection. I. TYPOLOGY OF COMPUTER SIMULATIONS IN EVOLUTIONARY THEORY • A. Kinds of role of selection: a1. formal selection context Todd and Miller (1995) on sexual selection (sexual selection is more an incentive for exploration than natural selection, since females, through mate choice, internalize the constraints of natural selection. ) Maley on biodiversity (emergence of new species is more conditioned by geographical barriers than by adaptive potential) Mikel Maron 2004) – Moths and industrial melanism A2. No selection context Boids (Reynolds) Chu and Adami (2000): simulation of phylogenies whose parameter is the mean number of same-order daughter families of a family Mc Shea (1996, 2005) : increase of complexity with no selection B.Use: weak and strong • B1. Weak : model is used to test one hypothesis on one process; it simulates the behavior of the entities (boids; Maley’s biodiversity; etc.). Way to test hypotheses about the world • B2 Strong : entities of the model don’t correspond to real entities; the simulation is meant to explore the kinds of behaviors of digital entities themselves (Ray’s Tierra, Holland’s Echo etc.). Hypotheses are made about the model itself “digital organisms” as “defined by the sequence of instructions that constitute their genome, are not simulated: they are physically present in the computer and live there” (Adami, 2002) • Echo : unrealistic assumptions concerning reproduction, absence of isolative reproduction for species, makes Echo a poor model of evolutionary biology (Crubelier 1997) Langton-Sayama Loop II. NATURAL SELECTION AND PURE POSSIBLE PROCESSES • In the (weak or strong) simulations : causal processes (i.e. counterfactual dependencies between classes of sets of cells, and global state at next step). a1 a 2 ….. a1+P a2+P ……………… ak ….. A1 n, j A2 n, j……… Ak n, j ….. b1n+1 b2n+1 …………… bkn+1.. Property P n at Step n (Add (on i) Disj (on j) Ai n, j ) Property Pn+1 at Step n+1 (all the bin+1 ) “If P n had not been the case, Pn+1 would not have been the case.” Causation as counterfactual dependence between steps in Cas Huneman Minds and machines 2008 -> In « formal selection » contexts simulations : those causal processes are actual « selective processes » • Yet the entities in the simulations can not exactly match biological entities: In Echo, you don’t have species easily, in Tierra no lineages etc. If one system is designed to study some level of biological reality, the other levels are not ipso facto given (whereas if you have, e.g., organisms you have genes and species etc.) -> In actual biology : all levels of the hierarchies are acting together • So computer simulations display « pure possible processes » concerning the entities modelised, located in a target-level of the hierarchy (no implicit entangling between levels) In the case of formal selection simulations, « pure » selective processes occur • Ex. of « natural selection » sensu Channon : Echo or Hillis’s coevolution between sorting problems: « natural selection » simulations. Yet in Echo, for ex., the class of possible actions is limited III. THE VALIDATION PROBLEM FOR COMPUTER SIMULATIONS What do tell us such simulations ? • They correlate pure possible processes with patterns of evolution • They can not prove that some process caused some evolutionary result, but they provide candidate causal explanations : « if pattern X is met, then process x is likely to have produced it” • And other causal processes may have been at work but they were not so significant regarding such outcome (noise ???) • Even if we have no idea of the ecological context, hence of the actual selective pressures • Adami, Pennock, Ofria and Lenski (2003) show that evolution is likely to have favoured complexity : their point is that, if there is complexity increasing in their sense, then deleterious mutations might have been selected; then a decrease in fitness might have been involved in the stabilisation of more functionally complex genomes. • Chu and Adami (2000) investigation of the patterns of abundance of taxa : if the distribution of taxa resemble a certain powerlaw scheme X, it is likely that the parameter m (mean number of same order daughter families of a family) has been in nature close to the value of m involved in X (i.e. m=1). The validation problem • Epstein (1999) : the case of Anasazi settlements • That does not prove that the rules ascribed to individuals are the accurate ones see also Reynold flocking boids : it excludes a centered-controlled social organisation (but we need other assumptions to make this plausible) • Even more : the case of Arakawa’s simulations in meteorology • Analysis by Kuippers & Lehnard 2001, Lehnard 2007 : drop « realism » in order to achieve efficiency How are simulations to be validated in biology ? • Mc Shea on complexity. Challenges Bonner (1988) explanation of the increase of complexity through selected incerase of size in various lineages Mc Shea (2005) suggests that complexity increase can be produced with no natural selection, only variation (complexity defined by diversity); models also produce patterns of complexity increase with patterns produced under various constraints (driven trend vs passive trends, with no selection). The pattern found in the fossil records may be produced by such process – but we need to have an idea about the processes likely to have actually occurred • A minimal characterisation of computer simulations in evolutionary biology : they provide candidate explanations (pure possible processes) and null hypotheses for evolutionary patterns • For the same reason (they don’t accept impure processes which are the ones really occurring) they can’t prove anything by themselves • An example worth to investigate : Hubbell’s ecological neutral theory (2001) • It skips the level of individual selection; generate the same outcome as what we see about succession, stability and persistance in communities IV. APPLICATION: DISCONTINUITIES IN EVOLUTION 4.1. The longstanding problem with innovations Darwinism is gradualist (small mutations selected etc.) Cumulative selection accounts for adaptations • Novelties, • Innovations (qualitative, eg morphological, differences); • Key innovations : trigger adaptive radiation, and new phylogenetic patterns (avian wing, fish gills, language…) – id est, phylogenetic and ecological causal patterns • Pattern and processes : the role of « punctuated equilibria theory » (Eldredge and Gould 1976) An issue with discontinuity • Problem : the fitness value of half a novelty ? (half a wing !) -> Solutions : - Find a benefit for each stage in various species (Darwin on the eye) - Conceive of it as an exaptation (ex. feathers) (Gould and Vrba 1981) - Developmental processes (Gould, 1977; Muller and Newman, 2005, etc.) : variation is not « minor », it’s a rearrangement of structures through shuffling of developmental modules/time (as such the pucntuated quilibria pattern don’t require a specific process) 4.2. Exploring discontinuity: Compositional evolution (Watson 2005) : “evolutionary processes involving the combination of systems and susbsystems of semiindependently preadapted genetic material” (p.3). • consideration of building blocks obeying some new rules that are inspired by the biological phenomena of sex and of symbiosis proves that in those processes non gradual emergence of novelties is possible. • 1. A system with weak interdepencies between parts can undergo linear evolution: increases in complexity are linear functions of the values of the variables describing the system. Algorithms looking for optimal solutions in this way are called “hill-climbers”; they are paradigmatically gradual. They easily evolve systems more complex in a quantitative way, but they can’t reach systems that would display innovations.. • 2. If you have arbitrary strong complexities between the parts, then evolving a new complex system will take exponential time (time increases as an exponential function of the number of variables). Here, the best algorithm to find optimal solutions is the random search. • 3. But if you have modular interdependencies (encapsulated parts, etc.) between parts, then evolving new complex systems is a polynomial function of the variables. (Watson 2005, 68-70) • Algorithms of the class “divide-and conquer” are dividing in subparts the optimisation issue, and divide in its turn each subpart in other subparts : the initial exponential complexity of the optimisation problem approached through random search is thereby divided each time that the general system is divided – so that in the end the problem has polynomial complexity. • Those algorithms illustrate how to evolve systems that are not gradual or linear improvements of extant systems; but as polynomial functions of the variables, they are feasible in finite time, unlike random search processes. • “Compositional evolution” concerns pure processes that embody those classes of algorithms with polynomial rates of complexification, and have genuine biological correspondents: sex; symbiosis. “mechanisms that encapsulate a group of simple entities into a complex entity” (Watson 2005, 3), and thus proceed exactly in the way algorithmically proper to polynomial-time complexity-increasing algorithms like “divide and conquer”. • Watson refined the usual crossover clause in GA, integrating various algorithmic devices (for ex. “messy GA”, according to Goldberg, Korb and Deb 1989) in order to account for selection on blocks that take into account correlation between distant blocks, hence creation of new blocks (Watson 2005, 77). • . This proves that processes formally structured like those encapsulated processes – such are symbiosis, endosymbiosis, may be lateral gene transfer – have been likely to provide evolvability towards the most complex innovations, the ones not reachable through gradual evolution • The bulk of the demonstration is the identity between algorithmic classes (hill-climbing, divide-and-conquer, random search) and evolutionary processes (gradual evolution, compositional evolution). • So the solution of the gradualism issue is neither a quest of nondarwinian explanation (“order for free”, etc.), nor a reassertion of the power of cumulative selection that needs to be more deeply investigated (Mayr *), but the formal designing of new modes of selective processes, of the reason of their differences, and of the differences between their evolutionary potentials. In this sense, discontinuities in evolution appear as the explananda of a variety of selective processes whose proper features and typical evolutionary patterns are demonstrated by computer sciences Open-ended evolution • Potential for discontinuities and novelties is constant or increasing • New adaptive radiations – wings for insects and birds, etc. – as opening possibilities of for other novelties • Not predictible – but retrodictible Modelling open-ended evolution • Question : what is specific to evolution in the biosphere ? • Limits in modeling open ended evolution in Alife (Bedau and Packard 1998) • Classify possible evolutionary patterns, with criteria that will take into account the degree of likeliness of discontinuities and emergences. • Those patterns will include classes of the pure possible processes that are directly implemented within the computational devices, and appear to be objects of investigation in computer sciences. • , Bedau and Packard (1998) :three kinds of emergence: class II is “bounded emergence” – Holland’s (1995) GA Echo –, as opposed to class I, no emergence, in an Echo simulation with no selection (what they call “Echo neutral shadow”), and class III is unbounded emergence – manifest in the phanerozoic fossil records – i.e. the history of Life. • “Bounded” for Bedau and Packard means that the range of adaptations exhibited is somehow finite, which is not the case in class III • Intuition : no new environment to be colonized in digital evolution Channon’s classification (2002) 1. Artificial selection in the SAGA simulation, 2. natural selection of program codes in Ray’s Tierra, which seems a now limited evolution, 3. less limited evolution by Channon’s “natural selection” in Geb simulation Is this class 3 = BP class III (phanerozoic records)? Typology in terms of driving processes • No selection. Phase transitions, etc. • Gradual evolution. Smooth landscapes, cumulative selection, problem of shifting balance theory, • Compositional / discontinuous evolution. Moving landscapes (not smooth); problems of facilitators of evolution (Wagner and Altenberg 1996 – evolvability as constraints on the genotypephenotype map). No fixed optima, hence some open ended evolution. • Local patterns of evolution can be simulated, hence providing candidate processes • General pattern of open-ended evolution in phanerozoic record is still unmatched (see Taylor 2004 for a state of the art) • No a priori reason for this • But it might be that no pure possible process is likely to generate this • The possibilities provided by those models settle the ground for empirically deciding about the specificity of life as a this-worldly feature (as opposed to « life » by AL theorists) Conclusion • Computational models are not a very general domain of which biology would exemplify some cases. (Against strong AL claim) • On the contrary they mostly provide pure possible processes that might causally contribute to origin of traits or evolutionary patterns. • The class of possible processes being larger than the real processes, obviously not all processes simulated are likely to be met in actual biology • The main difference between algorithms and biology might not be the chemical implementation of earthly life (replicators are DNA etc), but the fact that processes at work in biology are never pure in the sense that they involve all the levels of the hierarchy • Algorithmic devices only permit to single out one or few entities within them. In this sense they are only generating the pure processes involving solely those entities. • This constrains the form of the validation problem for computer simulations in evolutionary biology