“Solution Concepts in Coevolutionary Algorithms” (Dissertation) by Sevan Gregory Ficici 1.4 Foundations Evolutionary algorithms typically have the following steps: o Initialize population o Evaluate each member of the population and assign a rating o If halting criterion is met, then stop; otherwise… o Select population members for breeding according to their ratings o Generate “offspring” from selected “parents” with variation operators o Insert offspring into population, and go to Step 2 In coevolution, individuals are evaluated by seeing how it interacts with other individuals in the population or other populations, depending on the search problem o In a population of size n, an individual will interact n*(n-1) / 2 times o In a population of size n and another population of size m, there will be n*m interactions This is called “complete mixing” A constant may be added to an individual’s fitness in order to make the value nonnegative Potter introduces cooperative coevolution (not cooperative game theory) o Aims to solve a difficult problem X by coevolving an effective set of solutions to a decomposition of X; if X is decomposed into n sub-problems, then n reproductively isolated populations are coevolved to “cooperately” solve the problem X o The less the sub-problems interact with each other, the more effective cooperative coevolution will be o Has the ability to dynamically adjust the problem’s decomposition o Requires all populations to adhere to a pre-specified interface that governs interaction between components Competitive coevolution: coevolution applied to a zero-sum game o Example: Iterated prison’s dilemma Both “styles” of coevolution use multiple, reproductively isolated populations; both can use similar patterns of inter-population interaction, similar diversity maintenance schemes, and so on The most salient difference between cooperative and competitive coevolution resides primarily in the game-theoretic properties of the domains to which these algorithms are applied Assumptions when performing coevolutionary optimization o Initially ignorant to the gamut of behaviors available to an evolving agent o Initially ignorant of the outcomes obtained by the possible behaviors o Treat evolving individual as a “black box” o Cannot definitively establish identity of a behavior exhibited by an evolving agent without exhaustive testing o Cannot assume that individuals with different genotypes must behave differently In this dissertation, coevolutionary algorithms perfom optimization, and the notion of optimality is specified by a solution concept k-armed bandit problem: We have k slot machines and N coins; each of these slot machines has a different expected rate of reward that is unknown to us. Our task is to apportion our N coins amongst the k machines to optimize our expected cumulative return. Thus, we have finite resources with which to explore the rates of return of the k machines and exploit the machine with the highest observed return. When we apply evolution to a static multi-objective problem, then the solution that is delivered is typically the Pareto front, which is a set of non-dominated feasible members of a trade-off surface; these individuals are either in the evolving population or in an archive of some sort The solution may be an individual, a group of individuals from different population, a state of a population, or in some other form Behavior complex represents various types of strategy collections Chapter 3: A Taxonomy of Issues and Research Reasons to use a coevolutionary algorithm for machine learning o Make more efficient use of finite computational power by focusing evaluation effort on the most relevant tests. For example, those that best distinguish the quality of potential solutions o Some domains intrinsically require coevolution, such as games o Some domains require less (human-supplied) inductive bias when using coevolution than when using other search methods o Some domains are “open ended”—having an infinite number of possible behaviors Efficiency: focuses on minimal sorting networks and cellular automaton for density classification o Hillis and 16-input sorting networks: tries to competitively coevolve test-case samples such that they remain appropriate to the abilities of the evolving networks as they improve Finds better results, including a solution that has one more comparison-exchange operation than the currently known minimal network by Green Juille uses a portion of Hillis’s best solution to improve minimal network o Juille’s and Pollack’s majority function: discover an automaton rule that will cause the automaton to converge to a state of all ones if the IC has more ones than zeros, and converge to a state of all zeros otherwise Paredis was the first to attempt to use coevolution with the majority problem Coevolve rules with actual initial conditions rather than density classes Uses lifetime fitness evaluation (LTFE) to integrate multiple scores over multiple fitness evaluations Intrinsically Interactive Domains Valen’s Red-Queen Effect: if we simply monitor population fitness values (whether mean or maximum), we cannot reliably detect coevolutionary progress o For example, if a strong individual interacts with superior individuals, then the strong individual will appear weak. On the other hand, a mediocre individual interacting with weak individuals will appear to be strong o Several methods to detect and monitor progress relate to memory mechanisms Prevent evolutionary “forgetting” and maintaining a history (or “memory”). These operate by collecting the most fit individuals over evolutionary time (typically the most fit in each generation) and playing them against each other Miller and Cliff’s current individual ancestral opponents (CIAO) Floreano and Nolfi’s master tournament (MT) Stanley and Miikkulainen’s dominance tournament (DT) o Zero-sum symmetric “robot duel” game o Adds an individual to the collection if and only if it defeats all other individuals already in the collection o Ensures no intransitive cycle Loss of Gradient and Disengagement o Coevolution entails two search problems Primary search problem concerns the domain of interest Example: Cellular Automaton Research. Find an optimal automaton rule. Secondary search problem concerns the discovery of interactions that will allow us to search the primary domain effectively and recognize solutions Example: Cellular Automaton Research. Find appropriate automaton initial conditions Loss of gradient: If no member in the set of interactions can distinguish any two members of the current set of evolving individuals, then we have a loss of gradient in the primary search effort In single populations, gradient loss implies that all individuals receive the same fitness When the primary and secondary search problems involve separate populations, then a loss of gradient means that the populations have become disengaged Examples: Juille and Pollack coevolve cellular automaton rules with automaton initial conditions for a density classification task Generators (no input, one output) and predictors (one input, one output)— a predictor guesses what the generator will output Three population framework: generators evolve to be predictable to “friendly” predictors and simultaneously unpredictable to “hostile” predictors o Algorithmic remedies Example: Phantom parasite. If a strong individual a1 in population p1 beats every individual in population p2, then a1 loses to the phantom parasite; if an individual b1 in population p1 loses to an individual in population p2, then b1 beats the phantom parasite, which makes a1 at a disadvantage and preventing a1 from taking over population p1 Example: Moderating parasite virulence. Cartilage and Bullock seek to discount the fitness of individuals who attain perfect scores against the opposing population, preventing them from taking over and causing disengagement Example: Paredis and Olsson’s approach was to slow down reproduction for the stronger population, so the weaker population has time to adapt Intransitivity, Cycling, and the Red-Queen o Cycling population dynamics: caused by intransitive superiority structures Example: Rock-Paper-Scissors (RPS) Nash equilibrium strategy: equally choose rock, paper, and scissors Example: Matching pennies game. P1 wins if p1 and p2 both choose heads or both choose tails; p2 wins otherwise Nash equilibrium strategy: both players choose each pure strategy with probability one-half o Valen’s Red-Queen Effect: To maintain a level of fitness in a dynamic environment, a specie must continuously evolve. It also refers to an evolutionary “arms race” between two competing species, where each specie forces the other to become increasingly competent at certain behaviors o Examples: Paredis described cyclic dynamics on the majority function Juille and Pollack described cyclic dynamics on the majority function Author used coevolution for a time-series prediction task Miller discuss cyclic dynamics in pursuit and evasive contests Nolfi and Floreano discuss cyclic dynamics in pursuit and evasive contests o Algorithmic Remedies Nolfi and Floreano show that the effect of intransitivity can be diminished by adding various static obstacles to the environment that affect agent fitness Bullock implements a diffuse selection pressure by evolving multiple, reproductively isolated populations and having each agent interact with members of each population, creating greater genetic and behavioral diversity, which broadens selection pressure and dilutes the effect of intransitivity Forgetting: Process of trait loss o “Trait” refers to any measureable aspect of behavior o Causes of trait loss: Selected against—individuals with the trait are less fit, on average, than individuals without the trait Trait is not strongly acted upon by selection pressure and is left to drift according to biases in the variational operators Trait is selected for, but is difficult to maintain—the variational operators are strongly biased against it, making offspring likely to lack the trait These causes eventually lead to a population at some later point in time where no individual has the particular trait o An example of trait-loss becomes an instance of forgetting when, at some later point in time, the population has No individual having a trait x Some individual would gain an increase in fitness value if the trait x is obtained o This suggests an intransitive structure is at work o Focusing: When a trait is forgotten due to drift, selection pressure has become too narrow o Examples: Cliff and Miller discuss role of intransitivity in forgetting in pursuit-and-evasive contests Floreano and Nolfi use a shallow “Hall of Fame” memory to help stabilize cycling, but still obtain forgetting due to intransitivity Watson and Pollack provide vivid illustrations of how forgetting ensues from genetic drift in numbers games o Algorithmic Remedies: Pollack and Blair’s work suggests that the game of backgammon naturally provides such diverse selection pressures and is therefore resistant to evolutionary forgetting Boyd proves that contrite tit-for-tat (contains mistakes) prevents forgetting of important skills Memory mechanisms maintain a collection of “good” individuals thus encapsulating a wider range of phenotypes than is typically found in the evolving population at any one moment What is the solution concept (what to remember)? o When a domain forces mutual exclusivity between certain traits, or when an evolutionary representation (genotype) cannot simultaneously encode all desired traits Almost all memory mechanisms in the literature are instances of a general “best of generation” (BOG) model where The most fit individual in each of the m most recent generations is retained by the memory mechanism L of the m retained individuals are sampled without replacement for use in testing individuals in the current generation Stanley and Miikkulainen propose that their dominance tournament can be adapted for use as a memory mechanism by retaining the most fit individual of the current generation only if it beats all the individuals previously retained by the memory Fitness Deception Obscures Solutions o Coordination Game: symmetric two-player variable-sum game where both players must play the same pure strategy to receive maximal payoff Diversity Maintenance and Teaching o Maintaining genetic phenotypic diversity is a general antidote to all of the common pathologies o Several methods have been reported to maintain genetic phenotypic diversity