Ficici_Solution_Concepts_Dissertation_Notes

advertisement
“Solution Concepts in Coevolutionary Algorithms” (Dissertation) by Sevan Gregory Ficici
1.4 Foundations










Evolutionary algorithms typically have the following steps:
o Initialize population
o Evaluate each member of the population and assign a rating
o If halting criterion is met, then stop; otherwise…
o Select population members for breeding according to their ratings
o Generate “offspring” from selected “parents” with variation operators
o Insert offspring into population, and go to Step 2
In coevolution, individuals are evaluated by seeing how it interacts with other individuals in the
population or other populations, depending on the search problem
o In a population of size n, an individual will interact n*(n-1) / 2 times
o In a population of size n and another population of size m, there will be n*m interactions
 This is called “complete mixing”
A constant may be added to an individual’s fitness in order to make the value nonnegative
Potter introduces cooperative coevolution (not cooperative game theory)
o Aims to solve a difficult problem X by coevolving an effective set of solutions to a
decomposition of X; if X is decomposed into n sub-problems, then n reproductively isolated
populations are coevolved to “cooperately” solve the problem X
o The less the sub-problems interact with each other, the more effective cooperative
coevolution will be
o Has the ability to dynamically adjust the problem’s decomposition
o Requires all populations to adhere to a pre-specified interface that governs interaction
between components
Competitive coevolution: coevolution applied to a zero-sum game
o Example: Iterated prison’s dilemma
Both “styles” of coevolution use multiple, reproductively isolated populations; both can use similar
patterns of inter-population interaction, similar diversity maintenance schemes, and so on
The most salient difference between cooperative and competitive coevolution resides primarily in
the game-theoretic properties of the domains to which these algorithms are applied
Assumptions when performing coevolutionary optimization
o Initially ignorant to the gamut of behaviors available to an evolving agent
o Initially ignorant of the outcomes obtained by the possible behaviors
o Treat evolving individual as a “black box”
o Cannot definitively establish identity of a behavior exhibited by an evolving agent without
exhaustive testing
o Cannot assume that individuals with different genotypes must behave differently
In this dissertation, coevolutionary algorithms perfom optimization, and the notion of optimality is
specified by a solution concept
k-armed bandit problem: We have k slot machines and N coins; each of these slot machines has a
different expected rate of reward that is unknown to us. Our task is to apportion our N coins
amongst the k machines to optimize our expected cumulative return. Thus, we have finite resources



with which to explore the rates of return of the k machines and exploit the machine with the highest
observed return.
When we apply evolution to a static multi-objective problem, then the solution that is delivered is
typically the Pareto front, which is a set of non-dominated feasible members of a trade-off surface;
these individuals are either in the evolving population or in an archive of some sort
The solution may be an individual, a group of individuals from different population, a state of a
population, or in some other form
Behavior complex represents various types of strategy collections
Chapter 3: A Taxonomy of Issues and Research
 Reasons to use a coevolutionary algorithm for machine learning
o Make more efficient use of finite computational power by focusing evaluation effort on the
most relevant tests. For example, those that best distinguish the quality of potential
solutions
o Some domains intrinsically require coevolution, such as games
o Some domains require less (human-supplied) inductive bias when using coevolution than
when using other search methods
o Some domains are “open ended”—having an infinite number of possible behaviors
 Efficiency: focuses on minimal sorting networks and cellular automaton for density classification
o Hillis and 16-input sorting networks: tries to competitively coevolve test-case samples such
that they remain appropriate to the abilities of the evolving networks as they improve
 Finds better results, including a solution that has one more comparison-exchange
operation than the currently known minimal network by Green
 Juille uses a portion of Hillis’s best solution to improve minimal network
o Juille’s and Pollack’s majority function: discover an automaton rule that will cause the
automaton to converge to a state of all ones if the IC has more ones than zeros, and
converge to a state of all zeros otherwise
 Paredis was the first to attempt to use coevolution with the majority problem
 Coevolve rules with actual initial conditions rather than density classes
 Uses lifetime fitness evaluation (LTFE) to integrate multiple scores over
multiple fitness evaluations
 Intrinsically Interactive Domains
 Valen’s Red-Queen Effect: if we simply monitor population fitness values (whether mean or
maximum), we cannot reliably detect coevolutionary progress
o For example, if a strong individual interacts with superior individuals, then the strong
individual will appear weak. On the other hand, a mediocre individual interacting with weak
individuals will appear to be strong
o Several methods to detect and monitor progress relate to memory mechanisms
 Prevent evolutionary “forgetting” and maintaining a history (or “memory”). These
operate by collecting the most fit individuals over evolutionary time (typically the
most fit in each generation) and playing them against each other
 Miller and Cliff’s current individual ancestral opponents (CIAO)
 Floreano and Nolfi’s master tournament (MT)
 Stanley and Miikkulainen’s dominance tournament (DT)
o Zero-sum symmetric “robot duel” game
o


Adds an individual to the collection if and only if it defeats all other
individuals already in the collection
o Ensures no intransitive cycle
Loss of Gradient and Disengagement
o Coevolution entails two search problems
 Primary search problem concerns the domain of interest
 Example: Cellular Automaton Research. Find an optimal automaton rule.
 Secondary search problem concerns the discovery of interactions that will allow us
to search the primary domain effectively and recognize solutions
 Example: Cellular Automaton Research. Find appropriate automaton initial
conditions
 Loss of gradient: If no member in the set of interactions can distinguish any two
members of the current set of evolving individuals, then we have a loss of gradient
in the primary search effort
 In single populations, gradient loss implies that all individuals receive the
same fitness
 When the primary and secondary search problems involve separate populations,
then a loss of gradient means that the populations have become disengaged
 Examples:
 Juille and Pollack coevolve cellular automaton rules with automaton initial
conditions for a density classification task
 Generators (no input, one output) and predictors (one input, one output)—
a predictor guesses what the generator will output
 Three population framework: generators evolve to be predictable to
“friendly” predictors and simultaneously unpredictable to “hostile”
predictors
o Algorithmic remedies
 Example: Phantom parasite. If a strong individual a1 in population p1 beats every
individual in population p2, then a1 loses to the phantom parasite; if an individual
b1 in population p1 loses to an individual in population p2, then b1 beats the
phantom parasite, which makes a1 at a disadvantage and preventing a1 from taking
over population p1
 Example: Moderating parasite virulence. Cartilage and Bullock seek to discount the
fitness of individuals who attain perfect scores against the opposing population,
preventing them from taking over and causing disengagement
 Example: Paredis and Olsson’s approach was to slow down reproduction for the
stronger population, so the weaker population has time to adapt
Intransitivity, Cycling, and the Red-Queen
o Cycling population dynamics: caused by intransitive superiority structures
 Example: Rock-Paper-Scissors (RPS)
 Nash equilibrium strategy: equally choose rock, paper, and scissors
 Example: Matching pennies game. P1 wins if p1 and p2 both choose heads or both
choose tails; p2 wins otherwise
 Nash equilibrium strategy: both players choose each pure strategy with
probability one-half
o Valen’s Red-Queen Effect: To maintain a level of fitness in a dynamic environment, a specie
must continuously evolve. It also refers to an evolutionary “arms race” between two

competing species, where each specie forces the other to become increasingly competent at
certain behaviors
o Examples:
 Paredis described cyclic dynamics on the majority function
 Juille and Pollack described cyclic dynamics on the majority function
 Author used coevolution for a time-series prediction task
 Miller discuss cyclic dynamics in pursuit and evasive contests
 Nolfi and Floreano discuss cyclic dynamics in pursuit and evasive contests
o Algorithmic Remedies
 Nolfi and Floreano show that the effect of intransitivity can be diminished by adding
various static obstacles to the environment that affect agent fitness
 Bullock implements a diffuse selection pressure by evolving multiple, reproductively
isolated populations and having each agent interact with members of each
population, creating greater genetic and behavioral diversity, which broadens
selection pressure and dilutes the effect of intransitivity
Forgetting: Process of trait loss
o “Trait” refers to any measureable aspect of behavior
o Causes of trait loss:
 Selected against—individuals with the trait are less fit, on average, than individuals
without the trait
 Trait is not strongly acted upon by selection pressure and is left to drift according to
biases in the variational operators
 Trait is selected for, but is difficult to maintain—the variational operators are
strongly biased against it, making offspring likely to lack the trait
 These causes eventually lead to a population at some later point in time where no
individual has the particular trait
o An example of trait-loss becomes an instance of forgetting when, at some later point in
time, the population has
 No individual having a trait x
 Some individual would gain an increase in fitness value if the trait x is obtained
o This suggests an intransitive structure is at work
o Focusing: When a trait is forgotten due to drift, selection pressure has become too narrow
o Examples:
 Cliff and Miller discuss role of intransitivity in forgetting in pursuit-and-evasive
contests
 Floreano and Nolfi use a shallow “Hall of Fame” memory to help stabilize cycling,
but still obtain forgetting due to intransitivity
 Watson and Pollack provide vivid illustrations of how forgetting ensues from genetic
drift in numbers games
o Algorithmic Remedies:
 Pollack and Blair’s work suggests that the game of backgammon naturally provides
such diverse selection pressures and is therefore resistant to evolutionary forgetting
 Boyd proves that contrite tit-for-tat (contains mistakes) prevents forgetting of
important skills
 Memory mechanisms maintain a collection of “good” individuals thus encapsulating
a wider range of phenotypes than is typically found in the evolving population at any
one moment
 What is the solution concept (what to remember)?
o When a domain forces mutual exclusivity between certain traits, or


when an evolutionary representation (genotype) cannot
simultaneously encode all desired traits
 Almost all memory mechanisms in the literature are instances of a general “best of
generation” (BOG) model where
 The most fit individual in each of the m most recent generations is retained
by the memory mechanism
 L of the m retained individuals are sampled without replacement for use in
testing individuals in the current generation
 Stanley and Miikkulainen propose that their dominance tournament can be adapted
for use as a memory mechanism by retaining the most fit individual of the current
generation only if it beats all the individuals previously retained by the memory
Fitness Deception Obscures Solutions
o Coordination Game: symmetric two-player variable-sum game where both players must play
the same pure strategy to receive maximal payoff
Diversity Maintenance and Teaching
o Maintaining genetic phenotypic diversity is a general antidote to all of the common
pathologies
o Several methods have been reported to maintain genetic phenotypic diversity
Download