Multiobjective Optimisation

advertisement
Soft Computing
Multiobjective Optimization
Richard P. Simpson
A Landscape of interest
Inverted Shekel’s Foxholes
Multiple Objective functions
 Recall the paper we discussed on Landscape
Smoothing and its complex Objective function




Calculate float_error
 Accumulates error of the 10 sums while in float state.
Calculate diff_total
 Sort chromosome values and calculate the sum of the
squares of the differences between the sequence
1,2,… 16 and the sorted chromosome
Calculate int_error
 Accumulates error of the 10 sums while in integer
state
Minimize diff_total + float_error+ 2*int_error
Multiple Objective functions
Minimize diff_total + float_error+ 2*int_error
The above formula is really just a weighted
sum of three different objective functions.
diff_total, float_error, and int_error
The method used above is really just one of
the many methods used. A method that I
might add has some problems.
Multi-Objective Evolutionary
Algorithms (MOEA’s)
 MOEA’s allow us to
 search for solutions to complex high
dimensional real-world applications that have
multi-objective goals.
 find solutions to problems using little problem
domain knowledge
 search in parallel easily
 find several trade-off solutions in a single run
of the algorithm( assuming niching is used)
 attack certain single objective problems from a
different perspective (Landscape smoothing)
So what is the general problem?
 The Multiobjective optimization problem
(MOP) can be defined as the problem of
finding [Osyczka 1985] a vector of decision
variables which satisfies constraints and
optimizes a vector function show elements
represent the objective functions.
 Hence, the term “optimize” means finding
such a solution which would give the values
of all objective functions acceptable to the
designer.
Formally
 Find the vector X=[x1,x2,…,xn] that satisfies the m
constraints gi(X)≥0 for all I and p equality constraints
hi(X) =0 for all I and that optimizes the vector
function f(X) = [f1(X), f2(X) , …, fk(X)]
 Since we rarely have an X that minimizes/ maximizes
all the fi at the same time the meaning of optimum is
not well defined.
 What is optimum if often problem dependent.
 Lets first look at some previous research in this area.
What does optimum mean here?
 Having several objective functions implies
that we are trying to find a good compromise
rather than a single optimal solution.
 Francis Ysidro Edgeworth first proposed a
meaning for “optimum” in 1881 which was
generalized in 1896 by Vilfredo Pareto
Pareto optimality
 Pareto optimality
 optimality criterion for optimization problems
with multi-criteria objectives (multi-criteria
optimization). A state (a set of object
parameters) is said to be Pareto optimal, if
there is no other state dominating the state
with respect to a set of objective functions. A
state X dominates a state Y , if X is better
than Y in at least one objective function and
not worse with respect to all other objective
functions.
Pareto Optimality
 Another way of saying this is: X is Pareto
optimal if there exists no feasible vector X’
which would decrease some criterion without
causing a simultaneous increase in at least
one other criterion.
 This concept almost always gives not a single
solution, but rather a set of solutions called
the Pareto optimal set (aka Pareto front)
Current Multi-objective Optimization
(from Carlos A. Coello Coello)
 There are over 30 mathematical programming




techniques for multi-objective optimization.
These methods tend to generate elements of the
Pareto front one at a time.
Most are sensitive to the shape of the Pareto front (
may not work if the front is concave or disconnected)
First implementation of an evolutionary was by
Schaffer in 1984
After that the field was practically inactive until
around 1995 it took off.
Popularity of Evolutionary algorithms in multiobjective optimization
Citations by Year (up to 2001)
(from Carlos A. Coello Coello)
Classifying EMOO approaches
(Evolutionary Multi-Objective Optimization)
 First Generation Techniques


Non-Pareto approaches
Pareto approaches
 Second Generation Techniques





PAES
SPEA
NSGA-II
MOMGA
micro-GA
Non-Pareto Techniques
 These are methods that do not use
information about Pareto fronts explicitly.
 Incapable of producing certain portions of the
Pareto front.
 Efficient and easy to implement, but
appropriate to handle only a few objectives.
Aggregate Objective Model
(weighted sum method)
 Aggregated fitness functions are basically just
a weighted sum of the objective functions.
This is what we did in Landscape smoothing.
 The weighted sum creates a single objective
function from the multi-objective fitness
function.
 Determining the weights to use in this sum is
non trivial and is almost always problem
dependent.
Aggregate Function
 The weighted sum is basically in the following form.
k
min
 w f ( x)
i i
i 1
where wi  0
represents the weights
k
often we assume
w
i 1
i
1
Applications
 Design of DSP system (Arslan, 1996)
 Water quality control (Garrett, 1999)
 System-level synthesis (Blickle, 1996)
 Design of optimal filters for lamps (Eklund,
1999)
 Landscape Smoothing (Simpson, 2004)
Vector Evaluated Genetic Algorithm
(VEGA)
 This work was performed by J. D. Schaffer in
1985 and can be found in paper

Schaffer, J.D., Multiple objective optimization
with vector evaluated genetic algorithms.
 In this method appropriate fractions of the
next generation, or subpopulations, were
selected from the whole of the old generation
according to each of the objectives,
separately.
 Crossover and mutation were applied as
usual after combining the sub-populations
VEGA
generation(i)
generation(i+1)
shuffle
fill each section
using a separate
objection function
apply
generic
operators
Advantages and Disadvantages
 Efficient and easy to implement
 It does not have an explicit mechanism to
maintain diversity.
 It doesn’t necessarily produce non-dominated
vectors.
Sample Application of VEGA
 Combinational circuit design at the gate-level
(Coello,2000)
 Design multiplierless IIR filters (Wilson, 1993)
 Aerodynamic optimization (Rogers, 2000)
 Groundwater pollution containment (Ritzel,
1994
Lexicographic Ordering
 Here the user is asked to rand the objectives
in order of importance.
 The optimal solution is then obtained by
minimized the objective functions, starting
with the most important one and proceeding
according to the assigned order
Sample applications
 Symbolic layout compaction(Fourman, 1985)
 Robot path planning (Gacogne, 1999)
 Personel scheduling (El Moudani et al., 2001)
Target Vector Approaches
 Definition of a set of goals (or targets) that we
wish to achieve for each objective function.
 The EA is set up to minimize differences
between the current solution and these goals.
 Can also be considered aggregating
approaches, but in the case, concave
portions of the Pareto front could be obtained.
Advantages and Disadvantages
 Efficient and easy to implement
 Definition of goals may be difficult in some
cases
 Some methods have been known to
introduce misleading selection pressure
under certain circumstances.
 Goals must lie in the feasible region so that
the solutions generated are members of the
Pareto optimal set.
Sample Applications
 Intensities of emission lines of trace elements
(Wienke, 1992)
 Optimization of a fishery bio-economic model
( Mardle et al., 2000)
 Optimization of the counterweight balancing
of a robot arm (Coello, 1998)
Pareto-based Techniques
 Suggested by Goldberg (1989) to solve the
problems with Schaffer’s VEGA.
 Use of non-dominated ranking and selection
to move the population towards the Pareto
front
 Requires a ranking procedure and a
technique to maintain diversity in the
population (Otherwise, that GA will tend to
converge to a sing solution)
Multi-Objective Genetic Algorithm
(MOGA)
 Proposed by Fonseca and Fleming (1993)

see “Genetic Algorithms for Multiobjective
Optimization:Formulation, Discussion and
Generalization”
 This approach consists of a scheme in which
the rank of an individual corresponds to the
number of individuals in the current
population by which it is dominated.
 It uses fitness sharing and mating restrictions.
MOGA Ranking
 A vector X=(u1,u2,…,un) is superior
(dominates) another vector Y =(v1,v2,…,vn) if
for every i=1,…,n ui<=vi there exists i=1,…,n
such that ui<vi
 If X is superior to Y then Y is inferior to X.
 Let x be an individual in the population t

then rank(x,t)=1 + p(x) where p(x) is the
number of individuals in population t that it is
inferior to. Note that if it is a Pareto point then
it is inferior to no one hence its rank is 1.
MOGA Ranking
 Assigning fitness according to rank
 Sort population according to rank. Note that some rank
values may not be represented.
 Assign fitnesses to individuals by interpolation from the
best (rank 1) to the worst in the usual way, according to
some function, usually linear.
 Average the fitnesses of individuals with the same
rank, so that all of them will be sampled at the same
rate. Note that this procedure keeps the global
population fitness constant while maintaining
appropriate selective pressure, as defined by the
function used.
Ranking example
 Suppose that we have 10 individuals in population
that have ranks of 1, 2, 3,1,1,2 ,5, 3, 2, 5
 Since there are fitnesses of 1,2,3,and 5 we could
create a roulette wheel obtaining the following
fitness for each rank.
 Sort them obtaining 1, 1, 1, 2, 2, 2, 3, 3, 5, 5
 Map these guys to it fitness via function, say,



f(x)=6-x
giving 5,5,5,4,4,4,3,3,1,1 for fitnesses
The pie is then broken into 35 slices, the first three
getting 5 slices, the next three getting 4 etc.
Advantages and Disadvantages
 Efficient and relative easy to implement
 Its performance depends on the appropriate
selection of the sharing factor.
 MOGA was the most popular first-generation
MOEA and it normally outperformed all of its
contemporary competitors.
MOGA Applications
 Fault diagnosis ( Marcu, 1997)
 Control systems design (Chipperfield 1995)
 Design of antennas (Thompson, 2001)
 System-level synthesis (Dick, 1998)
Niched-Pareto Genetic Algorithm
(NPGA)
 Proposed by Horn et al. (1993,1994)
 It uses a tournament selection scheme based
on Pareto dominance. Two individuals
randomly chosen are compared against a
subset of the entire population(10% or so).
When both competitors are either dominated
or non-dominated(ie a tie), the result of the
tournament is decided through fitness sharing
in the objective domain.
Advantages and Disadvantages
 Easy to implement
 Efficient because does not apply Pareto
ranking to the entire pop.
 It seems to have a good overal performance.
 Besides requiring a sharing factor, it requires
another parameter (tournament size)
Sample applications
 Analysis of experimental spectra (Golovkin,
2000)
 Feature selection (Emmanouilidis, 2000)
 Fault-tolerant systems design (Schott, 1995)
 Road systems design ( Haastrup and Pereira,
1997)
Non-dominated Sorting Genetic
Algorithm
 Proposed by Srinivas and Deb (1994)
 Uses classifications layers.
 layer 1 is the set of non-dominated individuals
 layer 2 is the set of non-dominated individuals that
occur when layer 1 is removed. etc.
 Sharing is performed at each layer using dummy
fitnesses for that layer.
 Sharing spreads out the search over each
classification layer.
 High fitness of the upper levels implies that the
Pareto front is heavily searched.
Research Questions at this time were:
 Are aggregating functions really doomed to fail when




the Pareto front is non-convex?
Can we find ways to maintain diversity in the pop.
without using niches, which requires O(M2) work
where M refers to the pop. size?
If assume that there is no way to reduce the O(kM2)
complexity required to perform Pareto ranking, How
can we design a more efficient MOEA.
Do we have appropriate test functions and metrics to
evaluate quantitatively an MOEA?
Will somebody develop theoretical foundations for
MOEA’s?
from Carlos, Coello Coello
Generation 2 (Elitism)
 A new generation of algorithms came about
with the introduction of the notion of elitism.
 Elitism (in this context) refers to the use of an
external pop to retain the non-dominated
individual. Design issues include



How does the external file interact with the
main population?
What do we do when the external file is full
Do we impose additional criteria to enter the
file instead of just using Pareto dominance?
from Carlos, Coello Coello
Second Generations Algorithms
include
 Strength Pareto Evolutionary Algorithm(SPEA),




Zitzler and Thiele(1999)
Strength Pareto Evolutionary Algorithm 2 (SPEA 2)
by Zitzler Laumanns and Thiele 2001
Pareto Archived Evolution Strategy(PAES) by
Knowles and Corne(2000)
Nondominated Sorting Genetic Algorithm II Deb et
al.(2002)
Niched Pareto Genetic Algorithm 2(NPGA 2),
Erickson et al.(2001)
A quick look at the Pareto Archived
Evolution Strategy (PAES)
 (1+1) PAES is made up of 3 parts.

The candidate solution generator





this is basically simple random mutation
hillclimbing
it maintains a single current solution
at each iteration productes a single new
candidate via random mutation
the candidate solution acceptance function
the Nondominated-Solutions (NDS) archive
PAES(1+1) Pseudocode
Generate initial random solution c and add it to the archive
Mutate c to produce m and evaluate m
if (c dominates m) discard m
else if (m dominates c)
replace c with m, and add m to the archive
else if (m is dominated by any member of the archive) discard m
else apply test(c, m, archive) to determine which becomes the
new current solution and whether to add m to the archive
until a termination criterion has been reached, return to line 2
Test(c, m, archive)
if the archive is not full
add m to the archive
if (m is in a less crowded region of the archive than c) accept m
as the new current solution
else maintain c as the current solution
else
if (m is in a less crowded region of the archive than x for some
member x on the archive)
add m to the archive, and remove a member of
the archive from the most crowded region
if (m is in a less crowded region of the archive than c)
accept m as the new courrent solution
else maintain c as the current solution
The Adaptive grid
 PAES uses a new crowding procedure based on
recursively dividing up the d-dimensional objective
space. This is done to minimize cost and to avoid
niche-size parameter setting.
 Phenotype space is divided into hypercubes, which
have a width of dr/2k in each dimension, where dr is
the range (maximum minus minimum) of values in
objective d of the solutions currently in the archive,
and k is the subdivision parameter.
Example grid for d=2 objectives
 If we use 5 levels with 2 objectives we
basically have a quad-tree structure.
 Each level has 4 times the number of cells
the previous level has. 1,4, 16, 64, 256, 1024
 Hence we have 1024 regions of size
Grid cell
(max-min)/25
 For the simple case of k=3 the
1
0
 indicated cell has grid-location
0
 101-100 or
 in binary 101100
1
So how do we find the
grid location of X
 Recursively (for each dimension) go down the tree
left (0) or right(1) creating a binary number. This
requires k comparisons
 Then concat the binary strings creating a single
binary number
 Note that the grid location of the previous 1024 cells
is just a 10 bit string.
 Converting this 10 bit string to an integer gives one
an index into a array Count[1024] that can be used to
store the crowding number.
Download