Soft Computing Multiobjective Optimization Richard P. Simpson A Landscape of interest Inverted Shekel’s Foxholes Multiple Objective functions Recall the paper we discussed on Landscape Smoothing and its complex Objective function Calculate float_error Accumulates error of the 10 sums while in float state. Calculate diff_total Sort chromosome values and calculate the sum of the squares of the differences between the sequence 1,2,… 16 and the sorted chromosome Calculate int_error Accumulates error of the 10 sums while in integer state Minimize diff_total + float_error+ 2*int_error Multiple Objective functions Minimize diff_total + float_error+ 2*int_error The above formula is really just a weighted sum of three different objective functions. diff_total, float_error, and int_error The method used above is really just one of the many methods used. A method that I might add has some problems. Multi-Objective Evolutionary Algorithms (MOEA’s) MOEA’s allow us to search for solutions to complex high dimensional real-world applications that have multi-objective goals. find solutions to problems using little problem domain knowledge search in parallel easily find several trade-off solutions in a single run of the algorithm( assuming niching is used) attack certain single objective problems from a different perspective (Landscape smoothing) So what is the general problem? The Multiobjective optimization problem (MOP) can be defined as the problem of finding [Osyczka 1985] a vector of decision variables which satisfies constraints and optimizes a vector function show elements represent the objective functions. Hence, the term “optimize” means finding such a solution which would give the values of all objective functions acceptable to the designer. Formally Find the vector X=[x1,x2,…,xn] that satisfies the m constraints gi(X)≥0 for all I and p equality constraints hi(X) =0 for all I and that optimizes the vector function f(X) = [f1(X), f2(X) , …, fk(X)] Since we rarely have an X that minimizes/ maximizes all the fi at the same time the meaning of optimum is not well defined. What is optimum if often problem dependent. Lets first look at some previous research in this area. What does optimum mean here? Having several objective functions implies that we are trying to find a good compromise rather than a single optimal solution. Francis Ysidro Edgeworth first proposed a meaning for “optimum” in 1881 which was generalized in 1896 by Vilfredo Pareto Pareto optimality Pareto optimality optimality criterion for optimization problems with multi-criteria objectives (multi-criteria optimization). A state (a set of object parameters) is said to be Pareto optimal, if there is no other state dominating the state with respect to a set of objective functions. A state X dominates a state Y , if X is better than Y in at least one objective function and not worse with respect to all other objective functions. Pareto Optimality Another way of saying this is: X is Pareto optimal if there exists no feasible vector X’ which would decrease some criterion without causing a simultaneous increase in at least one other criterion. This concept almost always gives not a single solution, but rather a set of solutions called the Pareto optimal set (aka Pareto front) Current Multi-objective Optimization (from Carlos A. Coello Coello) There are over 30 mathematical programming techniques for multi-objective optimization. These methods tend to generate elements of the Pareto front one at a time. Most are sensitive to the shape of the Pareto front ( may not work if the front is concave or disconnected) First implementation of an evolutionary was by Schaffer in 1984 After that the field was practically inactive until around 1995 it took off. Popularity of Evolutionary algorithms in multiobjective optimization Citations by Year (up to 2001) (from Carlos A. Coello Coello) Classifying EMOO approaches (Evolutionary Multi-Objective Optimization) First Generation Techniques Non-Pareto approaches Pareto approaches Second Generation Techniques PAES SPEA NSGA-II MOMGA micro-GA Non-Pareto Techniques These are methods that do not use information about Pareto fronts explicitly. Incapable of producing certain portions of the Pareto front. Efficient and easy to implement, but appropriate to handle only a few objectives. Aggregate Objective Model (weighted sum method) Aggregated fitness functions are basically just a weighted sum of the objective functions. This is what we did in Landscape smoothing. The weighted sum creates a single objective function from the multi-objective fitness function. Determining the weights to use in this sum is non trivial and is almost always problem dependent. Aggregate Function The weighted sum is basically in the following form. k min w f ( x) i i i 1 where wi 0 represents the weights k often we assume w i 1 i 1 Applications Design of DSP system (Arslan, 1996) Water quality control (Garrett, 1999) System-level synthesis (Blickle, 1996) Design of optimal filters for lamps (Eklund, 1999) Landscape Smoothing (Simpson, 2004) Vector Evaluated Genetic Algorithm (VEGA) This work was performed by J. D. Schaffer in 1985 and can be found in paper Schaffer, J.D., Multiple objective optimization with vector evaluated genetic algorithms. In this method appropriate fractions of the next generation, or subpopulations, were selected from the whole of the old generation according to each of the objectives, separately. Crossover and mutation were applied as usual after combining the sub-populations VEGA generation(i) generation(i+1) shuffle fill each section using a separate objection function apply generic operators Advantages and Disadvantages Efficient and easy to implement It does not have an explicit mechanism to maintain diversity. It doesn’t necessarily produce non-dominated vectors. Sample Application of VEGA Combinational circuit design at the gate-level (Coello,2000) Design multiplierless IIR filters (Wilson, 1993) Aerodynamic optimization (Rogers, 2000) Groundwater pollution containment (Ritzel, 1994 Lexicographic Ordering Here the user is asked to rand the objectives in order of importance. The optimal solution is then obtained by minimized the objective functions, starting with the most important one and proceeding according to the assigned order Sample applications Symbolic layout compaction(Fourman, 1985) Robot path planning (Gacogne, 1999) Personel scheduling (El Moudani et al., 2001) Target Vector Approaches Definition of a set of goals (or targets) that we wish to achieve for each objective function. The EA is set up to minimize differences between the current solution and these goals. Can also be considered aggregating approaches, but in the case, concave portions of the Pareto front could be obtained. Advantages and Disadvantages Efficient and easy to implement Definition of goals may be difficult in some cases Some methods have been known to introduce misleading selection pressure under certain circumstances. Goals must lie in the feasible region so that the solutions generated are members of the Pareto optimal set. Sample Applications Intensities of emission lines of trace elements (Wienke, 1992) Optimization of a fishery bio-economic model ( Mardle et al., 2000) Optimization of the counterweight balancing of a robot arm (Coello, 1998) Pareto-based Techniques Suggested by Goldberg (1989) to solve the problems with Schaffer’s VEGA. Use of non-dominated ranking and selection to move the population towards the Pareto front Requires a ranking procedure and a technique to maintain diversity in the population (Otherwise, that GA will tend to converge to a sing solution) Multi-Objective Genetic Algorithm (MOGA) Proposed by Fonseca and Fleming (1993) see “Genetic Algorithms for Multiobjective Optimization:Formulation, Discussion and Generalization” This approach consists of a scheme in which the rank of an individual corresponds to the number of individuals in the current population by which it is dominated. It uses fitness sharing and mating restrictions. MOGA Ranking A vector X=(u1,u2,…,un) is superior (dominates) another vector Y =(v1,v2,…,vn) if for every i=1,…,n ui<=vi there exists i=1,…,n such that ui<vi If X is superior to Y then Y is inferior to X. Let x be an individual in the population t then rank(x,t)=1 + p(x) where p(x) is the number of individuals in population t that it is inferior to. Note that if it is a Pareto point then it is inferior to no one hence its rank is 1. MOGA Ranking Assigning fitness according to rank Sort population according to rank. Note that some rank values may not be represented. Assign fitnesses to individuals by interpolation from the best (rank 1) to the worst in the usual way, according to some function, usually linear. Average the fitnesses of individuals with the same rank, so that all of them will be sampled at the same rate. Note that this procedure keeps the global population fitness constant while maintaining appropriate selective pressure, as defined by the function used. Ranking example Suppose that we have 10 individuals in population that have ranks of 1, 2, 3,1,1,2 ,5, 3, 2, 5 Since there are fitnesses of 1,2,3,and 5 we could create a roulette wheel obtaining the following fitness for each rank. Sort them obtaining 1, 1, 1, 2, 2, 2, 3, 3, 5, 5 Map these guys to it fitness via function, say, f(x)=6-x giving 5,5,5,4,4,4,3,3,1,1 for fitnesses The pie is then broken into 35 slices, the first three getting 5 slices, the next three getting 4 etc. Advantages and Disadvantages Efficient and relative easy to implement Its performance depends on the appropriate selection of the sharing factor. MOGA was the most popular first-generation MOEA and it normally outperformed all of its contemporary competitors. MOGA Applications Fault diagnosis ( Marcu, 1997) Control systems design (Chipperfield 1995) Design of antennas (Thompson, 2001) System-level synthesis (Dick, 1998) Niched-Pareto Genetic Algorithm (NPGA) Proposed by Horn et al. (1993,1994) It uses a tournament selection scheme based on Pareto dominance. Two individuals randomly chosen are compared against a subset of the entire population(10% or so). When both competitors are either dominated or non-dominated(ie a tie), the result of the tournament is decided through fitness sharing in the objective domain. Advantages and Disadvantages Easy to implement Efficient because does not apply Pareto ranking to the entire pop. It seems to have a good overal performance. Besides requiring a sharing factor, it requires another parameter (tournament size) Sample applications Analysis of experimental spectra (Golovkin, 2000) Feature selection (Emmanouilidis, 2000) Fault-tolerant systems design (Schott, 1995) Road systems design ( Haastrup and Pereira, 1997) Non-dominated Sorting Genetic Algorithm Proposed by Srinivas and Deb (1994) Uses classifications layers. layer 1 is the set of non-dominated individuals layer 2 is the set of non-dominated individuals that occur when layer 1 is removed. etc. Sharing is performed at each layer using dummy fitnesses for that layer. Sharing spreads out the search over each classification layer. High fitness of the upper levels implies that the Pareto front is heavily searched. Research Questions at this time were: Are aggregating functions really doomed to fail when the Pareto front is non-convex? Can we find ways to maintain diversity in the pop. without using niches, which requires O(M2) work where M refers to the pop. size? If assume that there is no way to reduce the O(kM2) complexity required to perform Pareto ranking, How can we design a more efficient MOEA. Do we have appropriate test functions and metrics to evaluate quantitatively an MOEA? Will somebody develop theoretical foundations for MOEA’s? from Carlos, Coello Coello Generation 2 (Elitism) A new generation of algorithms came about with the introduction of the notion of elitism. Elitism (in this context) refers to the use of an external pop to retain the non-dominated individual. Design issues include How does the external file interact with the main population? What do we do when the external file is full Do we impose additional criteria to enter the file instead of just using Pareto dominance? from Carlos, Coello Coello Second Generations Algorithms include Strength Pareto Evolutionary Algorithm(SPEA), Zitzler and Thiele(1999) Strength Pareto Evolutionary Algorithm 2 (SPEA 2) by Zitzler Laumanns and Thiele 2001 Pareto Archived Evolution Strategy(PAES) by Knowles and Corne(2000) Nondominated Sorting Genetic Algorithm II Deb et al.(2002) Niched Pareto Genetic Algorithm 2(NPGA 2), Erickson et al.(2001) A quick look at the Pareto Archived Evolution Strategy (PAES) (1+1) PAES is made up of 3 parts. The candidate solution generator this is basically simple random mutation hillclimbing it maintains a single current solution at each iteration productes a single new candidate via random mutation the candidate solution acceptance function the Nondominated-Solutions (NDS) archive PAES(1+1) Pseudocode Generate initial random solution c and add it to the archive Mutate c to produce m and evaluate m if (c dominates m) discard m else if (m dominates c) replace c with m, and add m to the archive else if (m is dominated by any member of the archive) discard m else apply test(c, m, archive) to determine which becomes the new current solution and whether to add m to the archive until a termination criterion has been reached, return to line 2 Test(c, m, archive) if the archive is not full add m to the archive if (m is in a less crowded region of the archive than c) accept m as the new current solution else maintain c as the current solution else if (m is in a less crowded region of the archive than x for some member x on the archive) add m to the archive, and remove a member of the archive from the most crowded region if (m is in a less crowded region of the archive than c) accept m as the new courrent solution else maintain c as the current solution The Adaptive grid PAES uses a new crowding procedure based on recursively dividing up the d-dimensional objective space. This is done to minimize cost and to avoid niche-size parameter setting. Phenotype space is divided into hypercubes, which have a width of dr/2k in each dimension, where dr is the range (maximum minus minimum) of values in objective d of the solutions currently in the archive, and k is the subdivision parameter. Example grid for d=2 objectives If we use 5 levels with 2 objectives we basically have a quad-tree structure. Each level has 4 times the number of cells the previous level has. 1,4, 16, 64, 256, 1024 Hence we have 1024 regions of size Grid cell (max-min)/25 For the simple case of k=3 the 1 0 indicated cell has grid-location 0 101-100 or in binary 101100 1 So how do we find the grid location of X Recursively (for each dimension) go down the tree left (0) or right(1) creating a binary number. This requires k comparisons Then concat the binary strings creating a single binary number Note that the grid location of the previous 1024 cells is just a 10 bit string. Converting this 10 bit string to an integer gives one an index into a array Count[1024] that can be used to store the crowding number.