2012-03-27 Optimization Techniques for Design Space Exploration Zebo Peng Embedded Systems Laboratory (ESLAB) Linköping University Outline Optimization problems in ERT system design Heuristic techniques Simulated annealing Tabu search Prof. Z. Peng, ESLAB/LiTH, Sweden 2 1 2012-03-27 Design Space of ERT Systems Very large due to many solution parameters: architectures and components hardware/software partitioning mapping and scheduling operating systems and global control communication synthesis Hardware Software Embedded memory Microprocessor ASIC DSP Analog circuit Sensor Prof. Z. Peng, ESLAB/LiTH, Sweden Sourc e: S3 Network High-speed electronics 3 Design Space of ERT Systems Very bumpy due to the interdependence between the different parameters and the many design constraints (time, power, partial structure, ...). 27 25 Many embedded RT systems have a 23 very complex design space. 21 19 17 Prof. Z. Peng, 15 ESLAB/LiTH, Sweden 4 3,9 33 2 2012-03-27 Design Space Exploration What are needed in order to explore the complex design space to find a good solution: Exploration in the higher level of abstractions. Development of high-level analysis and estimation techniques. Employment of very fast exploration algorithms. Memory-less algorithms. Each solution needs a huge data structure to store, so we can’t afford to keep track of all visited solutions. Design space exploration should be formulated as optimization problems and powerful optimization techniques are needed! Prof. Z. Peng, ESLAB/LiTH, Sweden 5 The Optimization Problem The majority of design space exploration tasks can be viewed as optimization problems: To find - the architecture (type and number of processors, memory modules, and communication mechanism, as well as their interconnections), - the mapping of functionality onto the architecture components, and - the schedules of basic functions and communications, such that a cost function (in terms of implementation cost, performance, power, etc.) is minimized and a set of constraints is satisfied. Prof. Z. Peng, ESLAB/LiTH, Sweden 6 3 2012-03-27 Mathematical Optimization The design optimization problems can be formulated as to Minimize f(x) Subject to gi(x) bi; i = 1, 2, ..., m; where x is a vector of decision variables (x 0); f is the cost (objective) function; gi’s are a set of constraints. If f and gi are linear functions, we have a linear programming (LP) problem. LP can be solved by the simplex algorithm, which is an exact method. It will always identify the optimal solution if it exists. Prof. Z. Peng, ESLAB/LiTH, Sweden 7 Combinational Optimization (CO) In many design problems, the decision variables are restricted to integer values. The solution is a set, or a sequence, of integers or other discrete objects. We have an Integer Linear Programming (ILP) problem, which turns out to be more difficult to solve than the LP problems. Ex. System partitioning can be formulated as: Given a graph with costs on its edges, partition the nodes into k subsets no larger than a given maximum size, to minimize the total cost of the cut edges. A feasible solution (with n nodes) is represented as xi = j; j {1, 2, ..., k}, i = 1, 2, ..., n. They are called combinatorial optimization problems. Prof. Z. Peng, ESLAB/LiTH, Sweden 8 4 2012-03-27 Features of CO Problems Most CO problems, e.g., system partitioning with constraints, for ERT system designs are NP-compete. The time needed to solve an NP-compete problem grows exponentially with respect to the problem size n. Approaches for solving such problems are usually based on implicit enumeration of the feasible solutions. For example, to enumerate all feasible solutions for a scheduling problem (all possible permutation), we have: 20 tasks in 1 hour (assumption); 21 tasks in 20 hour; 22 tasks in 17.5 days; ... 25 tasks in 6 centuries. Prof. Z. Peng, ESLAB/LiTH, Sweden 9 Outline Optimization problems in ERT system design Heuristic techniques Simulated annealing Tabu search Prof. Z. Peng, ESLAB/LiTH, Sweden 10 5 2012-03-27 Heuristics A heuristic seeks near-optimal solutions at a reasonable computational cost without being able to guarantee either feasibility or optimality. Motivations: Many exact algorithms require a huge computation effort. The decision variables have complicated interdependencies. We have often nonlinear cost functions and constraints, even no mathematical functions. • Ex. The cost function f can, for example, be defined by a computer program (e.g., for power estimation). Approximation of the model for optimization. • A near optimal solution is usually good enough and could be even better than the theoretical optimum. Prof. Z. Peng, ESLAB/LiTH, Sweden 11 Constructive Problem specific Generic methods • List scheduling • Left-edge algorithm • Clustering • Divide and conquer • Branch and bound Transformational Iterative improvement Heuristic Approaches to CO • Kernighan-Lin algorithm for graph partitioning Prof. Z. Peng, ESLAB/LiTH, Sweden • • • • Neighborhood search Simulated annealing Tabu search Genetic algorithms 12 6 2012-03-27 Clustering for System Partitioning Each node initially belongs to its own cluster, and clusters are gradually merged until the desired partitioning is found. Merge operations are selected based on local information (closeness metrics), rather than global view of the system. 0 v2 v2 2 v1 5 3 v1 3 2 6 1 4 5 3 v5 3 v4 6 v3 7 v3 4 0 v3 5 v2 v5 v1 4 v4 v5 4 v4 v3 v2 4 v3 v2 v1 v4 v1 v4 v5 v5 v1 Prof. Z. Peng, ESLAB/LiTH, Sweden v5 v3 v2 v4 13 Branch-and-Bound Traverse an implicit tree to find the best leaf (solution). 4-City TSP 0 3 1 41 0 1 40 2 3 3 4 2 0 1 2 3 0 3 6 41 0 40 5 0 4 0 Total cost of this solution = 88 Prof. Z. Peng, ESLAB/LiTH, Sweden 14 7 2012-03-27 0 1 2 3 0 0 3 6 41 1 0 40 5 0 4 2 Branch-and-Bound Ex Low-bound on the cost function. Search strategy {0} L0 0 3 {0,1} L3 {0,3} L 41 {0,2} L6 {0,1,2} L 43 {0,1,3} L8 {0,2,1} L 46 {0,2,3} L 10 {0,3,1} L 46 {0,3,2} L 45 {0,1,2,3} L = 88 {0,1,3,2} L = 18 {0,2,1,3} L = 92 {0,2,3,1} L = 18 {0,3,1,2} L = 92 {0,3,2,1} L = 88 Prof. Z. Peng, ESLAB/LiTH, Sweden 15 Neighborhood Search Method Step 1 (Initialization) (A) Select a starting solution xnow X. (B) xbest = xnow, best_cost = c(xbest). Step 2 (Choice and termination) Choose a solution xnext N(xnow), in the neighborhood of xnow. If no solution can be selected, or the terminating criteria apply, then the algorithm terminates. Step 3 (Update) Re-set xnow = xnext. If c(xnow) < best_cost, perform Step 1(B). Goto Step 2. Prof. Z. Peng, ESLAB/LiTH, Sweden 16 8 2012-03-27 Neighborhood Search Method Very attractive for many CO problems as they have a natural neighborhood structure, which can be easily defined and evaluated. Ex. Graph partitioning: swapping two nodes. 5 15 2 65 35 20 5 45 5 40 35 56 4 8 24 23 5 15 2 65 35 8 24 Cost =209 20 5 45 23 Prof. Z. Peng, ESLAB/LiTH, Sweden 5 40 35 3 6 67 3 4 67 56 6 Cost =38 17 The Descent Method Step 1 (Initialization) Step 2 (Choice and termination) Choose xnext N(xnow) such that c(xnext) < c(xnow), and terminate if no such xnext can be found. Step 3 (Update) The descent method can easily be stuck at a local optimum. Cost Solutions Prof. Z. Peng, ESLAB/LiTH, Sweden 18 9 2012-03-27 Dealing with Local Optimality Enlarge the neighborhood. Start with different initial solutions. To allow “uphill moves”: Simulated annealing Tabu search Cost X Prof. Z. Peng, ESLAB/LiTH, Sweden Solutions 19 Outline Optimization problems in ERT system design Heuristic techniques Simulated annealing Tabu search Prof. Z. Peng, ESLAB/LiTH, Sweden 20 10 2012-03-27 Annealing Annealing is the slow cooling of metallic material after heating: A solid material is heated pass its melting point. Cooling is then slowly done. When cooling stops, the material settles usually into a low energy state (e.g., a stable structure). In this way, the properties of the materials are improved. Prof. Z. Peng, ESLAB/LiTH, Sweden 21 Annealing The annealing process can be viewed intuitively as: At high temperature, the atoms are randomly oriented due to their high energy caused by heat. When the temperature reduces, the atoms tend to line up with their neighbors, but different regions may have different directions. If cooling is done slowly, the final frozen state will have a nearminimal energy state. Prof. Z. Peng, ESLAB/LiTH, Sweden 22 11 2012-03-27 Simulated Annealing Annealing can be simulated using computer simulation: Generate a random perturbation of the atom orientations and calculates the resulting energy change. If the energy has decreased, the system moves to this new state. If energy has increased, the new state is accepted according to the laws of thermodynamics: At temperature t, the probability of an increase in energy of magnitude E is given by p (E ) 1 E e( k t ) where k is called the Boltzmann’s constant. Prof. Z. Peng, ESLAB/LiTH, Sweden 23 Probability of Accepting Higher-Energy States p(E) 1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 p (E ) 1 E e( k t ) 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 2 2,2 2,4 2,6 2,8 3 3,2 3,4 3,6 3,8 Prof. Z. Peng, ESLAB/LiTH, Sweden E 24 12 2012-03-27 Simulated Annealing for CO The SA algorithm could be applied to combinatorial optimization: Thermodynamic simulation Combinatorial optimization System states Feasible solutions Energy Cost Change of state Moving to a neighboring solution Temperature “Control parameter” Frozen state Final solution Prof. Z. Peng, ESLAB/LiTH, Sweden 25 The SA Algorithm Select an initial solution xnow X; Select an initial temperature t > 0; Select a temperature reduction function ; Repeat Repeat Randomly select xnext N(xnow); = cost(xnext) - cost(xnow); If < 0 then xnow = xnext else generate randomly p in the range (0, 1) uniformly; If p < exp(-/t) then xnow = xnext; Until iteration_count = nrep; Set t = (t); Until stopping condition = true. Return xnow as the approximation to the optimal solution. Prof. Z. Peng, ESLAB/LiTH, Sweden 26 13 2012-03-27 The SA Algorithm Simulated annealing is really an optimization strategy rather than an algorithm a meta heuristic. To have a working algorithm, the following must be done: Selection of generic parameters. Problem-specific decisions must also be made. Prof. Z. Peng, ESLAB/LiTH, Sweden 27 A HW/SW Partitioning Example 75 00 0 70 00 0 optimum a t itera tion 1006 65 00 0 Cost function value 60 00 0 55 00 0 50 00 0 45 00 0 40 00 0 35 00 0 0 2 00 40 0 60 0 8 00 1 00 0 12 00 14 00 Nu mb er of iter ations Prof. Z. Peng, ESLAB/LiTH, Sweden 28 14 2012-03-27 The Cooling Schedule I Initial temperature (IT): IT must be "hot" enough to allow an almost free exchange of neighborhood solutions, if the final solution is to be independent of the starting one. Simulating the heating process: • A system can be first heated rapidly until the proportion of accepted moves to rejected moves reaches a given value; e.g., when 80% of moves leading to higher costs will be accepted. • Cooling will then start. Prof. Z. Peng, ESLAB/LiTH, Sweden 29 The Cooling Schedule II Temperature reduction scheme: A large number of iterations at few temperatures or a small number of iterations at many temperatures. Typically (t) = a x t, where a < 1; a should be large, usually between 0.8 and 0.99. For better results, the reduction rate should be slower in middle temperature ranges. Stopping conditions: Zero temperature - the theoretical requirement. A number of iterations or temperatures has passed without any acceptance of moves. A given total number of iterations have been completed (or a fixed amount of execution time). Prof. Z. Peng, ESLAB/LiTH, Sweden 30 15 2012-03-27 Problem-Specific Decisions The neighborhood structure should be defined such that: All solutions should be reachable from each other. Easy to generate randomly a neighboring feasible solution. Penalty for infeasible solutions, if the solution space is strongly constrained. The cost difference between s and s0 should be able to be efficiently calculated. The size of the neighborhood should be kept reasonably small. Many decision parameters must be fine-tuned based on experimentation on typical data. Prof. Z. Peng, ESLAB/LiTH, Sweden 31 Outline Optimization problems in ERT system design Heuristic techniques Simulated annealing Tabu search Prof. Z. Peng, ESLAB/LiTH, Sweden 32 16 2012-03-27 Introduction to Tabu Search Tabu search (TS) is a neighborhood search method which employs "intelligent" search and flexible memories to avoid being trapped at local optimum. To de-emphasize randomization. Moves are selected intelligently in each iteration (the best admissible move is selected). Use tabus to restrict the search space and avoid cyclic behavior (dead loop). The classification of tabus is based on the history of the search. Taking advantage of history. It emulates human problem solving process. Prof. Z. Peng, ESLAB/LiTH, Sweden 33 An Illustrative Example A set of tests is to be scheduled to check the correctness and other features of a given system. To find an ordering of the tests that maximizes the test performance (fault coverage, time, and power): A feasible solution can be simply represented by a permutation of the given set of tests. A neighborhood move can be defined by swapping two tests. The best move will be selected in each step. To avoid repeating or reversing swaps done recently, we classify as tabu all most recent swaps. Prof. Z. Peng, ESLAB/LiTH, Sweden 34 17 2012-03-27 TS Example 2 2 5 7 3 4 6 1 3 4 5 6 7 1 2 3 2 6 7 3 4 5 1 4 5 6 21 moves are possible in each iteration in this example Prof. Z. Peng, ESLAB/LiTH, Sweden 35 TS Example Let a paired test tabu to be valid only for three iterations (tabu tenure): 2 2 5 7 3 4 6 1 3 4 5 6 1 7 1 2 2 3 2 6 7 3 4 5 1 4 5 3 6 When a tabu move would result in a solution better than any visited so far, its tabu classification may be overridden (an aspiration criterion). Prof. Z. Peng, ESLAB/LiTH, Sweden 36 18 2012-03-27 A TS Process Tabu structure 2 3 4 5 6 Current solution 2 5 7 3 4 6 1 7 1 Performance = 10 Swap Value 2 3 4 5 2 2 4 7 3 5 6 1 3 4 5 6 6 7 1 Performance = 16 Top 5 candidates 5, 4 6 7, 4 4 3, 6 2 2, 3 0 4, 1 -1 Swap Value 3, 1 2 2, 3 1 4 3 3, 6 -1 5 7, 1 -2 6, 1 -4 2 3 6 Prof. Z. Peng, ESLAB/LiTH, Sweden 37 A TS Process Tabu structure 2 3 4 5 6 Current solution 2 4 7 3 5 6 1 7 1 Performance = 16 Swap Value 2 3 4 3 5 2 2 4 7 1 5 6 3 Performance = 18 3 4 5 6 6 7 3 1 3, 1 2 2, 3 1 3, 6 -1 7, 1 -2 6, 1 -4 Swap Value 1, 3 -2 2, 4 -4 4 2 7, 6 -6 5 4, 5 -7 5, 3 -9 2 3 6 Prof. Z. Peng, ESLAB/LiTH, Sweden Top 5 candidates 38 19 2012-03-27 A TS Process Tabu structure 2 3 4 5 6 Current solution 2 4 7 1 5 6 3 3 1 Performance = 18 7 Swap Value 2 2 4 2 7 1 5 6 3 -4 7, 6 -6 5 4, 5 -7 5, 3 -9 4 6 6 7 5 Swap Value 3 2 4, 5 6 5, 3 2 4 1 7, 1 0 5 1, 3 -3 2, 6 -6 3 Aspiration criterion applies! 6 Prof. Z. Peng, ESLAB/LiTH, Sweden -2 2, 4 2 1 Performance = 14 3 1, 3 4 2 3 Uphill moves are allowed! Top 5 candidates 39 A TS Process Tabu structure Current solution 2 4 2 7 1 5 6 3 1 3 5 6 7 2 3 4 2 2 7 1 4 6 3 Performance = 20 3 4 0 1, 3 -3 2, 6 -6 5 6 4 7, 1 0 4, 3 -3 3 6, 3 -5 5 5, 4 -6 2, 6 -8 6 Prof. Z. Peng, ESLAB/LiTH, Sweden 7 Swap Value 3 Best so far! 2 7, 1 2 2 6 5, 3 5 1 1 4, 5 1 6 5 Top 5 candidates Swap Value 3 2 Performance = 16 4 40 20 2012-03-27 Tabu Memory The paired test tabu makes use of recencybased memory (short-term memory). It should be complemented by frequency-based memory (long-term memory) to diversify the search into new regions. Diversification is restricted to operate only on particular occasions. For example, we can select those occasions where no admissible improving moves exist. Prof. Z. Peng, ESLAB/LiTH, Sweden 41 Tabu Memory Structure Iteration 26 Current solution 1 3 6 2 7 5 4 Performance = 12 (Recency-based) 1 1 4 4 5 6 7 3 3 3 1 4 6 2 4 5 4 2 1 5 2 1 Top 5 candidates Penalized Swap Value Value 2 5 7 3 1 2 3 2 6 1,4 3 2 2,4 -1 -6 3,7 -3 -3 1,6 -5 -5 6,5 -4 -6 3 (Frequency-based) P.V. = Value – Frequency_count Prof. Z. Peng, ESLAB/LiTH, Sweden 42 21 2012-03-27 Effects of Random Diversifications 2.4 5e+ 0 6 2 .4e+ 0 6 o p tim u m at iter atio n 1 9 4 1 Cost function value 2.3 5e+ 0 6 2 .3e+ 0 6 2.2 5e+ 0 6 2 .2e+ 0 6 2.1 5e+ 0 6 2 .1e+ 0 6 2.0 5e+ 0 6 2e+ 0 6 1.9 5e+ 0 6 0 5 00 10 00 1 50 0 2 00 0 2 50 0 3 00 0 N um b er o f ite ra tio n s 2. 00 4e + 0 6 o p t i m u m at i t er at i o n 1 9 4 1 Cost function value 2. 00 2e + 0 6 2e + 0 6 1. 99 8e + 0 6 1. 99 6e + 0 6 1. 99 4e + 0 6 1. 99 2e + 0 6 1 .9 9e + 0 6 0 5 00 1 00 0 1 50 0 20 00 25 00 3 00 0 N u m b e r o f i t era t i on s Prof. Z. Peng, ESLAB/LiTH, Sweden 43 The Basic TS Algorithm Step 1 (Initialization) (A) Select a starting solution xnow X. (B) xbest = xnow, best_cost = c(xbest). (C) Set the history record H empty. Step 2 (Choice and termination) Determine Candidate_N(xnow) as a subset of N(H, xnow). Select xnext from Candidate_N(xnow) to minimize c(H, x). Terminate by a chosen iteration cut-off rule. Step 3 (Update) Re-set xnow = xnext. If c(xnow) < best_cost, perform Step 1(B). Update the history record H. Return to Step 2. Prof. Z. Peng, ESLAB/LiTH, Sweden 44 22 2012-03-27 Tabu and Tabu Status A tabu is usually specified by some attributes of the moves. Typically when a move is performed that contains an attribute , a record is maintained for its reverse attribute. => Preventing reversals or repetitions! A tabu restriction is typically activated only under certain condition: Recency-based restriction: its attributes occurred within a limited number of iterations prior to the present iteration; Frequency-based restriction: occurred with a certain frequency over a longer span of iterations. The tabu restrictions and tenure should be selected to achieve cycle prevention and induce vigor into the search. Prof. Z. Peng, ESLAB/LiTH, Sweden 45 Tabu Tenure Decision The tabu tenure, t, must be carefully selected: For highly restrictive tabus, t should be smaller than for lesser restrictive tabus. It should be long enough to prevent cycling, but short enough to avoid driving the search away from the global optimum. t can be determined using static rules or dynamic rules: Static rule choose a value for t that remains fixed: t = constant (typically between 7 and 20). t = f(n), where n is the problem size (typically between 0.5 n1/2 and 2 n1/2. Experimentation must be carried out to choose the best tenure! Prof. Z. Peng, ESLAB/LiTH, Sweden 46 23 2012-03-27 Aspiration Criteria (AC) Used to determine when tabu restrictions can be overridden. They contribute significantly to the quality of the algorithm. Examples of Aspiration Criteria: Aspiration by Default: If all available moves are classified as tabu, and are not rendered admissible by some other AC, then a "least tabu" move is selected. This is always implemented, e.g., by selecting the tabu with the shortest time to become inactive. Aspiration by Objective: c(xtrial) < best_cost. Subdivide the search space into regions R R, and let best_cost(R) denote the minimum c(x) for x found in R. If c(xtrial) < best_cost(R), a move aspiration is satisfied. Prof. Z. Peng, ESLAB/LiTH, Sweden 47 Stopping Conditions TS does not converge naturally. A fixed number of iterations has elapsed in total. A fixed number of iterations has elapsed since the last best solution was found. A given amount of CPU time has been used. 2.45e+06 2.4e+06 optimum at iteration 1941 Cost function value 2.35e+06 2.3e+06 2.25e+06 2.2e+06 2.15e+06 2.1e+06 2.05e+06 2e+06 1.95e+06 0 500 1000 1500 2000 2500 3000 Number of iterations Prof. Z. Peng, ESLAB/LiTH, Sweden 48 24 2012-03-27 TS vs. SA Neighborhood space exploration: TS emphasizes complete neighborhood evaluation to identify moves of high quality. SA samples the neighborhood solutions randomly. Move evaluation: TS evaluates the relative attractiveness of moves in relation not only to objective function change, but also to factors of influence. SA evaluates moves only in terms of their objective function change. Prof. Z. Peng, ESLAB/LiTH, Sweden 49 TS vs. SA (Cont’d) Search guidance: TS uses multiple thresholds, reflected in the tabu tenures and aspiration criteria, which varies also non-monotonically. SA is based on a single threshold implicit in the temperature parameter that only changes monotonically. Use of memory: SA is memoryless. TS makes heavily and intelligently use of both short-term and longterm memory. TS can also use the mid-term memory for intensification. Prof. Z. Peng, ESLAB/LiTH, Sweden 50 25 2012-03-27 Summary Design space exploration is basically an optimization problem. Due to the complexity of the optimization problem, heuristic algorithms are widely used. Many general heuristics are based on neighborhood search principles. SA is applicable to almost any combinatorial optimization problem, and very simple to implement. TS has a natural rationale: it emulates intelligent uses of memory. When properly implemented, TS often outperforms SA (the execution time is often one order of magnitude smaller). Prof. Z. Peng, ESLAB/LiTH, Sweden 51 26