See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/267305841 A parallel environment using taboo search and genetic algorithms for solving partitioning problems in codesign Article CITATIONS READS 2 95 5 authors, including: Mouloud Koudil Karima Benatchba Ecole Nationale Supérieure d'Informatique Ecole Nationale Supérieure d'Informatique 56 PUBLICATIONS 439 CITATIONS 82 PUBLICATIONS 404 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: HPC-Optim "High Performance Computing for Optimization" View project Approches Intelligentes pour la détection et le diagnostic de défaillances dans les machines asynchrones View project All content following this page was uploaded by Mouloud Koudil on 28 November 2014. The user has requested enhancement of the downloaded file. A parallel environment using taboo search and genetic algorithms for solving partitioning problems in codesign A. HENNI*, M. KOUDIL*, K. BENATCHBA*, H. OUMSALEM*, K. CHAOUCHE* *Institut National de formation en Informatique, BP 68M, 16270, Oued Smar, Algérie. Abstract: - Partitioning has a big impact on cost/performance characteristics of the system under design. It is an NPComplete problem that deals with the different constraints relative to the system and the underlying target architecture. The existing partitioning approaches have a major drawback: they are generally dedicated to particular classes of applications and/or target architectures and this makes them difficult to generalize as soon as small changes are performed on the system under design. Besides, it is generally difficult to include new strategies that might be better suited to the applications to partition. This paper introduces a parallel environment that offers the user the opportunity to describe, experiment new heuristics and test different sets of parameters. It is also possible to explore new hybridization schemes between different strategies. Key-Words: - Codesign, Partitioning, Optimization, Genetic Algorithms, Taboo Search, Parallel Architectures. 1. Introduction 2. Previous work Partitioning is the process of determining the parts of the system that must be implemented in hardware and those parts that are to be in software [1]. This task is of critical importance since it has a big impact on final product cost/performance characteristics [2]. Any partitioning decision must, therefore, take into account system properties. It must also include several constraints related to the environment, implementation platform and/or system functionality requirements. The weaknesses of actual partitioning approaches are that: - some of them are dedicated to a given application and hard to generalize; - they operate at a unique granularity level, either too low or too high, often missing interesting solutions; - most of them are manual and difficult to apply as soon as the system size increases; - they take into account a small subset of the possible constraints that apply to systems (execution time, software and hardware space, communication…); - they are dedicated to a given target architecture type, and impossible to extend to other platforms. There exist different approaches to solve partitioning problem. They can be either manual or automatic. Numerous approaches that try to solve partitioning problem in codesign, number of are manual [3-5]. Among the approaches that try to automatically solve partitioning problem, the simpler technique is the exact one [6, 7]. Exact algorithms are based on the determination and the evaluation of all possible solutions. In theory, they allow, for sure, to achieve all optimal solutions. In practice, these approaches are ideal for very small size problems, but they become intractable as soon as problem size gets larger. In fact, their main drawback is the execution time that grows exponentially with the number of system tasks to partition. Computing time becomes prohibitive as the size goes over 20 [8]. For example, the exhaustive browse of the solution space for 64 tasks mapped on two processors, with a 2 GHz computer, assuming that the evaluation time of a solution takes only one clock cycle, would take 292 years!!! Approached methods allow getting one (or many) solutions in an "acceptable" time. There are mainly two kinds of heuristics: the methods dedicated to the problem to solve, and the general heuristics, that are not specific to a particular problem. Among dedicated strategies, there are [2, 8, 9]. The main advantage of applying specific approaches is that they are "tailored" for the given problem. However, the solutions of this type become hard to deal with as soon as a change appears, even small, in the type of systems to design. The general heuristics [10-13] are not dedicated to a particular type of problems, and are widely used in other research fields, consisting of NP-Complete problems. This class also "contains" algorithms starting with an This paper introduces a parallel environment that offers the user the opportunity to describe, experiment new heuristics and test different sets of parameters. It is also possible to explore new hybridization schemes between different strategies. Section two reviews some of the reported partitioning approaches. The third one introduces the proposed partitioning approach while the last one lists some experiment results. initial solution (often randomly chosen), that is iteratively improved. The different solutions are compared, using a cost function. The advantage of this type of heuristics is that it is possible to use cost functions that are arbitrary chosen and easy to modify. They also allow to achieve a solution in a short time. Their drawback is that it is impossible to guarantee that then achieved solution is the optimum. In fact, that this kind of algorithms is often trapped in a local optimum of the cost function, and never achieves a global optimum. Recent works have been published [14-19] in partitioning area, which tends to prove that the problem is still opened. 3. The proposed partitioning approach When the size and complexity of the problem rise, it becomes difficult, for a human being, to apprehend all the details, and manual resolution of the problem becomes intractable. We believe, such as numerous other researchers on codesign, that the best automatic partitioning system would be the one that allows the user to choose among a variety of partitions the best solution for their needs. This is the approach adopted in this work. 3.1. Heuristic execution schemes The reported algorithms for automatic partitioning use either a dedicated approach, or a unique general approach. They are difficult to modify and the insertion of a new heuristic leads to major modifications in the tools. Few works use general approaches. This is because most of the heuristics use a great number of strategies and parameters, and that the quality of a solution heavily depends on the choices made by users. Among the numerous parameters the user must deal with, there is : the initial temperature, the temperature decrease policy or the stopping criterion for simulated annealing; the taboo list management policy, its size, the parameters managing the different strategies for taboo search algorithms; the population size and the parameters used by the different operators of genetic algorithms, etc. It thus appears that, confronted to the great number and the diversity of parameters and strategies to implement in these algorithms, a period of tests and evaluations is necessary to determine the best choices according to the problems dealt with. This is the reason why we propose an approach that allows the user to test partitioning heuristics. It offers the opportunity to study the parameter values, as well as the strategies to use, according to the type of problem. This work is inspired by those of Benatchba and al. [21] that propose to use general heuristics (simulated annealing, taboo search, genetic algorithms, scatter search, etc.) for satisfiability (Sat and Max-sat) problem solving. These heuristics can be sequentially executed or in a hybrid manner. The approach proposed here, transposes the execution schemes introduced in [21], to the resolution of partitioning problem. An execution model, named PARME (Partitioning and Max-sat Environment) was developed. The aim of this approach is not to achieve an "ideal" heuristic, allowing to solve each partitioning problem. In fact, it is, to our opinion, utopian to pretend solving all the problems with a unique method, each heuristic having advantages and weaknesses. The test of algorithms allows doing comparative studies on the results given by these methods, with different sets of parameters and strategies. An exact approach is also offered to the designer for an exhaustive search of solutions. This method presents acceptable times for applications that do not exceed 220 solutions. An example of sequential execution can be, for example, executing a first global method that allows to explore a large solution space (such as genetic algorithms), in order to determine one (or several) initial solution(s) for a local search heuristic. Figure 1 presents an example of sequential execution of two heuristics: a genetic algorithm executing and sending its final solution (best achieved solution) as an initial solution to a taboo search algorithm. Execution of a Genetic Algorithm Best Solution Initial Solution Execution of a Taboo Search Final Solution Figure 1: Example of sequential execution. It is also possible to experiment different hybridization strategies consisting of a parallel execution of several algorithms, and allowing the solution exchange (best, worst…), every n iterations [21]. Figure 2 illustrates an hybridization scheme where a genetic algorithm and a taboo search work in parallel. Every j generations, the genetic algorithm sends its best solution to taboo search that locally tries to improve it; and taboo search, in turn, sends its worst solution every i generations, in order to diversify the search space of the genetic algorithm and avoid local optima. : Worst Solution sending Heuristics : Best solution sending Taboo Search - a set of slave processors that execute optimization algorithms. This set constitutes what we called the "node farm"; - an interconnection network that implements the functions of communication between the different processors; - a set of memories. LM S M Input FIFO Master Processor Data Bus Activation Bus Genetic Algorithm i j 2i 2j 3i 3j Iter. L M L M GAP TSP2 L M SAP The heuristics used in both execution schemes can also be identical (several genetic algorithms simultaneously, for example) in order to offer the opportunity to: test different parameter sets, study their mutual influence, and determine those that give best results for a given problem. 3.2. The execution model An execution model, named PARME (Partitioning and Max-sat Environment) was designed to implement the different heuristic execution schemes introduced above. PARME is composed of a two-level hierarchy of processing nodes. At the highest level, a master processor manages and synchronizes all the activities of the different machine components. At the lowest level, a certain number of slave nodes operate in an autonomous manner. Each one is in charge of executing a given partitioning algorithm. The nodes communicate with each others through the master processor, using an interconnection network, in order to exchange solutions and receive the orders and the parameters needed for each execution, from the master. PARME offers the user the opportunity to execute several algorithms in parallel, by mapping to each node a different heuristic or even the same one with different parameters. All the algorithms cooperate in the search of the best solution. PARME encloses, as illustrated in figure 3: - a master processor that guides the search, collects and spreads information, distributes and activates the tasks, etc. Buffer Figure 2 : Example of parallel hybridization. Input FIFO Bus SM: Secondary Memory. TSP: Specific Processor performing Taboo search. GAP: Specific Processor performing Genetic Algorithm. LM: Local Memory. Figure 3: PARME execution model. 3.2.1. The farm node Each cell of a slave nodded is in charge of executing a given optimization algorithms (exact method, taboo search, genetic algorithm…) as illustrated in figure 3. The general architecture of a node is inspired by the works of Dours and al. [20]. These works present a generic parallel machine for real-time application codesign. The structure of a node is composed of two types of processors: - a computing processor which executes the associated optimization algorithm; - an exchange processor in charge of collecting input solutions and sending results to the master. A local memory (LM) stores the algorithm code and current results, while a secondary memory is used to store temporary interesting data (such as the best solution found up to the current iteration, the best solution found at the current iteration, …). 3.2.2. The master processor The master processor is the machine core. It is in charge of managing the node farm by: mapping the processes on the different slave nodes; activating the number of nodes the application needs to execute; initializing them with specific parameters according to the method running on each node. The master offers the user the opportunity to suspend/stop a processor during execution. It is also in charge of the coordination and the synchronization between the different nodes. Another task devoted to the master is to guide the search by collecting information (temporary best solutions) sent by slave processors and routing it through the interconnection network towards those slave processors according to a communication scheme described by the user. The main processor also takes care of the communication with the user by offering various numerical and graphical display tools that help him to follow the evolution of the execution and analyze the results of the different activated processors. To perform these tasks, the main processor uses three memories: - a local memory that acts as a working space; - a secondary memory that is used for communication purposes with the user, by stocking partial and final execution results; - a FIFO memory, used as input, allowing to asynchronously collect all the solutions coming from different slave processors. This way, the slave nodes can produce information that is consumed by the master at its own rhythm, without stopping its execution thread to receive it. 3.3. Implementation of PARME The first implementation of PARME was done through a simulator on a unique processor. This approach soon showed its limits as the number of processes exceeded two heuristics. A thread was used to simulate each processor, and the synchronization was performed by critical sections. The second generation of PARME is implemented using a cluster of PCs. A client-server architecture is used: the master is implemented by a server application ran on a single machine, while each slave processor is affected to a different machine. Sockets are used for the communication facilities. 4. Simulations and results This section presents a synthesis of some simulations performed on PARME environment. The aim here is not to carry out a detailed study on the different heuristics, the hybridization schemes between them, the sets of parameters, nor the different strategies. It is rather meant to emphasize the different possibilities offered by the environment for automatic solving of partitioning problem. The tests presented here have been performed on an Intel Pentium II, 333MHz with a 128 MO RAM. For the algorithm evaluation, several benchmarks have been used, but for space reasons, we focus on a particular one, which is a real program. It represents the entities of a genetic algorithm used in PARME environment. The objective is to accelerate the algorithm that is, despite its rather small size, very time-consuming. This benchmark is composed of 10 entities. The target architecture is composed of an instruction-set processor and a hardware processor. The cost function is a linear combination of the three main criteria (respectively: space, execution time and communication): f(x,y,z) = x+y+z. The application of the exact method to the refined benchmark is possible, since the number of solutions is relatively low (1024). By exact method, we mean the exhaustive browse of the solution space. This method can be used for benchmarks that does not exceed 220 combinations. It is nevertheless clear that the application of a manual method is impossible. The results achieved by the exact method confirm the complexity of the problem: there exist only two feasible (positive cost) solutions to this problem. The first one, at the 381st iteration gives the following cost: 232299.984375 The second one is the optimum, achieved at the 943rd iteration and giving the following cost: 213177. The fact that we knew by advance the optimal solution, by applying the exact method, allows us to study, in the following sections, two of the several heuristics that were implemented in PARME environment, and their parameters. 4.1. Application of a genetic algorithm alone Table 1 summarizes some of the results obtained using the genetic algorithm (GA) with different strategies and sets of parameters. The selection method used in all cases is "the lottery wheel". Notice that, in all the cases, the initial population was randomly generated. The performed simulations show that the solution "0000000000" (which means that all the entities have been mapped on the software processor named “0”), that has a cost of –27902, is a local optimum towards which the algorithm often quickly converges, if the parameters are not well chosen. The mutation and crossover operators, when used alone, are not sufficient to get out this local optimum trap (simulations 2 and 3 in table 1). It is nevertheless possible, for this benchmark, to achieve one of the feasible solutions, despite the random generation of the initial population. The parameters allowing to achieve these results are summarized by simulation 4 in table 1. Simulation 3 shows that too small crossover and mutation parameters avoid to diversify the population, and leads to an early convergence towards solutions that always appear to be, in the case of this benchmark, local optima. Too high values, avoid the algorithms to correctly explore the solution space, by forcing it to change too frequently the search region, and give the result that we can observe in the first simulation of table 1, where the crossover and mutation probabilities are both equal to 1. Notice that no feasible solution has been reached for populations under the size of 25. The influence of the population increase on the execution time is obvious and easy to understand, the operations on each generation being repeated a greater number of times. NG 100 S 1 A 0.01 C M EC It ET 1 1/ -70846 1 1.96 CRP CRP 50 --0.9 0.07/ -27902 5 8.97 CRP CRP 100 1 0.01 0.5 0.01/ -27902 5 11.92 CRP CRP 60 1 0.01 0.9 0.1/ 213177 7 12.96 CRP CRP 100 5 0.01 0.9 0.09/ 213177 8 15.97 CRP CRP 100 1 0.01 0.9 0.9/ 213177 12 20.95 CRP CRP 25 1 0.01 0.9/ 0.1/ -27902 7 13.94 NBS NBS NG: Number of Generations; S: Sigma (Sharing); A: Alpha (Sharing); C: Crossover probability and replacement technique; M: Mutation probability and replacement technique; NBS : N Best Selection (The N best individuals are selected) ; CRP : Children Replace Parents; EC: Elite solution Cost; It: Iteration number; ET: Execution Time (sec.); The Iteration Number and Execution time correspond to the iteration at which the best solution is achieved; The population size is always set to 100; Table 1: Genetic algorithm results. Since it was difficult to get the optimum in 100 per cent of the simulations, because of the premature convergence of the GA, we decided to make different heuristics cooperate. Next section gives some simulation results. 4.2. Parallel hybridization of a taboo search and a Genetic Algorithm The next step consisted of making heuristics cooperate in a parallel manner, exchanging their intermediate results. The simulations presented here uses two heuristics: a genetic algorithm and a taboo search. The master processor is on a server machine, while processor P0 (running the genetic algorithm) and P1 (running the taboo search algorithm) are ran on client machines. The following parameters are used: P 0 Number of generations Population size Sharing Frequency of solution sending to master Selection method in entry FIFO Selection method Uniform crossover with probability of Replacing technique after crossover Mutation probability Replacement method after mutation P 1 Taboo list type Taboo list size Number of iterations Selection method Frequency of sending to master Number of iterations to explore input FIFO Selection method in the input FIFO 50 100 Yes 4 best solution lottery wheel 0.9 children replace parents 0.07 children replace parents Static 7 50 best move 5 5 best solution Table 2 gives some results of such hybridization. Processor P0 P1 Iteration 27 30 Cost of elite solution Average Time 213177 213177 40.87 9.6 P0: Genetic Algorithm; P1: Taboo Search. Table 2: Hybrid execution results. Table 2 shows that the optimal solution was achieved in a 100% of the simulations, using this parallel hybridization scheme between the two heuristics exchanging solutions every five iterations. The Genetic algorithm found the best solution earlier than taboo search. The average times (time taken to get the optimum) show the slowness of GA. 5. Conclusion This paper presents a parallel environment called PARME that constitutes a research platform that allows the user to: - describe new partitioning heuristics; - test heuristics, sets of parameters, strategies, in order to determine the "good choices" for each class of problems encountered; - experiment different hybridization scenarios, allowing different heuristics to simultaneously work and collaborate in order to take profit of the benefits of each method, and overcome its weaknesses. First simulated on a single machine, this environment is implemented using a cluster of PCs according to a clientserver scheme. PARME is actually used to experiment different hybridization schemes and parameter sets, trying to solve NP-Complete problems encountered in various areas: Sat/Max-sat, Data mining… References: [1] KUMAR S. AYLOR J.H. JOHNSON B. AND WULF W.A., "The Codesign of Embedded Systems", Kluwer Academic Publishers, 1996. [2] DE MICHELI G. AND GUPTA R., "Hardware/Software Co-design", Proceedings of the IEEE, Vol.85, N°3, pp.349-365, 1997. [3] THEISSINGER M., STRAVERS P. AND VEIT H., "CASTLE: an interactive environment for HW-SW co-design", Proceedings of the Third International Workshop on Hardware/Software Codesign, pp.203209, 1994. [4] ISMAIL T.B., ABID M., O'BRIEN K. AND JERRAYA A.A., "An approach for hardware-software codesign", Proceedings of the Fifth International Workshop on Rapid System Prototyping, Shortening the Path from Specification to Prototype, pp.73-80, 1994. [5] EDWARDS M.D., "A development system for hardware/software cosynthesis using FPGAs", Second IFIP International Workshop on Hw-Sw Codesign, 1993. [6] D'AMBROSIO J.G. AND HU X., "Configuration-level hardware/software partitioning for real-time embedded systems", Proceedings of the Third International Workshop on Hardware/Software Codesign, pp.34-41, 1994. [7] POP P., ELES P. AND PENG Z., " Scheduling driven partitioning of heterogeneous embedded systems", Dept. of Computer and Information Science, Linköping University, Sweden, 1998. [8] ADMAS J.K. AND THOMAS D.E., "Multiple-Process behavioral synthesis for mixed hardware-software systems", in Proc. 8th Int. Symp. On System Synthesis, pp.10-15, 1995. [9] KALAVADE A. AND LEE E.A., "The extended partitioning problem: hardware/software mapping and implementation-bin selection", Proceedings of the Sixth International Workshop on Rapid Systems Prototyping, Chapel Hill, NC, June 1995. [10] GUPTA R.K., "Co-Synthesis of Hardware and Software for digital embedded systems", Amsterdam: Kluwer, 1995. View publication stats [11] JANTSCH A., ELLERVEE P., OBERG J., HEMMANI A. TENBUNEN H., "Hardware-software AND partitioning and minimizing memory interface traffic", Proc. of the European Design Automation Conference, pp.226-231, Sept. 1994. [12] GAJSKI D.D., NARAYAN S., RAMACHANDRAN L. AND VAHID F., "System design methodologies: aiming at the 100h design cycle", IEEE Trans. on VLSI Systems, Vol.4, N°1, pp.70-82, March 1996. [13] HARTENSTEIN R., BECKER J. AND KRESS R., "Twolevel hardware/software partitioning using CoDeX", IEEE Symposium and Workshop on Engineering of Computer-Based Systems, March 1996. [14] CHATHA K.S. AND VEMURI R., “MAGELLAN: Multiway Hardware-Software Partitioning and Scheduling for Latency Minimization of Hierarchical Control-Dataflow Task Graphs”, Proceedings of 9th International Symposium on Hardware-Software Codesign (CODES 2001), April 25-27, Copenhagen, Denmark, 2001. [15] BOLCHINI C., POMANTE L., SALICE F. AND SCIUTO D., “H/W embedded systems: on-line fault detection in a hardware/software codesign environment: system partitioning”, Proceedings of the International Symposium on Systems Synthesis, Vol. 14 September 2001. [16] DOURS D. ET AL., "Estimations rapides pour le partitionnement fonctionnel de systèmes temps-réelRENPAR'14, ASF, stricts distribués", SYMPA,Hamamet, Tunisie, 10 - 13 avril 2002. J. AND BADIA R.M., “A [17] NOGUERA hardware/software partitioning algorithm for dynamically reconfigurable architectures”, Proceedings of the International Conference on Design Automation and Test in Europe (DATE’01), March 2001. [18] SRINIVASAN I., RADHAKRISHNAN S. AND VEMURI R., “hardware/software partitioning with integrated hardware design space exploration”, Proc. of the International Conference on Design Automation and Test in Europe (DATE’98), pp. 28, February 1998. [19] VAHID F., “A three-step approach to the functional partitioning of large behavioural processes”, Proc. of 11th International Symposium on System Synthesis, pp. 152-157, Dec. 1998. [20] DOURS D. ET AL., "Parallélisation automatique pour la conception de systèmes critiques sûrs", TSI VOL 20,N° 8, P. 1075-1100, 2001. [21] BENATCHBA K., KOUDIL M, DRIAS H., OUMSALEM H. ET CHAOUCHE K., "PARME un environnement pour la résolution du problème Max-Sat", CARI'02, OCT. 2002 .