Uploaded by nanotensor

A parallel environment using taboo search and gene

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/267305841
A parallel environment using taboo search and genetic algorithms for solving
partitioning problems in codesign
Article
CITATIONS
READS
2
95
5 authors, including:
Mouloud Koudil
Karima Benatchba
Ecole Nationale Supérieure d'Informatique
Ecole Nationale Supérieure d'Informatique
56 PUBLICATIONS 439 CITATIONS
82 PUBLICATIONS 404 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
HPC-Optim "High Performance Computing for Optimization" View project
Approches Intelligentes pour la détection et le diagnostic de défaillances dans les machines asynchrones View project
All content following this page was uploaded by Mouloud Koudil on 28 November 2014.
The user has requested enhancement of the downloaded file.
A parallel environment using taboo search and genetic algorithms
for solving partitioning problems in codesign
A. HENNI*, M. KOUDIL*, K. BENATCHBA*,
H. OUMSALEM*, K. CHAOUCHE*
*Institut National de formation en Informatique, BP 68M, 16270, Oued Smar, Algérie.
Abstract: - Partitioning has a big impact on cost/performance characteristics of the system under design. It is an NPComplete problem that deals with the different constraints relative to the system and the underlying target architecture.
The existing partitioning approaches have a major drawback: they are generally dedicated to particular classes of
applications and/or target architectures and this makes them difficult to generalize as soon as small changes are performed
on the system under design. Besides, it is generally difficult to include new strategies that might be better suited to the
applications to partition. This paper introduces a parallel environment that offers the user the opportunity to describe,
experiment new heuristics and test different sets of parameters. It is also possible to explore new hybridization schemes
between different strategies.
Key-Words: - Codesign, Partitioning, Optimization, Genetic Algorithms, Taboo Search, Parallel Architectures.
1. Introduction
2. Previous work
Partitioning is the process of determining the parts of the
system that must be implemented in hardware and those
parts that are to be in software [1]. This task is of critical
importance since it has a big impact on final product
cost/performance characteristics [2]. Any partitioning
decision must, therefore, take into account system
properties. It must also include several constraints related
to the environment, implementation platform and/or
system functionality requirements. The weaknesses of
actual partitioning approaches are that:
- some of them are dedicated to a given application and
hard to generalize;
- they operate at a unique granularity level, either too low
or too high, often missing interesting solutions;
- most of them are manual and difficult to apply as soon
as the system size increases;
- they take into account a small subset of the possible
constraints that apply to systems (execution time,
software and hardware space, communication…);
- they are dedicated to a given target architecture type,
and impossible to extend to other platforms.
There exist different approaches to solve partitioning
problem. They can be either manual or automatic.
Numerous approaches that try to solve partitioning
problem in codesign, number of are manual [3-5].
Among the approaches that try to automatically solve
partitioning problem, the simpler technique is the exact
one [6, 7]. Exact algorithms are based on the
determination and the evaluation of all possible solutions.
In theory, they allow, for sure, to achieve all optimal
solutions. In practice, these approaches are ideal for very
small size problems, but they become intractable as soon
as problem size gets larger. In fact, their main drawback
is the execution time that grows exponentially with the
number of system tasks to partition. Computing time
becomes prohibitive as the size goes over 20 [8]. For
example, the exhaustive browse of the solution space for
64 tasks mapped on two processors, with a 2 GHz
computer, assuming that the evaluation time of a solution
takes only one clock cycle, would take 292 years!!!
Approached methods allow getting one (or many)
solutions in an "acceptable" time. There are mainly two
kinds of heuristics: the methods dedicated to the problem
to solve, and the general heuristics, that are not specific
to a particular problem.
Among dedicated strategies, there are [2, 8, 9]. The main
advantage of applying specific approaches is that they are
"tailored" for the given problem. However, the solutions
of this type become hard to deal with as soon as a change
appears, even small, in the type of systems to design.
The general heuristics [10-13] are not dedicated to a
particular type of problems, and are widely used in other
research fields, consisting of NP-Complete problems.
This class also "contains" algorithms starting with an
This paper introduces a parallel environment that offers
the user the opportunity to describe, experiment new
heuristics and test different sets of parameters. It is also
possible to explore new hybridization schemes between
different strategies. Section two reviews some of the
reported partitioning approaches. The third one
introduces the proposed partitioning approach while the
last one lists some experiment results.
initial solution (often randomly chosen), that is iteratively
improved. The different solutions are compared, using a
cost function. The advantage of this type of heuristics is
that it is possible to use cost functions that are arbitrary
chosen and easy to modify. They also allow to achieve a
solution in a short time. Their drawback is that it is
impossible to guarantee that then achieved solution is the
optimum. In fact, that this kind of algorithms is often
trapped in a local optimum of the cost function, and
never achieves a global optimum.
Recent works have been published [14-19] in partitioning
area, which tends to prove that the problem is still
opened.
3. The proposed partitioning approach
When the size and complexity of the problem rise, it
becomes difficult, for a human being, to apprehend all
the details, and manual resolution of the problem
becomes intractable. We believe, such as numerous other
researchers on codesign, that the best automatic
partitioning system would be the one that allows the user
to choose among a variety of partitions the best solution
for their needs. This is the approach adopted in this work.
3.1. Heuristic execution schemes
The reported algorithms for automatic partitioning use
either a dedicated approach, or a unique general
approach. They are difficult to modify and the insertion
of a new heuristic leads to major modifications in the
tools. Few works use general approaches. This is because
most of the heuristics use a great number of strategies
and parameters, and that the quality of a solution heavily
depends on the choices made by users. Among the
numerous parameters the user must deal with, there is :
the initial temperature, the temperature decrease policy or
the stopping criterion for simulated annealing; the taboo
list management policy, its size, the parameters
managing the different strategies for taboo search
algorithms; the population size and the parameters used
by the different operators of genetic algorithms, etc.
It thus appears that, confronted to the great number and
the diversity of parameters and strategies to implement in
these algorithms, a period of tests and evaluations is
necessary to determine the best choices according to the
problems dealt with.
This is the reason why we propose an approach that
allows the user to test partitioning heuristics. It offers the
opportunity to study the parameter values, as well as the
strategies to use, according to the type of problem. This
work is inspired by those of Benatchba and al. [21] that
propose to use general heuristics (simulated annealing,
taboo search, genetic algorithms, scatter search, etc.) for
satisfiability (Sat and Max-sat) problem solving. These
heuristics can be sequentially executed or in a hybrid
manner. The approach proposed here, transposes the
execution schemes introduced in [21], to the resolution of
partitioning problem. An execution model, named
PARME (Partitioning and Max-sat Environment) was
developed. The aim of this approach is not to achieve an
"ideal" heuristic, allowing to solve each partitioning
problem. In fact, it is, to our opinion, utopian to pretend
solving all the problems with a unique method, each
heuristic having advantages and weaknesses. The test of
algorithms allows doing comparative studies on the
results given by these methods, with different sets of
parameters and strategies.
An exact approach is also offered to the designer for an
exhaustive search of solutions. This method presents
acceptable times for applications that do not exceed 220
solutions.
An example of sequential execution can be, for example,
executing a first global method that allows to explore a
large solution space (such as genetic algorithms), in order
to determine one (or several) initial solution(s) for a local
search heuristic. Figure 1 presents an example of
sequential execution of two heuristics: a genetic
algorithm executing and sending its final solution (best
achieved solution) as an initial solution to a taboo search
algorithm.
Execution of
a Genetic
Algorithm
Best
Solution
Initial
Solution
Execution of
a Taboo
Search
Final
Solution
Figure 1: Example of sequential execution.
It is also possible to experiment different hybridization
strategies consisting of a parallel execution of several
algorithms, and allowing the solution exchange (best,
worst…), every n iterations [21]. Figure 2 illustrates an
hybridization scheme where a genetic algorithm and a
taboo search work in parallel. Every j generations, the
genetic algorithm sends its best solution to taboo search
that locally tries to improve it; and taboo search, in turn,
sends its worst solution every i generations, in order to
diversify the search space of the genetic algorithm and
avoid local optima.
: Worst Solution sending
Heuristics
: Best solution sending
Taboo Search
- a set of slave processors that execute optimization
algorithms. This set constitutes what we called the "node
farm";
- an interconnection network that implements the
functions of communication between the different
processors;
- a set of memories.
LM
S
M
Input FIFO
Master
Processor
Data Bus
Activation Bus
Genetic
Algorithm
i j
2i
2j
3i
3j Iter.
L
M
L
M GAP
TSP2
L
M
SAP
The heuristics used in both execution schemes can also
be identical (several genetic algorithms simultaneously,
for example) in order to offer the opportunity to: test
different parameter sets, study their mutual influence,
and determine those that give best results for a given
problem.
3.2. The execution model
An execution model, named PARME (Partitioning and
Max-sat Environment) was designed to implement the
different heuristic execution schemes introduced above.
PARME is composed of a two-level hierarchy of
processing nodes. At the highest level, a master processor
manages and synchronizes all the activities of the
different machine components. At the lowest level, a
certain number of slave nodes operate in an autonomous
manner. Each one is in charge of executing a given
partitioning algorithm. The nodes communicate with
each others through the master processor, using an
interconnection network, in order to exchange solutions
and receive the orders and the parameters needed for
each execution, from the master.
PARME offers the user the opportunity to execute
several algorithms in parallel, by mapping to each node a
different heuristic or even the same one with different
parameters. All the algorithms cooperate in the search of
the best solution.
PARME encloses, as illustrated in figure 3:
- a master processor that guides the search, collects and
spreads information, distributes and activates the tasks,
etc.
Buffer
Figure 2 : Example of parallel hybridization.
Input
FIFO
Bus
SM: Secondary Memory.
TSP: Specific Processor performing Taboo search.
GAP: Specific Processor performing Genetic Algorithm.
LM: Local Memory.
Figure 3: PARME execution model.
3.2.1. The farm node
Each cell of a slave nodded is in charge of executing a
given optimization algorithms (exact method, taboo
search, genetic algorithm…) as illustrated in figure 3.
The general architecture of a node is inspired by the
works of Dours and al. [20]. These works present a
generic parallel machine for real-time application
codesign. The structure of a node is composed of two
types of processors:
- a computing processor which executes the associated
optimization algorithm;
- an exchange processor in charge of collecting input
solutions and sending results to the master.
A local memory (LM) stores the algorithm code and
current results, while a secondary memory is used to
store temporary interesting data (such as the best solution
found up to the current iteration, the best solution found
at the current iteration, …).
3.2.2. The master processor
The master processor is the machine core. It is in charge
of managing the node farm by: mapping the processes on
the different slave nodes; activating the number of nodes
the application needs to execute; initializing them with
specific parameters according to the method running on
each node. The master offers the user the opportunity to
suspend/stop a processor during execution. It is also in
charge of the coordination and the synchronization
between the different nodes. Another task devoted to the
master is to guide the search by collecting information
(temporary best solutions) sent by slave processors and
routing it through the interconnection network towards
those slave processors according to a communication
scheme described by the user. The main processor also
takes care of the communication with the user by
offering various numerical and graphical display tools
that help him to follow the evolution of the execution and
analyze the results of the different activated processors.
To perform these tasks, the main processor uses three
memories:
- a local memory that acts as a working space;
- a secondary memory that is used for communication
purposes with the user, by stocking partial and final
execution results;
- a FIFO memory, used as input, allowing to
asynchronously collect all the solutions coming from
different slave processors. This way, the slave nodes can
produce information that is consumed by the master at its
own rhythm, without stopping its execution thread to
receive it.
3.3. Implementation of PARME
The first implementation of PARME was done through a
simulator on a unique processor. This approach soon
showed its limits as the number of processes exceeded
two heuristics. A thread was used to simulate each
processor, and the synchronization was performed by
critical sections.
The second generation of PARME is implemented using
a cluster of PCs. A client-server architecture is used: the
master is implemented by a server application ran on a
single machine, while each slave processor is affected to
a different machine. Sockets are used for the
communication facilities.
4. Simulations and results
This section presents a synthesis of some simulations
performed on PARME environment. The aim here is not
to carry out a detailed study on the different heuristics,
the hybridization schemes between them, the sets of
parameters, nor the different strategies. It is rather meant
to emphasize the different possibilities offered by the
environment for automatic solving of partitioning
problem.
The tests presented here have been performed on an Intel
Pentium II, 333MHz with a 128 MO RAM.
For the algorithm evaluation, several benchmarks have
been used, but for space reasons, we focus on a particular
one, which is a real program. It represents the entities of
a genetic algorithm used in PARME environment. The
objective is to accelerate the algorithm that is, despite its
rather small size, very time-consuming. This benchmark
is composed of 10 entities. The target architecture is
composed of an instruction-set processor and a hardware
processor.
The cost function is a linear combination of the three
main criteria (respectively: space, execution time and
communication): f(x,y,z) = x+y+z.
The application of the exact method to the refined
benchmark is possible, since the number of solutions is
relatively low (1024). By exact method, we mean the
exhaustive browse of the solution space. This method can
be used for benchmarks that does not exceed 220
combinations. It is nevertheless clear that the application
of a manual method is impossible. The results achieved
by the exact method confirm the complexity of the
problem: there exist only two feasible (positive cost)
solutions to this problem. The first one, at the 381st
iteration gives the following cost: 232299.984375
The second one is the optimum, achieved at the 943rd
iteration and giving the following cost: 213177.
The fact that we knew by advance the optimal solution,
by applying the exact method, allows us to study, in the
following sections, two of the several heuristics that were
implemented in PARME environment, and their
parameters.
4.1. Application of a genetic algorithm alone
Table 1 summarizes some of the results obtained using
the genetic algorithm (GA) with different strategies and
sets of parameters. The selection method used in all cases
is "the lottery wheel". Notice that, in all the cases, the
initial population was randomly generated.
The performed simulations show that the solution
"0000000000" (which means that all the entities have
been mapped on the software processor named “0”), that
has a cost of –27902, is a local optimum towards which
the algorithm often quickly converges, if the parameters
are not well chosen. The mutation and crossover
operators, when used alone, are not sufficient to get out
this local optimum trap (simulations 2 and 3 in table 1).
It is nevertheless possible, for this benchmark, to achieve
one of the feasible solutions, despite the random
generation of the initial population. The parameters
allowing to achieve these results are summarized by
simulation 4 in table 1.
Simulation 3 shows that too small crossover and
mutation parameters avoid to diversify the population,
and leads to an early convergence towards solutions that
always appear to be, in the case of this benchmark, local
optima.
Too high values, avoid the algorithms to correctly
explore the solution space, by forcing it to change too
frequently the search region, and give the result that we
can observe in the first simulation of table 1, where the
crossover and mutation probabilities are both equal to 1.
Notice that no feasible solution has been reached for
populations under the size of 25. The influence of the
population increase on the execution time is obvious and
easy to understand, the operations on each generation
being repeated a greater number of times.
NG
100
S
1
A
0.01
C
M
EC
It
ET
1
1/
-70846 1 1.96
CRP CRP
50 --0.9 0.07/ -27902 5 8.97
CRP CRP
100 1 0.01 0.5 0.01/ -27902 5 11.92
CRP CRP
60 1 0.01 0.9
0.1/ 213177 7 12.96
CRP CRP
100 5 0.01 0.9 0.09/ 213177 8 15.97
CRP CRP
100 1 0.01 0.9
0.9/ 213177 12 20.95
CRP CRP
25 1 0.01 0.9/ 0.1/ -27902 7 13.94
NBS NBS
NG: Number of Generations; S: Sigma (Sharing); A:
Alpha (Sharing); C: Crossover probability and
replacement technique; M: Mutation probability and
replacement technique; NBS : N Best Selection (The N
best individuals are selected) ; CRP : Children Replace
Parents; EC: Elite solution Cost; It: Iteration number;
ET: Execution Time (sec.);
The Iteration Number and Execution time correspond to
the iteration at which the best solution is achieved;
The population size is always set to 100;
Table 1: Genetic algorithm results.
Since it was difficult to get the optimum in 100 per cent
of the simulations, because of the premature convergence
of the GA, we decided to make different heuristics
cooperate. Next section gives some simulation results.
4.2. Parallel hybridization of a taboo search
and a Genetic Algorithm
The next step consisted of making heuristics cooperate in
a parallel manner, exchanging their intermediate results.
The simulations presented here uses two heuristics: a
genetic algorithm and a taboo search. The master
processor is on a server machine, while processor P0
(running the genetic algorithm) and P1 (running the
taboo search algorithm) are ran on client machines. The
following parameters are used:
P
0
Number of generations
Population size
Sharing
Frequency of solution sending to master
Selection method in entry FIFO
Selection method
Uniform crossover with probability of
Replacing technique after crossover
Mutation probability
Replacement method after mutation
P
1
Taboo list type
Taboo list size
Number of iterations
Selection method
Frequency of sending to master
Number of iterations to explore input FIFO
Selection method in the input FIFO
50
100
Yes
4
best solution
lottery wheel
0.9
children replace
parents
0.07
children replace
parents
Static
7
50
best move
5
5
best solution
Table 2 gives some results of such hybridization.
Processor
P0
P1
Iteration
27
30
Cost of elite
solution
Average
Time
213177
213177
40.87
9.6
P0: Genetic Algorithm; P1: Taboo Search.
Table 2: Hybrid execution results.
Table 2 shows that the optimal solution was achieved in a
100% of the simulations, using this
parallel
hybridization scheme between the two heuristics
exchanging solutions every five iterations. The Genetic
algorithm found the best solution earlier than taboo
search. The average times (time taken to get the
optimum) show the slowness of GA.
5. Conclusion
This paper presents a parallel environment called
PARME that constitutes a research platform that allows
the user to:
- describe new partitioning heuristics;
- test heuristics, sets of parameters, strategies, in order to
determine the "good choices" for each class of problems
encountered;
- experiment different hybridization scenarios, allowing
different heuristics to simultaneously work and
collaborate in order to take profit of the benefits of each
method, and overcome its weaknesses.
First simulated on a single machine, this environment is
implemented using a cluster of PCs according to a clientserver scheme.
PARME is actually used to experiment different
hybridization schemes and parameter sets, trying to solve
NP-Complete problems encountered in various areas:
Sat/Max-sat, Data mining…
References:
[1] KUMAR S. AYLOR J.H. JOHNSON B. AND WULF
W.A., "The Codesign of Embedded Systems",
Kluwer Academic Publishers, 1996.
[2] DE MICHELI G. AND GUPTA R., "Hardware/Software
Co-design", Proceedings of the IEEE, Vol.85, N°3,
pp.349-365, 1997.
[3] THEISSINGER M., STRAVERS P. AND VEIT H.,
"CASTLE: an interactive environment for HW-SW
co-design", Proceedings of the Third International
Workshop on Hardware/Software Codesign, pp.203209, 1994.
[4] ISMAIL T.B., ABID M., O'BRIEN K. AND JERRAYA
A.A., "An approach for hardware-software
codesign", Proceedings of the Fifth International
Workshop on Rapid System Prototyping, Shortening
the Path from Specification to Prototype, pp.73-80,
1994.
[5] EDWARDS M.D., "A development system for
hardware/software cosynthesis using FPGAs",
Second IFIP International Workshop on Hw-Sw
Codesign, 1993.
[6] D'AMBROSIO J.G. AND HU X., "Configuration-level
hardware/software partitioning for real-time
embedded systems", Proceedings of the Third
International Workshop on Hardware/Software
Codesign, pp.34-41, 1994.
[7] POP P., ELES P. AND PENG Z., " Scheduling driven
partitioning of heterogeneous embedded systems",
Dept. of Computer and Information Science,
Linköping University, Sweden, 1998.
[8] ADMAS J.K. AND THOMAS D.E., "Multiple-Process
behavioral synthesis for mixed hardware-software
systems", in Proc. 8th Int. Symp. On System
Synthesis, pp.10-15, 1995.
[9] KALAVADE A. AND LEE E.A., "The extended
partitioning problem: hardware/software mapping
and implementation-bin selection", Proceedings of
the Sixth International Workshop on Rapid Systems
Prototyping, Chapel Hill, NC, June 1995.
[10] GUPTA R.K., "Co-Synthesis of Hardware and
Software for digital embedded systems",
Amsterdam: Kluwer, 1995.
View publication stats
[11] JANTSCH A., ELLERVEE P., OBERG J., HEMMANI A.
TENBUNEN
H.,
"Hardware-software
AND
partitioning and minimizing memory interface
traffic", Proc. of the European Design Automation
Conference, pp.226-231, Sept. 1994.
[12] GAJSKI D.D., NARAYAN S., RAMACHANDRAN L.
AND VAHID F., "System design methodologies:
aiming at the 100h design cycle", IEEE Trans. on
VLSI Systems, Vol.4, N°1, pp.70-82, March 1996.
[13] HARTENSTEIN R., BECKER J. AND KRESS R., "Twolevel hardware/software partitioning using CoDeX", IEEE Symposium and Workshop on
Engineering of Computer-Based Systems, March
1996.
[14] CHATHA K.S. AND VEMURI R., “MAGELLAN:
Multiway Hardware-Software Partitioning and
Scheduling for Latency Minimization of
Hierarchical Control-Dataflow Task Graphs”,
Proceedings of 9th International Symposium on
Hardware-Software Codesign (CODES 2001), April
25-27, Copenhagen, Denmark, 2001.
[15] BOLCHINI C., POMANTE L., SALICE F. AND SCIUTO
D., “H/W embedded systems: on-line fault detection
in a hardware/software codesign environment:
system
partitioning”,
Proceedings
of the
International Symposium on Systems Synthesis,
Vol. 14 September 2001.
[16] DOURS D. ET AL., "Estimations rapides pour le
partitionnement fonctionnel de systèmes temps-réelRENPAR'14,
ASF,
stricts
distribués",
SYMPA,Hamamet, Tunisie, 10 - 13 avril 2002.
J.
AND
BADIA
R.M.,
“A
[17] NOGUERA
hardware/software partitioning algorithm for
dynamically
reconfigurable
architectures”,
Proceedings of the International Conference on
Design Automation and Test in Europe (DATE’01),
March 2001.
[18] SRINIVASAN I., RADHAKRISHNAN S. AND VEMURI
R., “hardware/software partitioning with integrated
hardware design space exploration”, Proc. of the
International Conference on Design Automation and
Test in Europe (DATE’98), pp. 28, February 1998.
[19] VAHID F., “A three-step approach to the functional
partitioning of large behavioural processes”, Proc.
of 11th International Symposium on System
Synthesis, pp. 152-157, Dec. 1998.
[20] DOURS D. ET AL., "Parallélisation automatique pour
la conception de systèmes critiques sûrs", TSI VOL
20,N° 8, P. 1075-1100, 2001.
[21] BENATCHBA K., KOUDIL M, DRIAS H., OUMSALEM
H. ET CHAOUCHE K., "PARME un environnement
pour la résolution du problème Max-Sat", CARI'02,
OCT. 2002 .
Download