LEARNING AS APPLIED TO SIMULATED ANNEALING

advertisement
Learning as Applied to Stochastic Optimization for Standard Cell Placement
Lixin Su, Wray Buntine, A. Richard Newton, and Bradley S. Peters
University of California at Berkeley, Department of EECS
Email: lixinsu@eecs.berkeley.edu
Abstract
Although becoming increasingly important, stochastic
algorithms are often slow since a large number of
random design perturbations are required to achieve an
acceptable result—they have no built-in “intelligence”.
In this work, we used regression to learn the swap
evaluation function while simulated annealing is applied
to 2D standard-cell placement problem. The learned
evaluation function is then applied to the trained
simulated annealing algorithm (TSA). The annealing
quality improvement of TSA was 15% ~ 43% for the set
of examples used in learning and 7% ~ 21% for new
examples. With the same amount of CPU time, TSA could
improve the annealing quality by up to 28% for some
benchmark circuits we tested. In addition the use of the
evaluation function successfully predicted the effect of the
windowed sampling technique and derived the informally
accepted advantages of windowing from the test set
automatically.
1. INTRODUCTION
Stochastic combinatorial optimization techniques, such
as simulated annealing and genetic algorithms, have
become increasingly important in design automation as
the size of design problems have grown and the design
objectives have become increasingly complex. In
addition, as we move towards deep-submicron
technologies, the cost function must often evolve or
handle a variety of tradeoffs between area power, and
timing, for example. Design technologists can often tackle
simplified versions of some of these problems, where the
optimizing objective can be stated in terms of a small
number of well-defined variables, using deterministic
algorithms. Such algorithms can produce as good or even
better results than the stochastic approaches in a shorter
period of time. Unfortunately, these algorithms run into a
variety of difficulties as the problems scale up and the
objective functions begin to capture the real constraints
imposed on the design, such as complex timing, power
dissipation, or test requirements. Stochastic algorithms
are naturally suited to these large and more complex
problems since they are very general, making random
perturbations to designs and, in some regular fashion,
letting a cost function determine whether to keep the
resulting change.
However, stochastic algorithms are often slow since a
large number of random design perturbations are required
to achieve an acceptable result—they have no built-in
“intelligence”. The goal of this research was to determine
whether statistical learning techniques can improve the
run-time performance of stochastic optimization for a
particular solution quality. In this paper, we present
results for simulated annealing as representative
stochastic optimization approach and the standard-cellbased layout placement problem was selected to evaluate
the utility of such a learning-based approach. The
standard cell problem was selected since it is a very well
explored problem using both deterministic as well as
“manually trained” stochastic approaches. While the
longer-term goal of this research is to apply incremental
probabilistic learning approaches to the stochastic
optimization, such as Bayesian [13,21] learning, in this
paper we report results for a regression-based empirical
approach to learning, to determine whether the overall
approach had a basis for further research.
Stochastic placement algorithms have evolved
significantly since their initial application in the EDA
area more than a decade ago[1]. Over that period, the
qualities of results they can produce have improved
significantly. For example, in the development of
TimberWolf system [6-8], which is a general purpose
placement and routing package based on simulated
annealing, many approaches have been tried to speed up
the algorithm. They include reducing the computation
time of each move, early rejection of bad moves, the use
of efficient and adaptive cooling schedules combined
with windowed sampling, and hierarchical or clustered
annealing. In many ways, these variations and
improvements can be viewed as “manually learned”
approaches, based on the application of considerable
experimental as well as theoretical work taking place over
a long time.
In this work, we explore another opportunity for
improving the utility of a stochastic algorithm through
automatic learning of the relative importance of various
criteria in the optimization strategy. We learn from
previous annealing runs to distinguish potentially good
moves from bad ones. The good ones will be selected
with a higher probability to expand the search.
Some preliminaries of both simulated annealing and
linear regression are presented in the next section. In
Section 3, we describe in detail how we apply linear
regression as our learning technique and how we apply
the learned information in the modified simulated
annealing. Section 4 contains our experimental results and
in Section 5 we present our conclusions and areas for
future work.
2. SIMULATED ANNEALING AND REGRESSION
There are many versions of the simulated annealing
algorithm [14, 17-19]. All of them require the definition
of the move set, which is the set of local perturbations
that can be made to the current solution. The algorithm is
composed of a series of Metropolis procedures [9,16].
Each Metropolis procedure is composed of a series of
moves, which are picked up from the move set according
to the so-called proposal distribution [5]. Each of the
moves will be accepted if it can pass a Boltzmann test [1],
which is controlled by a parameter called temperature;
otherwise, it is rejected. The algorithm starts from an
initial solution, goes through all the perturbations, when it
stops, the last or best ever seen solution will be returned
as the final solution.
The move set totally defines the structure of the solution
space, which can be represented as a graph, where the
nodes in the space represent solutions, and the edges
represent specific moves leading from one solution to
another. In our implementation of the simulated annealing
placement algorithm, pair-wise cell swaps constitutes the
move set.
Linear regression is a well-developed statistical
technique for data fitting, it can establish the response
model between the inputs and the output taking the
 

following form: y  P   [12]. P is called the design

matrix. By linear regression, we estimate  as
ˆ
  ( P P ) 1 P y , which makes the Residual Sum of Squares
ˆ 
ˆ

RSS  ( y  P )( y  P ) minimum, where ^ indicates an
estimated value. For additional properties of this estimate,
Page 2
we refer the reader to [12]. The following fact will used
in the discussions later in this paper.
Fact 1: If any parameter in the model is scaled by  , the
corresponding ̂ will be scaled by  1 , all the other
ˆ ' s and the predicted response will remain unchanged.
3. LEARNING
Learning [11,24], in this paper, means the construction
of the evaluation function of the swaps based on the past
annealing experience. The deduction procedure of the
evaluation function we use in this work is linear
regression, however incremental/adaptive approaches, in
particular those based on Bayesian theory [13,21] or
Support Vector Machines (SVM) [24], offer significant
potential. Learning is appealing to us because the
evaluation function need not be obtained by explicit
insight into the problem and explicit programming.
Rather, the problem-specific details, or even better the
problem-domain-specific
details,
are
extracted
automatically from the history of running the optimization
method on representative design problems and observing
the quality of the results.
In conventional simulated annealing, the Boltzmann test
is the only guidance used during the search procedure.
Given a finite amount of computation time, it is clear that
this guidance alone is not sufficient to yield an efficient
search and so we employ an additional evaluation
function. The evaluation function is used to help us
locally expand the node. In choosing an evaluation
function, the goal is that it is able to guarantee a highquality final solution. For our test problem, the evaluation
of the swap should reflect the swap's contribution to the
quality of the final solution. Regression, with the final
annealing quality as its response, was chosen as our
learning engine. Each swap is characterized as a
parameter vector, which serves as the input to the
regression model.
3.1 The form of the response model
To this end, first we run the conventional simulated
annealing and collect a variety of data that is used to
perform the linear regression and establish the response
model, which ideally should take the following form:
y  0 
t
q
l
 
i 1
j 1 k 1
i
jk
p ijk
(1)
In our conventional implementation of simulated
annealing, we have t=300 temperatures and l=20 swaps
for each temperature. Each move is characterized with
q=7 parameters.
p ijk is then the jth parameter of the kth
swap tried at the ith temperature if the swap is accepted,
otherwise it is set to zero. y is the final cost function
returned by each simulated annealing run. Hence the
model tries to correlate each accepted swap with the final
solution quality. In order to have more flexibility and
reduce the number of predictors, we evenly divide the
temperatures into r = 10 ranges [10], and assume
that  ijk   jg if temperature i belongs to the gth range.
modification on our placement test problem than the
theoretical issues. In this paper the ChooseSetSize is
selected to be 5 [10].
SmartMove {
for(i=1,ChooseSetSize )
{ mi  RandomMove;
y (m i )
So the response model now takes a new form:
y  0 
r
q
i 1
j 1
 
i
j
Pji
r
r
// r : current temperature range}
m  m | y ( m ) is min
(2)
return(m );}
where Pji is the sum of the jth parameter of all accepted
swaps in the ith temperature range. Clearly, all swaps
tried in the same temperature range were treated equally.
3.2 Model parameter definition
The cost function for the placement problem is defined
as the sum of half perimeters of bounding boxes of all
nets in the net-list. The swap parameters we selected for
our training experiment pi , i  1,...7, are defined as
follows. For detailed definition, please see the full version
of the paper [25].
p1 : The Euclidian distance between cells c1 and c2
p2 : The square of the distance between c1 and the origin
(0,0) (a control parameter)
p3 : The square of the distance between c2 and the origin
(0,0) (a control parameter)
p4 : The connectivity between
  1 p1i  ...   q p qi ;
c1 and c2
p5 : The external connectivity of the cells in the
candidate move.
p 6 , p 7 : Total force change on the swapped cell [15,25]
All the above parameters and the final annealing quality
were normalized to be between 0 and 1. As the first step,
an individual model is constructed for each specific
circuit. We expect that the normalization help the
individual model less instance specific.
3.3 Application of the evaluation function
The trained simulated annealing is the same as the
conventional one except that SmartMove, shown in
Figure 1, is used instead of RandomMove when a new
move is proposed for the Boltzmann test. At each
temperature, the walk in the solution space is actually a
Markov chain. This modification of the algorithm [5]
makes it extremely hard to do theoretical analysis based
on the Markov chain theory [2-4]. However, we would
rather be more concerned with the practical effect of the
Page 3
Figure 1: SmartMove
However, is it possible to use only part of the response
model as the evaluation function? From the scaling
independence of the model parameters as summarized by
Fact 1 in Section 2, the ith segment of the model actually

correlates  p
 with the final annealing quality. The
average of the parameter vector was done over all the
accepted swaps in a certain temperature range. So this
segment actually captured the “average” information of
all the accepted swaps in that temperature range, hence it
was eligible to be used as the evaluation function.
Another question is that, since the input to the learned
model is the summation of accepted swap parameters, can
a single swap parameter be used as input in the evaluation
function?
Recall
that
given
the
assumption
that  ijk   jg , then Equation (1) and Equation (2) are
equivalent. Hence, the evaluation function can be seen as
the contribution to the final cost function of a single
accepted swap.
4. EXPERIMENTAL RESULTS
We compare three variations of standard cell
placement: the Conventional Simulated Annealing (CSA),
the Trained Simulated Annealing (TSA) where the  's
are obtained by running CSA on each of the eight
benchmark circuits (Group 1), and an Untrained version
of Simulated Annealing (USA), where the  's are all
equal and set to 1. The following are the benchmark
circuits used in this paper.
NAME
CELLS
NETS
SOURCE
GROUP
C5315
C3540
I10
Struct
C6288
Industry1
729
761
1269
1888
1893
2271
907
811
1526
1920
1925
2593
1
1
1
2
1
2
1
1
1
1
1
1
Primary2
Biomed
Industry2
Industry3
Avq.small
Avq.large
S9234
S15850
S38417
2907
6417
12142
15059
21854
25114
979
3170
8375
3029
5737
13419
21940
22118
25378
1007
3183
8403
2
2
2
2
2
2
3
3
3
For each fixed number of SPT, the annealing quality of
TSA was recorded. In this paper, annealing quality means
the average of the final cost function values returned by
annealing runs so that the accuracy is kept within 1%
[25]. Meanwhile, CSA was also similarly conducted. The
results are summarized in Table 2. For each SPT, we
compute the ratio of the annealing quality returned by
TSA and CSA. (1- ratio) is the Percentage Improvement
(PI) of the annealing quality of TSA over CSA. Both the
best and worst PI’s are summarized in Table 2.
1
1
2
2
2
2
2
2
2
Table 1: benchmarks used in experiments
The circuits from Source 1 were taken from mcnc91
combinational benchmark set, the circuits from Source 2
were taken from [26]. The circuits from Source 3 were
taken from ISCAS89 sequential benchmarks.
All the above benchmark circuits are divided into two
groups. Group 1 contains circuits used to construct the
response models, Group 2 consists of those only used to
“blind” test the generality of the trained model.
Parameters in the experiment such as the number of
temperature regions, the choose set size, and so on were
selected after extensive empirical analysis that
demonstrated that these values were sufficient to
represent the problem domain well [10].
4.1 Self test of the individual and general model
First, we constructed individual models for each of the
circuits of group 1. The learning data were obtained by
running CSA 5,000 [25] times for each of the circuits.
The 300 temperatures in CSA were evenly divided into
10 ranges. The number of swaps per temperature was
determined to be 20 considering the size of the net-lists.
Although around 30 million swaps were tried during the
learning phase, it’s still a small number compared to the
total number of possible swaps N ! N ( N  1) / 4 . Here,
N is the number of cells in the net-list, which is at least
700 for our experiments. Hence it can’t be taken for
granted that the individual model can help improve the
annealing quality even for the same individual circuit. So,
we decided to apply each individual model to the
individual circuit by running TSA to see if the individual
model is robust for the whole huge solution space.
Second, we used simple averaging of the parameters
across the test circuits to build the overall general model.
While there are more effective statistical approaches to
the combination of the individual models, simple
averaging can be seen as a simple and sub-optimal
approach. This general model was then applied to all the
circuits of group 1 by running TSA in order to see if the
general model works as well as the individual model
does.
The above TSA was conducted with the variation of the
number of Swaps Per Temperature (SPT) from 10 to 800.
Page 4
Example
C5315
C3540
I10
Struct
C6288
Industry1
Primary2
Biomed
TSA with Individual Model
Best PI
Worst PI
25%
15%
34%
24%
25%
18%
39%
26%
42%
23%
33%
20%
43%
28%
38%
17%
TSA with General Model
Best PI
Worst PI
25%
15%
34%
24%
25%
18%
39%
26%
42%
23%
34%
20%
44%
18%
39%
17%
Table 2: Comparison of TSA, CSA for group 1
TSA with both the individual and the general model
works pretty well within a large range of SPT although
the models were learned with SPT = 20. With the same
number of steps walked in the solution space, the
annealing quality returned by TSA is at least 15% better
than that returned by CSA.
4.2 Blind test of the general model
Next we tested the generality of the general model in a
more significant way. The general model was applied to
circuits in Group 2 which have nothing to do with the
learning of the model. The experiments were conducted
similarly as what we did for the Group 1 circuits, except
that only the general model was used in TSA. The results
are summarized in Table 3.
circuit
Best PI
Worst PI
Circuit
Best PI
Worst PI
Industry2
Industry3
Avq.small
Avq.large
40%
37%
35%
34%
10%
7%
8%
7%
S9234
S15850
S38417
37%
41%
38%
27%
21%
14%
Table 3: Comparison of TSA, CSA for group 2
With the same number of steps walked in the solution
space, TSA with the general trained model improved the
annealing quality by 7% ~ 41% compared to CSA. Notice
that except for s9234, s15850 and s38417, all the other
“blindly” tested circuits are 1 ~ 3 times larger than the
largest circuit in group 1 from which the general model
was trained.
4.3 Information captured by the model
We were concerned about whether the model had really
learned something general and of use. The untrained USA
approach was used to determine whether the choose-set
itself was playing a key role in the improvement, rather
than the training. For a representative example, say
C6288, the quality of CSA and USA was almost the
same, with TSA again producing significantly better
results. For the data in the following table SPT is 20 and
the general model is used in TSA.
CSA
USA
TSA
Min
Mean
Max
Std
38484
37779
27805
39293
38835
28686
39915
39993
29426
347
361
343
Based on the two relationships, for a given CPU time,
TSA always returns better annealing result than CSA.
The best percentage improvement for each circuit was
recorded and is summarized in Table 5. For the same
amount of CPU time, the annealing quality was improved
from 12% ~ 22% for all the Group 2 circuits using TSA
instead of CSA. For Group 1, circuits, the best percentage
improvement of final quality was 14% ~ 28%.
Circuit
Industry2
Industry3
Avq.small
Avq.large
Industry1
Best PI
12%
14%
15%
14%
18%
circuit
S15850
S38417
C5315
C3540
I10
Best PI
21%
17%
15%
23%
14%
circuit
struct
C6288
S9234
Primary2
biomed
Best PI
20%
28%
22%
32%
18%
Table 5: Best percentage annealing quality
improvement for given CPU time
Table 4: Distribution of solutions of TSA, CSA,
and USA
Furthermore we investigated a typical TSA annealing
run for avq.small. SPT was set at 70 and at each
temperature the average distance between the two cells of
the swap proposed by the model right before the
Boltzmann test was recorded. Over the years, researchers
have found that a windowing approach, where the
maximum distance between cells in the candidate move is
decreased as temperature decreases, tends to improve the
overall run-time performance of the annealing at little or
no cost in the quality of the final result [1,6]. As the data
in Figure 2 illustrates, the general model derived purely
from the training runs on the test examples, derived
without any a priori hints that a windowing approach
would lead to an optimal utility result. Our approach
indeed has learned something nontrivial!
4.4 CPU timing analysis
The use of response model in the TSA introduces some
computational overhead, since it performs learning, swap
evaluation/selection, and extra updating of the internal
data structure if the swap is accepted. Because of the
generality of the learned model, training need only be
performed once so it is not a real problem.
In our implementation, the overall CPU time can be
controlled by changing SPT. CSA was run with SPT
varying from 10 to 1,500, TSA was run with SPT varying
from 10 to 800 using the general model. Both the
annealing quality and CPU time were recorded. For each
circuit, we established the piecewise linear relationship
between annealing quality and CPU time through linear
interpolation of the data points for both CSA and TSA.
Page 5
Figure 2: Average proposed swap distance as a
function of temperature for TSA on avq.small
5. CONCLUSIONS AND FUTURE WORK
We have demonstrated that a stochastic algorithm, in
this case simulated annealing, can be trained using prior
examples to improve its quality-of-results on a
mainstream EDA application by 7% ~ 43% over a
conventional approach with the same number of moves in
the solution space. For a given CPU time, TSA can
outperform CSA by up to 28%. A simple regression
model was used to train the algorithm, however we
believe that incremental probabilistic approaches to
training, such as the application of Bayes' Rule or SVM's
[24], can also be used. We demonstrated the robustness of
the approach by using representative "blind" examples to
test the general model.
Although we did not intend to make our TSA to beat
the best existing simulated annealing algorithms for
standard cell placement. Instead, we did show that TSA
with learned information outperformed CSA significantly.
We believe that despite our simple implementation of
CSA, learning has revealed some useful properties of the
solution space (within the framework of simulated
annealing) for the standard cell placement problem.
Hence this learning approach can now be applied to more
complicated simulated annealing algorithms [20,22,23] to
improve their performance. We also believe the
application of general-purpose stochastic algorithms, with
built-in general-purpose approaches to learning, could
eventually form the basis of a general and adaptive
approach to the solution of a variety of VLSI CAD
problems.
6. ACKNOWLEDGEMENTS
This work was supported in part by Semiconductor
Research Corporation under contract DC-324-030, by the
Digital Equipment Corporation and by Intel. The authors
sincerely thank them for their continuing support.
7. REFERENCES
[1] S .Kirkpatrick, C. D. Gelatt, and Jr., M. P. Vecchi,
“Optimization by simulated annealing”, Science, vol.220,
pp.671-880, 13 May 1983
[2] D. Mitra, F. Romeo, et al. “Convergence and finite time
behavior of simulated annealing”, Advances in Applied
Probability, 18:747-771, 1986
[3] P. J. M. van Laarhoven and E. H. L. Arts, “Simulated
annealing: theory and applications”, Kluwer Academic
Publishers, 1987
[4] Alistair Sinclair and Mark Jerrum, “Approximate counting,
uniform generation and rapidly mixing markov chains”,
Information and Computation, 82:93-133 (1989)
[5] Brian D. Ripley, “Stochastic Simulation”, John Wiley &
Sons, 1987
[6] Carl Sechen and Alberto Sangiovanni-Vincentelli, “The
TimberWolf placement and routing package”, IEEE
Journal of Solid-State Circuits, vol. sc-20, No. 2, April
1985
[7] Wern-Jieh Sun and Carl Sechen, “Efficient and effective
placement for very large circuits”, IEEE Transactions on
Computer-Aided Design of Integrated Circuits and
Systems, vol.14, No.3, March 1995
[8] William Swartz, et al. “Timing driven placement for large
standard cell circuits”, Proceedings of the 32nd Design
Automation Conference, pp.211-215, 1995
[9] D. F. Wong, et al. “Simulated annealing for VLSI design”,
Kluwer Academic Publishers, 1988
[10] Bradley S. Peters, Lixin Su and Richard Newton,
“Improvement of stochastic optimization through the use
of learning”, UC-Berkeley SRC Review, October 1995
[11] L. G. Valiant, “A theory of the learnable”,
Communications of the ACM, vol.27, November 1984
Page 6
[12] G. A. F. Seber, “Linear Regression Analysis”, John Wiley
& Sons, 1977
[13] Kai-Fu Lee and Sanjoy Mahajan, “A pattern classification
approach to evaluation function learning”, Artificial
Intelligence, 36 (1988) 1-25
[14] David Aldous and Umesh Vazirani, “Go with the winners
algorithms”, Proceedings. 35th Annual Symposium on
Foundations of Computer Science, pp.492-501, 1994
[15] M. Hannan, P. K. Wolff and B. Agule,
“Some
experimental results on placement technique”, Proc. 12th
Design Automation Conference, 1976, pp. 214-244
[16] N. Metropolis, et al. “Equations of state calculations by
fast computing machines”, J. Chem. Phys. 21(1953), 108792
[17] J. W. Greene and K. J. Supowit, “Simulated annealing
without rejected moves”, International Conference on
Computer Design, 1984, 658-63
[18] T. P. Moore and Aart J. de Geus, “Simulated annealing
controlled by a rule-based expert system”, International
Conference on Computer Aided Design, 1985, 200-202
[19] S. Mallela, et al. “Clustering based simulated annealing
for standard cell placement”, 25th ACM/IEEE Design
Automation Conference, 1988, 312-317
[20] E. H. L. Aarts and P. J. M. van Laarhoven, “A new
polynomial-time cooling schedule”, International
Conference on Computer Aided Design, 1985, 206-208
[21] S. James Press, “Bayesian statistics: principles, models,
and applications”, New York: Wiley, c1989
[22] M. D. Huang, F. Romeo and A. Sangiovanni-Vincentelli,
“An efficient general cooling schedule for simulated
annealing”, International Conference on Computer Aided
Design, 1986, 381-384
[23] Jimmy Lam, Jean-Marc Delosme, “Performance of a new
annealing schedule”, 25th ACM/IEEE Design Automation
Conference, 1988, 306-311
[24] Vladimir N. Vapnik, “The nature of statistical learning
theory”, Springer, 1995
[25] http://www-cad.eecs.berkeley.edu/~lixinsu
[26] http://www.cbl.ncsu.edu/benchmarks
Download