A reliability-driven placement procedure based on thermal force model

advertisement
A reliability-driven placement procedure based on thermal force model
Jing Lee
Department of Electronic Engineering
Southern Taiwan University of Technology
1 Nan-Tai St, Yung-Kang City, Tainan Hsien, Taiwan 710, R.O.C.
Email: leejing@mail.stut.edu.tw
Abstract
This paper deals with placing chips on an MCM module in chip array style for
minimizing the system failure rate. The placement procedure begins with constructing an
initial placement based on cooling considerations. Then, a thermal force model is presented to
transform the reliability-driven placement problem to solve a set of simultaneous nonlinear
equations to determine thermal-force-equilibrium locations of the chips. A modified
Newton-Raphson method is used to solve this system of equations. Finally, a chip assignment
procedure transforms the thermal-force-equilibrium placement into an array style placement
for minimum thermal distortion. Two assignment methods are developed and compared each
other. Experiments on three industrial MCMs designed by IBM show that the obtained
placements have significant improvements to their original designs in system reliability.
Additionally, a simulated annealing approach is presented for justifying the performance of
the proposed method.
Index Terms—Force-directed placement, reliability, thermal force, thermal placement.
1. Introduction
A Multi-Chip Module (MCM) considered in this paper is described as a package
combining multiple chips into a single system-level unit. The resulting module is capable of
handing an entire function. MCMs provide a very high level of system integration, with
hundreds of bare chips that can be placed very close to each other on a substrate. Therefore,
systems based on MCM architectures can achieve much denser circuits and much shorter
1
interconnect distances among the chips than those in which chips are packaged in a single
chip module and placed on PCBs. However, this denser integration results in higher heat flux
densities at the substrate and creates a very challenging thermal management problem. For
example, the IBM’s S/390 Servers that have 35 chips mounted on a 12.7cm × 12.7cm
substrate dissipate 1274 W per module [1]. If the dissipated heat is not properly removed,
higher operating temperatures can occur. A higher temperature not only affects circuit
performance directly by slowing down the transistors on chips, but also decreases their
reliability. As a result, supporting high heat fluxes while maintaining relatively low chip
temperatures is one of the major challenges facing today’s MCM system designers [2], [3].
The MCM placement problem is to assign the exact locations of chips on a substrate
subject to timing, thermal, and routability constraints [4], [5]. Most of the previous placement
methods used for MCM are extensions of well-known methods from the VLSI domain or the
PCB area [6], [7], which are mainly focused on routability. However, temperature distribution
on an MCM substrate is the most important reliability factor. It is conceivable that a
placement tool without thermal considerations could place some chips with high heat
dissipation closely spaced together. This would result in hot spots on the substrate, even
though the total power consumption is constrained. To overcome the problem of overheating
it is essential to develop good chip placement techniques for optimizing the system reliability,
which is usually referred as the thermal placement problem.
There are mainly two types of chip placements related to the MCM design, namely, full
custom style and chip array style. In the full custom style placement, the active substrate is
treated as a continuous plane on which chips of varying sizes and shapes are free to reside
anywhere on the active substrate as shown in figure 1(a). On the other hand, in the chip array
style placement, the active substrate is partitioned into a matrix of identical chip sites into
which the chips are placed as shown in figure 1(b). Noticed that the pitch (i.e. center-to-center
2
spacing) of the chip sites in x-direction can be different from the pitch in y-direction.
Chips
px
py
Su bst rat e
Substrate
C h i p sC h i p s i t eAs c t i v e
substrate
Active
substrate
(a) Full custom style
Fig. 1.
(b) Chip array style
Two types of MCM placements.
Basically, placements of different styles need different placement algorithms. Previous
studies on the thermal placement problem of MCM thus fall into two major categories:
iterative-based approaches for chip array style placements and force-directed algorithms for
full custom style placements. The iterative-based approaches consist of simulated annealing
approaches [8]-[10] and hybrid genetic algorithms [11], [12]. Force-directed algorithms
include the fuzzy-force [13] and the thermal-force [14] algorithms.
In the study, an extension of the previous thermal-force algorithm in [14] is developed to
cover the chip array style thermal placement problem. This method generates excellent
solutions both effectively and efficiently. Another important merit is that the proposed thermal
force model is easily combined with other force models developed for the objects of
routability and performance [15]-[18]. So, a multiobjective optimal placement problem can be
modeled by a hybrid force model that is a combination of different force models, and solved
by the same technique presented in the paper. In addition, a simulated annealing approach is
also presented for justifying the performance of the proposed method.
The rest of this paper is organized as follows: some preliminary knowledge, such as
3
problem description, reliability evaluation, packaging structure, and temperature calculation
are provided in Section 2. The thermal placement algorithm based on a modified thermal
force model is presented in Section 3. Simulated annealing approach is presented in Section 4.
Examples with computational results are given in Section 5. Conclusions are drawn in Section
6.
2. Preliminaries
2.1 Problem description
The chip array style thermal placement problem can be stated briefly as follows: given a
set of chips C  ci 1  i  m with its set of heat dissipations Q  qi 1  i  m and a set of
chip sites S  s j 1  j  n, n  m on a two-dimensional substrate as shown in figure 1(b),
assign each chip to one of the chip site such that the system failure rate is minimized. For
most practical cases, some chips may have been pre-assigned to some chip sites for timing or
cooling considerations. These chips are called fixed chips; the others are called movable chips.
Both types of chips are considered in the study.
2.2 Reliability evaluation
It is well known that most of the physical and chemical processes that can cause
component failure are usually accelerated at elevated temperatures [19]. In addition, the
unevenly distributed power dissipation of chips on a substrate may result in hot spots, which
can induce thermal stresses. When the stresses are severe enough and/or go through enough
cycles, they can cause chip failure, usually by rupturing of the solder joints [20]. Hence,
temperature is generally considered as a key parameter in failure mechanisms. An Arrhenius
relation has generally been adopted to model the strong dependency of failure rate with
temperature,
E  1
1 
 (Ti )   (Tr )  exp a   
 k  Tr
4
Ti 
(1)
where (Ti) and (Tr) are the failure rates of an individual chip at a temperature of Ti K and at
a reference temperature of Tr K, respectively; Ea is the activation energy (eV); k is the
Boltzmann's constant. Obviously, to determine the failure rate of an individual chip, various
operating parameters need to be specified. Without loss of generality, in this paper, all chips
are assumed to have the same factors of (Tr) and Ea, which are 1 Fit (i.e. 10-9/hour) and 1 eV,
respectively. The objective is to minimize the system failure rate of an MCM, S, which is
given by the sum of the individual chip failure rates given in (1).
m
S    Ti 
(2)
i 1
2.3 Package structure and temperature evaluation
In order to estimate S, one has to know the temperature distributions of chips on the
substrate. Since package structure and cooling conditions directly influence the temperature
distributions, they must be determined firstly. However, a practical package structure usually
is not completely determined at the placement stage, and a practical structure also is too
complex to calculate the temperature profiles. So, a simplified package model, as illustrated in
figure 2, is presented for reducing the computation time of calculating the temperature
profiles. The package consists of a sandwich structure formed from the ceramic multiplayer
substrate-epoxy adhesive-aluminum heat sink with thicknesses of 3.75 mm, 0.076 mm, and
1.27 mm, respectively. Within each layer, the material is assumed to be linear, isotropic, and
homogeneous. Temperature and heat flow are continuous at interfaces between layers.
Thermal conductivities of the multilayer substrate, the epoxy layer, and the heat sink are 39.4
W/mK, 0.276 W/mK, and 195 W/mK, respectively. Chips are treated as heat fluxes directly
from the substrate. Heat loss from the package into the board and from the finned side is
quantified by heat transfer coefficients, htop and hbot. Air conduction or convection within the
5
space between the MCM ceramic surface and the cover is neglected (the worst case effect on
results). Adiabatic heat flux is used as a boundary condition at the substrate edges.
The TAMS (Thermal Analyzer for Multilayer Structures) program developed by Ellison
is used to calculate the temperature distributions of chips on the substrate. This computer
program can predict the steady-state temperature in four-layer-rectangular structures with
anisotropic conductivity, lumped thermal resistances, and planar-discrete sources [21].
htop
Epoxy
Heat sink
Multilayer
substrate
h bot
Fig. 2. Package model.
3. Thermal force placement algorithm
The complete placement approach, named Thermal Force Placement (TFP) algorithm,
consists of three phases: generating a ‘good’ initial placement in phase 1, solving the system
of thermal force equations for obtaining a thermal-force-equilibrium (TFE) placement in
phase 2, and transforming the TFE placement to a chip array style placement in phase 3.
3.1 Initial placement
Since a good initial placement usually can speed up the entire placement procedure and
generates a better final placement, it is desirable to construct a better initial placement. In
general, a good placement for reliability can be constructed by the following two rules:
Rule I. High power chips are preferred to be placed around the border of the substrate.
Rule II. Avoid placing high power chips on neighboring chip sites.
Rule I is based on the fact that chip sites at the border of a substrate have larger cooling
6
area than those in the inner region, since a real substrate always has a larger area than an
active substrate. Rule II is used to avoid hot spots that are usually generated by placing high
power chips closely to each other. A simple and effective initial placement algorithm based on
the above two rules is provided below.
Here, the configuration of chip sites is considered as a series of concentric rectangular
rings with a core. The initial placement algorithm begins with sorting the chips in descendent
order of their heat dissipations. That is qi  qi 1 for i = 1 to m-1. Then, for satisfying Rule I,
chips are assigned into chip sites from the outer ring to the inner ring. During the procedure of
ring assignment, every rectangular ring is further partitioned into four arrays of chip sites. As
an example, figure 3(a) shows a ring structure of a 6 6 matrix of chip sites. The chip sites
in the outer ring are partitioned into four arrays, S L  sL,i 1  i  5, S B  sB,i 1  i  5,
S R  sR,i 1  i  5 , and ST  sT ,i 1  i  5. After that, for satisfying Rule II, chips are
assigned into the four arrays one by one in a rotation order of SL, SR, ST, SB, SL, SR, ST, SB, and
so forth. For the chip assignment in an array, chips are assigned into chip sites on an alternate
order. In the case of figure 3(a), the chip assignment order in the rectangular ring is sL,1, sR,1,
sT,1, sB,1, sL,3, sR,3, sT,3, sB,3, sL,5, sR,5, sT,5, sB,5, sL,2, sR,2, sT,2, sB,2, sL,4, sR,4, sT,4, sB,4. As a result,
chip placement in Ring II is as depicted in figure 3(b).
For easily programming, all chips are assumed to be movable and all chip sites are
unoccupied at the beginning. If there are some fixed chips in the practical problem, the fixed
chip and the chip occupying the chip cite which is pre-assigned to the fixed chip have to swap
their positions after the above procedure.
7
Ring II Ring I
Core
sL,1 sT ,5 sT ,4 sT ,3 sT ,2 sT ,1
C1
sL,2
sR,5
C
13
C10
sL,3
sR,4
C5
C18
sL,4
sR,3
C17
C6
sL,5
sR,2
C9
C14
sR,1
C4
sB,1 sB,2 sB,3 sB,4
sB,5
(a) Configuration of chip sites
Fig. 3.
C11
C16
C19
C8
C7
C15
C20 C12
C3
C2
(b) Chips placement in Ring II
Initial placement.
3.2 Thermal-force-equilibrium placement
Thermal force model was first presented by the author for the thermal placement of the
full custom design style [14]. It is based on the observation that heat loss conducted from the
substrate edges is insignificant when compared to the heat loss flow from the top and bottom
sides of the substrate. Therefore, a rectangular substrate of several heat sources can be
transformed into an unbounded substrate containing an infinite number of mirror image heat
sources as shown in figure 4. The unbounded substrate has the same thermophysical
properties as the original bounded substrate [22] [23]. The mirror image substrate at the rth
row and the cth column is called the r-c-substrate. The chips on the r-c-substrate are denoted
by the superscript of (r, c).
In the situation of unbounded substrate, the temperature rise of a considered chip is
results from two sources: the heat generated by itself and that conducted from other chips
(including the image heat sources). The heat conducted from other chips is analogy as
repulsion forces to push the considered chip in the thermal force model. If the considered chip
can freely move on the substrate, it will move far away from these chips to the force
8
equilibrium position and the considered chip can be expected at a lower chip temperature
since the temperature rise caused by other chips is reduced by enlarging the distances between
them. While the heat flux decreases with the square of the distance from the heat source in an
infinite body, it is reasonable to formulate the thermal force exerts on ci by c jr,c  as
f ij( r,c )
0,
if ci is fixed or c (jr,c )  ci

qj

,
otherwise
(
r,c
)
2
( r,c ) 2
 (x
)

(

y
)
ij
ij

(3)
)
)
where xij( r,c)  x (jr,c)  xi , yij( r,c)  y (jr,c)  yi , and ( x (r,c
, y (r,c
) denotes the coordinates of
j
j
)
. The location of cj in the real substrate is simply denoted as (xj, yj).
c (r,c
j
Y
Mirror image
heat sources
2-2-substrate
Row
2
qi
qj
qj q
i
qi
qj
qj q
i
qi
1
qi q
j
qj qi
qi q
j
qj qi
qi q
j
0
qi
qj q
i
qi
qj q
i
qi
-1
qi q
j
qj qi
qi q
j
qj qi
qi q
j
-2
qi
qj q
i
qi
qj q
i
qi
Column
-2
qj
qj
-1
0
qj
qj
1
qj
qj
X
qj
2
Real substrate
Fig. 4. Replace insulated boundaries by mirror image sources
For most actual MCMs, the chip pitches in x- and y- directions may be unequal. This
implies that the same magnitude of thermal-forces exerted on a chip in different directions
might have different pushing effects. Thus, the thermal-force model must be modified as
9
f ij( r,c )
 0,
if ci is fixed or c (jr,c )  ci

qj

otherwise
 (  x ( r,c ) ) 2  (y ( r,c ) ) 2 ,
ij
ij

(4)
where   p y /p x , px and py are the pitches in x- and y-direction, respectively.
Expanding (4) to cover all m chips, one obtains the net thermal force on ci to be




m
 ( 0 ,0 )   a  ( a,c)  (a,c)



a 1
Fi   
   f ij
 f ij   f ij( r,a)  f ij(r,a)  
 f ij
 
a 1 
r   a 1
c  a

j 1 
(5)

Theoretically, the maximum value of a is infinity. However, since f ij(r,c) is an inverse
 2
measure of rij(r,c) , setting the maximum value of a to five is adequate [14].
In the phase, a system of thermal-force equations based on the initial placement is

constructed as equation (5). By setting Fi = 0, and solving the system of equations by a
modified Newton-Raphson method [14] to get the TFE positions of ci, for 1  i  m , the
obtained placement is called a thermal-force-equilibrium (TFE) placement.
For properly defining a convergent solution, set
Norm 
1
2m 2
 F
m
i,x
 Fi,y

(6)
i 1
The stopping criterion is set to be Norm < 0.0001 according to the experimental results.
To justify the effect of introducing  into the thermal force model, figures 5(a) and (b)
show two TFE placements for a 30-chip-site module with   1 (i.e. without the consideration
of the unequal pitch effect) and   p y /p x (i.e. with the consideration of the unequal pitch
effect), respectively. In the example twenty-nine chips with different powers are placed on a
substrate of 6×5 chip sites. Obviously, both TFE placements have the desirable feature of
placing chips apart to abound with the substrate, because the chips are mutual excluded in the
10
thermal force model. This feature is important since it reduces the difficulty of the chip
assignment procedure. In addition, the TFE placement in figure 5(b) is more close to a 6×5
matrix than the one in figure 5(a). So, introducing  into the thermal-force model is helpful
for generating an array style placement. More data listed in Tables 1 about the example will
be further described in Section 5.
(b)   p y /p x
(a)   1
Fig. 5.
TFE placements with power distributions (W) of a 30-chip-cite module.
3.3 Chip assignment
Since a TFE placement is not usually an exact array style, a sequence of chip assignment
procedures, thus, is needed to transform the TFE placement to an array style placement with
minimum distortion to the thermal field. In the paper, two different assignment techniques,
Linear Assignment (LA) and Thermal Assignment (TA) are proposed and compared each
other.
3.3.1 Linear assignment
Let aij be a Boolean variable describing the assignment of chip ci to chip site sj
 1, if assign ci to s j
aij  
 0, if not
11
(7)
Each chip must be placed, and at any chip site only one chip can be assigned. Therefore
m
a
1
for j = 1, 2, …, n
(8)
1
for i = 1, 2, …, m
(9)
ij
i 1
n
a
j 1
ij
The objective in the linear assignment is to
m
minimize
n
 a d
i 1 j 1
ij
(10)
ij
where dij is the distance between ci and sj. This linear assignment problem can be solved by
the Hungarian method due to Kuhn [24]. As an example, figure 6 shows the final placement
which is transformed from figure 5(b) by the linear assignment.
A weakness of the linear assignment is that it does not correspond to the fact that moving
a hotter chip would produce larger thermal distortion than moving a cooler chip in the TFE
field. In addition, it needs a m  n memory space for storing dij. For huge problems, the space
needed for this might prevent us from using this method.
Fig. 6
101
(30)
102
(25)
103
(30)
97.7
(27)
101
(30)
105
(30)
91.2
(16)
91.6
(13)
89.4
(16)
103
(30)
101
(30)
90.2
(16)
88.0
(16)
84.8
(7)
87.1
(16)
103
(25)
90.1
(16)
107
(30)
104
(25)
100
(30)
91.4
(16)
93.6
(13)
93.9
(16)
107
(30)
100
(30)
98.1
(27)
104
(30)
103
(25)
102
(30)
Final placements with chip power (in parenthesis, W) and temperature (oC)
distributions of the 30-chip-cite module after linear assignment.
3.3.2 Thermal assignment
12
Thermal assignment is developed for taking the chip power’s effect into account. It
consists of three steps: fixed, rebalance, and assignment. In the fixed step, if only one chip
locates in a chip site in the TFE placement, the chip is assigned and fixed at the chip site; if
more than one chip locates at the same chip site, the one with the largest heat power is
assigned and fixed at the chip site. Next, for the unfixed chips, the system of the thermal-force
equations are reconstructed and resolved to obtain their new TFE positions. After that, each
unfixed chip is assigned into the nearest vacant chip site to finish a complete placement.
Figure 7 shows the final placement which is transformed from figure 5(b) by the thermal
assignment. Observing the case of the 7-W chip and the 16-W chips located at the same chip
site in figure 5(b), the 16-W chip will occupy the chip site after executing thermal assignment,
but it is not always the case after executing linear assignment. So, thermal assignment could
be expected to generate better final placement than linear assignment due to produce less
thermal distortion in the chip assignment procedure. Experimental evidences are provided in
Section 5. Another important merit is no extra memory space needed for the thermal
assignment.
Fig. 7
101
(30)
103
(25)
104
(30)
98.7
(27)
102
(30)
106
(30)
91.5
(16)
92.7
(13)
91.9
(16)
105
(30)
101
(30)
90.5
(16)
89.6
(16)
91.5
(16)
101
(25)
103
(25)
90.7
(16)
106
(30)
82.2
(7)
100
(30)
91.4
(16)
93.5
(13)
92.6
(16)
102
(30)
100
(30)
98.0
(27)
104
(30)
103
(25)
100
(30)
Final placements with chip power (in parenthesis, W) and temperature (oC)
distributions of the 30-chip-cite module after thermal assignment.
13
4. Simulated annealing approach
Simulated annealing (SA) is a general purpose combinatorial optimization technique that
is analogous to the process of metallurgical annealing in which a system is heated and then
cooled gradually until the material achieves certain desired metallurgical properties [25]. It
has been shown to produce good quality placements for routability [26], [27]. So, a simulated
annealing approach also proposed here for comparing and justifying the TFP algorithm.
4.1 Concept of SA
In general, SA starts with randomly generating an initial array style placement P and by
initializing the so-called temperature parameter T. Then, at each iteration a candidate
placement P’ is found by randomly selecting two chips of unequal heat dissipations in current
placement and then interchanging their positions. Whether P’ is accepted as new placement
depending on S (P), S (P’) and T. P’ replaces P if S (P’) < S (P) or, in case S (P’)  S (P),
with a probability which is a function of T and  = S (P’) - S (P). The probability is
generally computed following the Botzmann distribution e   / T .
At the beginning, T is set to a very high value such that most of the candidate placements
are accepted. Then T is gradually decreased, so the candidates with higher failure rates than
the current placement have less chance of being accepted. Finally, T is reduced to a very low
value so that only the candidates with lower system failure rate than the current placement are
accepted, and the algorithm converges to a placement of a low system failure rate. The
procedure of SA is shown in figure 8.
4.2 Initial temperature
The initial temperature must be chosen so that almost all candidate solutions are
accepted initially. That is, the initial accepted rate  0 must be close to unity. Here we use the
method developed by [28] to determine the initial temperature T0. In his method, T0 is
14
determined using the average changes of cost function after randomly adjusting trial solution
several times, the formula is as follow:
 0  e  
av / T
(11)
where av is the average changes of .
From Eq. (11), we get
T0 
av
l n ( 01 )
(12)
procedure SA( )
T ← T0
// initial temperature
P← random initial placement
S (P) ← TAMS(P) // calculate the temperature distributions and S of P
// by TAMS package
while (T > Tf)
// Tf is the frozen temperature
for i← 1 to M
// M is the length of Markovian chain
P’ ← PERTURB(P)
// randomly interchanging two unequal chips
S (P’) ←TAMS(P’)
 ←S (P’) - S (P)
if  < 0 or RANDOM(0,1) > e   / T then
P ← P’ ;
S (P) ←S (P’)
endif
end
T ← SCHEDULE(T)
end
OUTPUT(P)
Fig. 8 Procedure of SA
4.3 Cooling schedule
The choice of an appropriate cooling schedule is crucial for the performance of the
simulated annealing algorithm. The cooling schedule defines the value of T at each iteration k,
15
Tk+1 = f(Tk, k). Theoretical results on non-homogeneous Markov chains [29] state that under
particular conditions on the cooling schedule, the simulated annealing converges in
probability to global optima for k ∞. The logarithmic law fulfils the hypothesis. However, it
is too slow for practical applications. Instead, the geometric law: Tk+1 = α× Tk, is frequently
used, where α is the cooling rate parameter which is determined experimentally. Kirkpatrick
et al. [25] propose this rule first with α = 0.9. For saving runtime, the cooling schedule is
usually divided into two or three stages. TimberWolf [27], the most widely used and
successful placement package based on simulated annealing, suggests α = 0.8, 0.95, and 0.8 in
the high, medium, and low temperature ranges, respectively. We tried several different cooling
strategies for the tested problems, and the best one is a two-stage schedule. That is, α is taken
0.85 initially until the probability is smaller than 0.6, and then α is taken 0.95.
4.4 Length of Markov chain
The length of Markovian chain, M, is the number of trials at each temperature. In general,
the higher the number of M, the better the results obtained. However, the runtime increases
rapidly. There is a recommended number of M as a function of the problem size m in [30]. We
tried several different functions for M, and the experimental results show that
M  2 m
(13)
is the best one. Setting M a value higher than Eq. (13) can not further improve the final
solutions in our tested cases.
5. Examples and computational results
The present algorithms have been implemented in C language, and run on a 2.8GHz
Pentium IV personal computer. Three industry MCMs designed by IBM are used to test the
proposed algorithms.
5.1 The benchmark MCMs
16
Table 1 summarizes some information about the benchmark MCMs. The 30-chip-site
module and the 31-chip-site module are derived from IBM’s GEMI modules [31], [32]. The
sizes of real substrate and active substrate of the 30-chip-site module are square measuring
127.5 mm and 111.85 mm on the side, respectively. The active substrate is divided into a 6×5
matrix of identical chip sites. Each chip site is a rectangle of 18.45 mm × 22.05 mm in size.
Twenty-nine chips, with the chip sizes ranging from 12.9 mm to 17.4 mm on the side and
power ranging from 7 W to 30 W, are placed on the substrate. The 31-chip-site module is
very like the 30-chip-site module except for the first row that has six chip sites. Each chip site
in the first row is square with an 18.45 mm edge.
A large example, 121-chip-site module, is derived from IBM’s TCM with 110 chips [33].
The sizes of the substrate and the active substrate are square measuring 127.5 mm and 118.8
mm, respectively. The active substrate is divided into an 11×11 matrix of identical chip sites.
Each chip and chip site are square with a 6.5 mm and a 10.8 mm edge, respectively. Power
range of chips is from 8.9 W to 20 W.
Table 1. MCM information
Modules
No. of Chip
Px
Py
chips sites (mm) (mm)
Power dissipation value
(power × chip number)
30-chip-site
29
6×5
18.45 22.05 30W×12、27W×2、25W×4、16W×8、13W×2、
7W×1
31-chip-site
31
1×6
18.45 18.45 30W×14、27W×2、25W×4、16W×8、13W×2、
7W×1
18.45 22.05
5×5
121-chip-site 110
11×11 10.8
10.8 20W×17、19.5W×4、17.3W、16.9W×8、15W×2、
14.7W×1、14.3W×1、13.9W×1、13.8W×1、
13.6W×1、10.9W×1、10W×1、8.9W×71
For giving a fair comparison, all examples are treated as having the same package
structure and cooling condition as depicted in figure 2. Because the average heat flux is very
high in all examples, cooling conditions are selected for force convection at a velocity of 2.5
17
m/s for the top side, and jet impingement at a velocity of 0.5 m/s for the bottom side.
Correspondingly, htop and hbot are 43.8 W/m2 K and 832 W/m2 K, respectively.
5.2 Comparisons between TA and LA
Thermal performances, S, and runtimes of the tested examples are summarized in Table
2, where Tav, TSD , Tmax, Tmin, and  T are the mean, standard variation, maximum value,
minimum value, and range of chip temperatures, respectively. As expected, S in TA are lower
from 0.7% to 3.4% than S in LA. Thus, TA is superior to LA.
Table 2. Comparisons between TA and LA.
MCM
Algorithm
Tav(oC)
TSD
Tmax
Tmin
T
S (Fit)
Runtime (s)
TA
97.8
6.1
106
82
24
6.90  10 4
15.5
LA
97.7
6.4
107
85
22
6.95  10 4
15.3
TA
102.8
6.6
112
88
24
11.4  10 4
21.4
LA
102.6
8.2
115
86
29
11.8  10 4
21.2
TA
161.0
10.3
184
149
35
2.75  10 7
1407
LA
160.9
11.1
187
148
39
2.81  10 7
1413
30-chip-site
31-chip-site
121-chip-site
5.3 Comparisons between TFP and IBM
The results obtained by IBM, TFP algorithm, and SA approach are compared in Table 3.
The chip placements in IBM are obtained from [32], [33], but the temperature distributions
are analyzed under the present package structure and cooling conditions. The results show
that placements obtained by TFP algorithm have significant reliability improvement over the
original placements by IBM. The ratios of S in TFP to S in IBM are 88.5%, 94.2%, and
22.0%, respectively to the 30-chip-site module, the 31-chip-site module, and the 121-chip-site
module.
18
Table 3. Comparisons among IBM, TFP, and SA
MCM
30-chip-site
31-chip-site
121-chip-site
Algorithm
Tav(oC)
TSD
Tmax
Tmin
T
S (Fit)
Runtime (s)
IBM
98.4
8.1
108
80
28
7.8  10 4
N.A.
TFP
97.8
6.1
106
82
24
6.9  10 4
15.5
SA
97.4
6.2
104
85
19
6.7  10 4
4390
IBM
103
7.9
115
88
27
12.1104
N.A.
TFP
102.8
6.6
112
88
24
11.4  10 4
21.4
SA
102.7
6.2
110
91
19
11.0  10 4
6450
IBM
158.9
31.6
228
124
104
12.5 107
N.A.
TFP
161.0
10.3
184
149
35
2.75  10 7
1407
SA
160.6
9.3
181
151
30
2.6  10 7
195963
Algorithm
Tmax
Tmin
Tdif
S (Fit)
Runtime (s)
IBM
108
80
28
7.8  10 4
-.
29 chips
TFP
106
82
24
6.9  10 4
15.5
(1999)
SA
104
85
19
6.7  10 4
4390
IBM
115
88
27
12.1104
-.
31 chips
TFP
112
88
24
11.4  10 4
21.4
(1999)
SA
110
91
19
11.0  10 4
6450
IBM
228
124
104
12.5 107
-
121 chips
TFP
184
149
35
2.75  10 7
1407
(1992)
SA
181
151
30
2.6  10 7
195963
MCM
19
For further comparisons, the chip placements with powers and temperature distributions
of the 121-chip-site module obtained by IBM and TFP algorithm are shown in figures 9(a)
and (b), respectively. In figure 9(a), all the high power chips are placed in the central region of
the substrate, and the low power chips are placed at the border of the substrate. So, the
temperature profile at central region is much hotter than at border of the substrate. The value
of  T is up to 104 oC. By contrast, in figure 9(b), most high power chips are placed around
the border of the substrate, and most low power chips are placed in the central region of the
substrate. So, the temperatures distribution on the substrate shown in figure 9(b) is more
uniform than those in figure 9(a).  T in figure 9(b) is only 35 oC.
5.4 Comparisons between TFP and SA
In Table 3, one can see that S obtained by TFP algorithm are only 3.0%, 3.6%, and
5.8% higher than those obtained by SA approach corresponding to 30-chip-site, 31-chip-site,
and 121-chip-site modules. However, the runtimes in TFP are only a fraction of the runtimes
in SA. Note that SA needs to calculate the temperature distributions and S for every
candidate placement so as to justify whether the candidate placement can be accepted or not.
However, calculating the temperature distributions on the substrate is very time consuming
since it has to solve a three dimensional partial differential equation. Virtually, all
iterative-based approaches suffer the same difficulty. These methods, therefore, are generally
unsuitable for even middle-sized thermal placement problems. By contrast, the TFP algorithm
calculates temperature distributions and S only when the final placement has been
determined. So, it can be applied for large-sized thermal placement problems more
effectively.
20
128 134 137 138 135
(8.9) (8.9) (8.9) (8.9) (8.9)
135 138 137 133 127
(8.9) (8.9) (8.9) (8.9) (8.9)
174 154
1 5 2 1 7 8 176
( 2 0 ) ( 1 0 ) ( 8 . 9 () 1 9 . 5()2 0 )
134 142 148 150 148
(8.9) (8.9) (8.9) (8.9) (8.9)
148 150 148 141 133
(8.9) (8.9) (8.9) (8.9) (8.9)
173 153 154 165
( 1 9 . 5( )8 . 9 )( 8 . 9 )( 1 3 . 9 )
138 149 173 166 168 170
(8.9) (8.9) (16.9) (8.9) (8.9) (10)
168 165 173 148 137
(8.9) (8.9) (16.9) (8.9) (8.9)
141 152
(8.9) (8.9)
213
(20)
204
(20)
141 158 192 218
(8.9) (8.9) (16.9) (20)
213
(20)
215
(20)
204
(20)
151 140
(8.9) (8.9)
175
(20)
135 149
(8.9) (8.9)
216
(20)
205
(20)
217
(20)
208
(20)
124 131 136 139 139
(8.9) (8.9) (8.9) (8.9) (8.9)
151 152 151 152 152 153 155 166
( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 1 4 . 3 )
1 7 9 1 5 8 1 5 4 1 5 2 1 5 1 1 5 2 1 5 3 1 5 7 1 7 9 180
( 1 6 . 9( )8 . 9 )( 8 . 9 )( 8 . 9 () 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 () 1 6 . 9()2 0 )
210 187 156 139
(20) (16.9) (8.9) (8.9)
184
176 158 154 152 152 152 153 156 174
( 2 0 ) ( 1 4 . 7( )8 . 9 () 8 . 9 )( 8 . 9 () 8 . 9 )( 8 . 9 () 8 . 9 )( 8 . 9 )( 1 6 . 9 )
184
1 6 1 1 5 6 1 5 4 1 5 4 1 5 3 1 5 3 1 5 3 1 5 4 1 5 5 169
( 2 0 ) ( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 () 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 2 0 )
148 138
(8.9) (8.9)
144 172 167 174 194 170 159 168 145 136
(8.9) (16.9) (8.9) (8.9) (17.3) (8.9) (8.9) (16.9) (8.9) (8.9)
128 139 147 152 156 169 154
(8.9) (8.9) (8.9) (8.9) (8.9) (14.7) (8.9)
153 157 159
( 8 . 9 () 8 . 9 )( 1 0 . 9 )
177 156 153 151 151 152 153 156 161 179
181
( 2 0 )( 1 6 . 9( )8 . 9 () 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 () 8 . 9 () 2 0 )
171 185 220 228 216 227 218 190 171
(15) (10.9) (19.5) (19.5) (13.6) (19.5) (19.5) (13.8) (15)
227 214 222
(20) (13.9) (20)
175
167
( 1 5 )( 1 6 . 9 )
149 151 153 151 154 155 152 153 156 174
( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 () 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 1 9 . 5 )
226 215 226 217 192 157 140
(20) (14.3) (20) (20) (16.9) (8.9) (8.9)
139 157 191 218
(8.9) (8.9) (16.9) (20)
1 5 5 1 7 2 176
180 179
( 2 0 ) ( 2 0 ) ( 8 . 9 )( 1 6 . 9()2 0 )
1 5 6 1 5 6 1 5 5 1 5 6 1 5 7 1 5 7 1 5 5 155
153 150
( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 () 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )( 8 . 9 )
150 146 140 131
(8.9) (8.9) (8.9) (8.9)
1 6 4 1 5 6 1 5 3 1 5 9 1 7 8 171
159 167 156 153 166
( 1 3 . 6( )8 . 9 )( 8 . 9 )( 8 . 9 () 1 6 . 9()1 5 ) ( 8 . 9()1 3 . 8( )8 . 9 )( 8 . 9 () 1 7 . 3 )
168
175
( 2 0 )( 1 6 . 9 )
139 138 136 131 125
(8.9) (8.9) (8.9) (8.9) (8.9)
(a) Placement by IBM
178 180
(20)(20)
1 7 5 152 171
176 156
( 2 0 ) ( 8 . 9 () 1 9 . 5()8 . 9 () 2 0 )
(b) Placement by TFP algorithm
Fig. 9 Placements with chip power (in parenthesis, W) and temperature (oC) distributions.
5.5 Relationship between thermal performances and system reliability
It seems reasonable that a placement with higher average temperature also has higher
system failure rate. However, the present study shows that the conclusion is not true. It is
interesting to see that for the 121-chip-site module, the S in TFP is only 22 % of the S in
IBM, but the Tav in TFP is 2.1 oC higher than the Tav in IBM. So, Tav is not a good measure for
S. Note that as the decrease of one chip’s temperature always causes the increase of another
chip’s temperature during the position interchange in a placement, so the value of Tav is not
significantly different with different placements. Instead, reliability-better placements always
have lower value of  T due to these placements of lower Tmax and higher Tmin. This
conclusion is very important since it means that a reliability-optimization placement problem
can be simplified as a uniform-temperature-distribution placement problem.
6. Conclusion
This paper deals with placing chips in array style on an MCM substrate to minimize the
system failure rate. A TFP algorithm and a simulated annealing approach are presented for
this problem. Three industrial MCMs designed by IBM are examined by the proposed
21
methods. The TFP algorithm generates excellent solutions both effectively and efficiently
when comparing to the simulated annealing approach and IBM designs. Since, thermal
placement problem is a NP-hard combinatorial optimization problem, it is very important to
efficiently obtain ‘good’ solutions especially for huge problems. The TFP algorithm may be
the best method when considering both effectiveness and efficiency. In addition, by
combining the proposed force model with other force models developed for the objects of
routability and performance, a multiobjective optimal placement problem can be solved by
the same technique presented in this paper.
Acknowledgments
This work was supported by the National Science Council, Republic of China under
contract no. NSC91-2215-E-218-005. I am pleased to thank Professor Jung-Hua Chou for his
valuable comments and suggestions concerning this paper.
References
[1] Katopis GA. The evolution of ceramic packages for S/390 servers. Proc. of the Pacific
Rim/ASME Intern Electron Packag; 2001. p. 13-20.
[2] Kam T, Rawat S, Kirkpatrick D, Roy R, Spirakis GS, Sherwani N. EDA challenges facing
future microprocessor design. IEEE Trans on Computer-Aided Des 2000; 19(12):
1498-506.
[3] Garimella SV, Joshi YK, Bar-Cohen A, Mahajan R, Toh KC, et al. Thermal challenges in
next generation electronic system-summary of panel presentations and discussions. IEEE
Trans on Comp Packag Technol 2002; 25(4): 569-75.
[4] Moresco LL. Electronic system packaging: The search for manufacturing the optimum in
a sea of constraints. IEEE Trans Comp Hybrids Manufact Technol 1990; 13(3): 494-508.
[5] Sandborn PA, Moreno H. Conceptual Design of Multichip Modules and Systems. MA:
22
Kluwer; 1994.
[6] Sherwani NA, Yu Q, Badida S. Introduction to Multichip Modules. New York: Wiley;
1995.
[7] Sherwani NA. Algorithms for VLSI Physical Design Automation. 3rd, MA: Kluwer;
1999.
[8] Chao KY, Wong DF. Thermal placement for high-performance multi-chip modules.
Intern. Conf. on Computer Design; 1995, p. 218-23.
[9] Lampaert K, Gielen G, Sansen W. Thermally constrained placement of small-power IC’s
and multi-chip modules. Thirteenth IEEE SEMI-THERM Symp; 1997, p. 106-11.
[10] Tsai CH, Kang SM. Cell-level placement for improving substrate thermal distribution.
IEEE Trans on Computer-Aided Des 2000; 19(2): 253-66.
[11] Tang MC, Carothers JD. Consideration of thermal constraints during multichip module
placement. Electronic Letters 1997; 33(12): 1043-5.
[12] Beebe C, Carothers JD, Ortega A. Object-oriented thermal placement using an accurate
heat model. Proc of the 32nd Hawaii Intern Conf on System Sciences; 1999, p. 1-10.
[13] Huang YJ, Guo MH, Fu SL. Reliability and routability consideration for MCM
placement. Microelectronics Reliab 2002; 42: 83-91.
[14] Lee J. Thermal placement algorithm based on heat conduction analogy. IEEE Trans on
Comp Packag Technol 2003; 26(2): 473-82.
[15] Quinn N, Breuer M. A forced directed component placement procedure for printed circuit
boards. IEEE Trans Circuits Syst 1979; 26(6): 377-88.
[16] Osterman MD, Pecht M. Placement for reliability and routability of convectively cooled
PWB's. IEEE Trans on Computer-Aided Des 1990; 9(7): 734-44.
[17] Eisenmann H, Johannes FM. Generic global placement and floorplanning. Proc of the
ACM/IEEE Design Automation Conf; 1998. p. 269-74.
23
[18] Mo F, Tabbara A, Brayton RK. A force-directed marco-cell placer. Proc of Intern Conf on
Computer-Aided Des; 2000. p. 177-80.
[19] Lall P, Pecht M, Hakim EB. Influence of temperature on microelectronics and system
reliability. FL: CRC Press; 1997.
[20] Steinberg DS. Thermal stress failures of surface mounted components. In: Bar-Cohen A,
Kraus AD, editors. Advances in thermal modeling of electronic components and system.
vol. 3, New York: ASME/IEEE Press; 1993, p. 257-302.
[21] Ellison GN. Thermal computations for electronic equipment, New York: Van Nostrand
Reinhold; 1983.
[22] Dean DJ. Thermal design of electronic circuit boards and packages. Scotland:
Electrochemical Publications Ltd; 1985.
[23] Palisoc AL, Lee CC. Exact thermal representation of multilayer rectangular structures by
infinite plate structures using the method of images. J Appl Phys 1988; 12(64): 6851-7.
[24] Burkard RE, Cela E. Linear assignment problems and extensions. In: Du DZ and
Pardalos PM, editors. Handbook of combinatorial optimization, supplement volume A.
MA: Kluwer; 1999, p. 75-149.
[25] Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science
1983; 220(4598): 671-80.
[26] Sait SM, Youssef H. Iterative computer algorithms with applications in engineering.
California: IEEE Press; 1999.
[27] Sechen C, Sangiovanni-Vincentelli A. The timberwolf placement and routing package.
IEEE J. Solid-State Circuits 1985; 20(2): 510-22.
[28] Johnson DS, Aragon CR, Mcgeoch LA, Schevon C. Optimization by simulated annealing:
an experimental evaluation. PartⅠ. AT&T Bell Lab, Murray Hill; 1987.
[29] Aarts EHL, Lenstra JK. Local search in combinatorial optimization, UK: Wiley; 1997.
24
[30] Shahookar K, Mazumder P. VLSI cell placement techniques. ACM Computing Surveys
1991; 23(2): 143-220.
[31] Katopis GA, Becker WD, Mazzawy TR, Stoller H. Packaging 1000 MIPS for IBM’s
S/390 G5 server. Electronic Components and Technology Conf 1999; p. 680-5.
[32] Katopis GA, Becker WD, Mazzawy TR, Smith HH, Vakirtzis CK, et al. MCM
technology and design for the S/390 G5 system. IBM J of Research and Development
1999; 43(5/6): p. 21-49.
[33] Goth GF, Zumbrunnen ML, Moran KP. Dual-Tapered-Piston (DTP) module cooling for
IBM enterprise system/9000 systems. IBM J of Research and Development 1992; 36(4):
p. 805-16.
25
Download