Shi_umd_0117E_14446 - DRUM

advertisement
ABSTRACT
Title of dissertation:
ELECTRO-THERMAL CODESIGN IN LIQUID
COOLED 3D ICS: PUSHING THE POWERPERFORMANCE LIMITS
Bing Shi, Doctor of Philosophy, 2013
Dissertation directed by: Professor Ankur Srivastava
Department of Electrical and Computer Engineering
The performance improvement of today’s computer systems is usually accompanied by increased chip power consumption and system temperature. Modern
CPUs dissipate an average of 70 − 100W power while spatial and temporal power
variations result in hotspots with even higher power density (up to 300W/cm2 ).
The coming years will continue to witness a significant increase in CPU power dissipation due to advanced multi-core architectures and 3D integration technologies.
Nowadays the problems of increased chip power density, leakage power and system temperatures have become major obstacles for further improvement in chip
performance. The conventional air cooling based heat sink has been proved to be
insufficient for three dimensional integrated circuits (3D-ICs). Hence better cooling
solutions are necessary. Micro-fluidic cooling, which integrates micro-channel heat
sinks into silicon substrates of the chip and uses liquid flow to remove heat inside
the chip, is an effective active cooling scheme for 3D-ICs. While the micro-fluidic
cooling provides excellent cooling to 3D-ICs, the associated overhead (cooling power
consumed by the pump to inject the coolant through micro-channels) is significant.
Moreover, the 3D-IC structure also imposes constraints on micro-channel locations
(basically resource conflict with through-silicon-vias TSVs or other structures).
In this work, we investigate optimized micro-channel configurations that address the aforementioned considerations. We develop three micro-channel structures
(hotspot optimized cooling configuration, bended micro-channel and hybrid cooling
network) that can provide sufficient cooling to 3D-IC with minimum cooling power
overhead, while at the same time, compatible with the existing electrical structure
such as TSVs. These configurations can achieve up to 70% cooling power savings
compared with the configuration without any optimization. Based on these configurations, we then develop a micro-fluidic cooling based dynamic thermal management
approach that maintains the chip temperature through controlling the fluid flow rate
(pressure drop) through micro-channels. These cooling configurations are designed
after the electrical parts, and therefore, compatible with the current standard IC
design flow.
Furthermore, the electrical, thermal, cooling and mechanical aspects of 3D-IC
are interdependent. Hence the conventional design flow that designs the cooling configuration after electrical aspect is finished will result in inefficiencies. In order to
overcome this problem, we then investigate electrical-thermal co-design methodology for 3D-ICs. Two co-design problems are explored: TSV assignment and
micro-channel placement co-design, and gate sizing and fluidic cooling co-design.
The experimental results show that the co-design enables a fundamental powerperformance improvement over the conventional design flow which separates the
electrical and cooling design. For example, the gate sizing and fluidic cooling codesign achieves 12% power savings under the same circuit timing constraint and
16% circuit speedup under the same power budget.
ELECTRO-THERMAL CODESIGN IN LIQUID COOLED 3D ICS:
PUSHING THE POWER-PERFORMANCE LIMITS
by
Bing Shi
Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2013
Advisory Committee:
Professor Ankur Srivastava, Chair/Advisor
Professor Joseph JaJa
Professor Shuvra Bhattacharyya
Professor Donald Yeung
Professor Doron Levy
© Copyright by
Bing Shi
2013
ACKNOWLEDGEMENT
I would like to thank my advisor, Professor Ankur Srivastava for the support
and guidance he has provided throughout my time in the Ph.D. program. Thank
you for introducing me to the world of Electronic Design Automation, for giving me
so many opportunities, for helping me every step of the way, for encouraging me in
those hard times.
In addition, I would like to thank Professor Joseph JaJa who helped me a lot in
my Ph.D. oral qualify exam, research proposal and also Ph.D. dissertation. I would
like to thank Professor Shuvra Bhattacharyya for his support on my competition
for ECE dissertation fellowship.
I would also like to thank my committee members, Professor Joseph JaJa, Professor Shuvra Bhattacharyya, Professor Donald Yeung and Professor Doron Levy,
for their time, comments and feedback.
I also thank all past and present members of our lab: Domenic Forte, Yufu
Zhang, Caleb Serafy, Tiantao Lu and Chongxi Bao, for their help, friendship, and
support. I am grateful for all the fun times we have shared throughout the years.
Finally, I would like to thank my parents and my family for their ongoing
support and encouragement. Thank you for all of their love and support over the
course of my long journey as an academic.
ii
Table of Contents
List of Figures
vii
List of Tables
ix
1 Introduction
1.1 Thermal Issues in 3D-ICs . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Conventional Dynamic Thermal Management . . . . . . . . . . . . .
1.3 Interlayer Micro-fluidic Cooling . . . . . . . . . . . . . . . . . . . . .
1.4 Interdependency between Electrical, Thermal, Reliability and Cooling
1.5 Advantage of Electrical and Cooling System Co-Design . . . . . . . .
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
4
6
7
8
9
2 Background
2.1 Basics of Three Dimensional Integrated Circuit . . . . . . .
2.2 Fundamental Characteristics of Fluids in Micro-channels . .
2.2.1 Conservation Law of Fluid Dynamics . . . . . . . . .
2.2.2 Dimensionless Numbers in Fluid Mechanics . . . . .
2.2.3 Single and Two Phase Flow . . . . . . . . . . . . . .
2.2.4 Laminar and Turbulent Flow . . . . . . . . . . . . .
2.3 Thermal Modeling of 3D-IC with Micro-fluidic Cooling . . .
2.3.1 Distributed RC Thermal Model . . . . . . . . . . . .
2.3.2 Cooling Performance of Micro-channels . . . . . . . .
2.3.3 Overall Thermal Model of 3D-IC with Micro-channels
2.3.4 Thermal Impact of TSVs . . . . . . . . . . . . . . . .
2.4 Modeling of Power Consumption . . . . . . . . . . . . . . .
2.4.1 Dynamic Power Consumption . . . . . . . . . . . . .
2.4.2 Leakage Power Consumption . . . . . . . . . . . . . .
2.4.3 Micro-channel Cooling Power . . . . . . . . . . . . .
2.4.3.1 Straight Micro-channels . . . . . . . . . . .
2.4.3.2 Micro-channels with Bends . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
12
13
15
17
19
19
20
22
23
24
25
26
27
27
29
3 Design of Micro-fluidic Cooling Configurations for 3D-ICs
3.1 Motivation of Micro-Fluidic Cooling . . . . . . . . . . . . . .
3.2 Micro-channel Design Considerations/Constraints . . . . . .
3.2.1 Cooling Power Consumption . . . . . . . . . . . . . .
3.2.2 Non-uniform Power Profile . . . . . . . . . . . . . . .
3.2.3 TSV Constraint . . . . . . . . . . . . . . . . . . . . .
3.2.4 Thermal stress . . . . . . . . . . . . . . . . . . . . .
3.3 Hotspot Optimized Non-Uniform Micro-channel . . . . . . .
3.3.1 Problem Formulation . . . . . . . . . . . . . . . . . .
3.3.2 Heuristic for Micro-channel Placement . . . . . . . .
3.3.3 Workload-balanced Initial Micro-channel Distribution
3.3.4 Micro-channel Cost Assignment . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
34
35
35
36
37
39
40
41
43
49
iv
3.4
3.5
3.6
3.7
3.8
3.9
TSV Constrained Bended Micro-channel . . . . . . . . . . . . . . .
3.4.1 Motivation of Using Bended Micro-channel . . . . . . . . . .
3.4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . .
3.4.3 Overall Micro-channel Design Flow . . . . . . . . . . . . . .
3.4.4 Mincost Flow Based Micro-channel Design . . . . . . . . . .
3.4.4.1 Initialization of Minimum Cost Flow Network . . .
3.4.4.2 Cost Assignment . . . . . . . . . . . . . . . . . . .
3.4.5 Micro-channel Refinement . . . . . . . . . . . . . . . . . . .
3.4.5.1 Temperature and Pumping Power Analysis . . . . .
3.4.5.2 Iterative Micro-channel Optimization . . . . . . . .
Hybrid Cooling Network . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Motivation of Hybrid Cooling Network . . . . . . . . . . . .
3.5.2 Algorithm for Hybrid Cooling Network Design . . . . . . . .
3.5.3 Micro-channel Priority Assignment/Update . . . . . . . . .
3.5.4 Thermal TSV Allocation and Sizing . . . . . . . . . . . . . .
3.5.4.1 Basic Thermal TSV Placement Approach . . . . .
3.5.4.2 Modified Thermal TSV Allocation and Sizing Approach . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.4.3 Finding Maximum Independent Set E . . . . . . .
Considering Thermal Variations . . . . . . . . . . . . . . . . . . . .
Cooling Performance of Micro-channel Designs . . . . . . . . . . . .
Runtime Thermal Management Using Micro-channels . . . . . . . .
3.8.1 Algorithm for Micro-fluidic Based DTM . . . . . . . . . . .
3.8.2 Performance of Micro-channel Based DTM . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Co-design of Electrical and Fluidic Cooling Systems
4.1 Motivation for Co-Design . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Co-optimization of TSV Assignment and Micro-Channel Placement
4.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Algorithm for TSV Assignment and Micro-channel Placement
Co-optimization . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2.1 Overall Design Flow . . . . . . . . . . . . . . . . .
4.2.2.2 Multi-commodity Minimum Cost Flow Formulation
4.2.2.3 Iterative Optimization . . . . . . . . . . . . . . . .
4.2.3 Computational Simplifications . . . . . . . . . . . . . . . . .
4.2.3.1 Multi Layer Case . . . . . . . . . . . . . . . . . . .
4.2.3.2 Two Layer Case . . . . . . . . . . . . . . . . . . .
4.2.4 Performance of TSV Assignment and Micro-channel Placement Co-design . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.4.1 Comparison of Wirelength and Pumping Power . .
4.2.4.2 Tradeoff Between Wirelength and Pumping Power .
4.3 Co-optimization of Gate Sizing and Micro-Fluidic Cooling . . . . .
4.3.1 Motivation of Simultaneous Gate Sizing and Micro-channel
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
53
54
57
57
58
59
62
62
63
68
68
70
71
72
72
.
.
.
.
.
.
.
.
74
75
80
82
83
84
87
88
91
. 91
. 93
. 95
.
.
.
.
.
.
.
95
95
97
102
103
103
105
.
.
.
.
107
107
110
111
. 111
4.3.2
4.3.3
4.3.4
Modeling of Gate Delay . . . . . . . . . . . . . . . . . . . . . 113
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 115
Algorithm for Gate Sizing and Micro-channel Placement Cooptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.3.4.1 Step 1: Ideal Heat Sink and Gate Size Co-optimization120
4.3.4.2 Step 2: Micro-channel Distribution for Ideal Case . . 121
4.3.4.3 Step 3: Gate Size and Grid Temperature Refinement 123
4.3.4.4 Step 4: Micro-channel Distribution Refinement . . . 129
4.3.4.5 Step 5: Re-iteration and Stopping Criteria . . . . . . 130
4.3.5 Performance of Gate Sizing and Micro-channel Placement Codesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.3.5.1 Comparison of Power Consumption . . . . . . . . . . 131
4.3.5.2 Comparison of Circuit Delay . . . . . . . . . . . . . 132
4.3.6 Power-Performance Tradeoff . . . . . . . . . . . . . . . . . . . 132
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5 Conclusion and Discussion
136
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Bibliography
140
vi
List of Figures
1.1
Interdependency between Electrical, Thermal, Reliability and Cooling . . . . . .
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
Stacked 3D-IC with micro-channel cooling system . . . . . . . . .
Control volume of fluid . . . . . . . . . . . . . . . . . . . . . .
(a)-(f) Two phase flow patterns, (g) Evaporation process in a channel
Comparison of single and two phase flow . . . . . . . . . . . . .
(a) Laminar flow pattern, (b) Turbulent flow pattern, (c) Transitional
Fluid in micro-channel with bends . . . . . . . . . . . . . . . .
RC network for 3D-IC thermal modeling . . . . . . . . . . . . .
Micro-channel thermal model . . . . . . . . . . . . . . . . . . .
Thermal resistive network of one 3D-IC layer with micro-channels . .
A 3D-IC grid with thermal TSV . . . . . . . . . . . . . . . . .
Exponential leakage model versus quadratic leakage model . . . . .
3.1
3.2
3.3
Micro-channel and TSV configuration . . . . . . . . . . . . . . . . . . . .
Pumping power versus chip power consumption . . . . . . . . . . . . . . .
Thermal stress inside and surrounding TSV (a) when chip temperature is 100℃,
(b) when chip temperature is 50℃(assuming stress free temperature is 250℃) .
Potential locations of micro-channels: (a) uniform spreading of micro-channels,
(b) workload-balanced micro-channel spreading . . . . . . . . . . . . . . .
Example of formulating mincost flow network, (a) 3D-IC structure, (b) abstract
grid graph, (c) minimum cost flow network . . . . . . . . . . . . . . . . .
(a) Cost initialization, (b) Cost update . . . . . . . . . . . . . . . . . . .
Example of silicon layer thermal profile with TSV and (a) straight, (b) bended
micro-channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example of micro-channel infrastructure design using minimum cost flow . . .
Micro-channel infrastructure design flow . . . . . . . . . . . . . . . . . . .
Cost assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Examples of (a) unbalanced cooling demand, (b) different number of bends . .
Example of pairwise cooling workload balance . . . . . . . . . . . . . . . .
Examples of bend elimination . . . . . . . . . . . . . . . . . . . . . . . .
Overall design flow of micro-channel and thermal TSV co-optimization . . . .
Change in interdependence region of a grid (a) after allocating or enlarging a
thermal TSV, (b) after shrinking a thermal TSV . . . . . . . . . . . . . . .
Flow chart of micro-channel placement . . . . . . . . . . . . . . . . . . .
Comparison of Pumping Power . . . . . . . . . . . . . . . . . . . . . . .
Runtime pressure drop control versus fixed pressure drop for (a) group L, (b)
group M, (c) group H . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
3.18
4.1
4.2
4.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
flow pattern
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
13
16
17
18
18
20
21
23
24
27
. 34
. 35
. 38
. 42
. 47
. 52
.
.
.
.
.
.
.
.
54
55
57
61
64
66
68
71
. 79
. 81
. 83
. 89
Conventional chip design flow . . . . . . . . . . . . . . . . . . . . . . . . .
Thermal profile of one 3D-IC layer, and an example of TSV and micro-channel
allocation where TSVs constraint us from allocating micro-channels at hotspots .
Overall design flow of MCMCF based algorithm . . . . . . . . . . . . . . . .
vii
8
92
94
98
4.4
4.5
4.6
4.7
4.8
4.9
4.10
3D-IC with potential TSV and micro-channel locations . . . . .
Multi-commodity min-cost flow formulation . . . . . . . . . .
Computationally simplifying transformation for multi-layer case
Computationally simplifying transformation for two-layer case .
Tradeoff between wirelength and pumping power . . . . . . . .
Overall design flow . . . . . . . . . . . . . . . . . . . . . .
Delay versus power tradeoff for benchmark 1 . . . . . . . . .
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
100
101
105
106
111
119
133
List of Tables
3.1
Comparison of pumping power . . . . . . . . . . . . . . . . . . . . . . . 84
4.1
4.2
4.3
Problem formulation . . . . . . . . . . . . . . . . . . . . . .
Benchmark Information . . . . . . . . . . . . . . . . . . . .
Comparison between our approach, TSV first and channel first
(Ppump : W , W L : m, temperature: o C) . . . . . . . . . . . .
Comparison of total power consumption (power: W, tcons : ns)
Comparison of circuit performance (power: W, tcons : ns) . . .
4.4
4.5
ix
. . . . . . 96
. . . . . . 108
approach
. . . . . . 108
. . . . . . 132
. . . . . . 133
Chapter 1
Introduction
Moore’s law has predicted a spectacular exponential growth in chip performance. However, in recent years, such performance improvements are slowing down,
leading the research community to investigate alternative technologies that can restore the expected Moore’s law rhythm in the functionality and cost of electronic
products.
The three dimensional integrated circuit (3D-IC), which contains two or more
layers of active electronic components that are stacked vertically, has become a significant technology for achieving continued performance improvements. The 3D-IC
allows a significant increase in device densities, as well as faster on-chip communications compared with equivalent 2D circuits due to the shortening of interconnection
length and increased bandwidth [30][88]. Besides the performance improvement,
3D-IC can also result in overall system energy savings and co-integration of heterogeneous components [22][49].
Despite these advantages, the 3D-IC also brings forth new challenges to chip
thermal management due to the stacked structure.
1
1.1 Thermal Issues in 3D-ICs
Modern CPUs dissipate an average of 70 − 100W power while spatial and
temporal power variations result in hotspots with even higher power density (up
to 300W/cm2 ). The coming years will continue to witness a significant increase in
CPU power dissipation due to advanced multi-core architectures and 3D integration technologies. Increase in CPU power density is usually accompanied by drastic
increase in chip temperature. Nowadays the problems of increased chip power density, leakage power and system temperature have become major obstacles for further
improvements in chip performance. The advent of 3D integration technology, exacerbates the thermal problems on chip since the power density increases dramatically
due to several stacks of microprocessor chips, and also due to constraints imposed
on heat flow paths (by several intermediate layers).
Recent data shows that more than 50% of all IC failures are related to thermal issues [58]. For instance, excessive temperature reduces the electron and hole
mobilities which leads to increase in circuit propagation delay [44][83]; thermal variations and hotspots on chip cause reliability problems such as circuit mismatch and
reduced chip lifetime (due to the cumulative damage caused by excessive temperature) [29][10][50]. Hence, loss of performance and reliability due to unpredictable
thermal hotspots has become a major issue and limiting factor for further performance improvement in modern computer systems.
Furthermore, with continued scaling, the impact of leakage power is growing
as well. Today, up to 50% (or even more) of the total power consumption is leakage
2
power [38]. It has been shown that leakage and temperature are highly interdependent: higher temperature increases the leakage power which in-turn further increases the temperature [47][80][27]. This interdependency increases the importance
and difficulty of chip thermal management. The interdependence between temperature and leakage has been known for years and several attempts have been made
during design time to better estimate/control the leakage and temperature through
various design decisions [66][81]. For example [66] estimates the chip thermal and
leakage profile while accounting for their interdependence, and [81] estimates the
chip leakage profile while accounting for thermal variability.
In convectional computer systems, the thermal issues within the chip are handled at the package level by attaching a large heatsink on the top of the chip which
dissipates heat into surrounding air, together with air cooling based cooling devices
such as fans and air conditioners. Such “remote cooling” approaches have limitations
in the following ways:
1. Fail to account for temporal variations: the processor operation exhibits great
variations during runtime due to the nature of different applications and data.
The demand for resources by different applications also varies. The processor
operation and demand for resources influence the power and thermal states
on-chip, hence the chip power and thermal profiles change during runtime.
Therefore the convectional air cooling that ignores the real time chip operation
and cooling demand is inefficient.
2. Fail to account for spatial variations: the chip power and thermal profiles also
3
exhibit variations spatially since different parts of the chip exhibit different
switching activities. Such variations result in thermal hotspots which are
important issues in electronic systems. The convectional heat sinks usually
provide uniform cooling, which is very inefficient when there are hotspots.
3. Insufficient cooling capability: convectional heat sinks are usually attached at
the top of the chip, which makes it ineffective in removing the heat inside the
chip. Especially for 3D-ICs, the air based cooling has already been proved
to be insufficient. As illustrated in [8], if two 100W/cm2 microprocessors are
stacked on top of each other, the power density becomes 200W/cm2 , which is
beyond the heat removal limits of air cooled heat sinks.
Many efforts have been made to further mitigate the thermal issues in CPU
chips. These efforts can be classified into three categories: CPU thermal management schemes [11][16][20][21][53][64][63], materials with better thermal property
[67][79] and advanced cooling schemes [43][84][46][9]. In this work, we focus on the
new cooling technology and dynamic thermal management for 3D-ICs.
1.2 Conventional Dynamic Thermal Management
Usually, the chip performance and temperature are closely related. In order
to improve the performance delivered by the microprocessor, we could increase the
transistor integration density of the chip, or increase the supply voltage and clock
frequency, which leads to increased chip power consumption and temperature. Dynamic thermal management (DTM), where the chip operation is controlled during
4
runtime for curtailing thermal emergencies, can better address the temporal and
spatial variations of the power and thermal profiles on-chip (in addition to the convectional package level cooling scheme). In conventional DTM schemes, thermal
management can be achieved by controlling processor knobs such as core frequency
and supply voltage [64][25][13][41], scheduling of tasks etc [93], which in effect, control the power dissipation in different parts of the chip. These schemes basically
manage the chip temperature through controlling the heat dissipation rate/pattern.
For example, in dynamic voltage and frequency scaling (DVFS), the supply voltage
and operating frequency of micro-processors are dynamically controlled to reduce
the chip power consumption, thereby reducing the temperature as well. However
decreasing the supply voltage or operating frequency causes a potential performance
reduction. Hence in the conventional DTM schemes, constraining the chip temperature is usually accompanied by reductions in performance.
With the continued application of conventional thermal management techniques, many of today’s electronic systems underperform their inherent physical
limits while operate at the highest power dissipation allowed by the available thermal management technology. CMOS, telecommunications, active sensing and imaging have undergone tremendous technological innovation over the last 40 years.
However, despite the need and the potential for enhanced thermal management,
electronic cooling technologies have changed very little in the past two or three
decades, continuing instead to implement a “remote cooling” paradigm with only
incremental improvements in performance.
5
1.3 Interlayer Micro-fluidic Cooling
Relying on the conventional air-cooled heat sink for the thermal management
of 3D-ICs could have catastrophic consequences. On one hand, due to the strong
thermal-performance interdependency, in order to limit on-chip temperatures, designers will resort to aggressive shutdown or slowdown resulting in significant underutilization of the available devices, hurting overall performance and leading 3D-ICs
to experience greater fractions of dark silicon than that experienced by 2D-ICs. On
the other hand, the heat removal challenge could limit the number of 3D layers or
physical design optimization space. Consequently, if the performance and energy
efficiency promised by 3D integration are to be realized, the thermal challenge needs
to be actively addressed.
Micro-fluidic cooling, which integrates micro-channel heat sinks into silicon
substrates of the chip and uses liquid flow to remove the heat from inside of the
chip, can overcome this limitation. It has been reported to support heat dissipation
higher than 700W/cm2 [84]. Despite the excellent cooling capability, an overhead
associated with micro-channel based heat removal technology is that the cooling
system needs to consume extra energy for pumping the coolant through the channels. This has motivated a body of work that attempts to improve the micro-channel
cooling effectiveness (thereby reducing the cooling energy consumption) by: a) controlling their dimensional parameters such as channel width, height and aspect ratio
[42][84], b) investigating more sophisticated micro-channel infrastructures such as
cross-linked micro-channels [32], micro-pin-fins [52][59], tree- or serpentine-shaped
6
micro-channels [68][23], and c) using hotspot optimized micro-channel structures
[12][76], etc. Recently, micro-channel cooling has also been adopted in dynamic
thermal management to control the runtime CPU performance and chip temperature by tuning the fluid flow rate through channels [19][18].
1.4 Interdependency between Electrical, Thermal, Reliability and
Cooling
The electrical, thermal, reliability and cooling aspects of 3D-ICs are all interdependent. As the plot in Figure 1.1 shows, higher performance usually leads to
greater chip power consumption and generates heat. Increase in chip temperature
has a lot of detrimental effects.
1. It will result in higher circuit delay or delay uncertainties, which in turn limits
the performance improvement.
2. Due to the interdependency between temperature and leakage power, increase
in chip temperature will further increase the power consumption.
3. High chip temperature also exacerbate the electro-migration which will cause
reliability loss.
On the other hand, the heat level inside the chip also decides the micro-fluidic
cooling system configuration, which in turn changes the temperature/power distribution (due to thermal power interdependence), thereby changing the circuit delay
and chip lifetime. Furthermore, the existence of micro-fluidic cooling also causes
7
greater thermal gradients. Such thermal gradients and reduced chip temperature
will cause greater thermal stress, which on one hand, might result in mechanical
reliability issues such as crack formation, and on the other hand will change the
transistor delay.
Figure 1.1: Interdependency between Electrical, Thermal, Reliability and Cooling
1.5 Advantage of Electrical and Cooling System Co-Design
In the conventional IC design flow, the electrical parts of the chip is designed
first. The cooling system is then designed based on the current electrical system in
place. However, due to the aforementioned interdependency, such design methodology (that separates electrical and cooling system design) is inefficient. Co-design of
electrical and fluidic cooling system is necessary. It has the following advantages:
1. Higher cooling in timing critical areas results in better performing designs
since transistor delay is proportional to temperature.
2. Higher cooling in timing critical areas enables us to aggressively pursue high
power dissipating performance enhancements such as increasing supply volt-
8
age. This results in higher performance without impacting temperature since
the extra heat can be managed by micro-fluidics.
3. The design optimization could be more aggressive since temperature issue can
be addressed by aggressive cooling (placement, floorplanning etc).
4. Increasing the cooling levels in high leakage areas helps reduce the overall
power since leakage is a highly non-linear function of temperature. Reduction in leakage may be significant enough to make increase in pumping power
irrelevant.
5. Micro-fluidics may impact silicon thickness causing TSV performance degradation. By smart electrical design, this degradation could potentially be removed. For example, degradation in TSV performance could be overcome by
stronger drivers.
1.6 Thesis Outline
In this work, we investigate optimization of micro-fluidic cooling system that
can provide sufficient cooling to the 3D-IC with minimum overhead, while at the
same time, addressing the design constraints imposed by the 3D-IC structure. Three
micro-fluidic cooling configurations are proposed: hotspot-optimized non-uniform
micro-channel, bended micro-channel and hybrid cooling network.
In order to fully explore the interdependency among electrical, thermal, reliability and cooling aspects of 3D-ICs, we also investigate electrical and micro-fluidic
9
co-design methodologies. With the co-design, fundamental power-performance improvements can be achieved.
This dissertation is organized in five chapters. Following this introduction is
the background about 3D-IC and micro-fluidic cooling. In that chapter, we briefly
introduce the fundamentals of micro-fluidic cooling, as well as thermal and power
modeling of 3D-IC with micro-fluidic cooling. Chapter 3 discusses the design considerations of micro-fluidic cooling in 3D-ICs and presents three micro-channel heat sink
configurations that addresses these considerations. A micro-fluidic cooling based
dynamic thermal management (DTM) scheme is proposed. In Chapter 4, we investigate the electrical and cooling system co-design methodology. In that chapter,
we focus on two aspects of the co-design: a) TSV assignment and micro-channel
placement co-optimization, and b) gate sizing and micro-channel co-optimization.
Finally, we conclude in Chapter 5 with a summary of the main findings of this work,
and consider further prospects of this research field.
10
Chapter 2
Background
2.1 Basics of Three Dimensional Integrated Circuit
The 3D-IC contains two or more layers of active electronic components which
are stacked vertically. Figure 2.1 shows a three-tiered stacked 3D-IC. In the 3DIC, each active layer contains the functional units such as cores and caches, etc.
The metal layer contains wires that enable communication among different components. There is also a metal-oxide layer above each metal layer. Through-silicon-vias
(TSVs) are inserted in 3D-IC to deliver signal/power/ground among different tiers.
In 3D-IC, since several layers of electronic components that dissipates power
are stacked vertically, its power density is usually higher than 2D-ICs, leading to
potential thermal issues. Moreover, the thermal conductivity of oxide layer is low
and hence would reduce the heat transfer towards the ambient. This exacerbates the
thermal problems in 3D-ICs. Hence an important issue with 3D-IC is the removal
of high density heat resulting from several stacks of microprocessor chips. Although
current 3D-IC designs are limited to partitioning of memory and datapath across
layers, future 3D-IC designs are expected to have significantly complex architectures
and integration levels that would be associated with very high power dissipation and
heat density.
In order to alleviate the thermal issues, micro-channel based liquid cooling and
11
thermal TSVs have been adopted. As shown in Figure 2.1, micro-channel heat sinks
are embedded below the active layers. Liquid is pumped through each channel, and
takes away the heat generated in the active layers [39][43]. The heated coolant is
then cooled down in the heat exchanger, and recirculates into the fluid pump again
for the cooling in the next circulation. On the other hand, TSVs, which are usually
made of copper and have better thermal conductivity than silicon or metal-oxide,
can help improve conduction of heat between different layers. When the number of
signal TSVs is not enough, dummy thermal TSVs are inserted to further mitigate
the thermal issues.
Figure 2.1: Stacked 3D-IC with micro-channel cooling system
2.2 Fundamental Characteristics of Fluids in Micro-channels
2.2.1 Conservation Law of Fluid Dynamics
The characteristic of fluid inside the micro-channels is governed by conservation law of fluid. Considering the control volume of fluid U and its surface S
12
Figure 2.2: Control volume of fluid
(as shown in Figure 2.2). The fluid flow in the control volume is governed by the
following mass, momentum and energy conservation equations [87][62][85][37][78]:
∂ρ
+ ∇ · (ρ⃗v ) = 0
∂t
∂⃗v
Momentum conservation : ρ(
+ ⃗v · ∇⃗v ) = −∇p + µ∇2⃗v
∂t
dT
+ ∇ · (−kf ∇T ) + Cv⃗v · ∇T = Ṗ
Energy conservation : Cv
dt
Mass conservation :
(2.1)
Here ⃗v is the flow velocity vector, T is the fluid temperature, Ṗ is the volumetric
heat generation rate, and p is the pressure inside fluid. Also, ρ, µ, Cv and kf are
the density, viscosity, volumetric specific heat and thermal conductivity of the fluid,
respectively.
2.2.2 Dimensionless Numbers in Fluid Mechanics
The governing equations above are complex partial differential equations (PDE).
Researchers in fluid mechanics introduced a set of dimensionless numbers which
could help simplify the complex problem and also better understand the relative
importance of forces, energies, or time scales [87][55]. Some of these dimensionless
numbers are Reynolds number (Re), Prandtl number (Pr) and Nusselt number (Nu),
etc.
13
Reynolds number Re: The Reynolds number gives a measure of the ratio between
inertial forces to viscous forces, and is defined as:
Re =
ρvLc
µ
(2.2)
where v is the mean fluid velocity and Lc is the characteristic length. In straight
micro-channels, the characteristic length is usually given by the hydraulic diameter Dh .
When the cross section of the channel is circular, Dh is the diame-
ter of the cross section, while in rectangular channels, Dh is defined as Dh =
4 · cross sectional area/perimeter = 4∆x∆z/(2∆x + 2∆z), where ∆x and ∆z
are the width and height of the micro-channel. Usually, the Reynolds number is
used to distinguish between laminar and turbulent flow, which will be explained
later.
Prandtl number Pr: The Prandtl number is the ratio of momentum diffusivity
(kinematic viscosity) to thermal diffusivity.
Pr =
kinematic viscosity
µ/ρ
Cv µ
=
=
thermal diffusivity
kf /(ρCv )
kf
(2.3)
Nusselt number Nu: The Nusselt number is the ratio of convective to conductive
heat transfer across the boundary between the fluid and solid. The Nusselt number
is defined as:
Nu =
hLc
kf
(2.4)
where h, Lc and kf are the convective heat transfer coefficient, channel characteristic
14
length and fluid thermal conductivity. Usually, Nu is used to calculate the convective
heat transfer coefficient h. Many works have been done to characterize the Nusselt
number in micro-channels, and express it as a function of the Reynolds number and
Prandtl number [15][5][89][94].
2.2.3 Single and Two Phase Flow
The working fluid in the micro-channel can be either single phase or two phase.
The single phase flow consists of exclusively liquid coolant as the working fluid, while
two phase flow consists of both liquid and vapor.
When the power density is too high so that the liquid absorbs too much heat
and its temperature increases dramatically, part of the liquid will become vapor and
two phase flow is formed. The two phase flow exhibits different patterns. Figure
2.3(a)-(f) shows the two phase flow patterns in horizontal channels. When the flow
rate is low, the flow usually exhibits bubbly (Figure 2.3(a)) or plug pattern (Figure
2.3(b)), as the flow rate increases, the pattern becomes stratified (Figure 2.3(c)) and
wavy (Figure 2.3(d)), and finally slug (Figure 2.3(e)) and annular (Figure 2.3(f))
[82][51].
The evaporation process in a channel is as Figure 2.3(g) shows. As the single
phase liquid absorbs heat so that the temperature increases to the evaporation point,
small bubbles appear. When the fluid continues to absorb heat along the channel,
plug and slug flows appear. The flow becomes waved and annular in the end.
Figure 2.4 compares the cooling effectiveness of single and two phase flows.
15
(a) bubbly
(b) plug
(c) stratified
(d) wave
(e) slug
(f) annular
(g)
Figure 2.3: (a)-(f) Two phase flow patterns, (g) Evaporation process in a channel
It plots the solid temperature at the micro-channel outlet location Tw versus the
footprint power density Pa for both single and two phase flows at same pumping
power [6]. It shows that two phase flow achieves lower solid temperature than single
phase flow, which indicates that two phase flow has higher cooling effectiveness than
single phase flow.
16
60
Single phase
Two phase
Tw(oC)
50
40
30
20
0
100
200
300
400
Pa(W/cm2)
Figure 2.4: Comparison of single and two phase flow
2.2.4 Laminar and Turbulent Flow
The flow inside micro-channels can be laminar, turbulent, or transitional [55].
Figures 2.5(a), 2.5(b), 2.5(c) show these three types of patterns. Laminar flow
(Figure 2.5(a)) occurs when fluid flows in parallel layers, with no disruption between
the layers. That is, the pathlines of different particles are parallel. It generally
happens in small channels and low flow velocities. In turbulent flow (as shown
in Figure 2.5(b)), vortices and eddies appear, and make the flow unpredictable.
Turbulent flow generally happens at high flow rates and larger channels. Transitional
flow (Figure 2.5(c)) is a mixture of laminar and turbulent flow, with turbulence in
the center of the channel, and laminar flow near the edges.
Usually, Reynolds number is used to predict the type of flow (whether laminar,
turbulent or transitional) in straight channels. For example, as [55] shows:
When Re < 2100, it is laminar flow; when 2100 < Re < 4000, it is transitional
flow; when Re > 4000, it is turbulent flow.
When the channel involves more complex structure, the fluid exhibits more
17
(a) Laminar
(b) Turbulent
(c) Transitional
Figure 2.5: (a) Laminar flow pattern, (b) Turbulent flow pattern, (c) Transitional flow pattern
complicated behavior. Figure 2.6 shows an example of otherwise laminar flow in
straight channels in a micro-channel with bends. When fluid enters a channel, it
firstly subjects to a flow development process and after traveling some distance
downstream, it becomes fully developed laminar flow. Then, when the flow comes
across a bend, it becomes turbulent/developing around the corner and settles down
after traveling some distance downstream into laminar fully developed flow again
[68].
Figure 2.6: Fluid in micro-channel with bends
18
2.3 Thermal Modeling of 3D-IC with Micro-fluidic Cooling
2.3.1 Distributed RC Thermal Model
The chip thermal behavior can be modeled by a distributed RC network by
partitioning it into fine grids. In this network, each grid is represented by a node.
The voltage at each node represents the temperature at that grid. The current
source in each grid represents the power dissipated at that location, so the chip
power profile decides the current injected at each grid. Each resistance represents a
heat transfer path between grids, while capacitors indicate the ability to store heat
[77].
Figure 2.7 shows an example of the RC network for one 3D-IC layer. In this
network, Ri,j (i, j = 1...6) indicates the heat path (thermal resistance) between grids
i and j, Ci represents the thermal capacitance of grid i. According to the thermal
model, the thermal dynamics of each grid i is governed by the following equation:
∑ Ti − Tj
dTi
Pi
=−
+ ,
dt
Ri,j Ci
Ci
∀grids i
(2.5)
∀j∈N (i)
Here Ti is the temperature of grid i, and Pi is the power consumption at this grid,
N (i) represents the set of grids adjacent to grid i.
In some works, people are more interested in the steady state thermal behavior.
In this case, the thermal model can be simplified as a resistive network that represent
steady state chip thermal behavior. Hence the governing equation in Equation 2.5
can be simplified as a set of linear equations of temperature and power as Equation
19
Figure 2.7: RC network for 3D-IC thermal modeling
2.6 shows. Given a chip thermal resistive network and power profile, the temperature
profile can be estimated by solving the following system of linear equations.
G · T⃗ = P⃗
(2.6)
Here G is the thermal conductance matrix decided by the thermal resistance network, P⃗ = {Pi , ∀grid i}, T⃗ = {Ti , ∀grid i} represent the power and temperature
profiles.
2.3.2 Cooling Performance of Micro-channels
The heat removal through micro-channels comprises of an intricate combination of heat conduction, convection and coolant flow. Consider the micro-channel in
Figure 2.8, heat dissipated in surrounding regions (basically active layers) first conducts to the micro-channel sidewalls. The heat is then absorbed by the fluid through
convection. The heated fluid is then carried away by the moving flow. These three
aspects can be captured by expressing them as three types of thermal resistances:
Rcond for conduction, Rconv for convection and Rheat captures fluid flow (as shown
20
in Figure 2.8).
Figure 2.8: Micro-channel thermal model
Conductive resistance Rcond : It is decided by thermal characteristics of the silicon that conducts heat dissipated in surrounding region to micro-channel sidewalls.
It can be calculated using the model in [77].
Convective resistance Rconv : It results from the convection of fluid, which moves
the heat from micro-channel sidewalls to into the coolant fluid. The convective
resistance depends on the fluid property and area for heat transfer between the
micro-channel sidewalls and fluid. Assuming the micro-channel has been discretized
into grids along the fluid direction z. The size of each grid is ∆x×∆y ×∆z as Figure
2.8 shows. Let Rconv be the convective resistance between the micro-channel and
sidewalls in each grid. As shown in [84], Rconv = 1/hAh , where Ah is the surface
area for heat transfer in each grid. If we assume that heat can be transferred
from all four sidewalls, the surface area of each grid is Ah = 2∆z(∆x + ∆y). The
parameter h is the coefficient of convective heat transfer explained in Section 2.2.2.
Given the Nusselt number Nu and the micro-channel dimension, it is calculated by
h = Nu kf /Dh , kf is fluid thermal conductivity and Dh is the hydraulic diameter.
So the convective resistance could be expressed as:
21
Rconv =
1
Dh
=
hAh
2Nu kf ∆z(∆x + ∆y)
(2.7)
Convective resistance Rheat : The heat resistance basically represents the heat
flowing downstream caused by the moving fluid:
Rheat =
1
Cv ρf
(2.8)
Here f is the volumetric flow rate in each channel. It depends on the fluid velocity
v and micro-channel cross sectional area: f = velocity ∗ cross sectional area =
v∆x∆y. Cv is the fluid specific heat, and ρ is fluid density.
2.3.3 Overall Thermal Model of 3D-IC with Micro-channels
As indicated earlier, the thermal behavior of micro-channels can also be modeled by a thermal resistance network (Figure 2.8). The parameters of this resistive network could be computed using the equations described above or experiment
based approaches [54]. The 3D-IC resistive network and micro-channel network can
be combined to generate a unified model that captures the steady state thermal
behavior of 3D-ICs with liquid cooling (Figure 2.9). Other aspects of 3D-ICs such
as the thermal impact of TSVs and thermal wake effect [54] can also be incorporated
in this resistive network.
22
Figure 2.9: Thermal resistive network of one 3D-IC layer with micro-channels
2.3.4 Thermal Impact of TSVs
Besides of micro-fluidic cooling, some works have also proposed usage of dummy
thermal TSVs for 3D-IC temperature reduction [24][17][90]. Due to the existence of
oxide layer which separates different tiers of 3D-IC thermally, the heat cannot be
effectively dissipated between tiers. The dummy thermal TSVs are firstly proposed
by [14] as additional heat dissipation paths to alleviate the temperature issues on
chip. Now it is adopted in 3D-ICs [24][17][90]. Since the TSV fill materials such
as copper usually have much higher thermal conductivity than silicon and oxide,
thermal TSVs could enhance the vertical heat transfer between different 3D-IC tiers
and to the heat sinks by reducing the effective thermal resistances.
To quantify the thermal effect of TSV on 3D-IC, assuming there is a thermal
TSV inserted in a 3D-IC grid as Figure 2.10 shows. The dimension of the grid is
∆x × ∆y × ∆z, and its original vertical thermal conductivity is kyold . Assuming the
cross sectional area of thermal TSV is Atsv , the vertical thermal conductivity of this
grid after inserting the thermal TSV becomes:
23
Atsv
Atsv
+ kyold (1 −
)
∆x∆z
∆x∆z
Atsv
= kyold + (ktsv − kyold )
∆x∆z
kynew = ktsv
(2.9)
Since the thermal conductivity of TSV fill material ktsv is usually larger than the
original thermal conductivity kyold (which is generally the thermal conductivity of
silicon and metal-oxide), the thermal conductivity of this grid will increase after inserting the thermal TSV, which could result in better heat transfer between different
tiers, and thus more uniform thermal profile.
Figure 2.10: A 3D-IC grid with thermal TSV
2.4 Modeling of Power Consumption
The chip power consumption has two major components: dynamic power and
leakage power [86]. Dynamic power results from charging of transistor load capacitances when they are switched, while leakage power is the power consumed by
transistors when they are in idle state.
At the system level, there are generally three power states. A) Active mode,
where the system is performing some operation. In this mode, the chip dissipates
both dynamic power and leakage power. B) Standby mode, where the system is
idle but ready to execute an operation. In this mode, the circuit dissipates only
24
leakage power. C) Inactive mode, where the power supply to circuits are shut down
by power gating or other leakage reduction techniques. Very small amount of power
is dissipated in this mode.
In addition to the chip power consumption, the micro-fluidic heat sink also
consumes extra power for performing chip cooling. This power basically comes from
the pump to inject the coolant through micro-channels. This extra cooling power
consumption is called “pumping power”.
2.4.1 Dynamic Power Consumption
The dynamic power depends on the transistor load capacitances being charged,
the rate of switching, supply voltage, etc [86]. For each gate gi , its dynamic power
2
can be calculated by Pd,i = αi Cd,i Vdd
F , where Cd,i is the load capacitance of gate gi ,
αi is its average switching activity in each cycle and F is the clock frequency. The
output capacitance Cd,i is proportional to gate width si , hence the dynamic power
can also be represented as a function of gate size and clock frequency as Equation
2.10 shows. In the equation, βd,i depends on the switching activity αi and supply
voltage Vdd , etc.
2
Pd,i = αi Cd,i (s)Vdd
F = βd,i si F
25
(2.10)
2.4.2 Leakage Power Consumption
Current leaks through transistors even when they are turned off, resulting in
leakage power consumption. There are three main components of leakage power: reverse biased junction leakage, sub-threshold leakage and gate oxide tunneling leakage
[86].
The junction leakage and sub-threshold leakage increases with temperature
while the gate leakage is rather insensitive to temperature. [47] models the leakagetemperature dependency as:
−
Pl,i = βl,1 Ti2 e
− T1
As shown in [91], the variation of e
i
βl,2
Ti
+ βl,3
(2.11)
is very small in the normal range of chip
operating temperature. Hence, some works also approximate the leakage model as
a quadratic function of temperature as Equation 2.12 shows [91]. The quadratic
fitting parameters ε1,2,3 are obtained from the underlying model in [47]. We tested
the accuracy of this quadratic model. Figure 2.11 shows that the quadratic model
is very close to the exponential model given in [47].
Pl,i = ε1 Ti2 + ε2 Ti + ε3
(2.12)
The leakage power is also a linear function of gate width si [86]. Hence the overall
26
Transistor leakage power (W)
−8
x 10
6.5
exponential model
quadratic model
6
5.5
5
4.5
4
3.5
0
50
100
150
Temperature (oC)
Figure 2.11: Exponential leakage model versus quadratic leakage model
leakage power can be modeled as (here φ is a constant):
Pl,i = φ · si · (βl,1 Ti2 e
−
βl,2
Ti
+ βl,3 ) ≈ φ · si · (ε1 Ti2 + ε2 Ti + ε3 )
(2.13)
From the power models, large gate size will result in higher dynamic and leakage
power, which leads to temperature increase. Temperature increase in turn will lead
to further increase in leakage power.
2.4.3 Micro-channel Cooling Power
2.4.3.1 Straight Micro-channels
The power used by micro-channels for performing chip cooling comes from the
work done by the fluid pump to push the coolant fluid into micro-channels. It is a
strong function of the level of heat removal desired. Basically, to maintain acceptable
thermal levels, increase of chip power dissipation would result in increased pumping
power Ppump , which is decided by the pressure drop and coolant fluid flow rate.
27
N
∑
Ppump =
fn ∆pn
(2.14)
n=1
Assuming there are N micro-channels, ∆pn and fn are the pressure drop and fluid
flow rate of the n-th micro-channel.
Here we assume the flow is fully developed laminar flow. The pressure drop in
a micro-channel is decided by:
∆p =
2γµLv
Dh2
(2.15)
where L is channel length, Dh is hydraulic diameter, v is fluid velocity, µ is the
∆y
viscosity of fluid and γ is a function of micro-channel aspect ratio ( ∆x
) [42][39].
In this work, we assume that all straight micro-channels have the same width
and height. Usually fluid pumps are designed to work such that all the microchannels experience the same pressure drop ∆p. For a given pressure drop that
the pump delivers across all channels, fluid velocity v could be estimated using
Equation 2.15. So the fluid flow rate f = v∆x∆y is also a function of pressure drop
∆p, and could be estimated. Since the pressure drop is the same across all channels,
so are the velocity and fluid flow rate since we assume all channels have the same
dimension. Given this, the pumping power can be rewritten as:
Ppump = N f ∆p =
28
N ∆x∆yDh2 ∆p2
2γµL
(2.16)
2.4.3.2 Micro-channels with Bends
Consider the micro-channel structure shown in Figure 2.6. The existence of a
bend causes a change in the flow properties which impact the cooling effectiveness
and pressure drop. An otherwise fully developed laminar flow in the straight part of
the channel, when comes across a 90◦ bend becomes turbulent/developing around
the corner and settles down after traveling some distance downstream into laminar
fully developed again (see Figure 2.6). So a channel with bends has three distinct
regions, 1) fully developed laminar flow region, 2) the bend corner, and 3) the
developing/turbulent region after the bend [33][68]. The length of flow developing
region is [69]:
Ld = (0.06 + 0.07
∆y
∆y 2
− 0.04 2 )ReDh
∆x
∆x
(2.17)
where Re is the Raynolds number, and ∆x, ∆y and Dh are the micro-channel
width, height and hydraulic diameter.
The rectangular bend impacts the pressure drop. Due to the presence of bends,
the pressure drop in the channel is greater than an equivalent straight channel with
exactly the same dimensions. The total pressure drop in a channel with bends is
the sum of the pressure drop in the three regions described above (which finally
depend on how many bends the channel has). Assume L is the total channel length,
and m is the bend count. Therefore m · Ld is the total length that has developing/turbulent flow and m · ∆x is the total length attributed to corners (see Figure
2.6). Hence the effective channel length attributed to fully developed laminar flow
is L − m · Ld − m · ∆x. The pressure drop in the channel is the sum of the pressure
29
drop in each of these regions.
Pressure drop in fully developed laminar region: The total pressure drop in
fully developed laminar region is [42]:
∆pf =
2γµ(L − m · Ld − m · ∆x)v
2γµLf v
=
2
Dh
Dh2
(2.18)
Here Lf = L − m · Ld − m · ∆x is the total length of the fully developed laminar
region which is explained above, the other parameters are the same as in Equation
2.15.
Pressure drop in flow developing region: The pressure drop in each flow
developing region is: δpd =
3.44
√
2µv
2
Dh
∫ Ld
0
ψ(z)dz [56]. Here ψ(z) is given by ψ(z) =
(ReDh )/z, where z is the distance from the entrance of developing region
in the flow direction. Assuming there are a total of m corners in a given microchannel, so there are m developing regions with the same length Ld in this channel.
By putting the expression of ψ(z) and Ld into the equation of δpd and solving
the integration, we can get the total pressure drop of the developing region in this
micro-channel:
∆pd = m · δpd = mKd ρv 2
2
1
(2.19)
∆y
∆y 2
is a constant associated with the
where Kd = 13.76(0.06 + 0.07 ∆x
− 0.04 ∆x
2)
aspect ratio
∆y
.
∆x
Please refer to [33][56] for details.
30
Pressure drop in corner region: The total pressure drop at all the 90◦ bends in
a micro-channel is decided by:
ρ
∆p90◦ = m · δp90◦ = m K90 v 2
2
(2.20)
where m is the number of corners in the channel, δp90◦ is the pressure drop at each
bend corner and K90 is the pressure loss coefficient for 90◦ bend whose value can be
found in [33].
Total pumping power: The total pressure drop in a micro-channel with bends is
the sum of the pressure drop in the three regions discussed above:
∆p = ∆pd + ∆pf + ∆p90◦ =
2γµLf
K90
v + m(Kd +
)ρv 2
2
Dh
2
(2.21)
From Equations 2.21, the total pressure drop of a micro-channel is a quadratic
function of the fluid velocity v. For a given pressure difference applied on a microchannel, we can calculate the associated fluid velocity by solving Equation 2.21.
With the fluid velocity, we can then estimate the fluid flow rate f , and thus estimate
the thermal resistance and pumping power for this channel. Hence the pumping
power as well as cooling effectiveness of micro-channels with bends is a function of
1) number of bends, 2) location of channels, and 3) pressure drop across the channel.
Comparing Equations 2.15 and 2.21, due to the presence of bends, if the same
31
pressure drop is applied on a straight and a bended micro-channel of the same
length, the bended channel will have lower fluid velocity, which leads to a lower
cooling capability. Therefore, to provide the same amount of cooling, we will need
to increase the overall pressure drop that the pump delivers, which results in increase
of pumping power. But bends allow for better coverage in the presence of TSVs.
32
Chapter 3
Design of Micro-fluidic Cooling Configurations for 3D-ICs
3.1 Motivation of Micro-Fluidic Cooling
The coming years will witness a significant increase in CPU power dissipation
due to advanced multi-core architectures and 3D integration technologies. The
thermal problem in 3D-IC is even more severe compared with 2D circuits, because
the power density is usually higher due to the stacked architecture. Moreover,
the thermal conductivity of oxide layer is low and hence would reduce the heat
conduction towards the ambient. The conventional air cooling has been proved
to be insufficient for future high performance 3D-ICs even with sophisticated DTM
schemes [8]. As a result, more effective active cooling schemes are being investigated
for high performance 3D-ICs [39][43]. Micro-channel cooling, which integrates microchannel heat sinks into each tier of the 3D-IC and uses liquid flow to remove heat
from within the 3D chip, is an effective active cooling scheme for 3D-IC. It has
been reported to support heat dissipation higher than 700W/cm2 with single phase
flow[84]. When the working fluid is two phase flow, the heat removal rate is even
higher.
33
Figure 3.1: Micro-channel and TSV configuration
3.2 Micro-channel Design Considerations/Constraints
As shown in Figure 2.1, each tier of 3D-IC contains an active silicon layer
and silicon substrate. The micro-channels are placed horizontally in the silicon
substrate. TSVs such as power/ground TSV, signal TSV, etc, are incorporated
for communications between layers and delivery of power and ground. Figure 3.1
shows a possible configuration of micro-channels and TSVs in the silicon substrate
of 3D-IC [40][45]. In each 3D-IC tier, micro-channels are etched in the inter-layer
region (silicon substrate). Fluidic channels (fluidic TSVs) go through all the tiers
and delivers coolant to micro-channels. TSVs also go through the silicon substrate
vertically to deliver signal, power and ground.
Though the micro-channel heat sink is capable of achieving good cooling performance, many problems need to be addressed when designing the micro-channel
infrastructure for cooling 3D-IC so as to ensure the reliability of the chip and also
improve the effectiveness of the micro-channel [72].
34
10
Ppump(W)
8
6
4
2
0
250
300
350
400
450
500
550
Total 3D−IC chip power (W)
Figure 3.2: Pumping power versus chip power consumption
3.2.1 Cooling Power Consumption
The micro-fluidic cooling is active by nature. That is, the fluid pump consumes extra energy for pushing the coolant through the micro-channels (we call this
pumping power consumption). The pumping power can be quite significant. Figure
3.2 shows the pumping power required to maintain the 3D chip below temperature
constraints (85℃) for different chip power profiles using the conventional approach
of spreading straight micro-channels all over each tier. For each power profile, we
find the minimum pressure drop required to maintain the chip temperature within
constraints and then estimate the pumping power under this pressure drop using
Equation 2.16. As we can see, to maintain the chip temperature within acceptable levels, pumping power increases very fast as the total chip power increases.
Therefore controlling the micro-channel pumping power is very important.
3.2.2 Non-uniform Power Profile
The underlying heat dissipated in each active silicon layer exhibits great nonuniformity [39][60]. Such non-uniformity in power profile results in hotspots in
35
thermal profiles. Therefore, when designing micro-channel heat sink infrastructure,
one should account for this non-uniformity in thermal and power profiles. Simply
minimizing the total equivalent thermal resistance of the micro-channels while failing
to consider the non-uniformity of the power profile will lead to suboptimal design.
For example, conventional approaches for micro-channel designs spread the entire
surface to be cooled with channels, and find the width and height of micro-channels
that minimize the overall thermal resistance [84][42]. This approach, though helps
reducing the peak temperature around the hotspot region, over cools areas that are
already sufficiently cool. This is wasteful from the point of view of pumping power.
3.2.3 TSV Constraint
3D-ICs impose significant constraints on how and where the micro-channels
could be located due to the presence of TSVs, which allow different layers to communicate. As illustrated in Figure 3.1, micro-channels are allocated in the interlayer
bulk silicon regions. TSVs also exist in this region, causing a resource conflict. A
3D-IC usually contains thousands of TSVs which are incorporated with clustered
or distributed topologies [26][57]. These TSVs form obstacles to the micro-channels
since the micro-channels cannot be placed at the locations of TSVs. Therefore the
presence of TSVs limits the available spaces for micro-channels, and designing the
micro-channel infrastructure should take this fact into consideration.
36
3.2.4 Thermal stress
The TSV fill materials are usually different from silicon. For example, copper
has low resistivity and is therefore widely used as the material for TSV fill. Because
the annealing temperature is usually much higher than the operating temperature,
thermal stress will appear in silicon substrate and TSV after cooling down to room
temperature due to the thermal expansion mismatch between copper and silicon
[92][7]. This thermal stress might result in reliability problems such as cracking.
Moreover, as shown in [92][28], thermal stress also influences electron/hole
mobilities significantly, hence changing the gate delay. Therefore, if the gates on
critical paths are allocated near TSVs (basically regions with high thermal stress),
timing violation might occur.
The existence of micro-channels which influences the temperature around
TSVs will influence the thermal stress, thereby changing the mechanical reliability analysis and timing analysis in the 3D-IC with TSVs. For example, Figure 3.3
shows the thermal stress inside and surrounding a TSV at different thermal conditions. Figure 3.3(a) depicts the thermal stress when chip temperature is 100℃ and
annealing temperature (which is basically the stress free reference temperature) is
250℃. The figure shows that large thermal stress (up to 490MPa) appears surrounding the TSV. Figure 3.3(b) depicts the thermal stress when the chip temperature
is 50℃. In this case (where chip temperature is 50℃), the overall thermal stress is
increased (compared with the previous case where chip temperature is 100℃), and
the maximum thermal stress reaches up to 670MPa. Such phenomenon indicates
37
that reduction in chip temperature results in an increase in thermal stress. Hence
the existence of micro-channels, which generally reduces chip temperature, may increase the TSV-induced thermal stress. Such phenomenon should be considered
when designing the micro-channel infrastructure.
(a)
(b)
Figure 3.3: Thermal stress inside and surrounding TSV (a) when chip temperature is 100℃, (b)
when chip temperature is 50℃(assuming stress free temperature is 250℃)
Moreover, if micro-channels are placed too close to the TSVs, the silicon walls
between the TSVs and micro-channels will be more likely to crack because the walls
are thin. These facts further limits the locations of micro-channels.
In this chapter, we propose three micro-channel structures (cooling configura38
tions) to improve the cooling effectiveness while still satisfying the design constraints
imposed on micro-channels. These three structures are: non-uniform (hotspot optimized) micro-channels [76], bended (TSV-constrained) micro-channels [73] and
hybrid cooling network [75]. We also investigate a micro-channel based dynamic
thermal management scheme that controls the runtime chip temperature by tuning
the pressure drop (fluid flow rate) through micro-channels [71].
3.3 Hotspot Optimized Non-Uniform Micro-channel
The first configuration is hotspot-optimized non-uniformly distributed microchannels [76]. In this work, we start from the regular straight micro-channels. According to the micro-channel thermal model in Section 2.3.2, the cooling effectiveness
of micro-channels depends on the dimension and distribution of micro-channels, as
well as the fluid flow rate through micro-channels. The pumping power required by
micro-channels also depends on these parameters as Equation 2.16 shows. Here we
assume the micro-channel width and height are fixed. The optimal micro-channel
width and height were investigated in [84][42], etc. In this case, designing the
optimal micro-channel structure is basically deciding the count and distribution of
micro-channels. For a given pressure drop, increase in the number of micro-channels
helps increasing the coverage of cooling system thereby improving the heat removal
rate. But this will also lead to linear increase in total pumping power.
39
3.3.1 Problem Formulation
Given a 3D-IC design, its power distribution is a function of the architecture and application. Assuming the power profile is given (this assumption will be
generalized later), and we know a set of locations as potential target locations for
micro-channels (see Figure 3.4) (all locations containing TSVs have been removed
from this set for the sake of illustration). We want to find the number and locations
of channels such that the temperature all over the chip is within acceptable limits
while minimizing the number of channels (assuming pressure drop ∆p is fixed). The
problem is formulated as follows:
unknowns :
B
min
N = sum(B)
s.t.
T⃗ (B) ≤ Tmax
(3.1)
Here, B = {B1 , B2 , ..., BN } is a vector representing the locations of all microchannels. Assuming we know the set of potential micro-channel locations, each
element Bn (n = 1...N ) in B corresponds to one of these locations and it’s value is
assigned as:



 1, micro-channel exists in this location
Bn =
(3.2)


 0, otherwise
N is the total number of micro-channels placed. When pressure drop ∆p is given,
pumping power only depends on the total number of micro-channels N , so the
objective in Equation 3.1 basically minimizes the pumping power. For a given
40
allocation of micro-channels, the thermal resistive network can be used to estimate
temperature profile by dividing the 3D-IC into grids (using the approach illustrated
in Section 2.3.3). Finding the optimal locations of micro-channels is a complex
discrete problem. Now we describe an iterative heuristic that finds a good solution.
3.3.2 Heuristic for Micro-channel Placement
Algorithm 1 gives the basic framework of our heuristic. The heuristic is based
on iterative improvement. We start by finding a set of potential locations for microchannels as Figure 3.4 shows. Note that all the locations containing TSVs and
other structures are removed from this set for the sake of illustration. In reality the
potential location would be limited by TSV locations, etc. The detailed approach for
finding the potential micro-channel locations is given in Section 3.3.3. In the initial
micro-channel design, micro-channels are placed at all the potential locations. We
assign each micro-channel a cost which represents the impact of removing the microchannel on thermal profile. Given the initial design and micro-channel cost, the
algorithm iteratively removes micro-channels until further removal results in thermal
violation. In each iteration, the micro-channel with the smallest cost is removed.
After each micro-channel removal, the costs of the remaining micro-channels need
to be updated. This is because the impact of removing a micro-channel on the
thermal profile is a function of both the power profile and also which micro-channels
have been removed so far. A micro-channel that had little impact on the thermal
profile if many micro-channels were present in its neighborhood might have a much
41
(a)
(b)
Figure 3.4: Potential locations of micro-channels: (a) uniform spreading of micro-channels, (b)
workload-balanced micro-channel spreading
Algorithm 1 Heuristic for micro-channel placement
Starting from micro-channels placed at all potential locations:
1. Initialize the cost (defined below) for each micro-channel;
2. Set viscosity µ = µ(Tin ), where Tin is coolant inlet temperature;
3. Repeat:
4. Remove micro-channel with the lowest cost;
5. Generate the new resistive thermal model;
6. Estimate the temperature profile T⃗
7.
If T⃗ ≤ Tmax , update cost and viscosity, and go to step 2;
8.
Else stop.
higher impact when its neighboring micro-channels have been removed. Since the
fluid viscosity µ is a function of fluid temperature, we also update the value of fluid
viscosity after each iteration. To estimate the new viscosity, we calculate the average
fluid temperature among all channels and lookup the associated viscosity from the
table in [34].
The complexity in this optimization problem comes from the fact that as we
change the location of channels, the underlying thermal resistive network changes.
42
In order to estimate the thermal impact, we need to solve Equation 2.6 every time
we have a new resistive network, which, even though exhibits linear complexity for
estimation of the thermal profile, can have high complexity due to the granularity
of the grid. The success (both performance and runtime) of this algorithm critically
depends on how potential micro-channel locations are distributed (which basically
decides the initial micro-channel distribution) and how micro-channel cost is assigned and updated. In the next three subsections, we will discuss these aspects
and investigate ways to improve the efficiency of the algorithm (basically reducing
the required number of iterations in the algorithm).
3.3.3 Workload-balanced Initial Micro-channel Distribution
The heuristic of micro-channel placement starts with an initial distribution
where micro-channels are placed at all the potential locations and iteratively removes
micro-channels. The complexity of the algorithm mostly comes from the thermal
estimation in each iteration. Hence we should reduce the number of iterations
required by the algorithm (while still maintaining its performance), which critically
depends on how potential micro-channel locations are distributed. So in this section,
we investigate the method to find a good initial micro-channel distribution, which
is basically finding a set of potential locations of micro-channels.
As shown in [39] and [60], the underlying heat dissipated in each active silicon
layer exhibits great non-uniformity. For example, typical CPU designs are generally
very hot in areas surrounding ALU and cooler around caches. Therefore, spreading
43
micro-channels all over the 3D chip or using arbitrary initial micro-channel distribution may result in imbalance in micro-channel cooling workloads and waste pumping
power. For example, for a 3D-IC shown in Figure 3.4, in the active silicon layer
which dissipates power, the height of the arrow indicates the power density. If the
potential micro-channel locations spread the entire chip as Figure 3.4(a) shows, the
regions with higher power density are covered by similar amount of micro-channels
as lower power regions. Since all channels have the same pressure drop and dimension (therefore provides same cooling capability), in order to cool the higher power
density region, we need to increase the pressure drop or dimension of all channels,
which is unnecessary for low power regions and leads to waste of pumping power.
Therefore, we consider spreading the potential micro-channel locations according to the spatial variations in power/thermal profiles on chip. Intuitively, in those
locations where the potential cooling workload is high, we try to place more microchannels as Figure 3.4(b) shows. In other words, each micro-channel should absorb
same/similar amount of heat. This initial distribution could then be further optimized by our iterative approach described earlier. The problem of finding the initial
micro-channel distribution is formally stated as follows:
Problem Statement: Given a 3D-IC and a power profile, we would like to
find N potential micro-channel locations in the micro-channel layers such that all
the channels will absorb the same amount of heat. The amount of heat each microchannel absorbs can be estimated as follows: assuming the 3D-IC is divided into
grids and modeled as a thermal resistive network (as Figure 2.9). The heat absorbed
by micro-channel (i, j) is:
44
Pheat,i,j =
∑
I(i1 , j1 , k1 ; i2 , j2 , k2 )
∀(i2 , j2 , k2 ) ∈ G(i, j)
(3.3)
∀(i1 , j1 , k1 ) ∈
/ G(i, j)
Here I(i1 , j1 , k1 ; i2 , j2 , k2 ) is the heat (current) flowing from grid (i1 , j1 , k1 ) to grid
(i2 , j2 , k2 ) (note that (i1 , j1 , k1 ) and (i2 , j2 , k2 ) must be neighboring grids), and
G(i, j) is the set of grids covered by micro-channel (i, j) (micro-channel located
at the i-th/j-th grid in x/y direction). Therefore (i2 , j2 , k2 ) is a grid inside microchannel (i, j) while (i1 , j1 , k1 ) is outside micro-channel (i, j), and I(i1 , j1 , k1 ; i2 , j2 , k2 )
indicates the heat flowing into micro-channel (i, j) through grids (i1 , j1 , k1 ) and
(i2 , j2 , k2 ). Here I(i1 , j1 , k1 ; i2 , j2 , k2 ) can be estimated by thermal analysis. For example, assuming the temperature at grids (i1 , j1 , k1 ) and (i2 , j2 , k2 ) are Ti1 ,j1 ,k1 and
,k2
,k2
Ti2 ,j2 ,k2 , then I(i1 , j1 , k1 ; i2 , j2 , k2 ) = (Ti1 ,j1 ,k1 − Ti2 ,j2 ,k2 )/Rii12 ,j,j12 ,k
, where Rii12 ,j,j12 ,k
is
1
1
the thermal resistance between grids (i1 , j1 , k1 ) and (i2 , j2 , k2 ) (it is usually a combination of convective and conductive resistances). Therefore I(i1 , j1 , k1 ; i2 , j2 , k2 )
depends on the micro-channel structure (location and size). Assuming the total
number of potential micro-channel locations N is fixed, we would like to allocate
these N micro-channels so that the heat each micro-channel absorbs (Pheat,i,j ) are
the same.
The difficulty in this problem comes from the fact that the amount of heat
each micro-channel absorbs is hard to decide before micro-channel placement, since
the location of micro-channels and pressure drop will largely influence the direction
of heat flow and thereby influence the heat each micro-channel absorbs. Therefore,
we use a minimum cost flow based heuristic to find a good initial micro-channel
45
density distribution.
Formulation of minimum cost flow problem:
To form the minimum cost flow problem, we firstly divide the 3D-IC into
coarse grids and each grid can contain several micro-channels. Basically we would
like to decide the density distribution of the potential micro-channel locations among
the grids. Finding the density distribution of micro-channels is basically deciding
the number of micro-channels in each grid. Note that, since the micro-channel
encompasses the whole chip in z direction, the number/location of micro-channels
in the grids at same (x, y) position are the same. So we use Ni,j to denote the
number of channels in the i-th/j-th grids in x/y direction (note that the grid network
is coarse). The density of micro-channels should be proportional to the potential
cooling workload for the micro-channels in this region.
After dividing the 3D-IC into grids, we perform a thermal analysis based on
this grid division assuming there is no micro-channel, and estimate the temperature
at each grid. Meanwhile, we abstract the 3D-IC structure as an undirected graph.
Figure 3.5 gives an example of how we form the minimum cost flow problem based on
the given 3D-IC structure and thermal profile. Figure 3.5(a) shows a 3D-IC with two
active silicon layers and a micro-channel layer in between. This 3D-IC is divided
into coarse grids and an associated graph which captures the 3D-IC structure is
formed in Figure 3.5(b) and the corresponding minimum cost flow problem is given
in Figure 3.5(c). As we can see from Figure 3.5(b), each grid is represented by a
node, and each pair of neighboring grids (nodes) are connected by an undirected
46
edge. Based on this graph and the temperature profile, the minimum cost flow
problem is formed as follows:
Figure 3.5: Example of formulating mincost flow network, (a) 3D-IC structure, (b) abstract grid
graph, (c) minimum cost flow network
Nodes: a) Each node (i, j, k) in the active silicon layer forms a source node,
with ai,j,k = max{0, Ti,j,k −Tin } units of flow available, where Ti,j,k is the temperature
at grid (i, j, k) and Tin is the constant fluid inlet temperature. As shown in Figure
3.5(b), the active layer nodes are represented by black dots, and becomes source
nodes in the minimum cost flow problem in Figure 3.5(c).
∑
b) There is a single sink node with demand
ai,j,k . This node is
∀(i,j,k)∈active layer
represented by a black square in the minimum cost flow in Figure 3.5(c).
c) Each of the other grids/nodes is represented by an intermediate node (gray
dots in Figure 3.5(c)).
Edges: a) Similar as the graph in Figure 3.5(b), in the minimum cost flow in
Figure 3.5(c), each pair of neighboring nodes are connected by an edge and the edges
are bi-directional (can take heat flow in either direction). Each edge has unlimited
capacity and also a cost which is assigned as:
47
cost(i1 , j1 , k1 ; i2 , j2 , k2 ) = r1 · (Ti1 ,j1 ,k1 + Ti2 ,j2 ,k2 )/2
(3.4)
Here cost(i1 , j1 , k1 ; i2 , j2 , k2 ) denotes the cost of edge connecting nodes (i1 , j1 , k1 )
and (i2 , j2 , k2 ). The cost is basically decided by the average temperature of the two
neighboring nodes, and r1 is a constant scaling factor.
b) All the nodes in the micro-channel layers are connected to the sink node
with the capacity and cost defined as follows:
capacity : cap(i, j, k) = r2 V − r3 ni,j,k
T SV
(3.5)
cost :
cost(i, j, k; sink) = r4 Ti,j,k
Here V is a constant representing the maximum number of micro-channels each
grid can contain, ni,j,k
T SV represents the number of TSVs in grid (i, j, k), r2 , r3 are
constant scaling factors and r4 is a small constant. The edge capacity is decided by
the number of micro-channels each grid could contain at most, which depends on
the number of TSVs in the grid. The existence of TSVs in a grid would reduce the
capacity of each grid since micro-channel cannot be placed in the places where there
are TSVs.
The minimum cost flow problem basically sends the flows from source nodes
to the sink node through some of the edges so that the total cost of the selected
edges is minimized. The solution of minimum cost flow gives the amount of flow
(ei,j,k ) that passes through each micro-channel layer node (i, j, k). Assuming N is
the total number of potential micro-channel locations that we would like to find, the
48
number of micro-channels in grids (i, j, ∀k) is assigned as follows:
Ni,j
∑
ei,j,k
= round( ∑ ∀k
N)
∀i,j,k ei,j,k
(3.6)
The round() function means rounding the fractional number to the nearest integer
number. After getting the number of micro-channels in each grid, we uniformly
place such amount of micro-channels in each grid. That is, Ni,j micro-channels are
uniformly distributed in grids (i, j, ∀k) (note that we had used a coarse grained grid
structure). Figure 3.4(b) shows such a workload-balanced micro-channel distribution. The grids with higher power density are allocated more channels, and within
each grid (i, j, ∀k), Ni,j micro-channels spread uniformly if the TSVs do not block
the placement of micro-channels.
To account for the presence of TSVs, during the micro-channel placement,
when there are TSVs in any place along the micro-channel location, no microchannel is allocated in this location.
3.3.4 Micro-channel Cost Assignment
Given the initial micro-channel distribution, we iteratively remove micro-channels
to save pumping power as Algorithm 1 shows. To determine the order in which
micro-channels are removed, we assign a cost to each micro-channel, which indicates the cost of removing this micro-channel. In each iteration, the micro-channel
with the smallest cost is removed. After each micro-channel removal, the cost of
remaining micro-channels is updated. In this subsection, we discuss how micro49
channel cost is assigned and updated.
Defining Micro-channel Cost: The temperature at an on chip location
largely depends on the power dissipated in that region, and its neighboring regions.
Thus, we use “weighted power” based approach for micro-channel cost assignment.
Basically each micro-channel should absorb the heat generated in the region right
below and above itself in active layers and also the heat generated in near neighbors.
To assign the cost of micro-channels, we define a region of influence (ROI) for each
potential micro-channel. The ROI of a micro-channel is the region to which this
channel provides cooling (that is, the region right below and above this channel in
active layer and also in the near neighbors). The dark region in Figure 3.6(a) shows
the ROI of micro-channel 3. We divide the 3D-IC into fine grained grids, each of
which contains at most one micro-channel. Let Wi,j denote the cost of the microchannel located in position (i, j) (i-th grid in x direction in micro-channel layer j),
it is assigned as the weighted sum of the power dissipated in its ROI:
Wi,j =u1 (w0 · Pi,j+1 +
+u2 (w0 · Pi,j−1 +
b∑
max
b=1
b∑
max
wb · (Pi+b,j+1 + Pi−b,j+1 ))
(3.7)
wb · (Pi+b,j−1 + Pi−b,j−1 ))
b=1
Here Pi,j =
∑
∀k
Pi,j,k , where Pi,j,k is the power dissipated at grid (i, j, k) (the i-
th/j-th/k-th grid in x/y/z direction). In z direction the channel covers the whole
chip, so we sum up the power in all grids in z direction (denoted by Pi,j ) and the
channel cost is a weighted sum of Pi,j .
The weight is decided by the distance from the heat source to the micro50
channel. In Equation 3.7, u1 and u2 are the vertical weight factors. Assume microchannels absorb heat from the active layers right above and below them. As Figure
3.6(a) shows, u1 is the vertical weight factor for the power from the active layer
above the micro-channel, it is inversely proportional to the vertical distance between
micro-channel and its top active layer. Similarly, u2 is the vertical weight factor for
the power from the active layer below the micro-channel, and its value is decided
in a similar way. Here wb is the horizontal weight factor. We assume horizontally
each channel has a coverage of bmax in x direction, that is, each channel absorbs the
heat in the region within a distance of bmax from it in x direction. Note that the
horizontal distance here is measured in x direction since in z direction the channel
covers the whole chip. The horizontal weight factor wb (b = 1...bmax ) is decided by
the distance from the channel to the heat source in x direction (measured by b). We
set w0 = 1 and wb is monotonically decreasing with distance b.
Updating Micro-channel Cost: After removing a micro-channel, we should
update the cost of remaining channels. Basically, after a channel is removed, its
neighboring channels should take care of the region covered by the removed channel
(Figure 3.6(b)), and thus the cost of these neighboring channels should increase.
Assuming (i0 , j) is the micro-channel we have just removed (the channel located at
i0 -th grids in x direction in layer j), we will update the cost of remaining microchannels in layers j − 2, j and j + 2 as Figure 3.6(b) shows (note that layers j ± 1
are active layers), the update function is as follows:
51
(a)
(b)
Figure 3.6: (a) Cost initialization, (b) Cost update




Wi,j = Wi,j + w|i0 −i| · Wi0 ,j
Wi,j±2


= Wi,j±2 + u3 · w|i0 −i| · Wi0 ,j 
∀i s.t. |i0 − i| ≤ bmax
(3.8)
Here wb is the horizontal weight factor, and u3 is the vertical weight factor decided
by the vertical distance between two micro-channel layers.
The algorithm iterates until further removal of micro-channels results in thermal violation. The remaining micro-channels form the final cooling system. The
cooling effectiveness of the resultant micro-channel design will be given in Section
3.7, which shows that the non-uniform micro-channel design can result in more than
50% pumping power savings compared with the conventional design.
Though significant power saving is achieved, this non-uniform micro-channel
structure is still inefficient in dealing with the spatial constraints imposed by TSVs.
In the next section, we will investigate a TSV constrained micro-channel design that
can better address this problem and further save pumping power.
52
3.4 TSV Constrained Bended Micro-channel
3.4.1 Motivation of Using Bended Micro-channel
The previous configuration uses straight channels that spread in areas that
demand high cooling. If the spatial distribution of micro-channels is unconstrained
then such an approach results in the best cooling efficiency with the minimum cooling energy. However 3D-ICs impose significant constraints on how and where the
micro-channels could be located due to the presence of TSVs, which allow different
layers to communicate. A 3D-IC usually contains thousands of TSVs which are incorporated with clustered or distributed topologies [57]. These TSVs form obstacles
to the micro-channels since the channels cannot be placed at the locations of TSVs.
Therefore the presence of TSVs prevents distribution of straight micro-channels.
This results in the following problems.
1. As illustrated in Figure 3.7(a), micro-channels would fail to reach thermally
critical areas thereby resulting in thermal violations and hotspots.
2. To fix the thermal hotspots in areas where micro-channels cannot reach, we
need to increase the fluid flow rate resulting in a significant increase in cooling
energy.
To address this problem, we investigate micro-channel with bends as illustrated
in Figure 3.7(b). With bended structure, the micro-channels can reach those TSVblocked hotspot regions that straight micro-channels cannot reach. This results
in better coverage of hotspots and therefore better cooling efficiency and reduced
53
(a)
(b)
Figure 3.7: Example of silicon layer thermal profile with TSV and (a) straight, (b) bended microchannels
cooling energy. While micro-channels with bends (or serpentine organization of
micro-channels) have been investigated in the past [68][23], our work is the first one
to investigate this structure from the context of 3D-ICs and more specifically address
the constraint imposed by TSVs towards spreading of straight micro-channels [73].
3.4.2 Problem Formulation
In this work, we would like to decide the locations and geometry of microchannels with bended structure so that its cooling effectiveness is maximized. Designing 3D-IC micro-channel infrastructure is a very complex problem. For example
there are exponentially many ways to incorporate micro-channels with bends whose
impact on the silicon temperature requires us to solve complex system of thermal
equations. The specific problem formulation is as follows.
54
min Ppump (eli,j , ∆p)
s.t.
∑
eli,j = 1, ∀grid i ∈ {CI, CO}, ∀channel layer l
∀j∈N (i)
∑
eli,j = k ∈ {0, 2}, ∀grid i ∈{CI,
/
CO, TSV}, ∀channel layer l
∀j∈N (i)
eli,j = 0, if grid i or j ∈ {TSV}, ∀channel layer l
(3.9)
Til (eli,j , ∆p) ≤ Tmax , ∀grid i, ∀channel layer l
eli,j ∈ {0, 1}, ∀grids i, j, ∀channel layer l
eli,j = elj,i , ∀grids i, j, ∀channel layer l
Figure 3.8: Example of micro-channel infrastructure design using minimum cost flow
Figure 3.8 represents the problem formulation graphically. Given a set of
stacked silicon layers, some of the intermediate layers between silicon layers would
have micro-channels (as shown in Figure 3.8(a), two intermediate layers comprise of
micro-channels). The locations of input and output orifices for the micro-channels
are assumed known. We would like to find micro-channel routes from one side to
55
the other such that the routes do not intersect, avoid TSVs and provide sufficient
cooling at minimum pumping energy.
We impose a graph on each micro-channel layer as indicated in Figure 3.8(b).
In the graph, each grid is represented by a node, and the edges define the immediate
neighbors of a node. The micro-channel routing would be performed on this graph.
If there is a TSV located on a grid, then its corresponding neighborhood edges
are removed since micro-channels cannot be routed through TSVs. Let eli,j = 1
represents the fact that there is a channel connecting grids i and j in the l-th
micro-channel layer of the 3D-IC (so i and j must be neighboring nodes in the
grid graph and eli,j = elj,i ). Neither i nor j should have a TSV (because TSVs
will not allow channels to go through them). In the first constraint, {CI, CO}
represents the set of input and output orifice nodes, N (i) represents the set of
i’s neighboring nodes. So the first constraint imposes that the input and output
orifice nodes must have a neighboring grid they are connected to so that their
incoming/outgoing fluid can be pushed into/out-of the micro-channel layer. The
next constraint imposes that, for each grid, either there is a channel going through
this grid (and therefore
therefore
∑
l
∀j∈N (i) ei,j
∑
l
∀j∈N (i) ei,j
= 2), or no micro-channel goes through it (and
= 0). In the third constraint, {T SV } represents the set of
grids containing TSVs, so micro-channels cannot be routed through these nodes.
The following constraint imposes that the temperature is within acceptable limits
and the objective tries to minimize the pumping power.
56
Figure 3.9: Micro-channel infrastructure design flow
3.4.3 Overall Micro-channel Design Flow
This is a very complex problem since: 1) the variables need to be discrete,
and 2) the thermal and pumping power models are highly nonlinear. In this section
we investigate such a methodology as illustrated in Figure 3.9. Our methodology
follows a sequence of logical steps. First the severity of the thermal problem and the
need for having micro-channels is evaluated by performing a full scale thermal analysis. Based on the severity of the thermal problem (location, intensity of hotspots)
an initial micro-channel design is developed. This design is further improved for
reducing the cooling power footprint and improving the thermal effectiveness using
iterative methods. Now we go into the details of these individual steps.
3.4.4 Mincost Flow Based Micro-channel Design
The full scale 3D thermal analysis would identify locations of hotspots in
different layers which cannot be removed by conventional package/air cooling based
approaches. These are the areas which require sufficient proximity to the micro-
57
channels. Since solving the formulation in Equation 3.9 is intractable, we use simple
models to come up with a sufficiently good initial micro-channel infrastructure which
is iteratively improved subsequently. In order to develop this initial solution we use
the minimum cost flow formulation.
3.4.4.1 Initialization of Minimum Cost Flow Network
Consider the 3D-IC and the corresponding grid graph of each micro-channel
layer as illustrated in Figure 3.8(a)(b). For each micro-channel layer, we instantiate
a minimum cost flow problem as follows (see Figure 3.8(c) for illustration). The
nodes corresponding to the input/output orifices for the given micro-channel layer
are assigned a supply/demand of one flow unit. All nodes in the grid graph have
a capacity one. The edges have unlimited capacity and are bi-directional (can take
fluid flow in either direction). As indicated earlier the edges between two neighboring nodes exist only if neither of the nodes has a TSV. This enforces the routing
constraint imposed by TSVs. Figure 3.8(c) indicates the flow network for the two
micro-channel layers.
Each node has a cost whose assignment would be discussed subsequently. We
would like to send flow from inlet nodes to outlet nodes such that the capacity
constraints are not violated and the cost is minimum. Assigning the node capacity
to be 1 would ensure that all the flow from inlet to outlet follows simple paths (nonintersecting and non-cyclic). A minimum cost flow formulation with a well defined
node capacity could be solved using very similar methods as a formulation with edge
58
capacity alone [65]. It is noteworthy that because there is an edge between each
pair of neighboring nodes, the flow path could take several bends if necessary.
3.4.4.2 Cost Assignment
The cost assignment should be such that the minimum cost flow formulation
develops an initial infrastructure that distributes the micro-channels with higher
density in areas that demand more cooling. The chip scale thermal analysis would
identify locations of grids in the silicon layers that are in dire need of cooling (see
Figure 3.8(a)). A silicon layer would be cooled by the micro-channels both above
and below (unless the silicon layer is at the very top or very bottom of the stack).
For example, the middle silicon layer in Figure 3.8(a) could be cooled by two microchannel layers unlike the top and bottom silicon layers.
As illustrated in Figure 3.8(b), each micro-channel layer is represented as a grid
graph. The amount of cooling required at a certain node in this graph is a function
of how hot the top and bottom grids in the silicon layers are. It also depends on
how we chose to distribute the cooling demand at a certain location in the silicon
layer between the micro-channel layers just above and just below. Let us suppose a
certain location in the silicon layer has temperature T ≥ Tmax and requires cooling
(estimated by full scale thermal analysis). Let uT (with 0 ≤ u ≤ 1) represent the
fraction of this cooling demand assigned to the micro-channel grid right above and
(1−u)T represent the cooling demand assigned to the micro-channel grid just below.
If u is set very low then most of the cooling will be done by the channel layer below
59
and vice versa for large u. Let uli be the heat load partitioning factor of grid i in
silicon layer l, it is assigned as follows.
Case 1: If l is the topmost (bottommost) layer, then uli = 0(uli = 1) so that all the
cooling demand goes to the micro-channel layer right below (above) l, which is layer
l − 1 (l + 1).
Case 2: If l is neither top nor bottom layer, 0 ≤ uli ≤ 1, implying that the heat
generated in grid i of silicon layer l needs to be distributed in the two micro-channels
layers right above and below. If the channel layers above and below (layers l + 1 and
l − 1) have the same number of TSVs then uli = 1/2, else it is scaled linearly such
that more cooling demand is assigned to the micro-channel layer with lesser TSVs.
Given the partitioning factor uli , the cost is assigned as follows. (See Figure
3.10 for an illustration.) Let cost(i, l) denote the cost for node i in micro-channel
layer l (hence layers l − 1 and l + 1 correspond to silicon layers just below and above
the micro-channel layer l), three cases are considered depending on whether there
is hotspot below and above this node in the silicon layers l − 1 and l + 1.
Case 1: Hotspots on both sides. When the grid i in both silicon layers l − 1 and
l + 1 are in hotspot regions (Til−1 > Tmax and Til+1 > Tmax ), the micro-channel
should provide cooling to both sides (above and below), so the cost is:
l+1
l−1
cost(i, l) = −[(1 − ul+1
+ ul−1
]
i )Ti
i Ti
(3.10)
Here the first component inside the square bracket indicates the cooling demand
from the silicon grid above and the second component corresponds to the cooling
60
Figure 3.10: Cost assignment
demand from the silicon grid just below. Higher demand leads to lower cost since we
would like micro-channels to pass through high cooling demand regions. See Figure
3.10 for an illustration.
Case 2: Hotspot in one side. When the silicon grid i on only one side (l − 1 or
l + 1) is in hotspot region (but not both), the cost is assigned as


l+1

 − (1 − ul+1
, if Til+1 ≥ Tmax
i )Ti
cost(i, l) =


 − ul−1 T l−1 , if T l−1 ≥ Tmax
i
i
i
(3.11)
Case 3: No hotspot in either side. When there is no hotspot in either side, then
the node cost is assigned to a small positive value cost(i, l) = ϵ > 0.
The minimum cost flow formulation would therefore route flows such that
maximum number of high cooling demand grids are touched by the channels. The
non-hotspot regions are assigned a small positive cost. This would enable the minimum cost flow formulation to avoid areas that do not demand high cooling.
61
3.4.5 Micro-channel Refinement
The primary objective of the minimum cost flow formulation is to come up
with an initial micro-channel design that carries cooling in sufficient proximity of hot
areas. This is not enough to guarantee effective cooling. For example, some channels
have several bends and/or may be routed over disproportionately large number of
hotspots. Both of these situations cause a degradation in the overall cooling quality.
In this section we present approaches for iteratively refining the design for improved
cooling effectiveness. The micro-channel infrastructure refinement process works as
illustrated in Figure 3.9.
3.4.5.1 Temperature and Pumping Power Analysis
The impact of micro-channels on the 3D-IC thermal profile is a function of
how the micro-channels are routed and also how much fluid flow they carry. The
initial design generated using minimum cost flow technique does not prescribe the
pressure drop and the fluid flow rate that the channels need to work at. Hence
given the micro-channel design, we then need to estimate the smallest pressure drop
that the pump needs to work at such that thermal constraints are satisfied. Given
the micro-channel design, the smallest pressure drop value results in the smallest
pumping energy. As indicated earlier, we assume that all channels are subjected
to the same pressure drop by the pump, hence the minimum pressure drop can be
determined by linearly increasing pressure drop (∆p) and calculating the thermal
profile for each value until the thermal constraints are met. For a given pressure
62
drop across the pump and a given micro-channel design, Equation 2.21 could be
used to determine the velocity (fluid flow rate) in each channel. Note that because
each channel has different number of bends and total length, the flow rate would
be different too. Based on the flow rate information which is computed for a given
pressure drop, the associated thermal conductance matrix G could be computed.
This information could be used to estimate the thermal profile of the 3D-IC for
a given pressure drop. After finding the minimum required pressure drop (∆p),
we could calculate the required pumping power. This technique is highlighted in
Algorithm 2.
Algorithm 2 Finding the minimum required pumping power
1. ∆p = ∆pmin , and repeat steps 2-6:
2. Calculate the fluid velocity using Equations 2.21;
3. Calculate thermal conductance matrix G;
4. Estimate temperature profile;
5. If thermal violation occurs, ∆p = ∆p + δp;
6. Else break;
7. Calculate pumping power.
3.4.5.2 Iterative Micro-channel Optimization
The objective of minimum cost flow formulation did not capture cooling energy
and/or number of bends in the channels. Figure 3.11 illustrates typical situations
that can occur. In Figure 3.11, the two micro-channels have significantly different cooling demands (Figure 3.11(a)) and number of bends (Figure 3.11(b)). Such
imbalance (in cooling demand and bend count) leads to increase in the required
pressure drop and thereby increasing the pumping energy. The basic idea is that all
the channels should have similar levels of heat load, length and number of bends.
63
(a)
(b)
Figure 3.11: Examples of (a) unbalanced cooling demand, (b) different number of bends
Hence if a channel has too many bends or goes through many hotspots while others
are shorter, then other channels could be made longer thereby more uniformly distributing the heat load and also reducing the number of bends in the most critical
micro-channel.
Based on these considerations, we try to refine the initial design by 1) balancing the heat loads among micro-channels and 2)reducing unnecessary bends.
Micro-channel heat load balancing:
Starting from the initial design we identify the micro-channels which have disproportionately high heat removal load and spread their heat load into neighboring
channels.
Algorithm 3 highlights the iterative pairwise micro-channel cooling load balance process. In the first iteration of pairwise micro-channel cooling workload balance, we start from the channel with the highest cooling workload. Here the cooling
workload is measured by the total heat absorbed by the micro-channel, which could
be calculated using P = (Tout − Tin )/Rio . Here Tin is the fluid supply temperature
at micro-channel inlet, and Tout is the fluid temperature at micro-channel outlet,
Rio is the total thermal resistance between the fluid inlet and outlet of that spe64
cific channel. Given the pressure drop, power profile of the 3D-IC and the location
and dimensions of the micro-channels, these parameters could be easily calculated
(see discussion in Sections 2.3 and 2.4, as well as reference [76]). Assuming i is the
channel with the highest cooling workload, we then pick one of i’s neighbors (either
left or right) with lower cooling workload, say channel k, and balance the workload
between channels i and k.
Algorithm 3 Pairwise micro-channel cooling load balance
Repeat:
1. Pick the micro-channel with highest cooling load i;
2. Pick a micro-channel k from i’s neighbor with smaller cooling load, that is, k =
argmink∈{i−1,i+1} (load(k));
3. Equally divide the hotspot region covered by channels i and k, and assign one of the
region to channel i, the other to channel k;
4. Remove some edges on the boundary between these two regions from the grid graph;
5. Resolve the minimum cost flow based on new graph;
6. Temperature analysis and calculating minimum required pumping power using Algorithm 2;
7. If no further pumping power saving could be achieved, stop.
To balance the workload of channels i and k, we firstly partition the hotspot
regions covered by channels i and k. This region is bounded by channels min(i, k)−1
and max(i, k) + 1. For instance, as shown in Figure 3.12 in which we would like to
balance the workload between channels 2 and 3. Then, the hotspot region covered
by channels 2 and 3 is bounded by channels 1 and 4 (region identified by dotted
line in Figure 3.12). To equally partition this region, basically, we would like the
resultant two parts have similar total amount of heat load (cooling demand). As
indicated earlier, the cost of a node i at the l-th micro-channel layer signifies the
degree of cooling desired there. The total cooling needed in the region covered by
channels i and k is simply the sum total of the cost in all the associated grids. We
65
would like each channel to be assigned about half of this total cooling load in that
region. Hence we would like to partition this region into two subregions with the
same total cooling load.
Starting from the top left grid of the region covered by i and k, we traverse
the grid network in a row major form (left to right and then bottom). As soon as
we have collected grids whose sum total of cooling load is 1/2 of that of the region,
we stop. The boundary between these two subregions is defined in this fashion. A
row major form of traversal ensures that each channel will be somewhat uniformly
loaded with heat from a spatial perspective. Now one region is assigned to i and
the other is assigned to k. In order to find the exact route of the micro-channels we
can remove the edges connecting the two regions and solve the minimum cost flow
formulation once again (see Figure 3.12). This would ensure that channels i and k
do not encroach on each others regions. In the case where the minimum cost flow
could not return feasible solution due to the removal of too many edges, we will add
some removed edges back until a feasible solution is returned.
Figure 3.12: Example of pairwise cooling workload balance
The minimum cost flow gives a refined micro-channel structure design. We
then redo the temperature analysis and find the minimum pumping power for the
new design using algorithm 2.
66
In the next iteration of optimization, we find the currently highest workload
micro-channel in the new design and do pairwise load balance on this channel using
the new graph updated in the previous iteration. We repeat this process iteratively
until no further pumping power saving could be achieved.
Bend Elimination
As shown in section 2.4.3.2, the corners/bends in the micro-channel will introduce considerable pressure drop, which increases the pumping power. Bends in
micro-channels allow us to reach areas which cannot be directly connected due to the
presence of TSV obstacles. But unnecessary bends which have been incorporated
due to the heuristic nature of our algorithm provide little benefit while impacting
the cooling quality. As a final refinement step we develop a pattern matching based
scheme for removing unnecessary and redundant bends on the channel networks.
We firstly generate a library of the patterns of unnecessary corners and use
pattern match to find those unnecessary corners in our design. Then, we replace
those corner patterns with some equivalent patterns with lesser corners. Figure
3.13 highlights a few patterns and their replacement patterns. This step should be
performed in a judicious fashion. Removing corners in the hotspot region might
lead to reduction in the micro-channel cooling performance since it reduces the level
of coverage. Hence we only remove those corners in the non-hotspot regions which
can easily be identified by the thermal analysis. The algorithms used for pattern
matching are similar to those used in technology mapping. The exact details of how
pattern matching is done has been omitted here.
67
Figure 3.13: Examples of bend elimination
3.5 Hybrid Cooling Network
3.5.1 Motivation of Hybrid Cooling Network
Besides micro-channels, TSVs are also considered as an alternative solution
for cooling of 3D-ICs. TSVs are usually made of copper which has better thermal
conductivity than silicon or metal-oxide, and hence enable better vertical heat conduction between different layers. When the number of signal TSVs is not enough,
dummy thermal TSVs are inserted to further mitigate the thermal issues.
Both micro-channels and thermal TSVs have advantages and drawbacks in
performing 3D-IC cooling.
Micro-channel liquid cooling: The cooling effectiveness of micro-channel
is quite high and they have been reported to support heat densities as high as
700W/cm2 [84]. However as illustrated earlier, the drawback of micro-channel based
heat removal technology is that the cooling system consumes extra energy for pumping the coolant through channels. On the other hand, the presence of TSVs that
connect signals and power between layers constraints the locations where channels
could be placed, since micro-channels cannot be placed in the locations where these
TSVs are allocated (as shown in Figure 3.1). This constraint limits the heat removal
68
capability of micro-channels.
Thermal TSV: The thermal TSVs help alleviate the 3D-IC thermal issues
by establishing heat transfer paths from heat source to heat sink using high thermal conductivity materials, so that heat can be more effectively absorbed by heat
sinks. It also moves heat from hot to cool areas (without consuming extra cooling
power) to balance the heat between layers and make the thermal profile more uniform. However, thermal TSVs only help redistribute heat instead of removing heat.
Moreover, since the TSVs can only be placed in the whitespace between the layout,
the number and locations of thermal TSVs are limited by the chip floorplan. As a
result, their cooling capability is limited. Also, large number of TSVs will increase
the fabrication cost, degrade the yield of chips and exacerbate the thermal stress
problem in 3D-IC.
Based on these considerations, in this section, we propose a hybrid 3D-IC
cooling scheme: a cooling network which uses micro-channel based liquid cooling
together with thermal TSVs [75]. In this hybrid cooling network, micro-channels and
thermal TSVs work in a mutually complementary way. Thermal TSVs redistribute
heat and establish heat dissipation paths that deliver heat to micro-channels, and
the heat is then removed by micro-channels. This hybrid cooling scheme would
provide sufficient level of cooling to the 3D-IC using fewer cooling power and thermal
TSVs. To extract maximum cooling effectiveness, we would like to co-optimize the
allocation of micro-channels and thermal TSVs.
69
3.5.2 Algorithm for Hybrid Cooling Network Design
Our algorithm for micro-channel and thermal TSV co-optimization is based
on iterative improvement. The overall iterative design flow is similar as the algorithm in Section 3.3. But instead of iteratively removing micro-channels, we use a
constructive approach. That is, we start from the 3D-IC structure without any
micro-channel or thermal TSV, and iteratively add micro-channels and size thermal TSVs until they could provide sufficient cooling. The overall constructive design
approach is illustrated in Algorithm 4 and Figure 3.14.
Algorithm 4 Heuristic for micro-channel and thermal TSV co-optimization
Starting from the 3D-IC structure without micro-channels or thermal TSVs:
1. Assuming we are given a set of potential micro-channel locations, initialize the priority
level of each potential micro-channel;
2. Repeat until thermal constraint is satisfied:
3. Add a micro-channel with highest priority;
4. Decide the locations and sizes of thermal TSVs;
5. Set up thermal resistive network, estimate thermal profile;
6. If thermal constraint is satisfied, stop;
7. Else update priority of un-added channels and go to step 2.
The algorithm starts by finding a set of potential locations for micro-channels.
We use the algorithm proposed in Section 3.3.3 to find the set of potential microchannel locations.
Based on the potential micro-channel locations, we assign a priority for each
potential micro-channel. The priority is associated with the significance of the microchannel in removing heat. In each iteration, we firstly add a micro-channel with the
highest priority (that is, the most important micro-channel). Then we insert or
size thermal TSVs based on the current micro-channel allocation. After the microchannel and thermal TSV placement in each iteration, we check if the current cooling
70
Figure 3.14: Overall design flow of micro-channel and thermal TSV co-optimization
system design could provide enough cooling to the 3D-IC. If not, we will continue
to add more micro-channels and resize thermal TSVs. Once we have added a microchannel, we need to update the priority of the remaining un-added micro-channels
before adding another micro-channel. We repeat this iterative process until thermal
constraint is satisfied.
The success of this approach depends on how micro-channel priority is assigned
and how thermal TSVs are allocated and sized. The next three subsections explain
them in detail.
3.5.3 Micro-channel Priority Assignment/Update
The micro-channel priority assignment and update is similar as the microchannel cost assignment/update approach presented in Section 3.3.4, with slight
modifications on the updating formulation as Equation 3.12 shows.
71
Wi,j = Wi,j − w|i0 −i| · Wi0 ,j




Wi,j±2 = Wi,j±2 − u3 · w|i0 −i| · Wi0 ,j



∀i s.t. |i0 − i| ≤ bmax
(3.12)
Basically, when we add a micro-channel, this micro-channel absorbs heat from
the regions surrounding it, so the cooling workload of its potential neighboring
micro-channels would reduce. Hence the priority of the potential neighboring microchannels should decrease as Equation 3.12 shows.
3.5.4 Thermal TSV Allocation and Sizing
After inserting a micro-channel in each iteration, we place thermal TSVs in
the remaining available area to further reduce the chip temperature. For thermal
TSV allocation and sizing, we use the basic idea of iterative thermal conductivity
updating proposed in [24], but improve it for better rate of convergence.
3.5.4.1 Basic Thermal TSV Placement Approach
In the approach proposed in [24], the 3D-IC is divided into fine grids. It
finds the distribution of thermal TSVs by calculating the desired vertical thermal
conductivity of each grid that could eliminate or mitigate thermal problem. Their
approach is based on iterative improvement. To update the thermal conductivity in
each iteration, the vertical thermal gradient qz between two vertically neighboring
grids is calculated, and the vertical thermal conductivity kz in each grid is updated
using the following equation:
72
kznew =
qzold old
k
qznew z
(3.13)
where qzold is the current vertical thermal gradient, and the new thermal gradient
qznew (which is the desired thermal gradient after this iteration) is chosen as some
value closer to the ideal thermal gradient qideal than qzold :
|qznew | = qideal (
|qzold | θ
)
qideal
(3.14)
Here θ is a user defined parameter between 0 and 1, which is used to control the rate
of convergence. In each iteration, the thermal conductivity of all grids is updated
simultaneously. Once the algorithm converges, they calculate the number/size of
thermal TSVs in each grid that could result in the desired thermal conductivity
using Equation 2.9.
Adding a thermal TSV will change the thermal conductivity matrix G (given
in Section 2.3.1) and hence change the thermal gradient qz across the chip. So basically every time we have placed or sized a thermal TSV, we need to recompute the
thermal profile and get the updated thermal gradient qz before updating the thermal
conductivity of the next grid. Nevertheless, in [24], the thermal conductivities of all
grids are updated simultaneously in each iteration. In order to simultaneously update the thermal conductivity of all grids without recalculating the thermal profile,
the parameter θ should be close to 1 so that the change in thermal conductivity in
each step is very small and therefore has little influence on the thermal gradient of
other grids. However using such a θ value leads to slower convergence rate.
73
3.5.4.2 Modified Thermal TSV Allocation and Sizing Approach
In our modified thermal TSV planning approach, we still use the basic iterative
updating framework given in [24]. However, as explained earlier, the approach
proposed in [24] needs to use a large θ which indicates slower rate of convergence.
So in our modified approach, instead of modifying the thermal conductivity in all
grids in each iteration, we only update a subset of the grids E. The grids in this
subset E should satisfy the following two conditions: a) all the grids in this set
have very small interdependence with each other, and b) they have large influence
on the hotspot regions. The first condition ensures that only those grids that are
independent of each other are updated. So when we change the thermal conductivity
of a grid in set E, the thermal gradient of other grids in this set almost does not
change. Hence we could simultaneously update all the grids in this set using a
small θ which indicates faster rate of convergence. The second condition ensures
that we focus on updating those grids that are most likely to reduce the hotspot
temperature. This could help us to reduce the number and size of thermal TSVs
used. We call this subset “maximum independent set E”.
The success of this approach depends on how many independent grids we
could find and simultaneously update without recomputing the thermal profile in
each iteration. The micro-channel heat sinks basically behave as heat isolators
(since they carry heat away) and therefore reduce the interdependence between
grids. Hence the existence of micro-channels leads to more independent grids that
can be updated simultaneously.
74
Based on these two conditions, our modified thermal TSV placement and sizing
algorithm works as follows:
Algorithm 5 Algorithm of thermal TSV placement and sizing
1. Estimate interdependency of each pair of grids;
2. Repeat steps 3-6 until the stop condition is satisfied:
3. Assign a weight to each grid according to its interdependency with hotspot grids;
4. Find the maximum independent set E;
5.
Update the thermal conductance of the grids in set E using the approach given in
Section 3.5.4.1;
6. Update thermal gradient and grid interdependency, go to step 2.
7. Calculate thermal TSV size/density in each grid based on the achieved thermal conductivity.
In the next subsection, we explain how to find the maximum independent set
E in detail.
3.5.4.3 Finding Maximum Independent Set E
For a given 3D-IC structure, to estimate the interdependency between grids, we
firstly calculate the inverse of thermal conductance matrix G. This inverse matrix
H = G−1 satisfies T = H·Q. Here H(i, j) basically indicates how much temperature
increase in grid i is caused by the power dissipation in grid j. If H(i, j) > 0, when
the thermal conductivity at grid j changes, it will affect the temperature at grid i.
The interdependency of each pair of grids depends on how many power sources
they share. Here we use interdependency matrix IN T to indicate the interdependency between each pair of grids. The interdependency matrix is defined as:



 1,
IN T (i, j) = IN T (j, i) =


 0,
75
if H(i) · H T (j) > ζ
(3.15)
otherwise
Here IN T is symmetric. IN T (i, j) indicates whether grids i and j are interdependent (1 indicate the two grids are interdependent and 0 indicate they are independent). IN T (i, j) is decided by the correlation between grids i and j which is
measured by H(i) · H T (j) (H(i) represents the i-th row of matrix H). When the
correlation is very small (less than ζ), we assume the two grids are independent
and set IN T (i, j) to 0, otherwise, we set it to 1 which indicates the two grids are
dependent.
Once we get the interdependency matrix, we would like to find the set of grids
that are: a) independent of each other and b) have maximum dependency with
hotspot grids. To achieve this, we assign a weight to each grid which indicates its
interdependency with all hotspot grids, and then find the set of independent grids
with the maximum total weights.
Grid weight assignment
The weight of each grid ci which represents its interdependency with hotspot
regions is assigned as follows:
⃗ T , for each grid i
ci = IN T (i) · E
(3.16)
⃗ = {Ej , ∀grid j}
where IN T (i) is the i-th row of interdependency matrix IN T , and E
is a vector indicating whether each grid is a hotspot:
76



 1, if Tj > Tmax where Tj is temperature of grid j
E(j) =
(3.17)


 0, otherwise
Here Tmax is the thermal constraint.
If the weight of a grid is high, this basically means that the grid has higher
interdependency with hotspots. We would like to focus on updating those grids that
have higher interdependency with hotspots since inserting thermal TSVs in these
grids can better reduce the hotspot temperature.
Finding independent grids with maximum total weight
Given the weight of each grid and the interdependence between them, we
would like to find the set of grids which are independent and have the maximum
total weight. This problem is equivalent to weighted clique problem which is NP
complete. Many existing works have proposed heuristics to find a good solution.
Here we use the adaptive, randomized greedy approach in [31].
Once we get this maximum independent set E, we simultaneously update the
thermal conductivity of the grids in this set using the approach illustrated in Section
3.5.4.1. Since the grids in this set have very small interdependence with each other,
we can use a θ close to 0 thereby achieving faster convergence rate. Moreover, because we only update grids that are highly interdependent with hotspots, we could
use fewer thermal TSVs.
Updating interdependence matrix
77
The change in thermal TSVs will change the thermal resistive network thereby
changing the grid interdependency. So after we updated the thermal TSV in each
iteration, we should update the interdependence matrix based on the new thermal
resistive network. A simple approach is to regenerate the thermal conductance
matrix G and then recalculate its inverse matrix H as well as the interdependency
matrix IN T after every iteration. However, the problem is calculating the inverse
matrix H is time consuming. To save time for computing interdependence matrix,
we only calculate (initialize) matrices H and IN T once at the beginning of the
algorithm before allocating or sizing any thermal TSV, and every time we updated
thermal TSV, we only update some elements of matrix IN T instead of re-calculating
the whole matrix.
By exploring the interdependency matrix, we found that, the interdependency
between two grids largely depends on the distance between them. We define an
interdependence region for each grid, which includes all the grids that are interdependent with this grid. We found that, each grid usually has higher interdependence
with the grids close to it and smaller or no interdependence with those grids far away.
So the interdependence region of a grid is usually a region surrounding that grid as
Figure 3.15 shows.
As we have added or enlarged a thermal TSV, the interdependency between
grids would generally increase because the thermal conductivity increased. So the
interdependence region of each grid is enlarged (as Figure 3.15(a) shows). On the
other hand, as we reduce the size of a thermal TSV, the interdependency between
grids reduces and the interdependence region of each grid shrinks (as Figure 3.15(b)
78
Figure 3.15: Change in interdependence region of a grid (a) after allocating or enlarging a thermal
TSV, (b) after shrinking a thermal TSV
shows). Usually the change in a thermal TSV only affects the interdependence
regions of the grids close to this TSV. The level by which we enlarge/shrink the
interdependence region of each grid depends on the distance between this grid and
the newly allocated/sized thermal TSV, and also depends on the amount by which
we have sized the thermal TSV.
Once we updated the interdependence region of the grids, we can modify the
interdependence matrix IN T based on the new interdependence region of each grid.
Stop condition: We keep updating the thermal conductivities iteratively until one
of the following situations occurs:
a) Thermal constraint is satisfied. In this case, no more thermal TSV is needed.
b) The thermal TSV capacity is reached. In this case, no more thermal TSV
could be added.
c) Peak temperature cannot be further reduced. In this case, the algorithm
converges, so adding more thermal TSV will not help reducing the chip temperature.
After the thermal TSV allocation/sizing, we perform thermal analysis and
79
check if the resultant micro-channel and thermal TSV allocation could provide
enough cooling to the 3D-IC. If the resultant maximum temperature is within the
thermal constraint, then the current design is our final design. Otherwise, we will
continue to add micro-channels and size thermal TSVs until thermal constraint is
satisfied.
3.6 Considering Thermal Variations
The previous approaches in Sections 3.3-3.5 assume that the power profile is
fixed and known, and design the cooling structure based on the given power profile.
In reality CPU power profiles are strong function of the application and vary based
on the workload the CPU is experiencing at a given time. We address this problem
by using multiple training power profiles. Given a set of training power profiles (that
represent different classes of applications and workload levels), we would design the
cooling structure (non-uniform, bended micro-channel or hybrid cooling system)
which provides enough cooling to all the power profiles using minimum amount of
pumping power. Conventionally such approaches are addressed by choosing the
profile with the highest total dissipated power (TDP) and designing the cooling
system based on it. But such approach fails to account for the fact that a power
profile with a smaller TDP might end up with thermal violations due to the nature
of its hotspots even if the profile with higher TDP does not. The advantage of using
multiple training power profiles is that the resultant micro-channel network could
adapt to various power profiles.
80
Figure 3.16: Flow chart of micro-channel placement
Our approach that accounts for multiple power profiles is illustrated in Figure
4.3. We start with the power profile with the highest TDP and design the cooling
structure for this power profile using the heuristics given in Sections 3.3-3.5. We call
this a pilot power profile. Then we test if all the power profiles meet the thermal
constraint. If a set of power profiles violate the thermal constraint, then the pilot
power profile is refined using Algorithm 6 and the cooling structure is re-designed
based on the new pilot power profile.
Algorithm 6 Pilot power profile refinement
Assuming temperature constraint violation occurs in power profiles P⃗1 , P⃗2 , ..., P⃗M ;
1. For m = 1 to M
2.
Increase power density of pilot power profile in the region where thermal violations
occurs in power profile P⃗ m .
The refining process in step 2 of Algorithm 6 is basically increasing power
density of pilot power profile in the regions where thermal violation occurs in the
other power profiles. This would enable the micro-channel placement heuristic to
allocate more cooling (either micro-channels or thermal TSVs) in that region. For
example, if the violation occurs at grid (i, j, k) of power profile P⃗ m , we increase the
power consumption at grid (i, j, k) and all grids surrounding (i, j, k) in the pilot
power profile. The level of increase depends on the degree of thermal violation and
81
the distance from (i, j, k). The performance of this heuristic depends on the range in
which we choose to increase the power in the pilot profile. If this range is large, the
algorithm will converge faster but might have more channels and therefore higher
pumping power.
3.7 Cooling Performance of Micro-channel Designs
Now we compare the cooling effectiveness of the three micro-channel designs.
In our experiment, we use a three-tier stacked 3D structure. In the 3D-IC, three
active layers are vertically stacked and the micro-channel layers are below each active
layer. There is also an air-cooled heat sink at the top of 3D-IC. We use the ITC’99
circuits, which are typical synthesized circuits consisting of AND, OR, NOT, NAND
and NOR gates, to generate the 3D-IC benchmarks [4]. Each 3D-IC layer contains
several arbitrarily chosen ITC’99 circuits. We use the Capo placer to place the gates
in each layer [1]. To obtain the power profiles for each layer, we randomly assign a
switching activity factor (between [0, 1]) for each gate and use the power models in
[47][86] to estimate the power consumption. Based on the placement information,
we also find the whitespace between layout, and randomly allocate 1000 signal TSVs
in the whitespace. This forms our testing benchmarks.
The chip dimension is W = L = 9mm. We setup the resistive network by
using the hotspot like model in three dimension [77]. The micro-channel width
and height is ∆x = 100µm and ∆z = 200µm, and the diameter of TSV is 10µm.
The overall thermal resistance of the heat sink for air cooling is 0.5℃/W. The inlet
82
coolant temperature is 10℃ and the maximum temperature constraint Tmax is 85℃.
We compare the pumping power of the three micro-channel designs proposed in
this chapter. The comparison is given in Table 3.1 and Figure 3.17. For all the power
profiles, the air cooling cannot provide sufficient cooling to reduce the chip temperature below thermal constraint. Here, All channels design indicates the conventional
micro-channel design that spreads straight micro-channels all over the interlayer
regions, and save indicate the pumping power saving of each design over the All
channels design. As we can see from the table, the Non-uniform micro-channel design saves about 57% pumping power compared with the All channels design. Using
bended micro-channel could save another 11% pumping power. Among these three
approaches, the Hybrid cooling network saves most pumping power (78% pumping
power savings compared with the conventional All channels design).
14
Ppump(W)
12
10
All channels
Non−uniform
Bended
Hybrid
8
6
4
2
0
250
300
350
400
450
500
550
Pchip(W)
Figure 3.17: Comparison of Pumping Power
3.8 Runtime Thermal Management Using Micro-channels
Recently, the micro-fluidic cooling has also been adopted in dynamic thermal
management (DTM) to control the runtime CPU performance and chip temperature
83
Table 3.1: Comparison of pumping power
All channels
Non-uniform
Bended
Hybrid
Pchip
N
Ppump
N Ppump save N Ppump save N Ppump save
273.6
90
7.6
9
0.7
91% 8
0.6
92% 2
0.1
99%
305.7
90
7.9
16
1.3
84% 14
1.2
85% 6
0.5
94%
331.2
90
8.1
25
2.2
73% 20
1.7
79% 7
0.6
93%
362.7
90
8.2
27
2.4
71% 22
2.0
76% 12
1.0
88%
381.9
90
8.3
30
2.7
67% 24
2.2
73% 14
1.2
86%
413.5
90
8.3
39
3.6
57% 26
2.4
71% 17
1.5
82%
438.1
90
8.4
39
3.7
56% 28
2.5
70% 20
1.9
77%
462.9
90
8.4
51
4.7
44% 36
3.3
61% 23
2.1
75%
498.7
90
8.5
60
5.6
34% 40
3.7
56% 29
2.7
68%
517.3
90
8.5
62
5.9
31% 47
4.4
48% 43
4.0
53%
544.1
90
8.5
62
5.9
31% 56
5.2
39% 45
4.2
51%
Average 90
8.2
38
3.5
57% 29
2.6
68% 20
1.8
78%
by tuning the fluid flow rate through micro-channels [19][18][61].
In this section, we investigate a micro-channel based DTM scheme that could
provide sufficient cooling to the 3D-IC using minimal amount of cooling energy [71].
In this DTM scheme, assuming the micro-channel structure has already been decided using either of the aforementioned structures (Sections 3.3-3.5), it dynamically
controls the pressure drop across the micro-channels based on the runtime cooling
demand. Now we explain our micro-channel based DTM scheme in detail.
3.8.1 Algorithm for Micro-fluidic Based DTM
The temperature profiles on chip is a strong function of the power dissipated,
while the power dissipation depends on the applications which change at runtime.
84
In order to track the runtime thermal and power state, thermal sensors are placed
at various chip locations. Our micro-channel based DTM keeps track of power
profiles at runtime using the information achieved by thermal sensors and adaptive
Kalman filter based estimation approach (proposed in [96]), and then decides the
micro-channel pressure drop based on it.
To estimate the power profile, [96] assumes there are M different power states
(power profiles), each of which essentially represents a certain class of applications.
The Kalman filter holds a belief of what the current power profile is and predicts
the temperature profile based on this belief. Meanwhile, the thermal sensors keep
measuring the temperature. The power estimation method in [96] iteratively compares the temperature predicted by Kalman filter and sensor observations. If the
error between them is close to zero, this indicates that the belief of current power
state is correct. Otherwise, the belief might be wrong, which means the power state
has changed. Once the change in power state is detected, it tries to decide the new
power state, which is the one most likely to result in the current sensor reading. Interested readers are referred to [96] for the details of this adaptive power estimation
approach.
Once the power profile is obtained, we select the best pressure drop which provides enough cooling for this power profile using minimum pumping power. Hence
the micro-channel based DTM problem is formally stated as follows.
Given: a 3D-IC design, its power distribution is a function of the architecture
and application. Assuming the power profiles are given (or estimated using appropriate sensors) and the micro-channel structure is also fixed, we would like to find
85
the pressure drop for each power profile such that the temperature across the chip
is within acceptable limits while minimizing pumping power:
min Ppump (∆p)
⇔
min ∆p
s.t. G(∆p) · T⃗ = P⃗
(3.18)
T⃗ ≤ Tmax
∆pmin ≤ ∆p ≤ ∆pmax
The objective minimizes the pumping power used by micro-channels. When the
regular straight micro-channels are used, the pumping power can be calculated using
Equation 2.16. If bended micro-channels are used, the pumping power is calculated
using Equations 2.14 and 2.21.
The first constraint indicates the resistive thermal model, where P⃗ is a 3DIC power profile and T⃗ is the corresponding thermal profile, and G is the thermal
conductivity matrix which depends on the pressure drop ∆p. The second constraint
indicates that the peak temperature should not exceed the thermal constraint Tmax .
The last constraint gives the feasible range of pressure drop.
This optimization problem is difficult to solve directly because of the complexity of thermal model and the impact of micro-channel on temperature. Therefore
we use a linear search based approach to find the best pressure drop. Assume the
micro-channel structure is already decided, therefore the pumping power Ppump is
only a function of pressure drop in this problem. It can be proved that the pumping
power for both straight and bended micro-channels is monotonic increasing func-
86
tion of the pressure drop ∆p. Hence minimizing pressure drop basically minimizes
pumping power, and the problem is simplified to finding the minimum pressure drop
that provides enough cooling.
The pressure drop ∆p influences the heat resistance Rheat , thereby changing
the cooling performance. Increase in pressure drop results in increased fluid velocity
v and flow rate f , while higher flow rate results in smaller heat resistance Rheat and
hence better cooling performance.
In summary, a larger pressure drop would result in better cooling at the cost
of higher pumping power. Hence cooling effectiveness is a monotonic function of
pressure drop. Therefore the linear search approach can find the best pressure drop.
Specifically, this is done by starting from the minimum pressure drop ∆p = ∆pmin
and increasing it step by step until thermal constraint is satisfied. Due to the monotonic nature of the impact of pressure drop on micro-channel cooling effectiveness,
this linear search approach can result in the optimal selection of pressure drop for
a given micro-channel configuration.
3.8.2 Performance of Micro-channel Based DTM
We then implemented the runtime thermal management by micro-channel
pressure drop control. Here we assume the underlying micro-channel design is the
non-uniform straight micro-channel configuration proposed in Section 3.3. We use
the same 3D structure as Section 3.7 and tested three groups of benchmarks with
different power profiles. In the first group (group L), we generate 6 different 3D-IC
87
power profiles whose total dissipated power (TDP) ranges from 220 − 320W. Based
on the non-uniform micro-channel design, we select the best pressure drop for each
power profile and calculate the associated pumping power. The second (group M )
and third (group H ) groups are generated in a similar way, but with higher total
dissipated power.
Figure 3.18 shows the required pumping power for each group of benchmarks
using runtime DTM and fixed pressure drop approach. In fixed pressure drop approach, we use the lowest pressure drop that could provide enough cooling to all
benchmarks in this group. Pchip is the TDP of each benchmark. The runtime pressure drop controlling approach achieves an average of 39%, 43% and 46% pumping
power saving for benchmark groups L, M and H. The pressure drop calculation can
be done off line and stored in a table. Once we detect a specific power profile occurs,
we simply look up the best pressure drop for this power profile.
3.9 Summary
This chapter investigated the optimized micro-fluidic cooling configurations.
The first configuration (hotspot-optimized non-uniform micro-channel design) allocates micro-channels only in hotspot regions so that less channels are used, thereby
saving pumping power. In this configuration, straight micro-channels are used. The
straight micro-channels are easy to manufacture and more power efficient compared
with bended micro-channels of the same length. However, straight micro-channels
are inefficient in addressing the spatial constraints imposed by TSVs. Hence in the
88
1
Ppump(W)
0.8
0.6
0.4
fixed pressure
dynamic pressure
0.2
220
240
260
280
300
320
Pchip(W)
(a)
3
Ppump(W)
2.5
2
1.5
1
0.5
250
fixed pressure
dynamic pressure
300
350
400
450
Pchip(W)
(b)
10
Ppump(W)
8
6
4
2
0
400
fixed pressure
dynamic pressure
450
500
550
Pchip(W)
(c)
Figure 3.18: Runtime pressure drop control versus fixed pressure drop for (a) group L, (b) group
M, (c) group H
89
second configuration, we proposed the usage of bended micro-channel, which can be
flexibly routed to hotspot regions while avoiding TSVs. In order to further reduce the
pumping power overhead, we also proposed a hybrid cooling network which utilizes
dummy thermal TSVs (that reinforce vertical heat transfer) and micro-channels together. Compared with the conventional micro-channel design that spreads straight
micro-channels all over the interlayer region, the optimized configurations can result
in 57%, 68% and 78% pumping power savings respectively. In these designs, microchannel structures are designed after the electrical part of the chip, hence they are
compatible with the standard IC design flow.
We also proposed a micro-channel based dynamic thermal management method
that controls the pressure drop at runtime to allow real time thermal control.
Through runtime pressure drop tuning, we can further save about 43% pumping
power compared with using fixed pressure drop.
However, as illustrated in Section 1.4, the electrical, thermal, reliability and
cooling aspects are all interdependent. Hence, separating the design of electrical
and cooling system will lead to sub-optimal designs. In the next chapter, we will
investigate the electrical and cooling system co-design to achieve further powerperformance improvement.
90
Chapter 4
Co-design of Electrical and Fluidic Cooling Systems
4.1 Motivation for Co-Design
In the conventional chip design flow, cooling considerations are put in place
after the entire system has been designed (as Figure 4.1 shows). Such a postfix
approach can lead to sub-optimality, such as significant pumping power, competition
with TSVs, thickening of silicon substrate and impact on reliability.
As illustrated in Section 1.4, the electrical, thermal, reliability and cooling
aspects are all interdependent. It is important to investigate the interplay between
electrical and fluidic aspects, and develop avenues for co-design. Such co-design can
result in the following advantages:
1. Higher cooling in timing critical areas results in better performing designs
since transistor delay is proportional to temperature.
2. Higher cooling in timing critical areas enables us to aggressively pursue high
power dissipating performance enhancements such as increasing supply voltage. This results in higher performance without impacting temperature since
the extra heat can be manager by micro-fluidics.
3. The design optimization could be more aggressive since temperature issue can
be addressed by aggressive cooling (placement, floorplanning etc.)
91
Figure 4.1: Conventional chip design flow
4. Increasing the cooling levels in high leakage areas helps reduce the overall
power since leakage is a highly non-linear function of temperature. Reduction in leakage may be significant enough to make increase in pumping power
irrelevant.
5. Micro-fluidics may impact silicon thickness causing TSV performance degradation. By smart electrical design, this degradation could potentially be removed. For example, degradation in TSV performance could be overcome by
stronger drivers.
In this chapter, we investigate two electrical and cooling co-design problems.
Section 4.2 investigates the TSV allocation/assignment and micro-channel placement co-design [70], and in Section 4.3, a gate sizing and micro-fluidic co-design
problem is investigated [71].
92
4.2 Co-optimization of TSV Assignment and Micro-Channel Placement
In 3D-ICs, the interlayer nets use TSVs to deliver signals and power among
different layers. Recently, significant attention has been made to the problem of
allocating interlayer nets to TSVs that allow their successful routing. Existing work
mostly tries to address this problem with the objective of minimizing total wirelength. Two general approaches have been investigated: Post-Placement [48][90][95]
and In-Placement [36]. In Post-Placement approaches, cells are firstly placed in the
3D-IC. This determines the whitespace distribution capable of supporting TSVs.
These potential TSV locations are then allocated to the interlayer nets such that
the total wirelength is minimized [48][90][95]. In-Placement approaches perform simultaneous optimization of cell placement, TSV placement and interlayer net to
TSV assignment during the 3D-IC placement process itself. While both approaches
have their advantages, in our work, we assume the placement to be already done
before TSV assignment to the interlayer nets (Post-Placement paradigm), though
our work could also be extended to the In-Placement approach.
Conventional Post-Placement approaches for interlayer net to TSV assignment
do not consider the possibility of adding micro-channels in the interlayer regions.
TSVs impose significant constraints on how and where the micro-channels can be
located, and form obstacles to the micro-channel placement since the micro-channels
cannot be placed at the locations of TSVs. The location of TSVs is essentially
decided by the allocation of interlayer nets to TSVs. The exiting works for Post93
Figure 4.2: Thermal profile of one 3D-IC layer, and an example of TSV and micro-channel
allocation where TSVs constraint us from allocating micro-channels at hotspots
Placement TSV allocation (which ignore the possibility of allocating channels) and
micro-channel placement as proposed in the previous chapter (which assume the
TSV locations to be fixed) do not consider the possibility of combining these steps
for obtaining better results.
Two trivial approaches for allocating TSVs to nets and micro-channels to
interlayer regions together can be conceived as follows: TSV first approach and
Micro-channel first approach. If micro-channels are allocated before TSVs, there
is a possibility of increase in wirelength since the available whitespace for TSVs
shrinks due to the existence of micro-channels which deter allocation of TSVs in
those areas. A TSV first approach also has disadvantages. For instance, if TSVs are
placed at or near hotspot regions which preventing the allocation of micro-channels
at that hotspot, the cooling effectiveness of micro-channels will suffer.
In this section, we investigate co-optimization of TSV assignment and microchannel allocation simultaneously such that the total wirelength is minimized, and
maximizing the micro-channel cooling effectiveness [70]. As stated earlier, we assume
a Post-Placement paradigm.
94
4.2.1 Problem Formulation
The problem is stated in Table 4.1. The objective minimizes a combination
of the cooling power required by micro-channels and the total wirelength used by
all interlayer nets. It is noteworthy that an interlayer net is allocated to a set of
TSVs since several TSVs spanning multiple layers may be needed to connect the
source-destination pairs. The co-optimization of micro-channel allocation and TSV
assignment is complex due to its discrete nature and the complexity of thermal
estimation. Hence, we focus on developing effective heuristics that exploit specific
mathematical properties present in this problem.
4.2.2 Algorithm for TSV Assignment and Micro-channel Placement
Co-optimization
4.2.2.1 Overall Design Flow
The overall design flow is shown in Figure 4.3. We use multi-commodity mincost flow to formulate/solve some critical aspects of the problem, hence we call this
approach MCMCF. In Section 4.2.3 we discuss simplifications to this formulation
that enable us to solve the problem efficiently. We firstly find the thermal criticality
of all grid locations in the 3D-IC chip using a full chip thermal analysis assuming
there are no micro-channels. Also, based on the 3D-IC structure and placement, we
identify all the potential locations of micro-channels and TSVs.
Assuming the 3D-IC is divided into small grids (i, j, k), with i, j representing
95
Table 4.1: Problem formulation
Given:
I.1: A 3D-IC placed netlist. The placement information can be used to
generate potential TSV locations;
I.2: A netlist that describes a set of interlayer nets;
I.3: The power profile of 3D-IC;
I.4: A set of potential locations for interlayer micro-channels. These channels are to be incorporated in the interlayer region of the chip;
We would like to:
O.1: Decide the locations of TSVs;
O.2: Assign a set of TSVs to each interlayer net;
O.3: Decide the number and locations of micro-channels;
In such a way that:
C.1: The assigned set of TSVs for each interlayer net forms a path connecting the source and destination terminals of the net;
C.2: The locations of micro-channel and TSVs do not conflict (see Figure
2.1 for detail);
C.3: The micro-channels provide sufficient cooling for the 3D-IC, i.e. Ti ≤
Tmax , ∀locations : i;
C.4: The total wirelength and required
pumping power by micro-channels is
∑
minimized: min u1 N + u2 ∀r W Lr where N is the number of channels
and W Lr is the bounding box wirelength of the r-th interlayer net
which depends on the TSV set it has been allocated to. Constants u1
and u2 could be allocated based on preference for a particular tradeoff.
the face of the 3D-IC and k representing the longitudinal direction along which
the micro-channel runs. The location of a micro-channel is basically the (i, j)-th
grid where the channel is located. In the k-th direction, the channel spans the
chip anyway. The TSV could be identified by the (i, j, k)-th grid it is located at.
After initial thermal analysis, we define a thermal criticality of each potential microchannel which basically represents the demand of allocating a micro-channel at that
location. The criticality factor c(i, j) for each micro-channel location (i, j) is defined
as:
96
c(i, j) =
K
∑
w(i, j, k) · max[0, Ti,j,k − Tmax ]
(4.1)
k=0
where Ti,j,k represents the temperature at grid (i, j, k), Tmax is the maximum thermal constraint. Parameter w(i, j, k) represents the thermal significance of a certain
grid, and K is the number of grids in the longitudinal direction in which the channel spans the entire chip. Based on the criticality factor c(i, j), we formulate the
MCMCF problem and obtain the TSV assignment and micro-channel allocation
simultaneously (see Figure 4.3). Thermal analysis and 2D routing are then conducted to evaluate the performance of the resulting design. If the design results in
thermal violation, ends up having significant wirelength or placing too many microchannels in some locations (this will increase cooling power and might also degrade
wirelength), we will refine the criticality factor c(i, j) (increase or decrease c(i, j)
accordingly) and re-solve the MCMCF problem.
We repeat this process iteratively until obtaining a design that achieves required tradeoff between cooling power and wirelength.
4.2.2.2 Multi-commodity Minimum Cost Flow Formulation
Given: a) the 3D-IC structure, b) potential locations of TSVs and microchannels, c) the interlayer netlist and d) criticality factor c(i, j), the multi-commodity
min-cost flow (MCMCF) problem is illustrated in Figures 4.4 and 4.5. Figure 4.4
illustrates a 3D-IC with three active layers and two interlayer nets along with four
potential TSVs. The potential locations of micro-channels have also been indicated.
97
Figure 4.3: Overall design flow of MCMCF based algorithm
Both front and top views have been illustrated that indicate TSV and net locations
in the 3D-IC grids. Our objective is to find the allocation of nets to TSVs and
micro-channels such that cumulative objective indicated in the previous section is
minimized: min u1 N + u2
∑
∀r
W Lr . Assuming u1 and u2 are the same for the sake
of ease in exposition, we instantiate a multi-commodity min-cost flow formulation
as follows.
For each net, we allocate one unit of unique commodity flow. Hence J nets
would correspond to J distinct units of commodity flows. The flow network has
one node for each terminal of the nets and also the potential TSV locations. We
assume that the net terminal in the higher layer is the source of this one unit flow
and net terminal in the lower layer is the sink. We assume all nets are two terminal.
For the example shown in Figure 4.4, the flow network is illustrated in Figure 4.5
which indicates that net1 and net2 terminals in the top layer are sources. They are
connected by directional edges to the TSV nodes in that active layer. If the nets
98
span multiple (more than 2) layers, then the TSVs in this layer would connect to the
TSVs in the layer just below to transfer the signal. This is also indicated in Figure
4.5(a) where TSVs in layer 1 are connected by directional edges to TSVs in layer
2. Finally destination or sink terminals of nets are also connected by directional
edges to TSVs in that layer as indicated in the figure. Note that the edges always
carry flow from source to sinks. Also, by construction this network forms a directed
acyclic graph. Let us ignore the presence of micro-channels for now (for ease of
explanation). The problem of allocating nets to TSVs could be modeled as a multicommodity min-cost flow (and the flow graph is illustrated in Figure 4.5(a)). Let
each net to TSV edge or TSV to TSV edge have a cost which is simply the half
perimeter bounding box between the two. Let each TSV node have a total capacity
of 1. Also all edges have an individual commodity capacity as 1 and a total capacity
as 1. The multi-commodity min-cost flow solution that sends the unique commodity
flow from each net source to the corresponding net sink on the network in Figure
4.5(a) at the minimum total cost corresponds to the net to TSV assignment with
minimum total wirelength. A total unity capacity for each TSV node ensures that
only one net is allocated to it.
Now we extend this formulation to account for the presence of micro-channels.
As indicated in Figure 4.4, some micro-channel locations conflict with TSVs while
others don’t. The micro-channel is allocated an additional flow commodity. Hence
if there are J interlayer nets then the total commodities becomes J + 1. Figure
4.5(b) indicates the process of accounting for micro-channels in the flow network of
Figure 4.5(a). The figure shows the top view of an interlayer region where the micro99
Figure 4.4: 3D-IC with potential TSV and micro-channel locations
channels are located and potentially conflict with TSVs. The micro-channels span
the entire length of the chip in the k direction. If there is even one TSV allocated
in this path, a channel cannot be allocated and vice versa. Now some potential
micro-channel locations do not have any potential TSVs while others do (see Figure
4.5(b)). We instantiate a source at the beginning of each potential channel location
and a sink at the end. This source contains a unique commodity corresponding
to the fluid flow. Note that all sources corresponding to micro-channel locations
have the same flow type (same commodity). Several paths exist between the source
and sink for a particular channel location. The simplest path is the one that goes
through all the grids that span the entire length of the chip (see Figure 4.5(b)).
Some of these grids are potential TSV locations and have other nets and/or TSVs
connected to them by way of directional edges (as indicated in Figure 4.5(a)). The
fluid flow edges are directed longitudinally while the net interconnection edges are
directed vertically. As indicated earlier, the TSV nodes have a total capacity of
100
Figure 4.5: Multi-commodity min-cost flow formulation
one. The longitudinal edges that represent fluid flow edges have unit capacity for
the fluid flow commodity while 0 capacity for all the J net commodities. The fluid
flow edges that connect the adjacent grids (direct edges in Figure 4.5(b)) have a
cost of 0. Now each intermediate grid in this direct path is also connected directly
to the fluid sink for that channel location by way of offset edges as well. This edge
also has a fluid commodity capacity of 1 and net commodity capacity of 0. The
cost of this edge is c(i, j) which represents the cooling demand for that channel
location. All the edges that represent net to TSV or TSV to TSV connection have
a fluid flow commodity capacity as 0. Note that different micro-channel locations
do not interfere since the network does not have any edges that interconnect them.
Now the cheapest way of sending a fluid flow commodity from source to sink is to
follow the simple path that spans all the adjacent nodes. The cost of this path is
0 but it forces us to use all the potential TSV nodes on the path since they have a
101
total capacity as 1. Hence none of these potential TSV nodes can be used for net
interconnection. On the other hand, if any one of these nodes has been allocated to
a net, a fluid commodity cannot go through this simple path anymore and it has to
take any one of the alternative paths to the sink (see offset edges in Figure 4.5(b)).
Any such alternative path has a cost of c(i, j) which represents the price that we
pay by not having a channel at that location since we would rather use some of the
TSVs on the channel path for routing nets.
Sending min-cost multi-commodity flow on this network results in an allocation of nets to TSVs and micro-channels to channel locations such that the total
cost is minimized. This cost is a combination of c(i, j) and bounding box wirelength,
and represents a balance between cooling and wirelength. Solving multi-commodity
problem is a challenging problem since the formulation is generally NP-Complete,
although several effective heuristics have been developed. In Section 4.2.3, we investigate some specific properties in our problem that help us simplify the formulation
thereby enabling us to use simpler, computationally efficient heuristics.
4.2.2.3 Iterative Optimization
As indicated in Figure 4.3, once an allocation of micro-channels and TSVs
has been conducted, we perform: a) routing to compute the actual wirelengths,
and b) thermal analysis. If the wire-lengths are unacceptable, thermal violations
occur or the system is overcooled (pumping power is wasted), the c(i, j) values are
re-allocated and the problem re-solved. If wire-lengths are very high, then c(i, j)
102
values are uniformly scaled down enabling us to prefer wirelength over channels. If
the system experiences thermal violations, c(i, j) values are increased enabling us
to use more micro-channels. If the system is non-uniformly overcooled then regions
where excessive cooling is available are subjected to a reduction in c(i, j) which
could end up in removing the channels in favor of using TSVs. Such an approach
assists in achieving the optimal balance between wirelength and cooling power while
satisfying the thermal and interconnection constraints.
4.2.3 Computational Simplifications
4.2.3.1 Multi Layer Case
Solving multi-commodity flow instances in general is computationally intractable.
For our specific case, this is a bigger issue since the number of commodities is linear
in the number of interlayer nets J (which could be quite large). This significantly
adds to the number of unknowns in the problem formulation making its solving
computationally expensive. We first simplify the formulation without losing optimality followed by effective heuristics. We transform the flow graph illustrated in
Figure 4.5 to the one illustrated in Figure 4.6. For the moment let us ignore the
fluid flow network in Figure 4.5(b). For each distinct net, let us replicate the entire
network graph in Figure 4.5(a) J times (one replica for each net). This is illustrated
in Figure 4.6(a). Basically all the TSV nodes in the original network appears J
times in the new network. The graphs for each net do not have any common edges,
hence we don’t need to represent the net flows by different commodities. All the
103
net flows belong to the same commodity. The edge costs and the node/edge capacities are exactly the same as before. Sending unit commodity min-cost flow on
this network, though, does not solve our problem. This is because the same TSV
may be used by two or more nets. In order to address this problem we can allocate
a bundle capacity to all replicated TSV nodes corresponding to the same TSV. A
bundle capacity constraint in network flow problems allocates a total capacity to
a bundle of nodes or edges. In our case we can set a bundle capacity constraint
of 1 to all replicated TSV nodes belonging to the same TSV. This is illustrated in
Figure 4.6(a). The problem continues to be NP complete but we have eliminated
the need for different commodities by adding an additional bundle constraint. We
found through our experiments that this significantly enhanced the computational
efficiency.
Adding micro-channel allocation constraints is illustrated in Figure 4.6(b).
Just as Figure 4.5(b), each potential micro-channel location has the associated network as illustrated, but the TSV nodes in the fluid network do not have the edge
connections to net flow in this case. Instead of allocating a different commodity
to fluid flow, we allocate the same commodity. Now the TSV location nodes in
the network of Figure 4.6(b) have a bundle capacity of 1 with the replicated TSV
nodes for the corresponding TSV in the rest of the network. Hence if a TSV is
allocated to a net, then it cannot be allocated to any other net or micro-channel.
The problem now becomes a (single-commodity) min-cost flow problem with bundle capacity constraints. While solving this problem formulation is NP Complete,
it has significantly smaller number of unknowns although the constraints are a bit
104
more complex. We solve this problem by assuming that the discrete flow variables
are continuous. This results in a linear programming approximation (polynomially
solvable) for this discrete problem. After getting the solution, non-discrete values
are rounded up appropriately to give a valid solution.
Figure 4.6: Computationally simplifying transformation for multi-layer case
4.2.3.2 Two Layer Case
Now we discuss the special case where there are only two active layers stacked
together. While the simplification for multi-layer case described above could certainly be applied here, there are additional transformations we can use. Consider
the instance illustrated in Figure 4.7(a) where we have two nets and two TSVs. Once
again, let us ignore the micro-channel constraints for the moment. Allocation of nets
to TSVs in this case is easier than the multi-layer case since it can be transformed
to a simple case of bipartite matching. We instantiate a network as illustrated in
Figure 4.7(b). For each net (unlike net terminal in the previous case) we have a
node and for each TSV we have a node. We have directed edges between nets and
TSVs whose cost is the total bounding box between the net’s two terminals and the
105
Figure 4.7: Computationally simplifying transformation for two-layer case
TSV pads in the corresponding layers (see Figure 4.7(b) for an illustration). Each
node corresponding to the nets has a unit flow (of the same commodity) available.
We also have a super sink that is connected to all the TSVs. The TSV nodes have
a capacity of 1. Sending min-cost flow from net nodes to the super sink would essentially correspond to allocation of nets to TSVs with minimum total wirelength
optimally in polynomial time. In order to add micro-channel location constraints
to this formulation, we essentially apply the method used in the multi-layer case
(with bundle constraints). Note that in this case, no replication of nodes for TSV
assignment was needed as in the multi-layer case, hence the generated formulation
is much simpler than simply applying the previous technique to this case directly.
The problem is still NP Complete due to the bundle capacity constraints. We simplify the formulation to a linear program LP by assuming the flow variables are
continuous. The generated continuous solution is then discretized by rounding of
the non-discrete variables.
Note that for multi-pin interlayer nets, we firstly partition them into multiple
two-pin nets, and use the aforementioned method to assign all the two-pin nets to
TSVs.
106
4.2.4 Performance of TSV Assignment and Micro-channel Placement
Co-design
4.2.4.1 Comparison of Wirelength and Pumping Power
In our experiment, we tested both two-layer and three-layer 3D-ICs. We use
IBM-PLACE 2.0 circuits with placement information as the benchmark [2]. For
each test, we choose two or three circuits from ibm01 − ibm10 circuits, each circuit
corresponds to one 3D-IC layer. Based on the placement information, we find the
whitespace between layout, which are basically the potential TSV locations. The
number of potential TSV locations ranges from around 50-1000. We also randomly
generate 30-200 interlayer nets. To obtain the power profiles for each layer, we
randomly assign a value for each cell as the power density for the cell. The chip
dimension is 9 × 9mm2 . The micro-channel width × height is 100 × 200µm2 , and
the diameter of TSV is 10µm. The maximum temperature constraint Tmax is 85℃.
We compare the wirelength and pumping power achieved by our co-optimization
approach and TSV first, Micro-channel first approaches.
1. TSV first approach firstly assigns TSVs to interlayer nets assuming there are
no micro-channels. Once TSVs are assigned and hence TSV locations are
decided, we allocate micro-channels in the remaining interlayer regions using
the approach in Section 3.3;
2. Micro-channel first approach allocates micro-channels first assuming there are
no TSVs, and then assigns interlayer nets to the remaining available TSV
107
locations.
For each approach, once we obtained the TSV assignment result, we route the
interlayer net terminals to the TSVs (or TSVs to TSVs) in each layer separately
using Labyrinth 2D router [3] to obtain the total wirelength (W L). We also estimate
the pumping power Ppump based on the number of channels used and the given
pressure drop. Table 4.2 shows the benchmark information.
Table 4.2: Benchmark Information
Ckt
# Layer
# TSV
# Interlayer nets
1
2
3
4
5
6
7
8
9
2
2
2
2
2
3
3
3
3
56
119
190
348
652
175
511
714
1111
30
50
80
100
125
50
80
100
200
Table 4.3: Comparison between our approach, TSV first and channel first approach (Ppump :
W , W L : m, temperature: o C)
Air
TSV first
cool
Micro-channel first
Below
Co-optimization
Below
Below WL change wrt Ppump change wrt
Ckt Tpeak WL Ppump Tmax WL Ppump Tmax WL Ppump Tmax TSV firstMC firstTSV firstMC first
1
2
3
4
5
6
7
8
9
106.460.19 2.13
101.110.28 n/a
121.970.45 n/a
110.251.25 6.81
128.831.53 5.11
135.280.37 n/a
154.060.91 20.42
152.391.55 9.36
161.052.99 10.21
Y
N
N
Y
Y
N
Y
Y
Y
0.22
0.33
0.52
1.33
1.63
0.43
0.98
1.63
3.22
1.48
0.84
1.04
1.70
1.70
4.25
5.53
6.04
5.29
Y
Y
Y
Y
Y
Y
Y
Y
Y
0.19
0.28
0.47
1.26
1.52
0.37
0.92
1.51
3.00
Avg
1.70
1.03
1.32
2.55
2.55
5.10
5.95
6.80
5.95
Y
Y
Y
Y
Y
Y
Y
Y
Y
+0.00%
+0.00%
+4.44%
+0.80%
-0.65%
+0.00%
+1.10%
-2.58%
+0.33%
-13.64%
-15.15%
-9.62%
-5.26%
-6.75%
-13.95%
-6.12%
-7.36%
-6.83%
-20%
n/a
n/a
-63%
-50%
n/a
-71%
-27%
-42%
+14%
+22%
+27%
+50%
+50%
+20%
+8%
+12%
+12%
+0.38% -9.41%
-46%
+24%
Table 4.3 shows the comparison of wirelength and micro-channel cooling power
108
for the three approaches. In the table, “below Tmax ” indicates if the achieved thermal profile satisfies the thermal constraint, MC first indicates the micro-channel
first approach. Table 4.3 shows that using air cooling results in thermal violation
for all power profiles, while micro-channels can provide sufficient cooling. Moreover, using TSV first approach, though achieves good wirelength compared with
micro-channel first approach, uses about 160% more pumping power since the existence of TSVs deters the optimal allocation of micro-channels. Furthermore, for
some benchmarks, the TSVs are allocated in thermal critical regions, in which cases
micro-channels cannot effectively cool these thermal critical regions thereby causing
thermal violations. On the contrary, micro-channel first approach, though saves
pumping power, results in up to 15% wirelength increase compared with TSV first
approach. Our approach considers both wirelength and pumping power simultaneously. The wirelength increase in our approach compared with TSV first approach
is only 0.38%, while compared with MC first approach, our approach saves 9.41%
wirelength. In some benchmarks, our approach even results in slightly better WL
than TSV first approach, this is because both approaches use the bounding box
wirelength (which basically gives a lower bound of the routing wirelength) when
solving the TSV assignment problem, while the real routing result also depends on
the relative positions between interlayer net terminals and TSV locations. Therefore, in these benchmarks, although our approach results in slight degradation in
bounding box wirelength, its real routing wirelength is better than TSV first approach. Comparing the micro-channel pumping power, our approach achieves 46%
pumping power savings compared with TSV first approach, and uses 24% more
109
pumping power compared with micro-channel first approach. Moreover, for benchmarks where thermal violations occur using TSV first approach, using our approach
could reduce the temperature below thermal constraints without consuming excessive pumping power.
4.2.4.2 Tradeoff Between Wirelength and Pumping Power
The value of criticality factor c(i, j) could be adjusted to control the weight
between wirelength and cooling provided by micro-channels. Usually, decrease in
pumping power is at the cost of increased wirelength, and vice versa. Such tradeoff is illustrated in Figure 4.8, which shows the wirelength versus pumping power
for one benchmark (all data points satisfy the thermal constraints). When thermal violations occur, more efficient allocation of micro-channels could be adopted
by sacrificing some wirelength, or more channels are allocated in the unused regions
surrounding the hotspot which leads to an increase in pumping power. When pumping power is too high, we could try to better allocate micro-channels to improve its
cooling effectiveness at a cost of longer wirelength. When wirelength is more preferable, we could assign TSVs towards further reduction of wirelength while sacrificing
micro-channel cooling effectiveness (leading to higher pumping power).
110
0.215
WL(m)
0.21
0.205
0.2
0.195
0.19
1.4
1.6
1.8
2
2.2
Ppump(W)
Figure 4.8: Tradeoff between wirelength and pumping power
4.3 Co-optimization of Gate Sizing and Micro-Fluidic Cooling
4.3.1 Motivation of Simultaneous Gate Sizing and Micro-channel Distribution
Distribution of channels in the interlayer region (deciding the channel placement) can be controlled to favor some sub-regions over others. As investigated in the
previous chapter, the distribution of channels can be used to control the local temperature of 3D-IC subregions, unlike conventional air cooling where no such control
is possible. This localized thermal control enabled by apt distribution of channels
(higher channel counts in some areas over lower channel counts in others) offers several advantages to the 3D-IC design process, which are ignored by the conventional
postfix approach for design of the cooling system.
The power, performance and temperature aspects of 3D-ICs have a very complex interdependence. Temperature profile depends on both the amount as well as
distribution of power. Non-linear leakage thermal interdependence implies higher
temperatures leading to greater power. Higher temperature also impacts the device
111
performance. Addressing these complex interdependencies between power, temperature and performance has been a major focus of research both for 2D and 3D
ICs. Localized temperature control enabled by micro-channel distribution can be
exploited in a number of ways by the 3D-IC design optimization process.
1. Improving the circuit speed: Allocation of greater cooling surrounding
timing critical areas could be used by 3D-IC design methods to improve timing
further by aggressive timing optimization since the associated power dissipation could be addressed by greater cooling. Reduced temperatures would also
contribute to an overall speeding up of circuit.
2. Reducing dynamic and leakage power dissipation: Greater cooling
in high leakage areas would directly reduce their leakage levels due to nonlinear dependence between leakage and temperature. Reduction in temperature around timing critical circuits would result in an overall speeding up of
the design. Hence we do not need aggressive timing optimization helping save
both dynamic and leakage power. Reduction in power would further reduce
temperatures causing a favorable positive feedback. The reduction in power
dissipation may be significantly greater than an increase in the pumping power
(experimental results to support this claim would be provided subsequently).
Hence the total power of the 3D-IC including dynamic, leakage and pumping
would be reduced.
3. Reduction in pumping power: Design of 3D-IC would decide the location
and nature of hotspots and nature of power dissipation. Co-optimization of
112
the 3D-IC system and the channel distribution could be used to simplify the
cooling configuration and therefore save pumping power.
4. Fundamental advancement in power-performance tradeoff: Per the
advantages noted above, co-optimization of cooling and the 3D-IC design enables better performance under a given power envelope and better power for a
given performance constraint, thereby resulting in fundamental improvement
in power-performance tradeoff. Experimental data to support this claim is
illustrated subsequently.
Overall, it can be seen that there is sufficient motivation for co-optimization of
the 3D-IC physical design as well as distribution of channels. Co-design of 3D-IC and
the fluidic cooling infrastructure can fundamentally improve the power performance
tradeoff in 3D-ICs. In this section we attempt to highlight the need for this co-design
and the associated challenges and opportunities. We investigate the simultaneous
gate sizing and micro-channel distribution problem in 3D-ICs as an illustration of
the advantages of this co-optimization [71].
4.3.2 Modeling of Gate Delay
The maximum delay of circuit is usually decided by the latency of critical
pathes, which is largely influenced by the delay of gates on these critical pathes.
The gate delay is influenced by many parameters, such as the gate size, carrier
mobility, and threshold voltage, etc.
Many works model the gate delay as a posynomial function of the gate sizes
113
∑
as:
di ∝ η0i +
∀k∈F O(gi )
ηki · sk
(4.2)
si
Here si is the width of gate gi , and sk,∀k∈F O(gi ) are the sizes of all gate gi ’s fanouts
[35]. This model shows that the gate delay is a monotonically decreasing function
of its own size, but a monotonically increasing function of the sizes of its fanout
gates. Therefore, increase in the size of gate gi can result in a reduction in gi ’s
delay, however this would increase the delay of gate gi ’s fanin gates.
Some of the circuit parameters, such as the threshold voltage and mobility
are sensitive to temperature [86]. [47] models the dependency of gate delay on
temperature as a polynomial function:
di ∝ Tiσ ,
σ ≈ 1.19
(4.3)
By incorporating impact of both gate sizes and temperature, we can model the gate
delay as a function of gate sizes and temperature:
∑
di ∝
Tiσ
· (η0i +
∀k∈F O(i)
ηki · sk
si
)
(4.4)
Here si , Ti are the width and temperature of gate gi , sk is the width of gi ’s fanout
gates, σ, η0i and ηki are constants.
This model shows that change in the following parameters can result in gate
delay reduction: (a) increase of its own width, (b) decrease in the width of its
fanouts, and (c) reduction in gate temperature.
114
4.3.3 Problem Formulation
The problem of gate sizing and micro-channel placement co-optimization is
formally stated as follows. Given a 3D-IC circuit and the associated gate and TSV
placement (as Figure 3.1 shows), we would like to decide the size of all gates and
location of interlayer micro-channels such that the total power consumption (including the dynamic and leakage power, as well as the pumping power consumed by
micro-channels) is minimized, while at the same time minimizing the longest path
delay and ensuring silicon temperature to be less than the maximum constraint. The
channels should not come in conflict with TSVs, which have been placed already.
The co-optimization problem is formulated in Equation 4.5. Here we assume that
gates and TSVs have been placed on a grid (each gate/TSV is within a grid). Also
the gate sizing does not change the gate’s grid location. Note that these assumptions
are similar to other works dealing with in-place gate sizing.
Decision variables : ⃗s, B
min
∑
(Pd,i + Pl,i ) + Ppump
∀gate:gi
s.t. 1. tj + di (⃗s, Ti ) ≤ ti , ∀gate gi , gj ∈ F I(gi )
2. ti < tcon , ∀gate gi ∈ P O
3. G(B) · T⃗ = P⃗ (⃗s, F, T⃗ )
4. 0 ≤ T⃗ ≤ T⃗max
5. smin ≤ si ≤ smax , ∀gate gi
115
(4.5)
The decision variables in this problem are the gates size ⃗s and micro-channel locations B.
The objective of the optimization problem is to minimize the total power
consumption of the 3D-IC (including dynamic, leakage and pumping power) for the
given timing constraint tcon . Here Pd,i and Pl,i represent the dynamic and leakage
power of gate gi , which can be calculated based on the models in Sections 2.4.1 and
2.4.2. The dynamic power depends on the gate sizes ⃗s and clock frequency F , and
leakage power depends on both gate sizes ⃗s and thermal profile T⃗ (temperature in
all grids). The clock frequency is usually decided by the maximum circuit delay.
Hence, in this work, we assume the clock frequency is the inverse of timing constraint
F = 1/tcon .
The first two constraints are timing constraints, indicating that the signal
propagation delay from the primary inputs (PIs) to primary outputs (POs) should
be within the timing constraint tcon . Here ti denotes the signal arrival time at the
output of gate gi from the primary inputs and di is the propagation delay of gate
gi . The delay, which depends on gate sizes and temperature, is calculated using the
model in Equation 4.3.2. We assume the 3D-IC is divided into grids. For ease of
explanation, we assume each grid only contains one gate. Hence grid i contains gate
gi and has the temperature Ti . If a grid does not have a gate, the corresponding
power is 0 and the temperature would be decided by neighboring grids based on
the conductivity matrix G. The 3D-IC thermal profile T⃗ is then represented by
the temperature of all grids: T⃗ = {Ti,∀grids:i }. Note that this formulation is easily
extendable to the case where each grid contains multiple gates.
116
The third constraint indicates the interdependency between temperature and
power. Let T⃗ and P⃗ (⃗s, F, T⃗ ) represent the thermal and power profiles at all grids i in
3D-IC. The power dissipated in a grid i is Pi = Pd,i + Pl,i (if a grid does not have any
gate then its power is 0). Note that the power profile is a function of gate sizes and
temperatures. Here G represents the 3D-IC conductivity matrix which depends on
the properties of the material, TSVs as well as design of the micro-channel structure
B. The last two constraints are the maximum temperature constraint and feasible
gate size range.
The power, temperature and gate delay are interdependent in a complex way,
making this co-optimization problem difficult to solve. The allocation of microchannels at discrete locations adds further complexity to this problem.
4.3.4 Algorithm for Gate Sizing and Micro-channel Placement Cooptimization
The problem formulation illustrated above is quite complex. We develop an
iterative optimization approach where each step systematically solves some aspects
of the problem. We have strived to use rigorous optimization methods as much as
possible. Fundamentally the overall optimization problem is decomposed into two:
deciding the gate sizes and grid temperatures simultaneously and then designing the
micro-channel distribution which removes the heat generated by the circuit (function of temperature and gate size) while coming as close as possible to the prescribed
temperature. This process is iterated several times as summarized below.
117
Step 1: Ideal heat sink and gate size co-optimization: We first simplify the
problem by assuming that temperature in each grid is perfectly controllable and is
not dependent on the 3D-IC conductivity matrix G. The resulting solution allocates a gate size and temperature level to each gate/grid. The ideal case acts as a
guideline to following optimization steps which would then strive to get as close to
this ideal solution as possible.
Step 2: Micro-channel distribution for the ideal case: Interlayer microchannels are now placed such that: a) the heat levels decided by step 1 are effectively removed and the grid temperatures are as close to those prescribed by step 1
as possible, b) micro-channels are not allocated in areas with TSVs, and c) smallest
number of channels are allocated for minimal pumping power.
Step 3: Gate size and grid temperature refinement: Since step 2 will be
unable to entirely meet the ideal case solution of step 1, the gate size and grid
temperature solution needs to be refined to account for the current micro-channel
network in place.
Step 4: Micro-channel distribution refinement: The solution from step 3 gives
a modified gate size and grid temperature prescription. Hence the micro-channel
network needs to be refined further.
118
Figure 4.9: Overall design flow
Step 5: Iterate steps 3 and 4 till convergence criteria is met: The convergence criteria could be set to a maximum number of iterations or levels of improvements achieved.
Figure 4.9 illustrates the overall approach. In each step we strive to use algorithms and heuristics which draw upon rigorous optimization theory while exploiting
the structure in the problem formulation. Now we describe each step in detail.
119
4.3.4.1 Step 1: Ideal Heat Sink and Gate Size Co-optimization
Let us first simplify the optimization problem in Equation 4.5 as:
Decision variables : ⃗s, T⃗
min
∑
(Pd,i (si ) + Pl,i (si , Ti )) + λ
∀gate:gi
∑ 1
Ti
∀grid:i
s.t. 1. tj + di (⃗s, Ti ) ≤ ti , ∀gate gi , gj ∈ F I(gi )
(4.6)
2. ti < tcon , ∀gate gi ∈ P O
3. 0 ≤ T⃗ ≤ T⃗max
4. smin ≤ si ≤ smax , ∀gate gi
In this formulation, the grid temperature Ti is assumed to be perfectly controllable
through an ideal heat sink. The constraints signify meeting the timing constraint
while staying with temperature and gate size constraints. The objective has two
components: minimization of power as well as an additional term
∑
1
∀grid:i Ti .
This
term signifies the fact that reducing Ti comes at the penalty of a more complex heat
sink (which would be designed in the subsequent steps). Without this term, this
optimization problem would trivially assign all Ti to be as small as possible (because
that would benefit both timing and power). The solution of this problem represents
allocation of gate sizes along with grid temperature, and would be used as a starting
point for further optimization.
In order to solve this problem we make the following transformation si = exi
and Ti = eyi . Based on this transformation, the gate delay and power consumption
120
models described in Sections 2.4.1 and 2.4.2 become: di = eσyi · (η0i +
∑
∀k∈F O(gi )
ηki ·
exk −xi ), Pd,i = βd,i F exi , Pl,i = exi ·(ε1 e2yi +ε2 eyi +ε3 ). It can be seen that the models
for delay, leakage and dynamic powers are convex functions of variables xi and yi .
Theorem 1: Formulation is Equation 4.6 can be solved optimally using convex
optimization approaches.
Proof: As indicated, gate delay, dynamic and leakage power functions are convex
w.r.t. variables xi and yi . Hence the constraints are convex. The term
gets transformed to
∑
∀grid:i
∑
1
∀grid:i Ti
e−yi which is a convex function, too. Hence the overall
objective function is convex as well, making the whole formulation optimally solvable
using polynomial time convex methods. 4.3.4.2 Step 2: Micro-channel Distribution for Ideal Case
Step 1 has assigned gate sizes and grid temperature values. The gate sizes
and temperatures decide the overall power dissipation profile while the temperature
assignments indicate the level of cooling necessary in each grid. Together, these two
aspects profoundly impact the design of the interlayer micro-fluidic system. The
problem with the “ideal formulation” of step 1 is that it assumes perfect control of
each grid temperature which is not possible even with interlayer micro-fluidics. By
nature, micro-fluidic channels carry heat along the direction of fluid flow. They are
incapable of controlling grid level temperatures. This is because, even though they
enable localized cooling, they cannot completely remove the thermal cross-coupling
of neighboring grids. The decision of allocating or removing a micro-channel will
influence all the grids adjacent to this micro-channel. Hence in this step, we would
121
like to allocate channels such that the power dissipation levels are removed while
ensuring the grid temperatures are as close as possible to the prescribed levels from
step 1. We use least square fit (LSF) to find the micro-channel placement:
min
∥ G(B) · T⃗desire − P⃗desire ∥2
(4.7)
Here T⃗desire is the prescribed thermal profile decided by the previous step. P⃗desire is
the sum of dynamic and leakage power calculated based on the prescribed gate sizes
and temperatures using the power models in Sections 2.4.1 and 2.4.2. The objective
is to decide the channel allocation such that the RMS (root-square-mean) error is
minimized. B is the allocation of micro-channels and G(B) is the associated thermal conductivity matrix. For a given allocation of micro-channels, the associated
conductivity matrix could be generated using the modeling approach described in
Section 2.3. It is noteworthy that for a given set of potential channel locations, we
would like to choose a subset such that the aforementioned objective is minimized.
To solve this, we first formulate the problem as an integer program. Essentially we assign a decision variable for each potential micro-channel location (binary
constraint) and show that the conductivity matrix G is a linear function of these
binary variables (proofs are omitted here). By approximating the binary variables
as continuous, this problem becomes minimizing the RMS error of an affine function
(since T⃗desire and P⃗desire are known, (G(B) · T⃗desire − P⃗desire ) is a linear function of
B), which can be solved efficiently. After solving this problem, we roundup the continuous variables to obtain the locations of micro-channels. Note that the objective
122
here is to generate a fluidic cooling solution that come as close as possible to the
prescribed T⃗desire and P⃗desire .
4.3.4.3 Step 3: Gate Size and Grid Temperature Refinement
Since the micro-channel solution from step 2 may not be able to come very close
to the solution desired by step 1, we need to refine the original solution. Following
are the objectives of this refinement step. 1) Step 2 synthesized a micro-channel
solution which controls how power and temperature impact each other. This needs
to be accounted for in the gate sizing solution. The ideal case of step 1 had assumed
a perfectly controllable grid temperature. With the new channel infrastructure inplace, this assumption does not hold anymore. Hence the gate sizing needs to be
re-evaluated. 2) We may still want to refine the channel structure further, based on
newly prescribed temperature and gate sizes. Hence we would like to generate new
assignments for grid temperature while accounting for the current cooling system in
place.
In order to achieve the latter objective we divide the temperature Ti into two
components: controllable and uncontrollable parts, Tc,i and Tnc,i . The uncontrollable temperature is decided by the relationship between power and temperature
which is a function of gate sizes and also the micro-channel structure in place. The
controllable part is an additional parameter which we can control to prescribe any
change in temperature. It would be used to further refine the micro-channel structure. The gate/grid temperature Ti = Tnc,i · Tc,i . Here Tc,i = 1 indicates no change
123
at gate gi (or grid i), Tc,i < 1 indicates greater need for cooling and Tc,i > 1 indicates
less cooling necessary. The formulation at this step can be represented as follows.
Decision variables : ⃗s, T⃗nc , T⃗c
Objective :
min
∑
(4.8)
(Pd,i (si ) + Pl,i (si , Tnc,i · Tc,i )) + λ
∀gate:gi
∑
1
Tc,i
∀grid:i
The objective structure is the same as the ideal case in step 1. However, the
temperature affecting the gate leakage has two components now: uncontrollable
part Tnc,i and controllable part Tc,i . Because the controllable component is being
assigned by us in this step, we would like Tc,i to be as large as possible indicating
minimal need for channels. This would help reduce pumping power. Hence the
objective combines total power dissipated (the first two terms) along with pumping
power (the third term).
Constraints 1, 2 :
1. tj + di (⃗s, Tnc,i · Tc,i ) ≤ ti , ∀gate gi , gj ∈ F I(gi )
(4.9)
2. ti < tcon , ∀gate gi ∈ P O
This set of timing constraints (constraints 1 and 2) is similar to the ideal case
except the gate temperature has two components.
Constraint 3 :
G(B) · T⃗nc = P⃗d (⃗s) + P⃗l (⃗s, T⃗nc )
124
(4.10)
As indicated earlier, Tnc,i is the uncontrollable temperature which is decided by
the power being dissipated and also the cooling system in place. Constraint 3
establishes the relationship between chip power dissipation and Tnc,i . Note that we
do not include Tc,i in this equation, because this parameter is being controlled to
prescribe refinements in the cooling system, and would be used by future steps to
redesign the cooling system.
Unlike the ideal case in step 1, Tc,i should not be arbitrarily assigned in each
grid since we already have a micro-channel network in place. For example, if a grid
i already has a channel underneath, then increasing Tc,i would prescribe removal of
this channel. But doing so without accounting for the impact on other grids may
result in significant sub-optimality since removal of a channel would affect a large
number of grids. Also, if a grid i is located close to a TSV, then even if it has a
small value of Tc,i (indicating a need for channels), its extra cooling demands may
never be met due to physical constraints imposed by TSVs. To account for these
issues, the following constraints are imposes on the control of Tc,i .
Constraints 4, 5 :
4. T⃗c,min ≤ T⃗c ≤ T⃗c,max
(4.11)
5. Tc,i = Tc,j , ∀adjacent grids i, j along channel direction
Tc,min,i and Tc,max,i values control how the Tc,i values are allocated (T⃗c,max , T⃗c,min are
vectorized Tc,max,i , Tc,min,i ). Tc,min,i ≤ 1 and Tc,max,i ≥ 1. A small value of Tc,min,i
implies the possibility of adding more cooling around grid i, while a large value of
125
Tc,min,i implies smaller chance of adding extra cooling around i. Similarly, a large
value of Tc,max,i implies that grid i is close to some existing channels, hence great
temperature increase would occur if the cooling around grid i is removed. A small
value of Tc,max,i implies that the impact of existing cooling configuration on grid i
is small since they are far away. By appropriately assigning the values for Tc,min,i
and Tc,max,i , we can control the degree of change that is prescribed to the cooling
system by the optimization formulation. The Tc,min,i and Tc,max,i values for each Tc,i
are allocated using the following rules.
Rule 1: If grid i is in the close vicinity of a TSV, then allocating channels
nearby would be tougher. Hence we do not wish to have too much additional control
of temperature at grid i. Therefore, Tc,min,i and Tc,max,i are allocated to be closer
to each other such that significant changes in the fluidic structure around i is not
prescribed by the optimization formulation. We use a formula based on distance
and number of closeby TSVs to compute this range.
Rule 2: If a channel is already allocated very close to grid i, then Tc,min,i is
assigned to 1 and Tc,max,i is assigned to be a large value. This indicates that the
step 3 formulation only has the option of suggesting removal of a channel from this
location.
Rule 3: If a channel is allocated close but not too close to a grid i, then
Tc,min,i < 1 and its value is a function of the number of potential channel locations
in the close vicinity. More the potential channel locations, smaller the value of
Tc,min,i . Tc,max,i is allocated to be a value greater than 1, and is a function of the
distance to the closest channel in the current design. Greater the distance smaller
126
the value of Tc,max,i . This is because, prescribing an increase in grid temperature by
removing channels will only be effective if they are located sufficiently close.
Rule 4: If no channel is allocated in sufficient vicinity then Tc,min,i has the
smallest value possible indicating that a channel could be added and Tc,max,i = 1
indicating that there is little possibility of removal of a channel.
Rule 5: All Tc,i for the grids along the same micro-channel is allocated to be
the same. Since each micro-channel spans the whole interlayer region in z direction,
hence the prescribed changes for grids along the same micro-channel are assigned
be the same due to the nature of micro-channels. This is illustrated in constraint 5.
Allocating Tc,min,i and Tc,max,i values is very critical since the ranges decide
what kind of changes from the current fluidic structure end up being prescribed.
The rules above attempt to constrain the formulation of step 3 to prescribe changes
which are in sync with the current fluidic system in place. Also, as we re-iterate, we
would like to make fewer modifications in the micro-channel structure. This could
be achieved by reducing the range for Tc,i as iterations progress.
Solving this formulation is more complex than the ideal case of step 1. Here
too, we transform the temperature Tnc,i = eync,i , Tc,i = eyc,i , and gate size si = exi .
Hence the prescribed temperature Ti = Tnc,i · Tc,i = eync,i +yc,i . With this transformation, the gate delay, dynamic and leakage power become convex functions of the
gate size and temperature variables xi , ync,i and yc,i . The objective and constraints
1,2 in Equation 4.8, 4.9 remain convex. Constraints 4 and 5 are also convex (since
ranges of the primary variables could be transformed to appropriate ranges of the
transformed variables). Constraint 3, however is problematic. In this constraint,
127
Tnc,i and power dissipation values are convex functions of xi and ync,i . However the
equality relationship in the constraint causes the convexity to breakdown. In order
to address this problem, we represent the the power dissipation of gate gi (leakage
+ dynamic) as a piecewise linear function of the gate size parameter xi and uncontrollable temperature variable ync,i . Note that the right hand side of the constraint
is basically the power dissipation for all gates. We also represent Tnc,i = eync,i (on
the left had side) as a piecewise linear function of ync,i . The underlying model parameters could be used to generate the coefficients for the piecewise linearization
(these are standard approaches and therefore omitted for brevity). Because, both
gate power dissipation and Tnc,i are convex functions of xi and ync,i , the following
approach can be used to replace the variables Tnc,i , Pd,i , Pl,i from constraint 3 by the
underlying piecewise linearization.
P oweri ≥ ϕm,1 · xi + ϕm,2 · ync,i + ϕm,3
∀m = 1...M
(4.12)
T empi ≥ ϕn,1 · ync,i + ϕn,2
∀n = 1...N
Here M and N are the number of linearizations imposed on the gate power dissipation and Tnc,i . Here P oweri represents an upper bound on gate gi ’s total power. The
M -piecewise linearization is derived from the underlying model. Similarly T empi is
an upper bound on Tnc,i . Constraint 3 is now written as:
Constraint 3 :
G(B) · T⃗ emp = P⃗ ower
(4.13)
Here T⃗ emp and P⃗ ower are vectorized P oweri and T empi . This modification enables
128
us to linearize constraint 3, which could now be augment with the other constraints
and solved with standard convex optimization methods. The final solution of this
optimization would be xi , ync,i and yc,i values for all gates. These would now be used
to refine the micro-channel distribution.
4.3.4.4 Step 4: Micro-channel Distribution Refinement
Just as step 2, we would like to design the micro-channel distribution to address
the heat dissipation decided by the gate sizes (and temperature) and also account
for the change in the current configuration prescribed by Tc,i . This step is basically
the same as step 2. However there are a few changes. Firstly, the formulation
solved in step 3 uses upper bound P oweri and T empi as illustrated in Equations
4.12, 4.13. Hence, for a given gate size and micro-fluidic configuration, we will need
to recompute the actual uncontrollable thermal profile T⃗nc (which could be done by
simply solving Equation 4.10 for the assigned gate size). Note that this is a complex
equation to solve due to leakage thermal interdependence. This would give the
actual T⃗nc profile for the given gate size solution. Now we combine the actual Tnc,i
with the prescribed Tc,i values to obtain the target grid temperature Ti = Tnc,i · Tc,i .
The generated target thermal profile is basically T⃗desire in step 2. Since the target
thermal profile and gate sizes are known, the chip power profile could be computed
as well. This would constitute P⃗desire . Using these values, a new channel distribution
is computed using techniques described in step 2.
129
4.3.4.5 Step 5: Re-iteration and Stopping Criteria
Steps 3, 4 are iterated to continue improvement in the overall solution. Firstly
we would like to point out that the formulation in step 3, indirectly captures pumping
power using the term λ
∑
1
∀grid:i Tc,i .
Secondly, as we iterate, Equation 4.11 controls
the tolerable level of change from the current micro-channel allocation. By shrinking
the range of Tc,i as we iterate, the amount of change in the cooling solution becomes
lesser and lesser. Hence after a few iterations, it will converge. This approach unifies
the design of cooling structure with gate sizing. This is a significant improvement
over conventional approaches that usually design the cooling infrastructure after
designing the electrical aspects. In the next section we illustrate how such co-design
can fundamentally improve the power-performance tradeoff in 3D-ICs.
4.3.5 Performance of Gate Sizing and Micro-channel Placement Codesign
To verify the power and performance improvement achieved by our approach,
we compare our co-optimized design with two other approaches.
1. The thermal aware gate sizing approach with pure air cooling (Air Cool approach). In this approach, the overall thermal resistance of the heat sink for
air cooling is 0.5℃/W.
2. The postfix approach that performs gate sizing first and then place microchannel using the approach in [76] (Postfix approach).
130
The experimental setup is the similar as Section 3.7. In this experiment, we
place a total of 2000 TSVs in the whitespace. The parameters of delay, thermal and
power models are obtained from [47][86][91] and SPICE simulation.
4.3.5.1 Comparison of Power Consumption
We compare the total power consumption resulted from the three approaches.
For the Air Cool approach, the power consumption consists of dynamic and leakage
power, while for Postfix and our approaches, the total power consumption also includes the pumping power consumed by micro-channels. Table 4.4 shows the power
consumption resulted from these approaches. For each benchmark, we tested power
consumption for different timing constraints: one is tight and the other is looser.
Note the tight timing constraint is the best achievable timing constraint for Air
Cool approach (basically the tightest timing constraint that we can compare). Table 4.4 shows that, under the same performance constraint, our approach can result
in 13.33% total power savings compared with Air Cool approach, indicating that
the use of micro-channels, not only does not increase the system total power consumption, but actually helps save power instead. Compared with Postfix approach
which performs gate sizing and micro-channel placement separately, our co-design
approach achieves 12.05% power saving. This is because: a) micro-channel structure
is optimized, b) micro-channels, which reduce chip temperature, also help reduce
the leakage power and circuit delay, causing a favorable positive feedback.
131
Table 4.4: Comparison of total power consumption (power: W, tcons : ns)
Bench
#Gates
mark
tcon
Total power
Power saving w.r.t
(tight/loose) Air Cool Postfix Our Air Cool
Postfix
343380
48 (tight)
70 (loose)
294
226
289
223
254
197
13.61%
12.83%
12.11%
11.66%
2
394152
74 (tight)
95 (loose)
256
233
251
219
219
189
14.45%
18.88%
12.75%
13.70%
3
342267
70 (tight)
90 (loose)
221
182
218
189
191
164
13.57%
9.89%
12.39%
13.23%
4
295632
39 (tight)
60 (loose)
293
214
287
210
258
189
11.95%
11.68%
10.10%
10.00%
5
208575
51 (tight)
61 (loose)
284
251
291
245
248
219
12.67%
12.75%
14.78%
10.61%
6
181722
55 (tight)
75 (loose)
232
190
232
188
206
167
11.21%
12.11%
11.21%
11.17%
240
237
208
13.33%
12.05%
1
Average
4.3.5.2 Comparison of Circuit Delay
We also compare the best achievable circuit delay under the same power envelop. This was obtained by performing a binary search on timing constraints tcon .
Table 4.5 shows that our co-optimized design achieves 15.88% circuit speedup over
the Air cool and Postfix approaches, while still consuming the same (or even less)
amount of power.
4.3.6 Power-Performance Tradeoff
To characterize the tradeoff between the system performance and power consumption, we plot the circuit delay versus power consumption for benchmark 1 as
Figure 4.10 shows. For all three approaches, the power consumption increases as the
132
Table 4.5: Comparison of circuit performance (power: W, tcons : ns)
Bench
mark
Air cool
Postfix
Our
Circuit
Best tcon Power Best tcon Power Best tcon Power speedup
48
74
70
39
51
55
294
256
221
293
284
232
48
74
70
39
51
55
289
251
218
287
291
232
40
60
57
34
44
47
289
251
218
287
277
231
16.67%
18.92%
18.57%
12.82%
13.73%
14.55%
Average
56
263
56
261
47
259
15.88%
Total power (W)
1
2
3
4
5
6
300
250
200
150
40
Air Cool
Postfix
Co−design
45
50
55
60
65
70
Max delay (ns)
Figure 4.10: Delay versus power tradeoff for benchmark 1
timing constraint becomes tighter. In the figure, the solid line is the power consumption of conventional gate sizing approach using pure air cooling. This line is basically
the best power-delay tradeoff that the conventional gate sizing approach can achieve.
The tradeoff achieved by Postfix approach has slight (but not significant) improvement over the conventional gate sizing approach. However, using co-design results
in significant performance-power improvement. The figure shows that for all timing constraints we tested, our design always dissipates less power compared with
the other two approaches. Similarly, when the available power budget is fixed, our
design achieves better circuit speed, indicating a fundamental power-performance
improvement achieved by 3D-IC electric and cooling system co-design.
133
4.4 Summary
In this chapter, we investigated two electrical-cooling system co-design problems: a) TSV assignment and micro-fluidic cooling co-optimization, and b) gate
sizing and micro-fluidic cooling co-optimization.
We firstly investigated a co-optimization of TSV assignment to interlayer nets
and micro-channel allocation such that both wirelength and micro-channel cooling
energy are co-optimized. We propose a multi-commodity min-cost flow based formulation followed by simplifying transformations that enable use of effective polynomial
time heuristics. The experimental results show that, our co-optimization approach
achieves 46% cooling power savings or 7.6% wire length reduction compared with
the approaches that assign TSVs and allocate micro-channels separately.
We then investigated a co-optimization approach for 3D-IC gate sizing and
micro-fluidic cooling design that fully exploits the interdependency between power,
temperature and circuit delay to push the power-performance tradeoff beyond conventional limits. We proposed a unified formulation to model this co-optimization
problem and use an iterative optimization approach to solve the problem. The experimental results show a fundamental power-performance improvement, with 12%
power saving and 16% circuit speedup.
Compared with the conventional design flow that separates the electrical and
cooling system design, the co-design methodology can fundamentally improve the
system power and performance. Furthermore, it also allows a more flexible tradeoff
between the system performance (such as wirelength and circuit delay) and power
134
consumption.
135
Chapter 5
Conclusion and Discussion
5.1 Conclusion
In this work, we investigated several aspects of micro-fluidic cooling for 3D-ICs.
The micro-fluidic cooling is capable of removing very high density heat. However,
there are also overhead or constraints associated micro-fluidic cooling, such as significant extra cooling power consumption, resource conflict with TSVs, etc.
In order to overcome these overheads or account for the design constraints,
we proposed three micro-fluidic cooling configurations that can result in significant
cooling power savings and meanwhile, avoid the TSVs. In these designs, microchannel structures are designed after the electrical part of the chip, hence they are
compatible with the standard IC design flow. Besides optimized cooling configuration, we also proposed a micro-channel based dynamic thermal management method
that controls the fluid velocity at runtime to allow real time thermal control.
The electrical, thermal, reliability and cooling aspects are all interdependent.
Therefore, although these cooling system designs are compatible with the standard
IC design flow, separating the design of electrical and cooling system actually leads
to sub-optimal designs. Hence, we then investigated the electrical and cooling system
co-design to achieve further power-performance improvement.
We firstly investigated a co-optimization of TSV assignment to interlayer nets
136
and micro-channel allocation such that both wirelength and micro-channel cooling energy are co-optimized. We propose a multi-commodity flow based formulation followed by simplifying transformations that enable use of effective polynomial
time heuristics. The experimental results show that, our co-optimization approach
achieves 46% cooling power savings or 7.6% wire length reduction compared with
the approaches that assign TSVs and allocate micro-channels separately.
We then investigated a co-optimization approach for 3D-IC gate sizing and
micro-fluidic cooling design that fully exploits the interdependency between power,
temperature and circuit delay to push the power-performance tradeoff beyond conventional limits. We proposed a unified formulation to model this co-optimization
problem and use an iterative optimization approach to solve the problem. The experimental results show a fundamental power-performance improvement, with 12%
power saving and 16% circuit speedup.
With the existence of micro-fluidic cooling, the designers now can perform a
more aggressive performance optimization, since the resulting heat can be removed
by the liquid flow in the micro-channels. Furthermore, the co-optimization will help
us fully exploit the advantages of micro-fluidic cooling and result in a fundamental
improvement in the system power-performance tradeoff.
5.2 Future Work
Using of micro-fluidic cooling in the 3D-IC is still a new technology and several
problems need to be addressed.
137
The first direction is more extensive investigation of electro-thermo-mechanical
co-design. The existence of micro-channels not only influences the gate sizing and
TSV assignment as explored in Chapter 4, it will change the whole physical design
process, such as 3D-IC partitioning and floorplanning etc. For example, the 3D-IC
partitioning can be optimized more aggressively to achieve better bandwidth; the
floorplanning can also be optimized to save chip area, etc. Besides physical design,
the micro-fluidic cooling also enables a more aggressive architectural level design
without worrying about the temperature, since the resulting heat can be removed
by micro-channels.
The second direction is reliability associated with micro-channels. As mentioned earlier, in the 3D-IC, TSVs are incorporated to enable interlayer communications and delivery of power/ground. Copper, due to its low resistivity, is a commonly
used material for TSV fill. Since the chips are usually annealed at the temperature
level much higher than their operating temperature, when cooling down from the
annealing temperature, thermal stress occurs due to the coefficient of thermal expansion (CTE) mismatch between the TSV fill material (e.g. copper) and silicon.
The thermal stress might cause reliability problems such as cracking. The existence
of micro-channels will change the 3D-IC thermal profile and hence influence the thermal stress field inside 3D-ICs as well. The impact of micro-fluidic cooling on chip
reliability (through thermal stress) needs to be analyzed. Besides thermal stress, the
coolant fluid inside micro-channels also causes mechanical stress on micro-channel
sidewalls. This intensity of such stress depends on the distribution, dimension of
micro-channels and fluid flow rate (velocity) through micro-channels (along with
138
choice of material). Such mechanical stress also needs to be investigated.
Furthermore, the thermal stress inside 3D-IC also influences the carrier mobilities, hence affecting gate delays. The impact of stress on gate/circuit delay is
complex, depending on the intensity of stress, the location of gates and TSVs and
the type of transistor (NMOS or PMOS). The micro-fluidic cooling, since it influences the thermal stress, also influences the circuit delay through thermal stress. As
a result, it will fundamentally change the timing analysis in 3D-ICs. When performing statistical timing analysis in 3D-IC, we should take this fact into consideration
[74]. Moreover, in designing the micro-fluidic cooling configurations, this thermal
stress effect should also be considered, which basically requires electrical and cooling
system co-design as well.
139
Bibliography
[1] Capo:
a
large-scale
fixed-die
http://vlsicad.eecs.umich.edu/BK/PDtools/Capo/.
floorplacer.
[2] Ibm-place 2.0 benchmark. In http://er.cs.ucla.edu/benchmarks/ibm-place2/.
[3] Labyrinth
global
router.
ner/research/labyrinth/.
In
http://cseweb.ucsd.edu/
kast-
[4] ITC’99 benchmarks. http://www.cad.polito.it/dow nloads/tools/itc99.html.
[5] T. M. Adams, S. I. Abdel-Khalik, S. M. Jeter, and Z. H. Qureshi. An experimental investigation of single-phase forced convection in microchannels.
International Journal of Heat and Mass Transfer, pages 851–857, 1998.
[6] Bruno Agostini, John Richard Thome, Matteo Fabbri, and Bruno Michel. High
heat flux two-phase cooling in silicon multimicrochannels. IEEE Transactions
on Components and Packaging Technologies, Vol.31, 2008.
[7] K. Athikulwongse, A. Chakraborty, Jae-Seok Yang, D.Z. Pan, and Sung Kyu
Lim. Stress-driven 3d-ic placement with tsv keep-out zone and regularity study.
In IEEE/ACM Intl. Conf. on Computer Aided Design (ICCAD’10), 2010.
[8] Muhannad S. Bakir, Calvin King, and et al. 3D heterogeneous integrated
systems: Liquid cooling, power delivery, and implementation. In IEEE Custom
Intergrated Circuits Conference, pages 663–670, 2008.
[9] Avram Bar-Cohen. Thermal management of on-chip hot spots and 3d chip
stacks. In IEEE International Conference on Microwaves, Communications,
Antennas and Electronics Systems, pages 1–8, 2009.
[10] James R Black. Electromigrationa brief survey and some recent results. IEEE
Transactions on Electron Devices, 16:338–347, 1969.
[11] David Brooks and Margaret Martonosi. Dynamic thermal management for
high-performance microprocessors. In Proc. of the 7th Intl. Symp. on HighPerformance Computer Architecture (HPCA’01).
[12] Thomas Brunschwiler, Bruno Michel, Hugo Rothuizen, Urs Kloter, Bernhard
Wunderle, and Herbert Reichl. Hotspot-optimized interlayer cooling in vertically integrated packages. Proc. Materials Research Society (MRS) Fall Meeting, 2008.
[13] Thomas D Burd, Trevor A Pering, Anthony J Stratakos, and Robert W Brodersen. A dynamic voltage scaled microprocessor system. Solid-State Circuits,
IEEE Journal of, 35:1571–1580, 2000.
140
[14] Ting-Yen Chiang, K. Banerjee, and K.C. Saraswat. Effect of via separation and
low-k dielectric materials on the thermal characteristics of Cu interconnects. In
IEEE Intl. Electron Devices Meeting, IEDM Technical Digest, pages 261–264,
2000.
[15] S.B. Choi, R.F. Barron, and R.O. Warrington. Fluid flow and heat transfer
in micro tubes. Micromechanical sensors, actuators and systems, ASME DSC,
pages 123–128, 1991.
[16] Aviad Cohen, Lev Finkelstein, Avi Mendelson, Ronny Ronen, and Dmitry
Rudoy. On estimating optimal performance of cpu dynamic thermal management. IEEE Computer Architecture Letters, 2:6, 2003.
[17] Jason Cong and Yan Zhang. Thermal via planning for 3-D ICs. In IEEE/ACM
Intl. Conf. on Computer Aided Design (ICCAD’05), pages 744–751, 2005.
[18] Ayse K. Coskun, David Atienza, Tajana Simunic Rosing, and et al. Energyefficient variable-flow liquid cooling in 3D stacked architectures. In Conference
on Design, Automation and Test in Europe (DATE’10), pages 111–116, 2010.
[19] Ayse K. Coskun, Jose L. Ayala, David Atienzaz, and Tajana Simunic Rosing.
Modeling and dynamic management of 3D multicore systems with liquid cooling. In 17th Annual IFIP/IEEE International Conference on Very Large Scale
Integration, pages 60–65, 2009.
[20] Ayse Kivilcim Coskun, Tajana Simunic Rosing, and Kenny C. Gross. Temperature management in microprocessor socs using online learning. In Design
Automation Conference (DAC’08).
[21] Ayse Kivilcim Coskun, Tajana Simunic Rosing, and Kenny C. Gross. Proactive
temperature management in MPSoCs. In Proceedings of the 2008 International
Symposium on Low Power Electronics and Design, pages 165–170, 2008.
[22] William J Dally. Future directions for on-chip interconnection networks. In
OCIN Workshop, 2006.
[23] Lotfollah Ghodoossi. Thermal and hydrodynamic analysis of a fractal microchannel network. Energy Conversion and Management, Elsevier, pages 771–
788, 2005.
[24] Brent Goplen and Sachin Sapatnekar. Thermal via placement in 3D ICs. In
International Symposium on Physical Design (ISPD’05), pages 167–174, 2005.
[25] Vinay Hanumaiah, Sarma Vrudhula, and Karam S Chatha. Performance optimal online dvfs and task migration techniques for thermally constrained multicore processors. Computer-Aided Design of Integrated Circuits and Systems,
IEEE Transactions on, 30:1677–1690, 2011.
141
[26] Michael B. Healy and Sung Kyu Lim. Power delivery system architecture for
many-tier 3d systems. In Electronic Components and Technology Conference,
pages 1682–1688, 2010.
[27] Huang Huang, Gang Quan, and Jeffrey Fan. Leakage temperature dependency
modeling in system level analysis. In 11th International Symposium on Quality
Electronic Design (ISQED), pages 447–452, 2010.
[28] H. Irie, K. Kita, K. Kyuno, and A. Toriumi. In-plane mobility anisotropy and
universality under uni-axial strains in n- and p-mos inversion layers on (100),
(110), and (111) si. In IEEE International Electron Devices Meeting, pages
225–228, 2004.
[29] Muhamad Amri Ismail, Iskhandar Md Nasir, and Razali Ismail. Modeling of
temperature variations in mosfet mismatch for circuit simulations. In Quality
Electronic Design, 2009. ASQED 2009. 1st Asia Symposium on, pages 357–362,
2009.
[30] Philip Jacob, Okan Erdogan, Aamir Zia, Paul M Belemjian, Russell P Kraft,
and John F McDonald. Predicting the performance of a 3d processor-memory
chip stack. IEEE Design & Test of Computers, 22:540–547, 2005.
[31] Arun Jagota and Laura A. Sanchis. Adaptive, restart, randomized greedy
heuristics for maximum clique. Journal of Heuristics, 7:565 – 584, 2001.
[32] Linan Jiang, Jae-Mo Koo, and et al. Cross-linked microchannels for vlsi hotspot
cooling. In ASME 2002 International Mechanical Engineering Congress and
Exposition, 2002.
[33] Satish Kandlikar, Srinivas Garimella, and et al. Heat transfer and fluid flow in
minichannels and microchannels. Elsevier, 2005.
[34] J. Keslin. Viscosity of liquid water in the range - 8 c to 150 c. J. Phys. Chem.
Ref. Data, 7, 1978.
[35] Mahesh Ketkar, Kishore Kasamsetty, and Sachin S. Sapatnekar. Convex delay
models for transistor sizing. In Design Automation Conference (DAC’00), pages
655–660, 2000.
[36] Dae Hyun Kim, Krit Athikulwongse, and Sung Kyu Lim. A study of throughsilicon-via impact on the 3D stacked IC layout. In IEEE/ACM Intl. Conf. on
Computer Aided Design (ICCAD’09), pages 674–680, 2009.
[37] Duckjong Kim, Sung Jin Kim, and Alfonso Ortega. Compact modeling of fluid
flow and heat transfer in pin fin heat sinks. Journal of Electronic Packaging,
2004.
142
[38] N.S. Kim, T. Austin, D. Blaauw, T. Mudge, K. Flautner, J.S. Hu, M.J. Irwin,
M. Kandemir, and V. Narayanan. Leakage current: Moores law meets static
power. IEEE Computer Society, 36(12):68–75.
[39] Yoon Jo Kim, Yogendra K. Joshi, and et al. Thermal characterization of interlayer microfluidic cooling of three dimensional integrated circuits with nonuniform heat flux. ASME Trans. Journel of Heat Transfer, 2010.
[40] CR King, D. Sekar, M.S. Bakir, B. Dang, J. Pikarsky, and J.D. Meindl. 3d
stacking of chips with electrical and microfluidic i/o interconnects. In Electronic
Components and Technology Conference, pages 1–7, 2008.
[41] Alexander Klaiber et al. The technology behind crusoe processors. Transmeta
Technical Brief, 2000.
[42] Roy W. Knight, Donald J. Hall, and et al. Heat sink optimization with application to microchannels. IEEE Trans. on Components, Hybrids, and Manufacturing Technology, pages 832–842, 1992.
[43] Jae-Mo Koo, Sungjun Im, Linan Jiang, and Kenneth E. Goodson. Integrated microchannel cooling for three-dimensional electronic circuit architectures. ASME Trans. Journel of Heat Transfer, pages 49–58, 2005.
[44] K. Laker and W. Sansen. Design of analog integrated circuits and systems.
New York: McGraw-Hill, 1994.
[45] Young-Joon Lee and Sung Kyu Lim. Co-optimization of signal, power, and
thermal distribution networks for 3D ICs. In Electrical Design of Advanced
Packaging and Systems Symposium, pages 163–155, 2008.
[46] Jing Li and H. Miyashita. Efficient thermal via planning for placement of 3d
integrated circuits. In IEEE International Symposium on Circuits and Systems
(ISCAS’07), pages 145–148, 2007.
[47] W. Liao, L. He, and K.M. Lepak. Temperature and supply voltage aware
performance and power modeling at microarchitecture level. IEEE Trans. on
Computer-Aided Design of Integrated Circuits and Syst., 24:1042–1053, 2005.
[48] Xiaodong Liu, Yufan Zhang, Gary Yeap, and Xuan Zeng. An integrated algorithm for 3D-IC TSV assignment. In DAC, pages 652–657, 2011.
[49] James J-Q Lu, Ken Rose, and Susan Vitkavage. 3d integration: Why, what,
who, when? Future Fab Intl, 23, 2007.
[50] Zhijian Lu, John Lach, Mircea Stan, and Kevin Skadron. Banking chip lifetime:
Opportunities and implementation. In Proceedings of the 1st Workshop on High
Performance Computing Reliability Issues (HPCRI05), 2005.
143
[51] I. Hassan M. Dang and R. Muwanga. Adiabatic two phase flow distribution and
visualization in scaled microchannel heat sinks. Experiments in Fluids, 2007.
[52] Christophe Marques and Kevin W. Kelly. Fabrication and performance of a
pin fin micro heat exchanger. Journal of Heat Transfer, pages 434–444, 2004.
[53] Steven M. Martin, Krisztian Flautner, Trevor Mudge, and David Blaauw. Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads. In IEEE/ACM Intl. Conf. on Computer
Aided Design (ICCAD’02).
[54] H. Mizunuma, C. L. Yang, and Y. C. Lu. Thermal modeling for 3D-ICs with
integrated microchannel cooling. In IEEE/ACM Intl. Conf. on Computer Aided
Design, pages 256–263, 2009.
[55] Bruce Roy Munson, Donald F. Young, Theodore H. Okiishi, and Wade W.
Huebsch. Fundamentals of fluid mechanics. Wiley, 2008.
[56] Y. S. Muzychka and M. M. Yovanovich. Modelling friction factors in noncircular ducts for developing laminar flow. In 2nd AIAA Theoretical Fluid
Mechanics Meeting, 1998.
[57] Mohit Pathak, Young-Joon Lee, Thomas Moon, and Sung Kyu Lim. Throughsilicon-via management during 3d physical design: When to add and how
many? In IEEE/ACM International Conference on Computer-Aided Design
(ICCAD’10), pages 387–394, 2010.
[58] Massoud Pedram and Shahin Nazarian. Thermal modeling, analysis and management in VLSI circuits: Principles and methods. Proceedings of the IEEE,
94:1487–1501, 2006.
[59] Yoav Peles, Ali Kosar, Chandan Mishra, Chih-Jung Kuo, and Brandon Schneider. Forced convective heat transfer across a pin fin micro heaet sink. International Journal of Heat and Mass Transfer, pages 3615–3627, 2005.
[60] Kiran Puttaswamy and Gabriel H. Loh. Thermal analysis of a 3D die-stacked
high-performance microprocessor. In Proceedings of the 16th ACM Great Lakes
symposium on VLSI (GLSVLSI’06 ), 2006.
[61] Hanhua Qian, Xiwei Huang, Hao Yu, and Chip Hong Chang. Cyber-physical
thermal management of 3d multi-core cache professor system with microfluidic
cooling. Journal of Low Power Electronics, 2011.
[62] Weilin Qu, Issam Mudawar, Sang-Youp Lee, and Steven T. Wereley. Experimental and computational investigation of flow development and pressure drop
in a rectangular micro-channel. Journal of Electronic Packaging, 2006.
144
[63] Ravishankar Rao and Sarma Vrudhula. Performance optimal processor throttling under thermal constraints. In Proc. of Intl. Conf. on Compilers Architectures and Synthesis for Embedded Systems (CASES’07), pages 257–266, 2007.
[64] Ravishankar Rao, Sarma Vrudhula, and Naehyuck Chang. An optimal analytical solution for processor speed control with thermal constraints. In Proceedings
of the 2006 International Symposium on Low Power Electronics and Design,
pages 292–297, 2006.
[65] K. Ahuja Ravindra, L. Magnanti Thomas, and James B. Orlin. Network flows:
Theory, algorithms and applications. Prentice Hall, 1993.
[66] Takashi Sato, Junji Ichimiya, Nobuto Ono, and Masanori Hashimoto. On-chip
thermal gradient analysis considering interdependence between leakage power
and temperature. In IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, pages 3491–3499, 2006.
[67] TH Schubert, L Ciupiński, J Morgiel, H Weidmüller, T Weissgärber, and
B Kieback. Advanced composite materials for heat sink applications. Euro
PM, 2007.
[68] S.M. Senn and D. Poulikakos. Laminar mixing, heat transfer and pressure drop
in tree-like microchannel nets and their application for thermal management
in polymer electrolyte fuel cells. Journal of Power Sources, Vol. 130, pages
178–191, 2004.
[69] R. K. Shah and A. L. London. Laminar flow forced convection in ducts: A
source book for compact heat exchanger analytical data. Academic, 1978.
[70] Bing Shi, Caleb Serafy, and Ankur Srivastava. Co-optimization of tsv assignment and micro-channel placement for 3d-ics. In Proceedings of the 23rd ACM
international conference on Great lakes symposium on VLSI, pages 337–338,
2013.
[71] Bing Shi and Ankur Srivastava. Cooling of 3d-ic using non-uniform microchannels and sensor based dynamic thermal management. In Communication,
Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on,
pages 1400–1407, 2011.
[72] Bing Shi and Ankur Srivastava. Liquid cooling for 3D-ICs. In invited paper,
First International IEEE Workshop on Thermal Modeling and Management:
Chips to Data Centers, 2011.
[73] Bing Shi and Ankur Srivastava. Tsv-constrained micro-channel infrastructure
design for cooling stacked 3d-ics. In Proceedings of the 2012 ACM international
symposium on International Symposium on Physical Design, pages 113–118,
2012.
145
[74] Bing Shi and Ankur Srivastava. Thermal stress aware 3d-ic statistical static
timing analysis. In Proceedings of the 23rd ACM international conference on
Great lakes symposium on VLSI, pages 281–286, 2013.
[75] Bing Shi, Ankur Srivastava, and Avram Bar-Cohen. Hybrid 3d-ic cooling system using micro-fluidic cooling and thermal tsvs. In VLSI (ISVLSI), 2012
IEEE Computer Society Annual Symposium on, pages 33–38, 2012.
[76] Bing Shi, Ankur Srivastava, and Peng Wang. Non-uniform micro-channel design
for stacked 3D-ICs. In Design Automation Conference (DAC’11), 2011.
[77] Kevin Skadron, Mircea R. Stan, Karthik Sankaranarayanan, Wei Huang,
Sivakumar Velusamy, and David Tarjan. Temperature-aware microarchitecture: Modeling and implementation. ACM Trans. on Architecture and Code
Optimization, 1(1):94–125, 3.
[78] Arvind Sridhar, Alessandro Vincenzi, Martino Ruggiero, Thomas Brunschwiler,
and David Atienza. 3D-ICE: Fast compact transient thermal modeling for 3D
ICs with inter-tier liquid cooling. In IEEE/ACM Intl. Conf. on Computer
Aided Design (ICCAD’10), 2010.
[79] Linda Stappers, Yanli Yuan, and Jan Fransaer. Novel composite coatings for
heat sink applications. Journal of The Electrochemical Society, 152:C457–C461,
2005.
[80] Haihua Su, Frank Liu, Anirudh Devgan, Emrah Acar, and Sani Nassif. Full
chip leakage estimation considering power supply and temperature variations.
In Proceedings of the 2003 international symposium on Low power electronics
and design, pages 78–83, 2003.
[81] Haihua Su, Frank Liu, Anirudh Devgan, Emrah Acar, and Sani Nassif. Full
chip leakage estimation considering power supply and temperature variations.
In Proceedings of the 2003 International Symposium on Low Power Electronics
and Design (ISLPED’03), pages 78 – 83, 2003.
[82] John R. Thome. Engineering data book iii. Wolverine Tube, 2004.
[83] Y. Tsividis. Operation and Modeling of the Mos Transistor. Oxford University
Press, 2004.
[84] D. B. Tuckerman and R. F. W. Pease. High-performance heat sinking for VLSI.
IEEE Electron Device Letters, pages 126–129, 1981.
[85] R. Walchli, T. Brunschwiler, B. Michel, and D. Poulikakos. Combined local
microchannel-scale cfd modeling and global chip scale network modeling for
electronics cooling design. International Journal of Heat and Mass Transfer,
2010.
146
[86] Neil Weste and David Harris. Cmos vlsi design: A circuits and systems perspective. Addison Wesley, 2010.
[87] Frank M. White. Fluid mechanics. McGraw-Hill Book Company, 1986.
[88] Dong Hyuk Woo, Nak Hee Seong, Dean L Lewis, and H-HS Lee. An optimized
3d-stacked memory architecture by exploiting excessive, high-density tsv bandwidth. In IEEE 16th International Symposium on High Performance Computer
Architecture (HPCA), pages 1–12, 2010.
[89] P. Y. Wu and W.A. Little. Measuring of the heat transfer characteristics of gas
flow in fine channel heat exchangers for micro miniature refrigerators. Cryogenics, 1994.
[90] Jin-Tai Yan, Yu-Cheng Chang, and Zhi-Wei Chen. Thermal via planning for
temperature reduction in 3D ICs. In IEEE International SOC Conference
(SOCC’10), pages 392–395, 2010.
[91] C.Y. Yang, J.J. Chen, L. Thiele, and T.W. Kuo. Energy-efficient real-time
task scheduling with temperature-dependent leakage. In Conference on Design,
Automation and Test in Europe, pages 9–14, 2010.
[92] Jae-Seok Yang, Krit Athikulwongse, Young-Joon Lee, Sung Kyu Lim, and
David Z. Pan. Tsv stress aware timing analysis with applications to 3d-ic
layout optimization. In Proceedings of the 47th Design Automation Conference
(DAC’10), 2010.
[93] Jun Yang, Xiuyi Zhou, Marek Chrobak, Youtao Zhang, and Lingling Jin. Dynamic thermal management through task scheduling. In Performance Analysis
of Systems and software, 2008. ISPASS 2008. IEEE International Symposium
on, pages 191–201, 2008.
[94] D. Yu, R. Warrington, R. Barron, and T. Ameen. An experimental and theoretical investigation of fluid flow and heat transfer in microtubes. Proceedings
of the ASME/JSME Thermal Engineering Conference, pages 523–530, 1995.
[95] Tianpei Zhang, Yong Zhan, and Sachin S. Sapatnekar. Temperature-aware
routing in 3D ICs. In ASP-DAC, pages 309–314, 2006.
[96] Yufu Zhang and Ankur Srivastava. Adaptive and autonomous thermal tracking
for high performance computing systems. In Design Automation Conference
(DAC’10), 2010.
147
Download