Carbon Nanotube MOSFETs (CNTs)

advertisement
Task C: 3D Integration
Neil Goldsman, Bruce Jacob, Martin Peckerar
The component of Task C lead by Neil Goldsman covers three subtasks:
• Subtask 1: Modeling and Prototyping Device and Chip Heating for 2D & 3D
Integration
• Subtask 2: Modeling and Prototyping for on Chip Electromagnetic Effects
• Subtask 3: Prototyping, Modeling and Processing 3D Structures
1. Subtask 1: Modeling and Prototyping Device and Chip Heating for 2D & 3D
Integration
1.1 Introduction:
We report on a novel method for predicting the temperature profile of an integrated
circuit at the resolution of a single device. This work has been performed by Akin Akturk
and Neil Goldsman. Recently, Latise Parker, a new MS candidate has begun to
participate as well.
As chips are densely packed with more transistors per unit area, chip manufacturers are
coping with several problems to guarantee good chip performance. One of these
important problems is the full-chip heating. Investigators have pointed out that towards
the end of the semiconductor roadmap, there will be more devices per unit area due to
scaling down of physical device dimensions. The resulting real estate crowding will
induce high temperatures and temperature gradients on the chip. The increase in device
density per chip, as well as higher device capacitance, higher clock speeds and more onstate leakage currents will give rise to more power dissipation which will translate into
higher on-chip temperatures. This problem can be alleviated somewhat by reducing
supply voltages. However, noise margins restrict the level to which supply voltages can
be reduced, so increased power dissipation is inevitable.
The problems that we describe above for 2D chips are highly exacerbated for 3D
integration since the circuit surface area is greatly reduced, and the generated heat can not
easily escape through surface cooling. Preliminary research has been done to estimate
the temperature profile for given chips. However, we fully address the need for a tool that
calculates chip temperatures, and establishes the necessary link between single device
operations and the full-chip heating for both planar and 3D IC’s. We have developed a
new methodology for predicting full-chip heating at the resolution of a single device. On
the device level, we first obtain electrical characteristics of a MOSFET for the given
voltage and temperature by self consistently solving coupled quantum and semiconductor
equations. We then solve the system on the chip level, where the thermal coupling
between devices is modeled by a lumped circuit type thermal network. We next obtain
1
the model for the thermal network comprised of passive thermal elements like thermal
resistances and capacitances, and heating sources. From the layout design and spatial
considerations, we obtain values for the thermal resistances and capacitances between
individual devices and a single device and ground. To determine the strength of each
heating source (driving force in the thermal network corresponding to a single device),
we extend the results of the individual MOSFET operation to the entire chip by a
statistical Monte Carlo-type algorithm. Thus we account for application and location
specific effects on the full-chip heating while achieving the coupling between individual
devices and their collective operation. Using our modeling technique, we obtain the
effects of power density on the full-chip heating and on the single device performance.
To achieve efficient chip designs, we also offer solutions for removing heat away from
the hottest regions of the chip surface. In Fig. 1.1, we show our device and chip levels
and their interaction.
2
Figure 1.1: (Top) Each MOSFET device is modeled by a lumped circuit for chip thermal
analysis. (Bottom) Devices and their interaction are shown. Heat flow between devices
causes thermal coupling.
1.2. Device Performance and Full-Chip Heating Model:
As aforementioned, we solve coupled device performance equations along with the fullchip heating model. To obtain device performance for the given boundary conditions, we
solve semiconductor equations along with the Schrödinger equation. We next solve the
lumped thermal network for the full-chip. Here we will first elaborate on the device
model and later on the thermal network.
1.2.1. Device Performance Model
We developed a quantum device solver based on the quantum and semiconductor
equations. We list these device equations below starting from Schrödinger equation (1)
followed by Poisson, electron current continuity, hole current continuity(2-6), and lattice
heat flow (7) equations. In addition we have one more equation which we call the
population equation (8) that gives the density of electrons in the channel by summing
contributions from different subbands.
E ( y)  
d 2 ( y)
 q ( x, y) ( y)
2m* dy 2
2
 2  
q

 p  n  D
(1)
(2)
n 1
 .J n  GRn
t q
(3)
p
1
  .J p  GR p
t
q
(4)
n
J n  qnn   QM  HS
  qn kT n
(5)
p
J p  qp p    QM  HS
  q p kT p
(6)
C
T
 . T    J n  J p  
t
3
(7)
n
m*kT

2

2
i
i
 EF  Ei  

kT
ln 1  e



(8)
We symbolize  , E, EF, ψ, n-p, Jn-p, T, D, GRn-p, C and κ as electrostatic potential, wave
energy, Fermi level, wave function, electron-hole concentrations, electron-hole current
densities, lattice temperature, net dopant concentration, electron-hole net generationrecombination rates, heat capacity and thermal diffusion constant, respectively. The other
parameters have the usual meaning. In Eqn. (1), x is parallel to the Si-SiO2 interface in
the MOSFET and y is normal to x and points in the direction of substrate.
We solve device equations numerically starting from the Schrödinger equation, which is
used to resolve the confinement effects in the MOSFET channel near the Si-SiO2
interface. We solve for the eigenenergies and eigenfunctions of the Schrödinger equation
to obtain the wave functions. We then solve the Poisson equation for the electrostatic
potential  . We next add the effects of electron transport in the channel by modifying the
electron concentration through the electron current continuity equation. We also solve for
the hole transport through the hole current continuity equation. We last ascertain that the
Fermi level complies with the calculated electron concentration and the wave functions.
Here we will discuss the heat flow equation and the ways to embed temperature
dependencies into these equations.
To obtain the non-isothermal device performance, we solve the differential heat flow
equation(7) and let the values of some simulation parameters change according to the
lattice temperature between iterations. The variables that are explicitly varied by
temperature are written below in the order of thermal voltage, intrinsic carrier
concentration, electron-hole mobilities, electron-hole saturation velocities, built-in
potentials and the bandgap of silicon:
T 
VTH (T )  VTH (To )  
 To 
1.5
T 
no (T )  no (To )  
 To 
e
  E g (T )
  T  E g (To ) 
 1


2 kT   T  E (T ) 

  o g

T 
 (T )   (To )  
 To 
(10)
2.5
 1  e T 2To
sat (T )  sat (To ) 
 1  e 12

4
(9)
(11)




(12)

n 

 no (T ) 
built in (T )  VTH (T ) ln 
Eg (T )  Eg (To ) 1  2.4x10-4 (T  To ) 
(13)
(14)
Here T is the lattice temperature in Kelvins. We assume that thermal equilibrium is
established between the carriers and the lattice. Thus electron and hole temperatures are
also equal to T. To is the reference lattice temperature, taken to be 300o K for this work.
We know the values of these parameters at room temperature, which are approximately
0.258eV for VTH, 1.45x1010cm-3 for no, 0.7x107 cm/s for υsat, and 1.12eV for Eg. Values of
μ and built in are not given because they depend also on other parameters not specified
here.
Eqn. (9) indicates thermal voltage changes linearly with temperature. The bandgap of
silicon in Eqn. (14) also varies linearly with temperature. However, carrier mobilities in
Eqn. (11) vary by a power law relation. Intrinsic carrier concentration and the saturation
velocity have the strongest dependency on temperature, which is exponential in nature.
Thus it appears that they are likely to have the strongest influence on the device
performance. However, investigations have shown that although that might be the case in
pn junctions, it is not the case in MOSFET devices unless temperature increases to such
high levels where the control of the gate over the channel is lost due to abundance in
intrinsic carriers for transport. Analyses show that non-isothermal MOSFET operation is
affected mostly through the carrier mobilities, saturation velocities and built-in boundary
potentials. As temperature increases, current decreases due to reduction in mobilities, and
increases slightly due to the saturation velocity and built-in boundary potentials
(temperature effectively lowers the threshold voltage). Thus, as temperature increases,
current decreases for moderate temperatures falling in the operating range of most of
today’s devices. However, for high temperatures such as 100o K above the ambient, the
effects of intrinsic carrier concentration may play a leading role and the MOSFET might
run into a condition much like thermal runaway in pn junctions.
Analysis shows that the temperature variation within a bulk MOSFET that is a few
degrees Kelvin higher than the boundaries is not high enough to cause major effects on
the charge transport and power consumption. Thus we adopt the following algorithm for
updating the values of the temperature dependent variables written above. We set the
temperature values at the boundaries to a constant value. This value can either be the
room temperature (the ambient temperature for the chip), or a temperature higher than
that. If the device is isolated from the chip or the chip just starts operating so that all
devices sit at the ambient level, we use the value of room temperature for the boundaries.
However, the temperature boundaries for a device near the center of the chip are higher
due to heating of the chip. Furthermore, we fix all, or fix some and let float other
temperature boundaries at the source, drain and substrate contacts. We observe that all
cases give similar results because the temperature variation is small in the channel.
Additionally, we let heat flow in the lateral direction within the bulk silicon, and assume
that there is no heat flow on the SiO2 because its thermal diffusion constant is one
5
hundred times smaller than that of silicon. We fix the values of thermal voltage, intrinsic
carrier concentration, saturation velocity, built-in boundary voltages, and bandgap of
silicon according to the given boundary temperature. We later solve the semiconductor
equations along with the quantum equations iteratively. We also update the value of the
mobilities and the thermal diffusion constant (temperature dependency of the thermal
diffusion constant will be discussed later), at each iteration. Thus we self consistently
solve semiconductor equations, quantum equations, and ascertain that the calculated
carrier mobilities satisfy transport considerations.
While we are solving the semiconductor equations, we also solve for the differential heat
flow equation. The heat flow equation provides the coupling between the lattice
temperature and the state variables such as current density and electric field. We repeat
the heat flow equation here for easy reference:
C
T
  .T   2T  H
t
(15)
Here, H is the heating term and we let the value of thermal diffusion constant κ vary with
doping level and the lattice temperature:
T 
 (T ) 
 
D   To 

1


19 
 2.8x10 
 (To )
4 / 3
(16)
We set the value of κ(To) to 1.5 K/Wcm for silicon at room temperature. We substitute
Eqn. (16) into Eqn. (14), and solve for the lattice temperature for the given current
densities and electrostatic potential. Here the  J . term can be recognized as the Joule
heating. Analysis shows that Joule heating contributes almost all the heating. As
mentioned before, when we are solving for the semiconductor and quantum equations, we
also solve for the differential heat flow equation. We then update the values of mobilities
and thermal diffusion constant until all equations and boundaries are satisfied.
1.2.2. Full-Chip Heating Model
We obtain the temperature map of the full-chip by solving the heat flow equation. We
transform the differential heat flow equation given in Eqns. (7) and (15) to a lumped heat
flow equation. We do this to overcome the difficulties introduced by differences in the
scales of a single device and the full-chip, where the dimensions of the full-chip are
thousands of times larger than the corresponding dimensions in a MOSFET. Using the
differential heat flow equation for the full-chip requires too many mesh points and is
neither practical nor conducive to the result for our simulation case.
6
We first use the following transformation to ease the numerical solution including
temperature dependency of the thermal diffusion constant:
T
T  To 
1
 ( )d
 (To ) To
(17)
Thus differential heat flow equation in terms of the averaged temperature T becomes:
C
T
  (To ) 2 T  H (T )
t
(18)
Here heating is in terms of the actual temperature T. If we assume that heat capacity has
the same temperature dependency as the thermal diffusion constant than C is not a
function of T. In addition, temperature T can be written in terms of T as follows:


T  To
T  To 1 

3To

 
3


(19)
We then integrate Eqn. (16) around our unit block, a single MOSFET device.
T
dV   o  TdS   HdV
(20)
t
V
S
V
We enclose the MOSFET by a rectangular prism. Here V and S are the volume and the
six faces of that prism, respectively. We note that heat flows in the direction of
decreasing temperature, thus T represents the heat flux. We take time and space
derivatives of temperature as constant in the volume and on the given face, respectively.
Taking the integrals in Eqn. (20) we obtain:
C
CV
T 6  o T f S f

  HdV
t f 1 l f
V
(21)
Here lf and T f are the distance and temperature difference between the centers of
adjacent prisms going normal to one of the six faces Sf. T shows the temperature
variation at the mid-point of that prism. Expression in Eqn. (21) is analogous to a KCL
type nodal equation, where terms on the left hand side are capacitive and resistive
components of the network, while the right side is the source term like a current source in
the KCL network. Thus taking T analogous to voltage we can write equivalent thermal
resistances and capacitances as follows:
C th  CV
7
(22)
R thf 
l f
oS f
(23)
One capacitive and six resistive components connect the device to other devices and
ground. We determine the values of thermal resistances and capacitance from layout
design and geometrical considerations. We then use Eqn. (21) to solve for the averaged
temperature for the calculated resistances, capacitances and source term. We obtain the
Joule heating from MOSFET simulations using the actual temperature as described in the
previous section. As a reminder, we obtain the MOSFET performance for given boundary
conditions and then extend these results to the chip surface. We get the Joule heating for
each device by an MC type methodology. In the following section we will elaborate on
the coupled device and full-chip system.
1.2.3. Coupled Device and Full-Chip Heating Model:
We solve self-consistently the device equations along with full-chip heating equations.
The solution necessitates convergence at the device level and the chip level. The device
level involves a coupled solution of semiconductor and quantum equations for voltage
bias and temperature conditions. We calculate the Joule heating due to a single device
using the actual temperature boundaries. We then use the solution of the device
performance equations to obtain the distributed performances of devices throughout the
whole chip by using the application and operation statistics. We then shrink each unit to a
single node on our thermal network and solve a lumped KCL-type thermal network for
nodal voltages. In the electrical analogy, nodal voltages are similar to averaged
temperatures. Thus we get the averaged temperature of each device on chip. We later
compare these temperatures with the ones used for obtaining performance figures. We
iteratively repeat the process until we achieve convergence at the device and chip levels.
Here we will elaborate more on the details of the mixed-mode solution.
For our representative simulation, we use the devices and layout of a Pentium processor.
As our representative device, we use a 0.4μm width, 90nm effective gate length (0.13μm
physical gate length) well-tempered MOSFET]. We obtain the doping profile of our
device by using the analytical expression for 50nm device and stretching appropriate
dimensions. We first solve device equations for this representative device on the chip.
The representative device is chosen as the one that sits at the median temperature of the
chip. During simulation we update the temperature dependent device parameters
according to that temperature. We obtain the median temperature of the entire chip after
we solve the lumped thermal network. Initially, this temperature is equal to the ambient
temperature since no device is working yet. During the mixed-mode simulation, we
update this temperature between iterations until convergence. We also decide on some
average voltage bias conditions for that device. We use a value of 1.5V for the gate-tosource and drain-to-source biases, which are reasonable average voltage conditions
during inverter switching . We also take the device as “On” for ten percent of the period,
which is used as the weighting factor for power for determining steady state conditions.
8
We then obtain the Joule heating of that device for these bias conditions and pass the
calculated value to the lumped thermal network for being used as the current source. For
an isolated device that is one of forty million devices in a square centimeter area, we find
the power density to be 1000 W/cm2. This is larger than the power density of a Pentium
III but here collective device behavior has not been considered yet.
We next solve for the nodal voltages of the RthCth thermal network. These nodal voltages
are equal to averaged device temperatures in the electrical analogy. Before solving the
thermal network we first need to construct it in conjunction with the layout design,
geometrical considerations and device performance. We replace each device on the chip
with a node that has a current source, thermal capacitance and five thermal resistances.
The current source, capacitance and one of the resistances are between the given node
and ground, and the other resistances are between nodes. Device performance along with
an MC type methodology (reflects the non-uniformity between device operations)
determines the value of the current source. However, the values of thermal resistances
and capacitances are obtained from the layout design and geometrical arguments. We
roughly estimate that there are forty million devices in an area of one square centimeter.
Furthermore we use the package configuration for a Pentium 4 processor, which consists
of the die, heat spreader, and package. Our calculations yield 200 K/W for the mutual
resistances and 2.0x106 K/W for the resistance connected to the ground. We use the same
values for all the devices assuming that they are uniformly distributed on the surface.
Once we have the values of resistances and sources for all the nodes, we obtain the
temperature that corresponds to each node or device by solving forty million KCL-type
equations for each node (i,j):
k
k 1
k
k
k
k
k
(T i , j  T i , j ) T i , j (T i , j  T i 1, j ) (T i , j  T i , j 1 )
C
 th 

 Iik, j (Ti ,kj1 )
t
Ri , j
Rith 1 , j
Rith, j  1
th
i, j
2
(16)
2
Here (i  ½,j) gives the resistance between nodes (i,j) and (i+1,j). To solve the KCL
equations, we first reduce the size of the system [19] by using a Thevenin equivalent
circuit on a subblock of twelve by twelve nodes. At each side of the block, we introduce
new nodes that are half resistance away from the boundary nodes. We then separately
short the new nodes introduced on each of the four sides. Size reduction and the
formation of new nodes are shown in Fig. 1.2. We next solve the system by a bilateral
conjugate gradient method for nodal temperatures
9
.
Figure 1.2: Size reduction methods are applied on a subblock of five by five. We obtain
four port Thevenin representation of each block and use that representation instead as
shown at the bottom of the figure.
1.3.1 Application and Results
So far, we have discussed the device and the full-chip heating models. Solution of the
device level determines the current (heat) sources in the thermal network. We then solve
the thermal network and obtain the value of the median temperature which in turn is the
10
input to the device solution. The feedback from the thermal network to the device is
straightforward because we just solve for one device for one temperature boundary.
However the relation is complicated for going from the device to the chip level, because
each device on the chip operates under different conditions that are affected by the
running applications and chip design. Thus we extend our device results to the entire chip
using an MC type methodology to account for the application and operation statistics of
the chip. To achieve this, we divide the chip into functional blocks and group them such
as cache, floating point unit, execution unit, etc. In this work, we use the die photo of the
Pentium III shown in Fig 1.3.
Figure 1.3: Pentium III processor die photo for 0.18 micron technology.
We obtain the percentage of consumed power to the total power for each block as shown
in Table 1. We renormalize those percentage powers and add instruction decode unit and
miscellaneous powers. We then distribute miscellaneous power to other units in
proportion to their areas. We next normalize these percentage powers by the
corresponding areas of each unit. Thus we obtain an estimate on the likelihood of finding
an active device in that unit relative to others.
11
Table 1.1: Percentage areas and powers of functional blocks in a Pentium III chip.
Pentium III Unit
Clock (CLK)
Issue Logic (ISL)
Memory Order Buffer (MOB)
Register Alias Table (RAT)
Bus Interface Unit (BIU)
Execution Unit (EU)
Fetch
Decode Unit (DU)
L1 Data Cache (L1C)
L2 Data Cache (L2C)
Percentage
Area
1.0
9.5
3.3
3.3
4.3
9.5
12.5
14.6
12.5
29.8
Percentage
Power
5.2
14.1
4.7
4.7
5.9
13.0
16.9
17.2
9.8
8.5
Normalized
Power/Area
1.0
0.29
0.28
0.28
0.27
0.26
0.26
0.23
0.15
0.05
In Table 1.1, we note that the clock has the highest likelihood while L2 data cache has the
least. Furthermore, we take the uniform probability between 0.5 and 1.0 as corresponding
to an “On” state in terms of device operation. Likewise “Off” state is represented by a
probability between 0 and 0.5. We then assume that a device in the clock unit is always
“On” such that it is always associated with a probability between 0.5 and 1.0. We next
use the normalized power ratios to weight the “On” probabilities of each unit. For
example, a device in the fetch block is likely to be represented by a number between 0.5
and 1.0 (“On” state) 26 out of 100 times it is assigned a value equal to its normalized
power. We repeat this procedure for each functional unit and determine a representative
number for each device in each unit. We then weight the calculated Joule heating of the
simulated device by these probabilities to obtain the strength of the current source that
corresponds to each device on the entire chip. Thus we also achieve the feedback in the
direction of device to entire chip. We then obtain full-chip heating in conjunction with
device operations. For convergence, we ascertain that the median temperature calculated
by the thermal network gives the same device performance for each device, or the heat
current calculated for each device gives the same median temperature used before. We
summarize our algorithm in Fig. 1.4.
12
Input
 Device Configuration
 RC Thermal Network
 Application Related Activity Profile
Device Level:
- Solve Non-Isothermal DD-QM Eqns.
- Obtain Joule Heating
Chip Level:
- Solve RthCth Thermal Network
- Obtain Temperature Profile
Output:
 Device Characteristics
 Full-Chip Temperature Profile
Figure 1.4: Coupled algorithm flowchart.
In Fig. 1.5, we show temperature dependent device performance characteristics. For an nMOSFET, as temperature increases, current decreases in the linear and saturation
regions.However, current-voltage characteristics differ from that under high temperature
conditions where as temperature increases, current also increases. This results in a
positive feedback and thermal instability. Thus chip designers like to avoid such kind of
situations. Here we offer methods to predict temperature maps of different IC designs
corresponding to different running applications.
13
Figure 1.5: Temperature dependent current-voltage characteristics of a 0.1μm nMOSFET for VGS=0.7,1.0,1.5V. As temperature increases, current decreases.
In Fig 6, we show the functional blocks used for the Pentium III chip. Percentage areas
and powers of each block are written in Table 1. Clock occupies the smallest area on the
chip, while L2 cache has the largest area. On the other hand, clock has the highest
normalized power and L2 cache has the least. Thus a device in the clock unit operates
much more frequently than a device in the L2 cache. This fact reflects itself as highest
and lowest temperatures corresponding to clock and L2 cache, respectively. We also
show our calculated temperature profile for Pentium III in Fig 1.6. Temperature reaches
seventy degrees peak above the ambient while the median and lowest temperatures are
thirty and twenty degrees above the outside temperature. These numbers are in agreement
with maximum tolerable temperatures of 80 and 90 degrees above the ambient for
Pentium III processors. This temperature profile can be used to relieve problems related
to hot spots on the chip by offering ways for rearranging the spatial distribution of
functional units and utilization of thermal contacts with direct connections to the
problematic areas.
14
Figure 1.6: a) Functional blocks of the Pentium III chip: Clock has the smallest area but
the largest normalized power. Unlike L2 Cache that has the largest area but smallest
normalized power as pointed out in Table 1. b) Our calculated temperature map for
Pentium III reaches a peak in the clock block (seventy degrees above the ambient) and
has the lowest temperature plateau in L2 cache (thirty degrees above the ambient).
15
2. Subtask 2: Modeling and Prototyping for on Chip Electromagnetic Effects
2.1 Introduction
We introduce a time-domain method to simulate the digital signal propagation along onchip interconnects by solving Maxwell’s equations with the Alternating-DirectionImplicit (ADI) method. With this method, we are able to resolve the large scale (i.e. onchip electromagnetic wave propagation) and fine scale (i.e. skin depth and substrate
current) structure in the same simulation, and the simulation time step is not limited by
the Courant condition. The simulations allow us to calculate in detail parasitic current
flow inside the substrate; propagation losses; skin-depth; and dispersion of digital signals
on non-ideal interconnects. We have found considerable substrate currents and losses that
depend on the substrate doping. So far, most of our applications have been for planar
chips. Over the next several quarters, we plan to adapt the method for 3D chips as well.
MS candidate Xi Shao has contributed significantly to this work under the supervision of
Prof. Neil Goldsman.
Inductive, capacitive coupling, and resistive losses in interconnects and substrates are
significant barriers in the development of high-speed digital and analog IC’s. Accurate
modeling of modern on-chip interconnects (including coupling and losses) usually
requires a full-wave solution to Maxwell’s equations. However, such a solution is
difficult because the wavelengths of interest are much larger than the fine topological
structure of IC’s. (Wavelengths are typically on the mm to cm scale, while chip
structures are on the micron scale.) In addition, digital and mixed (broaden band) signal
applications often require analysis in the time domain. Conventional Maxwell solvers
typically use the explicit Finite-Difference-Time-Domain (FDTD) method. However, the
conventional
method
is
limited
by
the
Courant
condition
2
2
2
2
( t  1 / c (1 / x  1 / y  1 / z ) ), which requires prohibitively small time steps to
resolve fine structure on the submicron scale. To overcome this problem, we have applied
the Alternating-Direction-Implicit (ADI) method to solve Maxwell’s Equation in IC’s,
and have overcome the Courant’s limit. We have used the method to model the MetalInsulator-Semiconductor-Substrate (MISS) structure. The simulations allow us to
calculate in detail parasitic current flow inside the substrate, propagation losses, skindepth and dispersion of digital signals on non-ideal interconnects. We have found
considerable substrate currents and losses that depend on the substrate doping.
2.2 Simulation Method
In the ADI method Maxwell’s equations (1) are discretized on the conventional Yee’s
staggered grids with the electric field on the grid cell edge center, and magnetic field on
the grid cell face center. In this way, the zero-divergence of the magnetic field is
maintained throughout the simulation.
16
D
  H  J,
t
B
(1)
   E ,
t
B   H , D   E, J   E
At each step, by manipulating Maxwell’s equations, we transform the differential
equations to a system of tri-diagonal algebraic equations. Here, we give an example of
discretizing the Ex component (equations (2-7)) during the two alternating steps. In step
1, the first half (Bz) of the right hand side in equation (2) and the first half (Ex) of the
right hand side in equation (3) are treated as implicit. We substitute equation (3)
( Bzn,(i11 / 2, j 1 / 2,k ) ) back to equation (2) and obtain the tri-diagonal equation (4). For the other
two dimensions (Ey and Ez), we perform similar manipulation to form a system of tridiagonal equations, which can be easily solved with a tri-diagonal matrix solver. The
magnetic field is updated using equations similar to equation (3).
In the next step, we treat the other half (By) implicit in equation (5), and (Ex) as implicit
in equation (6). We obtain the tri-diagonal system in equation (7) for Ex. Similarly, we
can obtain the other two tri-diagonal systems for Ey and Ez and solve them to update the
electric field. The magnetic field is updated with equations similar to equation (6).
These two steps are alternated thereafter.
STEP 1 (for Ex and Bz component):
 Exn,(1i 1/ 2, j ,k )   Exn,(i 1/ 2, j ,k ) 1 Bzn,(i11/ 2, j 1/ 2,k )  Bzn,(i11/ 2, j 1/ 2,k )

t

y

1B

Bzn,(i11/ 2, j 1/ 2, k )  Bzn,( i 1/ 2, j 1/ 2, k )
t
n
y ,( i 1/ 2, j , k 1/ 2)

B
n
y ,( i 1/ 2, j , k 1/ 2)
z
Exn,(i11/ 2, j , k )  Exn,(i11/ 2, j 1, k )
y
17
(2)
  Exn,(1i 1/ 2, j ,k )

E yn, (i 1, j 1/ 2,k )  E yn,(i , j 1/ 2, k )
x
(3)
a  Exn,(i11/ 2, j 1, k )  b  Exn,(i11/ 2, j , k )  c  E xn,(i11/ 2, j 1, k )  d
where
a
1 t 2
 y 2
b  1
c
2 t 2 t


 y 2 
1 t 2
 y 2
d  Exn,( i 1/ 2, j , k ) 
1 t
 z
( Byn,( i 1/ 2, j , k 1/ 2)  Byn,( i 1/ 2, j , k 1/ 2) ) 
E yn,( i 1, j 1/ 2, k )  E yn,( i , j 1/ 2, k )
1 t
1 t 2
n
n
(B
 Bz ,( i 1/ 2, j 1/ 2, k ) ) 
(

 y z ,( i 1/ 2, j 1/ 2, k )
 y
x
E yn,( i 1, j 1/ 2, k )  E yn,( i , j 1/ 2, k )
x
(4)
)
STEP 2 (for Ex and By component):
 Exn,(i21/ 2, j ,k )   Exn,(i21/ 2, j ,k ) 1 Bzn,(i11/ 2, j 1/ 2,k )  Bzn,(i11/ 2, j 1/ 2,k ) 1 Byn,(i21/ 2, j ,k 1/ 2)  Byn,(i21/ 2, j ,k 1/ 2)


  Exn,(i21/ 2, j ,k )
t

y

z
(5)
Byn,(i21/ 2, j ,k 1/ 2)  Byn,(i21/ 2, j ,k 1/ 2)
t

Ezn,(i11, j ,k 1/ 2)  Ezn,(i1, j , k 1/ 2)
x
(6)
18

Exn,(i21/ 2, j ,k 1)  Exn,(i21/ 2, j ,k )
z
a  E xn,(i21/ 2, j ,k 1)  b  Exn,(i21/ 2, j ,k )  c  Exn,(i21/ 2, j ,k 1)  d ,
where
1 t 2
a
 z 2
b 1
c
2 t 2
t


2
 z

1 t 2
 z 2
d  Exn,(1i 1/ 2, j ,k ) 

1 t
( Bzn,(i11/ 2, j 1/ 2,k )  Bzn,(i11/ 2, j 1/ 2, k ) )
 y
1 t
( Byn,(1i 1/ 2, j ,k 1/ 2)  Byn,(1i 1/ 2, j ,k 1/ 2) )
 z
n 1
n 1
Ezn,(i11, j ,k 1/ 2)  Ezn,(i1, j , k 1/ 2)
1 t 2 Ez ,( i 1, j ,k 1/ 2)  Ez ,( i , j ,k 1/ 2)

(

)
 z
x
x
(7)
The 3D-ADI method for solving Maxwell’s equations is unconditionally stable, and the
simulation time step is not limited by Courant’s condition. In our code, the grid spacing is
non-uniform in all three dimensions. This allows us to place enough resolution at the
places of interest. The simulation time step is chosen to be able to resolve the band width
of the signal. Mur’s first order absorption boundary condition is applied to simulate free
space.
2.3 Model Verification and Example Simulation Results
To test the code we applied it to a standard metal skin depth problem with a known
analytical solution. We performed a 2D simulation of EM wave propagation under a
metal strip of conductivity = 3.9×107 S/m. The domain bottom is bounded with Perfect
Electric Conductor (PEC) .The smallest grid size of 0.1 um is placed inside the metal. The
grid along Z direction is of uniform size= 150 um. The Courant condition requires ∆t <
0.33×10-15sec. Our simulation time step is ∆t =2×10-13 sec. The excitation frequency = 50
GHz. The skin current Jz inside the metal has the analytical solution
(8)
Jz  cos(k z z  y /  )  exp(  y /  ) ,
δ is the skin depth = 2 /( ) and kz is the wave number along the guide. Inside the
metal, the wave is damped in the Y direction and grazes along the Z direction. Fig. 2.1
shows the agreement between the simulation and analytical calculation for current Jz
inside the metal. The agreement is excellent. With the ADI method, we are able to reveal
the grazing wave pattern inside the metal.
We applied the 3D ADI code to study the digital pulse propagation along MetalInsulator-Semiconductor-Substrate (MISS) structure. Fig. 2.2 shows a cross-section of
19
the interconnect MISS structure we simulated with our ADI code. Along Z, the direction
of wave propagation, we have 120 grid points of uniform spacing = 25 um. In XY crosssection we have a non-uniform mesh with finest grid spacing = 0.1 um; and the time step
is ∆t =1×10-13 sec, giving 8389120 space grid points, with 1000 time steps. The
simulation time is 3-4 hours on a PC.
Fig. 2.3-2.7 show simulation results for a fast 1V, 20psec digital pulse of rise-time =
2ps, excited at one end of the interconnect. The metal strip conductivity = 5.8×107 S/m,
typical for copper. The substrate doping is set to be n = 1017 /cm3, which corresponds to
substrate conductivity = 2260 S/m. Fig. 2.3 shows the voltage signal at Z = 0, 500, and
1000 um. At Z = 500 and 1000 um, the signal amplitude is lowered and broadened. The
higher frequency components of the signal suffer larger damping. Figure 2.4 shows cross
section of Ey at Z= 1000 um and t= 50 ps. The Ey field concentrates inside the SiO2
layer. This corresponds to the skin-depth mode propagation along the MISS structure.
The EM field is guided along the channel formed by the metal skin current and substrate
current. Figures 2.5 shows the XY-cross section of current Jz inside the metal (along the
direction of the signal propagation), at Z = 1000 um and t = 37 ps. We see that due to the
skin depth effect, the current concentrates near the metal edges and surfaces, giving rise
to resistive losses. Figure 2.6 shows the top view of substrate current Jz at 5 um below
the SiO2 layer for t = 37 ps. The bright and dark shaded areas correspond to the rising and
falling of the signal, showing the spread of the current almost a tenth of a mm away from
the interconnect edge, which can obviously lead to parasitic cross talk. Fig. 2.7 shows
side view giving the depth of the substrate current. The substrate skin depth is tens of
microns. The substrate current contributes most to the parasitic interconnect losses.
Fig. 2.8 shows the voltage at different Z locations for substrate doping changing from n
= 1016 and 1018/cm3. The EM waves, all of which are in the skin-depth mode region,
suffers different dispersion and dissipation for different substrate dopings. Lower
substrate doping in this region yields more losses and dispersion.
20
Figure 2.1: Model Verification: shows excellent agreement between numerical and
analytical result. Geometry. Source at z = 0; f= 50 GHz (top). pattern of the current Jz
inside the metal obtained from the simulation and analytical calculation. The metal strip
conductivity = 3.9×107 S/m (middle and bottom).
21
Figure 2.2: Cross section of the simulated MISS structure. The XY cross section of the
metal strip is 6um x 1.8 um and thickness of the SiO2 layer is 2 um. Z is the direction of
propagation and lumped current flow.
Figure 2.3: Voltage observed at different Z locations (z=500 um, 1000 um) along the
MISS strip. Solid line shows simulation with substrate doping = 1017 /cm3 . Shows digital
signal losses and dispersion.
22
5
X 10
metal
SiO2
Figure 2.4: b) Cross section of Ey (V/m) field at Z= 1mm and t= 50 ps. The dark shading
corresponds to the weak electric field. The electric field Ey concentrates in the SiO2
layer.
Figure 2.5: Cross section of current Jz (mA/um2) inside the metal at Z=1mm and t = 37
ps. The dark color corresponds to strong current. Shows metal skin-depth effect losses.
23
Figure 2.6: Top view of current Jz (uA/um2) in the substrate (5um below SiO2 layer) at t
= 37 ps. Bright and dark shaded areas correspond to the rising and falling of the signal.
Shows potential interference and coupling in lateral direction.
Figure 2.7: Side view of current Jz (uA/um2) in substrate at X = 0. Shows current
penetration to the substrate. Substrate doping n = 1017 /cm3.
24
Figure 2.8: Voltage observed at different Z locations along the MISS strip with substrate
doping n1 = 1018, and n2 = 1016 /cm3.
3. Subtask 3: Prototyping, Modeling and Processing 3D Structures
3.1 Process Integration:
Key processes for developing 3D integrated circuits are wafer and die bonding,
micrometer via etching and metalization. To facilitate the development of these
processes, Chris Bles has been hired has an engineer by Task C under the direction of
Neil Goldsman, George Metze and Mike Khbeis (3D Integration) of the joint Program.
Chris received his BS in ECE from UMD in 2003, and will be entering the MS program
in ECE at UMD in Fall 2004. This section of the report describes Chris’ work on process
development and his interactions with LPS under the auspices of Task C (3D IC) of the
Joint Program.
To keep track of all the wafer lots, a database was developed to organize various
process information. When the database is online it will contain information on all tools
and recipes used to process wafers and be accessible through the glue system. The wafer
lots themselves also have a history associated with them describing what processes have
been run and what the results were. This allows for statistics tracking on the effectiveness
of our various processes.
25
To create vias in wafers,Chris developed expertise in photolithography. This
included learning to use a spinner and contact aligner. The spinner is used to deposit a
thin film of photo resist on a 6” wafer. The wafer is then inserted into the contact aligner
along with an appropriate mask. After the wafer is exposed in the UV light of the contact
aligner, it is chemically developed and ready for further processing. Most of the wafer
lots Chris processed involved photolithography at some point. The contact aligner is
capable of resolving features down to about one micron so that is the smallest via that can
be etched so far.
The vias are created in the silicon by a plasma etcher. An RF source ionizes
various gasses controlled by permanent and electromagnets. The ions in the plasma assist
the reactive species in a chemical reaction with the surface of the wafer, ultimately
resulting in a high aspect ratio via. The wafers are then run through a metal deposition
tool which places metal on the wafer under high pressure and temperature. Chris has
developed expertise in running these processes, and performing basic maintenance on the
tools, as well as in trouble shooting and repairing the tools when they go down.
The wafer bonding process involved much more experimentation than via design
since the via recipes were already programmed in the tools. Multiple approaches to wafer
bonding were taken. In some cases, polymers were used as a glue to hold them together.
In others direct bonding was achieved with various types of surface activation. Single die
were also placed on wafers using a programmable, automated placing tool.
Wafers bonded with HD4002 polyimide were patterned with trenches to allow for
the polymer to off gas during the bonding process. As the wafers are heated and pressure
is applied, various gasses are forced from the polymer which cannot diffuse through the
wafer. If the film has no trenches, the gas will collect in bubbles, destroying the bond
yield. Another material called cyclotene (BCB) does not off gas but, like the polyimide,
cannot easily be dispensed as a thin film on individual chips. Surface activation requires
no additional substance to achieve a bond.
Wet and dry surface activation chemically prepare a surface for bonding.
However, particles are very destructive to this process. An attempt at wet activation with
hydrogen peroxide resulted in far too many particles for practical use so we developed a
dry activation process with additional measures to control particles. The current process
uses the plasma etcher to bombard the wafer surface with Oxygen or Argon plasma for 5
minutes. The O2 plasma leaves oxygen atoms on top of the silicon crystal which, along
with hydrogen, can bond to the other wafer. Ar blasts the surface, removing Si atoms
from the top of the wafer so two such wafers will bond to each other. The wafers are then
brought into contact in a wafer bonding tool. To test the bonds, a scanning acoustic
microscope can see voids as a change in acoustic impedance so there is a clear picture of
how good the bond is. Using this process successful bonds were achieved on blank
wafers and thermal oxide coated wafers with trenches for out gassing. (see below)
26
fig 1
fig 2Fig 1 and Fig 2
Figures 3.1 and 3.2 contrast the wet and dry surface activation approaches. Fig 1 is a
bonded pair of wafers that were soaked in H2O2. Fig 2 is a thermal oxide coated wafer
pair that was activated with an O2 plasma.
fig 3
Fig 3.3 is a particle controlled Ar plasma activated bond. While the bond area approaches
100%, tests of the bond energy were somewhat lower than expected. However, the
bonded pairs survived a grinding process intended to remove all but 15 um of one of the
wafers, which is the critical test for the bond being compatible with the 3D integration
process.
27
fig 4
fig 5
Figures 3.4 and 3.5 show problems encountered on Oxide coated wafers after a 24 hour
cure at 200o C. The Oxide has no place to outgas so the bond process will ultimately fail
unless steps are taken to prevent this problem.
fig 6
fig 7
Figure 3.6 is an Oxide bonded pair with 1-2 um trenches etched into the oxide. Figure 3.7
shows that the trenches allow for the oxide to off gas so no additional voids form. The
large void was formed by a particle and is not due to the off gas problem.
28
3.2 Passive and Active 3DI Test Structures
We are very actively pursuing the development of IC test structures to help
determine benefits and applications of 3DI. This part of the program is being pursued by
Ph.D candidate Zeynep Dilli, who is being supervised by Neil Goldsman. The test
structures consist of integrated circuits which are designed to measure and investigate
electronic components. We have designed and more than ten integrated circuits fabricated
through the MOSIS processing facility. Our first set of chips was developed to help
measure the how the parasitic effects of interconnects and bonding pads reduce the
performance of standard planar circuits. We found that the effect of bonding pads can be
enormously detrimental. For the 0.6micron AMI MOSIS process, the capacitance of the
pad would slow the switching speed of an inverter by approximately a factor of one
thousand (1000). Therefore, much circuit design is devoted toward overcoming this effect
with expensive pad driving circuits. Transforming to 3D IC’s would obviate the need for
much of these bonding pads, and therefore could significantly increase circuit speed and
performance. Other test structures include 2D and 3D inductors, transformers, and 3D
capacitors. We have also developed a theory to calculate the inductance of these 3D
structures as a function of frequency. We are currently performing experiments to test
this theory and optimize inductor design. Another test circuit that we are pursuing is an
integrated circuit for communications. Radio Frequency circuits are very susceptible to
noise. As a results, integration of RF and digital electronics remains a challenge. 3D
integration may be the solution to this problem. By separating analog and digital levels in
a 3D circuit, and by placing shielding between the layers, much of the digital interference
that is so detrimental to the analog components of the circuits may be eliminated. To
investigate this, we developed test chips that are phased-locked loops and FSK
transmitters that were subject to digital interference. We found that significant reductions
in the interference could be achieved by introducing large amounts of on-chip shielding
around the RF component of the IC’s. We expect that transforming to 3D IC’s will allow
for even more shielding, and increased separation between analog and digital
components, thereby helping to realize robust integration of RF and digital circuits.
29
Download