Task C: 3D Integration Neil Goldsman, Bruce Jacob, Martin Peckerar The component of Task C lead by Neil Goldsman covers three subtasks: • Subtask 1: Modeling and Prototyping Device and Chip Heating for 2D & 3D Integration • Subtask 2: Modeling and Prototyping for on Chip Electromagnetic Effects • Subtask 3: Prototyping, Modeling and Processing 3D Structures 1. Subtask 1: Modeling and Prototyping Device and Chip Heating for 2D & 3D Integration 1.1 Introduction: We report on a novel method for predicting the temperature profile of an integrated circuit at the resolution of a single device. This work has been performed by Akin Akturk and Neil Goldsman. Recently, Latise Parker, a new MS candidate has begun to participate as well. As chips are densely packed with more transistors per unit area, chip manufacturers are coping with several problems to guarantee good chip performance. One of these important problems is the full-chip heating. Investigators have pointed out that towards the end of the semiconductor roadmap, there will be more devices per unit area due to scaling down of physical device dimensions. The resulting real estate crowding will induce high temperatures and temperature gradients on the chip. The increase in device density per chip, as well as higher device capacitance, higher clock speeds and more onstate leakage currents will give rise to more power dissipation which will translate into higher on-chip temperatures. This problem can be alleviated somewhat by reducing supply voltages. However, noise margins restrict the level to which supply voltages can be reduced, so increased power dissipation is inevitable. The problems that we describe above for 2D chips are highly exacerbated for 3D integration since the circuit surface area is greatly reduced, and the generated heat can not easily escape through surface cooling. Preliminary research has been done to estimate the temperature profile for given chips. However, we fully address the need for a tool that calculates chip temperatures, and establishes the necessary link between single device operations and the full-chip heating for both planar and 3D IC’s. We have developed a new methodology for predicting full-chip heating at the resolution of a single device. On the device level, we first obtain electrical characteristics of a MOSFET for the given voltage and temperature by self consistently solving coupled quantum and semiconductor equations. We then solve the system on the chip level, where the thermal coupling between devices is modeled by a lumped circuit type thermal network. We next obtain 1 the model for the thermal network comprised of passive thermal elements like thermal resistances and capacitances, and heating sources. From the layout design and spatial considerations, we obtain values for the thermal resistances and capacitances between individual devices and a single device and ground. To determine the strength of each heating source (driving force in the thermal network corresponding to a single device), we extend the results of the individual MOSFET operation to the entire chip by a statistical Monte Carlo-type algorithm. Thus we account for application and location specific effects on the full-chip heating while achieving the coupling between individual devices and their collective operation. Using our modeling technique, we obtain the effects of power density on the full-chip heating and on the single device performance. To achieve efficient chip designs, we also offer solutions for removing heat away from the hottest regions of the chip surface. In Fig. 1.1, we show our device and chip levels and their interaction. 2 Figure 1.1: (Top) Each MOSFET device is modeled by a lumped circuit for chip thermal analysis. (Bottom) Devices and their interaction are shown. Heat flow between devices causes thermal coupling. 1.2. Device Performance and Full-Chip Heating Model: As aforementioned, we solve coupled device performance equations along with the fullchip heating model. To obtain device performance for the given boundary conditions, we solve semiconductor equations along with the Schrödinger equation. We next solve the lumped thermal network for the full-chip. Here we will first elaborate on the device model and later on the thermal network. 1.2.1. Device Performance Model We developed a quantum device solver based on the quantum and semiconductor equations. We list these device equations below starting from Schrödinger equation (1) followed by Poisson, electron current continuity, hole current continuity(2-6), and lattice heat flow (7) equations. In addition we have one more equation which we call the population equation (8) that gives the density of electrons in the channel by summing contributions from different subbands. E ( y) d 2 ( y) q ( x, y) ( y) 2m* dy 2 2 2 q p n D (1) (2) n 1 .J n GRn t q (3) p 1 .J p GR p t q (4) n J n qnn QM HS qn kT n (5) p J p qp p QM HS q p kT p (6) C T . T J n J p t 3 (7) n m*kT 2 2 i i EF Ei kT ln 1 e (8) We symbolize , E, EF, ψ, n-p, Jn-p, T, D, GRn-p, C and κ as electrostatic potential, wave energy, Fermi level, wave function, electron-hole concentrations, electron-hole current densities, lattice temperature, net dopant concentration, electron-hole net generationrecombination rates, heat capacity and thermal diffusion constant, respectively. The other parameters have the usual meaning. In Eqn. (1), x is parallel to the Si-SiO2 interface in the MOSFET and y is normal to x and points in the direction of substrate. We solve device equations numerically starting from the Schrödinger equation, which is used to resolve the confinement effects in the MOSFET channel near the Si-SiO2 interface. We solve for the eigenenergies and eigenfunctions of the Schrödinger equation to obtain the wave functions. We then solve the Poisson equation for the electrostatic potential . We next add the effects of electron transport in the channel by modifying the electron concentration through the electron current continuity equation. We also solve for the hole transport through the hole current continuity equation. We last ascertain that the Fermi level complies with the calculated electron concentration and the wave functions. Here we will discuss the heat flow equation and the ways to embed temperature dependencies into these equations. To obtain the non-isothermal device performance, we solve the differential heat flow equation(7) and let the values of some simulation parameters change according to the lattice temperature between iterations. The variables that are explicitly varied by temperature are written below in the order of thermal voltage, intrinsic carrier concentration, electron-hole mobilities, electron-hole saturation velocities, built-in potentials and the bandgap of silicon: T VTH (T ) VTH (To ) To 1.5 T no (T ) no (To ) To e E g (T ) T E g (To ) 1 2 kT T E (T ) o g T (T ) (To ) To (10) 2.5 1 e T 2To sat (T ) sat (To ) 1 e 12 4 (9) (11) (12) n no (T ) built in (T ) VTH (T ) ln Eg (T ) Eg (To ) 1 2.4x10-4 (T To ) (13) (14) Here T is the lattice temperature in Kelvins. We assume that thermal equilibrium is established between the carriers and the lattice. Thus electron and hole temperatures are also equal to T. To is the reference lattice temperature, taken to be 300o K for this work. We know the values of these parameters at room temperature, which are approximately 0.258eV for VTH, 1.45x1010cm-3 for no, 0.7x107 cm/s for υsat, and 1.12eV for Eg. Values of μ and built in are not given because they depend also on other parameters not specified here. Eqn. (9) indicates thermal voltage changes linearly with temperature. The bandgap of silicon in Eqn. (14) also varies linearly with temperature. However, carrier mobilities in Eqn. (11) vary by a power law relation. Intrinsic carrier concentration and the saturation velocity have the strongest dependency on temperature, which is exponential in nature. Thus it appears that they are likely to have the strongest influence on the device performance. However, investigations have shown that although that might be the case in pn junctions, it is not the case in MOSFET devices unless temperature increases to such high levels where the control of the gate over the channel is lost due to abundance in intrinsic carriers for transport. Analyses show that non-isothermal MOSFET operation is affected mostly through the carrier mobilities, saturation velocities and built-in boundary potentials. As temperature increases, current decreases due to reduction in mobilities, and increases slightly due to the saturation velocity and built-in boundary potentials (temperature effectively lowers the threshold voltage). Thus, as temperature increases, current decreases for moderate temperatures falling in the operating range of most of today’s devices. However, for high temperatures such as 100o K above the ambient, the effects of intrinsic carrier concentration may play a leading role and the MOSFET might run into a condition much like thermal runaway in pn junctions. Analysis shows that the temperature variation within a bulk MOSFET that is a few degrees Kelvin higher than the boundaries is not high enough to cause major effects on the charge transport and power consumption. Thus we adopt the following algorithm for updating the values of the temperature dependent variables written above. We set the temperature values at the boundaries to a constant value. This value can either be the room temperature (the ambient temperature for the chip), or a temperature higher than that. If the device is isolated from the chip or the chip just starts operating so that all devices sit at the ambient level, we use the value of room temperature for the boundaries. However, the temperature boundaries for a device near the center of the chip are higher due to heating of the chip. Furthermore, we fix all, or fix some and let float other temperature boundaries at the source, drain and substrate contacts. We observe that all cases give similar results because the temperature variation is small in the channel. Additionally, we let heat flow in the lateral direction within the bulk silicon, and assume that there is no heat flow on the SiO2 because its thermal diffusion constant is one 5 hundred times smaller than that of silicon. We fix the values of thermal voltage, intrinsic carrier concentration, saturation velocity, built-in boundary voltages, and bandgap of silicon according to the given boundary temperature. We later solve the semiconductor equations along with the quantum equations iteratively. We also update the value of the mobilities and the thermal diffusion constant (temperature dependency of the thermal diffusion constant will be discussed later), at each iteration. Thus we self consistently solve semiconductor equations, quantum equations, and ascertain that the calculated carrier mobilities satisfy transport considerations. While we are solving the semiconductor equations, we also solve for the differential heat flow equation. The heat flow equation provides the coupling between the lattice temperature and the state variables such as current density and electric field. We repeat the heat flow equation here for easy reference: C T .T 2T H t (15) Here, H is the heating term and we let the value of thermal diffusion constant κ vary with doping level and the lattice temperature: T (T ) D To 1 19 2.8x10 (To ) 4 / 3 (16) We set the value of κ(To) to 1.5 K/Wcm for silicon at room temperature. We substitute Eqn. (16) into Eqn. (14), and solve for the lattice temperature for the given current densities and electrostatic potential. Here the J . term can be recognized as the Joule heating. Analysis shows that Joule heating contributes almost all the heating. As mentioned before, when we are solving for the semiconductor and quantum equations, we also solve for the differential heat flow equation. We then update the values of mobilities and thermal diffusion constant until all equations and boundaries are satisfied. 1.2.2. Full-Chip Heating Model We obtain the temperature map of the full-chip by solving the heat flow equation. We transform the differential heat flow equation given in Eqns. (7) and (15) to a lumped heat flow equation. We do this to overcome the difficulties introduced by differences in the scales of a single device and the full-chip, where the dimensions of the full-chip are thousands of times larger than the corresponding dimensions in a MOSFET. Using the differential heat flow equation for the full-chip requires too many mesh points and is neither practical nor conducive to the result for our simulation case. 6 We first use the following transformation to ease the numerical solution including temperature dependency of the thermal diffusion constant: T T To 1 ( )d (To ) To (17) Thus differential heat flow equation in terms of the averaged temperature T becomes: C T (To ) 2 T H (T ) t (18) Here heating is in terms of the actual temperature T. If we assume that heat capacity has the same temperature dependency as the thermal diffusion constant than C is not a function of T. In addition, temperature T can be written in terms of T as follows: T To T To 1 3To 3 (19) We then integrate Eqn. (16) around our unit block, a single MOSFET device. T dV o TdS HdV (20) t V S V We enclose the MOSFET by a rectangular prism. Here V and S are the volume and the six faces of that prism, respectively. We note that heat flows in the direction of decreasing temperature, thus T represents the heat flux. We take time and space derivatives of temperature as constant in the volume and on the given face, respectively. Taking the integrals in Eqn. (20) we obtain: C CV T 6 o T f S f HdV t f 1 l f V (21) Here lf and T f are the distance and temperature difference between the centers of adjacent prisms going normal to one of the six faces Sf. T shows the temperature variation at the mid-point of that prism. Expression in Eqn. (21) is analogous to a KCL type nodal equation, where terms on the left hand side are capacitive and resistive components of the network, while the right side is the source term like a current source in the KCL network. Thus taking T analogous to voltage we can write equivalent thermal resistances and capacitances as follows: C th CV 7 (22) R thf l f oS f (23) One capacitive and six resistive components connect the device to other devices and ground. We determine the values of thermal resistances and capacitance from layout design and geometrical considerations. We then use Eqn. (21) to solve for the averaged temperature for the calculated resistances, capacitances and source term. We obtain the Joule heating from MOSFET simulations using the actual temperature as described in the previous section. As a reminder, we obtain the MOSFET performance for given boundary conditions and then extend these results to the chip surface. We get the Joule heating for each device by an MC type methodology. In the following section we will elaborate on the coupled device and full-chip system. 1.2.3. Coupled Device and Full-Chip Heating Model: We solve self-consistently the device equations along with full-chip heating equations. The solution necessitates convergence at the device level and the chip level. The device level involves a coupled solution of semiconductor and quantum equations for voltage bias and temperature conditions. We calculate the Joule heating due to a single device using the actual temperature boundaries. We then use the solution of the device performance equations to obtain the distributed performances of devices throughout the whole chip by using the application and operation statistics. We then shrink each unit to a single node on our thermal network and solve a lumped KCL-type thermal network for nodal voltages. In the electrical analogy, nodal voltages are similar to averaged temperatures. Thus we get the averaged temperature of each device on chip. We later compare these temperatures with the ones used for obtaining performance figures. We iteratively repeat the process until we achieve convergence at the device and chip levels. Here we will elaborate more on the details of the mixed-mode solution. For our representative simulation, we use the devices and layout of a Pentium processor. As our representative device, we use a 0.4μm width, 90nm effective gate length (0.13μm physical gate length) well-tempered MOSFET]. We obtain the doping profile of our device by using the analytical expression for 50nm device and stretching appropriate dimensions. We first solve device equations for this representative device on the chip. The representative device is chosen as the one that sits at the median temperature of the chip. During simulation we update the temperature dependent device parameters according to that temperature. We obtain the median temperature of the entire chip after we solve the lumped thermal network. Initially, this temperature is equal to the ambient temperature since no device is working yet. During the mixed-mode simulation, we update this temperature between iterations until convergence. We also decide on some average voltage bias conditions for that device. We use a value of 1.5V for the gate-tosource and drain-to-source biases, which are reasonable average voltage conditions during inverter switching . We also take the device as “On” for ten percent of the period, which is used as the weighting factor for power for determining steady state conditions. 8 We then obtain the Joule heating of that device for these bias conditions and pass the calculated value to the lumped thermal network for being used as the current source. For an isolated device that is one of forty million devices in a square centimeter area, we find the power density to be 1000 W/cm2. This is larger than the power density of a Pentium III but here collective device behavior has not been considered yet. We next solve for the nodal voltages of the RthCth thermal network. These nodal voltages are equal to averaged device temperatures in the electrical analogy. Before solving the thermal network we first need to construct it in conjunction with the layout design, geometrical considerations and device performance. We replace each device on the chip with a node that has a current source, thermal capacitance and five thermal resistances. The current source, capacitance and one of the resistances are between the given node and ground, and the other resistances are between nodes. Device performance along with an MC type methodology (reflects the non-uniformity between device operations) determines the value of the current source. However, the values of thermal resistances and capacitances are obtained from the layout design and geometrical arguments. We roughly estimate that there are forty million devices in an area of one square centimeter. Furthermore we use the package configuration for a Pentium 4 processor, which consists of the die, heat spreader, and package. Our calculations yield 200 K/W for the mutual resistances and 2.0x106 K/W for the resistance connected to the ground. We use the same values for all the devices assuming that they are uniformly distributed on the surface. Once we have the values of resistances and sources for all the nodes, we obtain the temperature that corresponds to each node or device by solving forty million KCL-type equations for each node (i,j): k k 1 k k k k k (T i , j T i , j ) T i , j (T i , j T i 1, j ) (T i , j T i , j 1 ) C th Iik, j (Ti ,kj1 ) t Ri , j Rith 1 , j Rith, j 1 th i, j 2 (16) 2 Here (i ½,j) gives the resistance between nodes (i,j) and (i+1,j). To solve the KCL equations, we first reduce the size of the system [19] by using a Thevenin equivalent circuit on a subblock of twelve by twelve nodes. At each side of the block, we introduce new nodes that are half resistance away from the boundary nodes. We then separately short the new nodes introduced on each of the four sides. Size reduction and the formation of new nodes are shown in Fig. 1.2. We next solve the system by a bilateral conjugate gradient method for nodal temperatures 9 . Figure 1.2: Size reduction methods are applied on a subblock of five by five. We obtain four port Thevenin representation of each block and use that representation instead as shown at the bottom of the figure. 1.3.1 Application and Results So far, we have discussed the device and the full-chip heating models. Solution of the device level determines the current (heat) sources in the thermal network. We then solve the thermal network and obtain the value of the median temperature which in turn is the 10 input to the device solution. The feedback from the thermal network to the device is straightforward because we just solve for one device for one temperature boundary. However the relation is complicated for going from the device to the chip level, because each device on the chip operates under different conditions that are affected by the running applications and chip design. Thus we extend our device results to the entire chip using an MC type methodology to account for the application and operation statistics of the chip. To achieve this, we divide the chip into functional blocks and group them such as cache, floating point unit, execution unit, etc. In this work, we use the die photo of the Pentium III shown in Fig 1.3. Figure 1.3: Pentium III processor die photo for 0.18 micron technology. We obtain the percentage of consumed power to the total power for each block as shown in Table 1. We renormalize those percentage powers and add instruction decode unit and miscellaneous powers. We then distribute miscellaneous power to other units in proportion to their areas. We next normalize these percentage powers by the corresponding areas of each unit. Thus we obtain an estimate on the likelihood of finding an active device in that unit relative to others. 11 Table 1.1: Percentage areas and powers of functional blocks in a Pentium III chip. Pentium III Unit Clock (CLK) Issue Logic (ISL) Memory Order Buffer (MOB) Register Alias Table (RAT) Bus Interface Unit (BIU) Execution Unit (EU) Fetch Decode Unit (DU) L1 Data Cache (L1C) L2 Data Cache (L2C) Percentage Area 1.0 9.5 3.3 3.3 4.3 9.5 12.5 14.6 12.5 29.8 Percentage Power 5.2 14.1 4.7 4.7 5.9 13.0 16.9 17.2 9.8 8.5 Normalized Power/Area 1.0 0.29 0.28 0.28 0.27 0.26 0.26 0.23 0.15 0.05 In Table 1.1, we note that the clock has the highest likelihood while L2 data cache has the least. Furthermore, we take the uniform probability between 0.5 and 1.0 as corresponding to an “On” state in terms of device operation. Likewise “Off” state is represented by a probability between 0 and 0.5. We then assume that a device in the clock unit is always “On” such that it is always associated with a probability between 0.5 and 1.0. We next use the normalized power ratios to weight the “On” probabilities of each unit. For example, a device in the fetch block is likely to be represented by a number between 0.5 and 1.0 (“On” state) 26 out of 100 times it is assigned a value equal to its normalized power. We repeat this procedure for each functional unit and determine a representative number for each device in each unit. We then weight the calculated Joule heating of the simulated device by these probabilities to obtain the strength of the current source that corresponds to each device on the entire chip. Thus we also achieve the feedback in the direction of device to entire chip. We then obtain full-chip heating in conjunction with device operations. For convergence, we ascertain that the median temperature calculated by the thermal network gives the same device performance for each device, or the heat current calculated for each device gives the same median temperature used before. We summarize our algorithm in Fig. 1.4. 12 Input Device Configuration RC Thermal Network Application Related Activity Profile Device Level: - Solve Non-Isothermal DD-QM Eqns. - Obtain Joule Heating Chip Level: - Solve RthCth Thermal Network - Obtain Temperature Profile Output: Device Characteristics Full-Chip Temperature Profile Figure 1.4: Coupled algorithm flowchart. In Fig. 1.5, we show temperature dependent device performance characteristics. For an nMOSFET, as temperature increases, current decreases in the linear and saturation regions.However, current-voltage characteristics differ from that under high temperature conditions where as temperature increases, current also increases. This results in a positive feedback and thermal instability. Thus chip designers like to avoid such kind of situations. Here we offer methods to predict temperature maps of different IC designs corresponding to different running applications. 13 Figure 1.5: Temperature dependent current-voltage characteristics of a 0.1μm nMOSFET for VGS=0.7,1.0,1.5V. As temperature increases, current decreases. In Fig 6, we show the functional blocks used for the Pentium III chip. Percentage areas and powers of each block are written in Table 1. Clock occupies the smallest area on the chip, while L2 cache has the largest area. On the other hand, clock has the highest normalized power and L2 cache has the least. Thus a device in the clock unit operates much more frequently than a device in the L2 cache. This fact reflects itself as highest and lowest temperatures corresponding to clock and L2 cache, respectively. We also show our calculated temperature profile for Pentium III in Fig 1.6. Temperature reaches seventy degrees peak above the ambient while the median and lowest temperatures are thirty and twenty degrees above the outside temperature. These numbers are in agreement with maximum tolerable temperatures of 80 and 90 degrees above the ambient for Pentium III processors. This temperature profile can be used to relieve problems related to hot spots on the chip by offering ways for rearranging the spatial distribution of functional units and utilization of thermal contacts with direct connections to the problematic areas. 14 Figure 1.6: a) Functional blocks of the Pentium III chip: Clock has the smallest area but the largest normalized power. Unlike L2 Cache that has the largest area but smallest normalized power as pointed out in Table 1. b) Our calculated temperature map for Pentium III reaches a peak in the clock block (seventy degrees above the ambient) and has the lowest temperature plateau in L2 cache (thirty degrees above the ambient). 15 2. Subtask 2: Modeling and Prototyping for on Chip Electromagnetic Effects 2.1 Introduction We introduce a time-domain method to simulate the digital signal propagation along onchip interconnects by solving Maxwell’s equations with the Alternating-DirectionImplicit (ADI) method. With this method, we are able to resolve the large scale (i.e. onchip electromagnetic wave propagation) and fine scale (i.e. skin depth and substrate current) structure in the same simulation, and the simulation time step is not limited by the Courant condition. The simulations allow us to calculate in detail parasitic current flow inside the substrate; propagation losses; skin-depth; and dispersion of digital signals on non-ideal interconnects. We have found considerable substrate currents and losses that depend on the substrate doping. So far, most of our applications have been for planar chips. Over the next several quarters, we plan to adapt the method for 3D chips as well. MS candidate Xi Shao has contributed significantly to this work under the supervision of Prof. Neil Goldsman. Inductive, capacitive coupling, and resistive losses in interconnects and substrates are significant barriers in the development of high-speed digital and analog IC’s. Accurate modeling of modern on-chip interconnects (including coupling and losses) usually requires a full-wave solution to Maxwell’s equations. However, such a solution is difficult because the wavelengths of interest are much larger than the fine topological structure of IC’s. (Wavelengths are typically on the mm to cm scale, while chip structures are on the micron scale.) In addition, digital and mixed (broaden band) signal applications often require analysis in the time domain. Conventional Maxwell solvers typically use the explicit Finite-Difference-Time-Domain (FDTD) method. However, the conventional method is limited by the Courant condition 2 2 2 2 ( t 1 / c (1 / x 1 / y 1 / z ) ), which requires prohibitively small time steps to resolve fine structure on the submicron scale. To overcome this problem, we have applied the Alternating-Direction-Implicit (ADI) method to solve Maxwell’s Equation in IC’s, and have overcome the Courant’s limit. We have used the method to model the MetalInsulator-Semiconductor-Substrate (MISS) structure. The simulations allow us to calculate in detail parasitic current flow inside the substrate, propagation losses, skindepth and dispersion of digital signals on non-ideal interconnects. We have found considerable substrate currents and losses that depend on the substrate doping. 2.2 Simulation Method In the ADI method Maxwell’s equations (1) are discretized on the conventional Yee’s staggered grids with the electric field on the grid cell edge center, and magnetic field on the grid cell face center. In this way, the zero-divergence of the magnetic field is maintained throughout the simulation. 16 D H J, t B (1) E , t B H , D E, J E At each step, by manipulating Maxwell’s equations, we transform the differential equations to a system of tri-diagonal algebraic equations. Here, we give an example of discretizing the Ex component (equations (2-7)) during the two alternating steps. In step 1, the first half (Bz) of the right hand side in equation (2) and the first half (Ex) of the right hand side in equation (3) are treated as implicit. We substitute equation (3) ( Bzn,(i11 / 2, j 1 / 2,k ) ) back to equation (2) and obtain the tri-diagonal equation (4). For the other two dimensions (Ey and Ez), we perform similar manipulation to form a system of tridiagonal equations, which can be easily solved with a tri-diagonal matrix solver. The magnetic field is updated using equations similar to equation (3). In the next step, we treat the other half (By) implicit in equation (5), and (Ex) as implicit in equation (6). We obtain the tri-diagonal system in equation (7) for Ex. Similarly, we can obtain the other two tri-diagonal systems for Ey and Ez and solve them to update the electric field. The magnetic field is updated with equations similar to equation (6). These two steps are alternated thereafter. STEP 1 (for Ex and Bz component): Exn,(1i 1/ 2, j ,k ) Exn,(i 1/ 2, j ,k ) 1 Bzn,(i11/ 2, j 1/ 2,k ) Bzn,(i11/ 2, j 1/ 2,k ) t y 1B Bzn,(i11/ 2, j 1/ 2, k ) Bzn,( i 1/ 2, j 1/ 2, k ) t n y ,( i 1/ 2, j , k 1/ 2) B n y ,( i 1/ 2, j , k 1/ 2) z Exn,(i11/ 2, j , k ) Exn,(i11/ 2, j 1, k ) y 17 (2) Exn,(1i 1/ 2, j ,k ) E yn, (i 1, j 1/ 2,k ) E yn,(i , j 1/ 2, k ) x (3) a Exn,(i11/ 2, j 1, k ) b Exn,(i11/ 2, j , k ) c E xn,(i11/ 2, j 1, k ) d where a 1 t 2 y 2 b 1 c 2 t 2 t y 2 1 t 2 y 2 d Exn,( i 1/ 2, j , k ) 1 t z ( Byn,( i 1/ 2, j , k 1/ 2) Byn,( i 1/ 2, j , k 1/ 2) ) E yn,( i 1, j 1/ 2, k ) E yn,( i , j 1/ 2, k ) 1 t 1 t 2 n n (B Bz ,( i 1/ 2, j 1/ 2, k ) ) ( y z ,( i 1/ 2, j 1/ 2, k ) y x E yn,( i 1, j 1/ 2, k ) E yn,( i , j 1/ 2, k ) x (4) ) STEP 2 (for Ex and By component): Exn,(i21/ 2, j ,k ) Exn,(i21/ 2, j ,k ) 1 Bzn,(i11/ 2, j 1/ 2,k ) Bzn,(i11/ 2, j 1/ 2,k ) 1 Byn,(i21/ 2, j ,k 1/ 2) Byn,(i21/ 2, j ,k 1/ 2) Exn,(i21/ 2, j ,k ) t y z (5) Byn,(i21/ 2, j ,k 1/ 2) Byn,(i21/ 2, j ,k 1/ 2) t Ezn,(i11, j ,k 1/ 2) Ezn,(i1, j , k 1/ 2) x (6) 18 Exn,(i21/ 2, j ,k 1) Exn,(i21/ 2, j ,k ) z a E xn,(i21/ 2, j ,k 1) b Exn,(i21/ 2, j ,k ) c Exn,(i21/ 2, j ,k 1) d , where 1 t 2 a z 2 b 1 c 2 t 2 t 2 z 1 t 2 z 2 d Exn,(1i 1/ 2, j ,k ) 1 t ( Bzn,(i11/ 2, j 1/ 2,k ) Bzn,(i11/ 2, j 1/ 2, k ) ) y 1 t ( Byn,(1i 1/ 2, j ,k 1/ 2) Byn,(1i 1/ 2, j ,k 1/ 2) ) z n 1 n 1 Ezn,(i11, j ,k 1/ 2) Ezn,(i1, j , k 1/ 2) 1 t 2 Ez ,( i 1, j ,k 1/ 2) Ez ,( i , j ,k 1/ 2) ( ) z x x (7) The 3D-ADI method for solving Maxwell’s equations is unconditionally stable, and the simulation time step is not limited by Courant’s condition. In our code, the grid spacing is non-uniform in all three dimensions. This allows us to place enough resolution at the places of interest. The simulation time step is chosen to be able to resolve the band width of the signal. Mur’s first order absorption boundary condition is applied to simulate free space. 2.3 Model Verification and Example Simulation Results To test the code we applied it to a standard metal skin depth problem with a known analytical solution. We performed a 2D simulation of EM wave propagation under a metal strip of conductivity = 3.9×107 S/m. The domain bottom is bounded with Perfect Electric Conductor (PEC) .The smallest grid size of 0.1 um is placed inside the metal. The grid along Z direction is of uniform size= 150 um. The Courant condition requires ∆t < 0.33×10-15sec. Our simulation time step is ∆t =2×10-13 sec. The excitation frequency = 50 GHz. The skin current Jz inside the metal has the analytical solution (8) Jz cos(k z z y / ) exp( y / ) , δ is the skin depth = 2 /( ) and kz is the wave number along the guide. Inside the metal, the wave is damped in the Y direction and grazes along the Z direction. Fig. 2.1 shows the agreement between the simulation and analytical calculation for current Jz inside the metal. The agreement is excellent. With the ADI method, we are able to reveal the grazing wave pattern inside the metal. We applied the 3D ADI code to study the digital pulse propagation along MetalInsulator-Semiconductor-Substrate (MISS) structure. Fig. 2.2 shows a cross-section of 19 the interconnect MISS structure we simulated with our ADI code. Along Z, the direction of wave propagation, we have 120 grid points of uniform spacing = 25 um. In XY crosssection we have a non-uniform mesh with finest grid spacing = 0.1 um; and the time step is ∆t =1×10-13 sec, giving 8389120 space grid points, with 1000 time steps. The simulation time is 3-4 hours on a PC. Fig. 2.3-2.7 show simulation results for a fast 1V, 20psec digital pulse of rise-time = 2ps, excited at one end of the interconnect. The metal strip conductivity = 5.8×107 S/m, typical for copper. The substrate doping is set to be n = 1017 /cm3, which corresponds to substrate conductivity = 2260 S/m. Fig. 2.3 shows the voltage signal at Z = 0, 500, and 1000 um. At Z = 500 and 1000 um, the signal amplitude is lowered and broadened. The higher frequency components of the signal suffer larger damping. Figure 2.4 shows cross section of Ey at Z= 1000 um and t= 50 ps. The Ey field concentrates inside the SiO2 layer. This corresponds to the skin-depth mode propagation along the MISS structure. The EM field is guided along the channel formed by the metal skin current and substrate current. Figures 2.5 shows the XY-cross section of current Jz inside the metal (along the direction of the signal propagation), at Z = 1000 um and t = 37 ps. We see that due to the skin depth effect, the current concentrates near the metal edges and surfaces, giving rise to resistive losses. Figure 2.6 shows the top view of substrate current Jz at 5 um below the SiO2 layer for t = 37 ps. The bright and dark shaded areas correspond to the rising and falling of the signal, showing the spread of the current almost a tenth of a mm away from the interconnect edge, which can obviously lead to parasitic cross talk. Fig. 2.7 shows side view giving the depth of the substrate current. The substrate skin depth is tens of microns. The substrate current contributes most to the parasitic interconnect losses. Fig. 2.8 shows the voltage at different Z locations for substrate doping changing from n = 1016 and 1018/cm3. The EM waves, all of which are in the skin-depth mode region, suffers different dispersion and dissipation for different substrate dopings. Lower substrate doping in this region yields more losses and dispersion. 20 Figure 2.1: Model Verification: shows excellent agreement between numerical and analytical result. Geometry. Source at z = 0; f= 50 GHz (top). pattern of the current Jz inside the metal obtained from the simulation and analytical calculation. The metal strip conductivity = 3.9×107 S/m (middle and bottom). 21 Figure 2.2: Cross section of the simulated MISS structure. The XY cross section of the metal strip is 6um x 1.8 um and thickness of the SiO2 layer is 2 um. Z is the direction of propagation and lumped current flow. Figure 2.3: Voltage observed at different Z locations (z=500 um, 1000 um) along the MISS strip. Solid line shows simulation with substrate doping = 1017 /cm3 . Shows digital signal losses and dispersion. 22 5 X 10 metal SiO2 Figure 2.4: b) Cross section of Ey (V/m) field at Z= 1mm and t= 50 ps. The dark shading corresponds to the weak electric field. The electric field Ey concentrates in the SiO2 layer. Figure 2.5: Cross section of current Jz (mA/um2) inside the metal at Z=1mm and t = 37 ps. The dark color corresponds to strong current. Shows metal skin-depth effect losses. 23 Figure 2.6: Top view of current Jz (uA/um2) in the substrate (5um below SiO2 layer) at t = 37 ps. Bright and dark shaded areas correspond to the rising and falling of the signal. Shows potential interference and coupling in lateral direction. Figure 2.7: Side view of current Jz (uA/um2) in substrate at X = 0. Shows current penetration to the substrate. Substrate doping n = 1017 /cm3. 24 Figure 2.8: Voltage observed at different Z locations along the MISS strip with substrate doping n1 = 1018, and n2 = 1016 /cm3. 3. Subtask 3: Prototyping, Modeling and Processing 3D Structures 3.1 Process Integration: Key processes for developing 3D integrated circuits are wafer and die bonding, micrometer via etching and metalization. To facilitate the development of these processes, Chris Bles has been hired has an engineer by Task C under the direction of Neil Goldsman, George Metze and Mike Khbeis (3D Integration) of the joint Program. Chris received his BS in ECE from UMD in 2003, and will be entering the MS program in ECE at UMD in Fall 2004. This section of the report describes Chris’ work on process development and his interactions with LPS under the auspices of Task C (3D IC) of the Joint Program. To keep track of all the wafer lots, a database was developed to organize various process information. When the database is online it will contain information on all tools and recipes used to process wafers and be accessible through the glue system. The wafer lots themselves also have a history associated with them describing what processes have been run and what the results were. This allows for statistics tracking on the effectiveness of our various processes. 25 To create vias in wafers,Chris developed expertise in photolithography. This included learning to use a spinner and contact aligner. The spinner is used to deposit a thin film of photo resist on a 6” wafer. The wafer is then inserted into the contact aligner along with an appropriate mask. After the wafer is exposed in the UV light of the contact aligner, it is chemically developed and ready for further processing. Most of the wafer lots Chris processed involved photolithography at some point. The contact aligner is capable of resolving features down to about one micron so that is the smallest via that can be etched so far. The vias are created in the silicon by a plasma etcher. An RF source ionizes various gasses controlled by permanent and electromagnets. The ions in the plasma assist the reactive species in a chemical reaction with the surface of the wafer, ultimately resulting in a high aspect ratio via. The wafers are then run through a metal deposition tool which places metal on the wafer under high pressure and temperature. Chris has developed expertise in running these processes, and performing basic maintenance on the tools, as well as in trouble shooting and repairing the tools when they go down. The wafer bonding process involved much more experimentation than via design since the via recipes were already programmed in the tools. Multiple approaches to wafer bonding were taken. In some cases, polymers were used as a glue to hold them together. In others direct bonding was achieved with various types of surface activation. Single die were also placed on wafers using a programmable, automated placing tool. Wafers bonded with HD4002 polyimide were patterned with trenches to allow for the polymer to off gas during the bonding process. As the wafers are heated and pressure is applied, various gasses are forced from the polymer which cannot diffuse through the wafer. If the film has no trenches, the gas will collect in bubbles, destroying the bond yield. Another material called cyclotene (BCB) does not off gas but, like the polyimide, cannot easily be dispensed as a thin film on individual chips. Surface activation requires no additional substance to achieve a bond. Wet and dry surface activation chemically prepare a surface for bonding. However, particles are very destructive to this process. An attempt at wet activation with hydrogen peroxide resulted in far too many particles for practical use so we developed a dry activation process with additional measures to control particles. The current process uses the plasma etcher to bombard the wafer surface with Oxygen or Argon plasma for 5 minutes. The O2 plasma leaves oxygen atoms on top of the silicon crystal which, along with hydrogen, can bond to the other wafer. Ar blasts the surface, removing Si atoms from the top of the wafer so two such wafers will bond to each other. The wafers are then brought into contact in a wafer bonding tool. To test the bonds, a scanning acoustic microscope can see voids as a change in acoustic impedance so there is a clear picture of how good the bond is. Using this process successful bonds were achieved on blank wafers and thermal oxide coated wafers with trenches for out gassing. (see below) 26 fig 1 fig 2Fig 1 and Fig 2 Figures 3.1 and 3.2 contrast the wet and dry surface activation approaches. Fig 1 is a bonded pair of wafers that were soaked in H2O2. Fig 2 is a thermal oxide coated wafer pair that was activated with an O2 plasma. fig 3 Fig 3.3 is a particle controlled Ar plasma activated bond. While the bond area approaches 100%, tests of the bond energy were somewhat lower than expected. However, the bonded pairs survived a grinding process intended to remove all but 15 um of one of the wafers, which is the critical test for the bond being compatible with the 3D integration process. 27 fig 4 fig 5 Figures 3.4 and 3.5 show problems encountered on Oxide coated wafers after a 24 hour cure at 200o C. The Oxide has no place to outgas so the bond process will ultimately fail unless steps are taken to prevent this problem. fig 6 fig 7 Figure 3.6 is an Oxide bonded pair with 1-2 um trenches etched into the oxide. Figure 3.7 shows that the trenches allow for the oxide to off gas so no additional voids form. The large void was formed by a particle and is not due to the off gas problem. 28 3.2 Passive and Active 3DI Test Structures We are very actively pursuing the development of IC test structures to help determine benefits and applications of 3DI. This part of the program is being pursued by Ph.D candidate Zeynep Dilli, who is being supervised by Neil Goldsman. The test structures consist of integrated circuits which are designed to measure and investigate electronic components. We have designed and more than ten integrated circuits fabricated through the MOSIS processing facility. Our first set of chips was developed to help measure the how the parasitic effects of interconnects and bonding pads reduce the performance of standard planar circuits. We found that the effect of bonding pads can be enormously detrimental. For the 0.6micron AMI MOSIS process, the capacitance of the pad would slow the switching speed of an inverter by approximately a factor of one thousand (1000). Therefore, much circuit design is devoted toward overcoming this effect with expensive pad driving circuits. Transforming to 3D IC’s would obviate the need for much of these bonding pads, and therefore could significantly increase circuit speed and performance. Other test structures include 2D and 3D inductors, transformers, and 3D capacitors. We have also developed a theory to calculate the inductance of these 3D structures as a function of frequency. We are currently performing experiments to test this theory and optimize inductor design. Another test circuit that we are pursuing is an integrated circuit for communications. Radio Frequency circuits are very susceptible to noise. As a results, integration of RF and digital electronics remains a challenge. 3D integration may be the solution to this problem. By separating analog and digital levels in a 3D circuit, and by placing shielding between the layers, much of the digital interference that is so detrimental to the analog components of the circuits may be eliminated. To investigate this, we developed test chips that are phased-locked loops and FSK transmitters that were subject to digital interference. We found that significant reductions in the interference could be achieved by introducing large amounts of on-chip shielding around the RF component of the IC’s. We expect that transforming to 3D IC’s will allow for even more shielding, and increased separation between analog and digital components, thereby helping to realize robust integration of RF and digital circuits. 29