P-2-6 Design of Capacitive-Coupling-Based Simultaneously Bi-directional Transceivers for 3DIC Myat Thu Linn Aung1, Eric Lim2, Takefumi Yoshikawa3, Tony T. Kim1 1 VIRTUS, School of EEE, Nanyang Technological University, Singapore (aung0038@e.ntu.edu.sg; thkim@ntu.edu.sg). Panasonic Semiconductor, Singapore. 3Mixed-signal Technology Development Division, Panasonic Corporation, Japan. 2 Abstract – Capacitive-coupling-based simultaneously bidirectional chip-to-chip transceivers for 3DIC are presented. By employing a multi-level signaling strategy with a novel cascaded capacitor configuration, the proposed transceivers can transmit and receive data simultaneously through a single inter-chip coupling capacitor and double the throughput per interconnect without significant overheads in the circuit area and power. In this work, several design and implementation issues such as dead zones, parasitic capacitance, leakage, and coupling effects are addressed. The proposed transceivers are designed and simulated in a commercial 65nm CMOS technology. I. INTRODUCTION Recently, the market demand for high speed, low power integration pushes three-dimensional (3D) integration as a solution to complex integration and various issues rising from deep submicron integration. The 3D integration has been an attractive integration method due to its higher number of I/O counts, heterogeneous integration and wide communication parallelism. Wired interconnections such as TSVs or micro bumps have been considered as promising solutions in the 3DIC integration providing high interconnect density and lower parasitic compared to wire-bonding techniques. However, they have high development cost due to fabrication process complexity and considerable reliability issues such as thermal and mechanical stress due to wafer thinning, and limitations in TSV placement [1]. An alternative to physical connections, non-contact communication or proximity communication for 3DIC can be realized with inductive or capacitive coupling methods. They can be easily implemented in standard CMOS technology and assembly can be finished at die-level so that chips verification can be done before die stacking. Due to the absence of physical connection, ESD protection can be omitted and parasitic is significantly reduced improving the overall performance in points of speed and power consumption [1]. For instance, due to the current driven nature in inductive coupling, transmitted energy in inductive channel can be dynamically controlled depending on the communication distance [2] and hence, more than 2 tiers (stacked face-to-face or face-to-back) of communication can be realized in this method [3]. Concerning voltage driven capacitive coupling, only face-to-face stacking is allowed to guarantee sufficient inter-chip coupling. Although the number of tiers is limited to two in this method, it consumes less power and occupies less chip area, which enables high interconnect density [4]. Several capacitive-coupling-based transceivers such as unidirectional and bi-directional transceivers have been presented [5-8]. In this paper, we propose a capacitive-coupling-based transceivers with improved throughput without significant power and area overheads. A novel simultaneous bidirectional signaling principle [9] is explored and a novel -1- cascaded capacitor interconnect structure is proposed. With the proposed interconnect, two chips can send and receive data simultaneously and this can double the throughput per interconnect compared to previously reported capacitive interconnects [5-8]. Throughout the paper, the design challenges for transceivers are discussed in a commercial 65nm CMOS technology. The SPICE simulation result reveals that the transceivers consume 324µW @ 6Gbps data rate. Data Link (chip A) (a) (b) Data Link (chip B) TxL Drv. CC Rcv. RxR TxL Drv. CC Drv. TxR Rcv. RxR ENL (c) ENR RxL Rcv. TxL Drv. V clamp CD A RefH RefL RxL Rcv . & Rcvr. CD CC B TxR V clamp RefH RefL RxR Rcv . & Rcvr. Figure 1. Three different types of capacitive coupling based interconnection are presented. (a) uni-directional signaling; (b) bidirectional signaling; (c) proposed simultaneous bi-directional signaling. II. COMPARISON OF CAPACITIVE-COUPLING-BASED TRANSCEIVERS The comparison is made between uni- and bi-directional signaling schemes in Fig 1. In the uni-directional signaling [57] (Fig. 1(a)), signal can be transmitted only from chip A to B through the coupling capacitor (CC) while in the bi-directional signaling [8] (Fig. 1(b)), signal transmission and reception are possible at the both chips but only one directional transmission is allowed at a time. Therefore, the drivers are required to multiplex in order to avoid the simultaneous data transmission. A slide degradation in speed is expected in this method compared to uni-directional signaling due to the parasitic add up by the additional circuit [8]. To overcome these limitations with improved throughput per interconnect, we proposes the multi-level signaling scheme [9] with a new transceiver design – the simplified schematic is shown in Fig. 1(c). Unlike bi-directional signaling, the proposed scheme allows simultaneous data transmission and reception from both chips; thanks to cascade capacitors configuration formed by the driving capacitor (CD) and the coupling capacitor (CC) in which CD decouples the transmitted signal from the received signal at the node A and B. Transmitted signal from the chip B can be recovered by the chip A receiver by comparing the voltage level at node A with two voltage references (RefH and RefL) and vice visa. As a chip can ππ΄ ππ·π΄ππ΄ = 1+π 1+2π ππ·π ππ·π΄ππ΄ , ππ΅ ππ·π΄ππ΄ 1 = (1+2π) = π 1+2π (1) (2) If k were infinitely large, VA/VDATA and VB/VDATA become 0.5 so the upper and lower swings become exactly VDD/2. This is ideal for the 4-level signaling and no DZ will be observed. However, smaller CC and CD are desirable for improving power and area efficiency. Therefore, in reality, the values of CC and CD have to be determined by the tradeoff between a larger signal swing and a smaller area. IV. TRANCEIVER DESIGN CONSIDERATIONS In this section, several non-deal factors such as parasitic capacitance, leakage, and coupling to the floating node A are discussed followed by developed solutions for the transceiver design. A. Impact of Parasitic Capacitance on Signaling On top of the capacitance ratio (k) affecting the signal swings, the parasitic capacitance (CP) formed by the circuits connected to the nodes (A and B) degrades the performance of the proposed signaling scheme. The presence of CP not only attenuates the signal swings but also introduces an additional -2- A X= ‘1’ X= ‘1’ X= ‘0’ 11 Y= ‘1’ A Signal swing when X=‘1’ Y= ‘0’ Voltage Signal swing A TxR = ‘Y’ A X= ‘0’ VDD Voltage CD CC CD TxL =‘X’ TclkR = ‘X’ VDD Dead zone 10 Y= ‘1’ 01 Signal swing when X=‘0’ Y= ‘0’ 0 0 00 Data Data signaling Data Clock signaling Figure 2. Multi-level signaling principle of the proposed transceiver. CC ο½ kCD VDATA CD B VDATA A Upper Signal Swing Region VB (b) CC A B CP CP VDATA VDATA (a) CD CC CD 11 Upper Signal Swing Region V’B 10 DZ CD DZ1 Voltage Voltage 10 01 VA Lower Signal Swing Region VB 00 V’A Lower Signal Swing Region V’B k VA (1ο« k) VDATA ο½ VDATA (1ο« 2k) 01 00 k CCCD/(CC+CD) CD CC(CD+CP)/(CC+CD+CP) VDATA A A CD CCCD/(CC+CD) VB k ο½ VDATA (1ο« 2k) VDATA 11 DZ 0 III. MULTI-LEVEL SIGNALING STRATEGY OF PROPOSED CASCADED CAPACITOR CONFIGURATION In this section, the signaling strategy for the proposed cascaded capacitor configuration is discussed. Fig. 2 explains the clock and the data signaling principle of this work employing multi-levels. The cascade configuration of CC and CD decides the signal levels of the internal nodes (A and B) through charge sharing operation. In the clock link, assuming that the respective clock signals from both sides have the same phase, the node A will have only two levels. However, in the data link, four different signal levels ('00', '01', '10', and '11') are possible. For example, when the transmitting data (TxL) is high, the node A falls into one of the two upper levels ('10' or '11') where "10" represents the received data is "0"; on the other hand, "11" corresponds to the data "1". Similarly, the node A is formed at one of the two lower levels ('00' or '01') when TxL is low. The level of "00" will be considered as data "0" while the level of "01" is accounted for data "1". Note that TxL decides whether the received signal levels are in the upper or lower region. Even though full voltage swing is desired for the received signals at node A and B, it is inevitable to lose some voltage range in the real design due to the limitations in the implantable capacitance values. For instance, the gap between '01' and '10' levels represents the dead zone (DZ) where no data can be retrieved (Fig. 2). Its magnitude is defined by the capacitance ratio (k) of the coupling capacitor (CC) to the driving capacitor (CD). Fig. 3 (a) depicts the signal swings (VA and VB) over different k values which can be explained by the following equations. CD CC CD TclkL = ‘X’ 0 transmit signal regardless of other chip transmission state, data throughput per interconnect is double compared to conventional uni- and bi-directional signaling. CP C C CD VDATA B CD CD+CP A B Figure 3. Effect of capacitance ratio CC/CD and parasitic capacitance CP on signal swing is depicted; (a) without parasitic, (b) with parasitic. dead zone (DZ1) as shown in Fig. 3 (b). The signal attenuation can be estimated by an attenuation factor (CP / CD + 1) and the magnitude of DZ1 is calculated by VDATA - V'A V'B. Utilizing this information, the new signal swings (V'A and V'B ) and the new dead zone (V'DZ) can be estimated by the following equations. ππ΄′ ππ·π΄ππ΄ ≈ ππ΄ ππ·π΄ππ΄ πΆπ +1 πΆπ· ′ ππ·π ππ·π΄ππ΄ ≈ , ππ΅′ ππ·π΄ππ΄ ππ·π ππ·π΄ππ΄ πΆπ +1 πΆπ· ≈ ππ΅ ππ·π΄ππ΄ πΆπ +1 πΆπ· (3) (4) It is assumed that k >> 1 and CP << CD for simplicity. Fig. 4 demonstrates the simulation result of the signal attenuation with respect to CP / CD. This simulation result concludes that CP affects more on the size of DZ1 region rather than the size of DZ. As the CP values approaches the CD's, almost half of voltage swings are occupied by the dead zones (DZ and DZ1). Therefore, it is desirable to keep the CD value larger than that of CP enough to achieve reasonable signal swings. 10 01 For the receiver design, a sense-amplifier-based receiver [10] can be employed for fast sensing without static power. However, large voltage swing at the internal node can be coupled back to the floating nodes A and B, devastating the received signals. To mitigate the coupling effect, we adopt a simple differential amplifier (Fig. 5(b)) whose internal node, INT_RCV, has small voltage swing that reduces the charge coupling to the node A through the overlap capacitance (Cgd2). In addition, proper sizing of M2 is required so that its gate capacitance is kept low for minimizing parasitic capacitance (CP) while maintaining the desirable signal swing. To improve the slew rate of the differential amplifier with less driving current, the size of the inverter at the INT_OUT is kept at minimum. While a NMOS differential pair is used for the upper signal region sensing (Fig. 5(b)), a PMOS differential pair is applied for the lower signal region sensing. 00 Voltage at node A (V) 1.2 Upper and Lower Signal Swings 46% 0.8 DZ1 V’B 18% 0.4 V’B DZ 0 -0.2 0 0.2 0.6 0.4 0.8 1 CP /CD Figure 4. Simulation result of attenuation in signal swings (solid line) with different CP / CD ratio @ k = 5. Dotted line represents the estimated signal swings from the equations 3 and 4. B. Impact of Leakage on Signaling The leakage in nano-scale CMOS devices (65nm CMOS in particular for this work) is significant due to aggressive scaling of channel lengths, gate oxide thickness and doping profiles. Due to that leakage, a floating node cannot hold data for a long time without periodic refreshing cycles. Therefore, the leakage has to be carefully controlled for reliable transceiver operation. In this work, the transceivers have floating input nodes A and B (Fig. 1) that are vulnerable to the leakage from and to them. The primary leakage components are the gate leakage of the input devices in the receivers (Fig. 5(b)). Careful leakage control for the floating nodes is particularly critical when the transmitting signals do not change over a long period of time. To address this leakage problem, a clamping circuit (Fig. 5(a)) is proposed for stabilizing and initializing the signal levels of the nodes A and B. The signal level clamping is conducted indirectly through the open state of the transmission gate (TG) to isolate the large parasitic capacitance at the node A' in the clamping circuit from the node A. If the TxL and RxL do not change for a long time, the node A follows the node A' that is determined by the clamping circuit. Since the TxL and RxL control the voltage levels of the node A' close to the desired levels, the signal levels are maintained over a long time. Fig. 6 shows the simulation results of the proposed signaling with and without the proposed clamping circuit. As expected, the signal levels remain unchanged as a result of the clamping operation. The signal swings are further attenuated due to the additional parasitic introduced by TG. RefH and RefL labeled in Fig. 6 are the reference levels of the receivers for the upper and lower signal swings respectively. INT_OUT (Large Swing) VDD VDD RxL TxL (i) INT_RCV (Small Swing) VDD M5 M4 OUT RefH A RxL TxL RxL TxL (ii) Rst A’ M3 Vbias Rst A M2 M1 Preamp Figure 5. (a) Clamping circuit for the floating node A, (b) schematic of the receiver for upper signal swings sensing (10, 11). Without A ( clamping without clamping circuit circuit ) A ( with clamping ) With clamping circuitcircuit (ii) 1.2 Voltage at node A (V) 11 Voltage drop due to receivers’ parasitic 0.8 11 0.4 RefH 10 0 Voltage drop due to clamping circuit’s parasitic 01 Negative voltage -0.2 0 0.2 0.4 RefL 11 01 00 0.6 Time(ms) 0.8 1 Figure 6. Simulation result of the voltage swings at different signaling levels with and without clamping circuit. V. PROPOSED TEST ARCHITECTURE AND SIMULATION The test structure (Fig. 7) consists of one clock link and four data links with different k ratios in order to test the interconnection-capability of each electrode. For implementation, the capacitance ratios (6f/2.5f, 7f/2.5f, 8f/2.5f and 10f/2f) are chosen to provide large VB or signal swings with the extracted CP value. The clock link with the C. Transceiver Design capacitance ratio of 10f/2f is used to transfer the system clock The primary goal of the transceiver design is to achieve to another chip for clock recovery and data sampling. The two fast signal transition at the driver output, minimize coupling inverters driver will drive the signal to CD while the signal effect at the node A and B, and reduce the power consumption. level at the node A is sensed by two receivers. Among the two One of the benefits of the proposed cascaded capacitor receivers’ outputs, only one of them is selected by the TxL configuration is that the two inverter drivers only have to (refers to section III) and generates output, RxL. Test and drive capacitors (CD, CC, CD) in series which capacitance is control circuits are also designed to characterize the data effectively smaller than that of single coupling capacitor, CC. transfer speed and the bit-error-rate (BER) of the designed Therefore, the driver consumes less amount of power electrodes. The voltage-controlled-delay-line (VCDL) compared to the other drivers which are directly driving CC between the clock generator and the clock link controls the [5-8]. position of the sampling edge. The transmitting data (Tx) -3- I/Os Test & Control Unit Mux Ref. Gen. RxL RxL Mux Rx I (mA) RefL Rcv. RefH Data/Clock Link Figure 7. Proposed transceivers and built-in self-test architecture. Receivers Transmitter Metal 6 CD Passivation Layer TxR 1.2 1.2 1.2 0 0 TxL 0 Rst 0 0 Cc Metal 7 Receivers 1.2 Chip A Metal 7 Metal 6 0 0 Clamping Circuit RefL 1.2 -0.270 (avg) RefH Rcv. BER CD A Drv. TxL TxL RxR (V) Demux 1.2 TxR (V) Tx PRBS TxL (V) 3 Data Links 0 -0.6 B (V) Clock Link A (V) TclkL RxL (V) VCDL Electrodes Clk CLK Gen. CD 2 3 4 5 Time (ns) Figure 9. Simulation waveforms of the simultaneously bi-directional signaling. Total power consumption from two transceivers is 324µW @ 6Gbps. Adhesive Layer Transmitter 1 Chip B Figure 8. Proposed face-to-face die stacking and CD and CC implementation. generated by the pseudo-random-binary-signal (PRBS) generator is sent to the selected data link through the Demux block. The transmitted TxL signal is resent by another chip through the same electrode and the BER block compares Tx with Rx and increments the counter output if a mismatch is detected. The custom layout is desirable for the proposed transceiver to improve layout efficiency and parasitic. The driving capacitor (CD) can be implemented by a metal-insulator-metal (MIM) capacitor while a top metal (M7) layer can be utilized to form CC when face-to-face die stacking is in place (Fig. 8). An optional thinning of passivation layer can be done to further enhance the coupling effect of CC. Fig. 9 presents the SPICE simulation waveforms of the proposed simultaneous bi-directional transceivers at 6Gbps where each transceiver transmits 3Gbps simultaneously and the power consumption is 0.054pJ/b. The prototype will be fabricated in a 65 nm CMOS technology. VI. CONCLUSION In this paper, capacitive-coupling-based simultaneously bidirectional transceivers for 3DIC are presented. The proposed transceivers can double the throughput per interconnect by transmitting and receiving the signal simultaneously through a single coupling capacitor. This is achieved by employing a multi-level signaling with a novel cascaded capacitor interconnect structure. Several implementation issues such as -4- dead zones, parasitic, leakage and coupling effects are addressed. SPICE simulation demonstrates that the transceivers consume 324µW at the speed of 6Gbps data rate. The transceivers and test circuits are designed and will be fabricated in a commercial 65nm CMOS technology. REFERENCES [1] W. R. Davis, et al., "Demystifying 3D ICs: the pros and cons of going vertical," Design & Test of Computers, IEEE, vol. 22, pp. 498-510, 2005. [2] N. Miura, et al., "A 195Gb/s 1.2W 3D-stacked inductive inter-chip wireless superconnect with transmit power control scheme," in SolidState Circuits Conference, 2007. Digest of Technical Papers. IEEE International, 2005, pp. 264-597 vol. 1. [3] M. Saen, et al., "3-D System Integration of Processor and Multi-Stacked SRAMs Using Inductive-Coupling Link," Solid-State Circuits, IEEE Journal of, vol. 45, pp. 856-862, 2010. [4] R. Canegallo, et al., "System on chip with 1.12mW-32Gb/s AC-coupled 3D memory interface," in Custom Integrated Circuits Conference, 2009. IEEE, pp. 463-466. [5] R. J. Drost, et al., "Proximity communication," Solid-State Circuits, IEEE Journal of, vol. 39, pp. 1529-1535, 2004. [6] L. Luo, et al., "3 gb/s AC coupled chip-to-chip communication using a low swing pulse receiver," Solid-State Circuits, IEEE Journal of, vol. 41, pp. 287-296, 2006. [7] A. Fazzi, et al., "3-D Capacitive Interconnections for Wafer-Level and Die-Level Assembly," Solid-State Circuits, IEEE Journal of, vol. 42, pp. 2270-2282, 2007. [8] A. Fazzi, et al., "3-D Capacitive Interconnections With Mono- and BiDirectional Capabilities," Solid-State Circuits, IEEE Journal of, vol. 43, pp. 275-284, 2008. [9] K. Jin-Hyun, et al., "A 4-Gb/s/pin low-power memory I/O interface using 4-level simultaneous bi-directional signaling," Solid-State Circuits, IEEE Journal of, vol. 40, pp. 89-101, 2005. [10] P. Wijetunga and A. F. J. Levi, "3.3 GHz sense-amplifier in 0.18µm CMOS technology," in Circuits and Systems, 2002. IEEE International Symposium on, pp. 764-767.