An Optical Data Receiver for Integrated Photonic Interconnects by Michael S. Georgas Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2009 @ Massachusetts Institute of Technology 2009. All rights reserved. ,./1 I A uthor .......... ... Department of Eectrical Engineering and Computer Science September 4, 2009 Certified by ..... Accepted by... / -/ Vladimir Stojanovid Assistant Professor Thesis Supervisor 7................................................................ Terry Orlando Chairman, Department Committee on Graduate Theses MASSCHUSRTTS INS OF TECHNOLOGY SEP 3 0 2009 LIBRARIES ARCHIVES An Optical Data Receiver for Integrated Photonic Interconnects by Michael S. Georgas Submitted to the Department of Electrical Engineering and Computer Science on September 4, 2009, in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science Abstract The throughput bounds of traditional interconnect networks in microprocessors are being pushed to their limits. In past single-core processors, the number of long global wires constituted only a small fraction of the total. However, with the emergence of multi-core systems, where each core must be able to communicate with each other as well as off-chip memory, global interconnects have become a major bottleneck. The solution has been proposed through integrated photonic networks, where multiple channels of information can be placed onto a single low-latency waveguide, reducing the number of interconnects and increasing the speed of transimssion. This work presents a novel optical data receiver for integrated optical links. Both the optical receiver and the photodiode are monolithically-integrated in the same CMOS substrate. The highly-digital receiver senses the photodiode current using a regenerative cross-coupled latch. The photodiode is modelled as an ideal current source with a capacitance in parallel. The receiver operates in two phases, receiving one bit per clock cycle, and is able to resolve input photocurrents of less than 50,PA at 5-Gb/s with a power consumption of less than 500yW (100fJ/bit). The receiver was fabricated in a 32-nm CMOS process as part of a flexible test vehicle that will demonstrate various optical components and the electronic systems that interface with them. Thesis Supervisor: Vladimir Stojanovid Title: Assistant Professor Acknowledgments Over the past two years I have learned a tremendous amount. I am grateful to everyone that I have learned something from. In particular I thank my supervisor, Professor Vladimir Stojanovid, for all of his support and guidance in this work. I thank my closest collaborators, Ben Moss and Jonathan Leu, and I thank the rest of the Integrated Photonics team. I am also grateful to have enjoyed a mid-afternoon coffee with some of the most interesting people that I have met. I thank Ranko Sredojevid for all of the discussion (ranging from politics to parallel computing) and the rest of my research group here at MIT. I thank all of the other new friends that I have made. I thank my family, and for more than the last two years. You have always been a great support for me. I thank Diana for always believing in me, even when I do not believe in myself. I could not have made it this far without all of you. Support My work over the past two years has been funded by NSERC, DARPA, and Texas Instruments. I thank each group for helping to fund my exploration of this interesting research. Contents 1 2 15 Introduction 1.1 Background ................... 1.2 Motivation ............... 1.3 Previous Work . .. . ........ .. 16 ................ 1.3.1 Optical Waveguides ................... 1.3.2 Optical Modulators ................ 1.3.3 Optical Receivers ................... 1.4 Contributions of this Thesis ................... 1.5 Summary 17 .. ......... ................... 15 18 ..... . . ..... 19 ...... 24 .... . . . . ...................... 18 25 . 27 Receiver Design 2.1 Recever Block Diagram ................... 2.2 Photodiode .......... ................... 2.2.1 Semiconductor Material 2.2.2 Geometry ................... 2.2.3 Photodiode Model ................... 28 .. 29 30 2.4 Offset Compensation ....... 2.5 Latch Output Buffer ............... 2.6 Dynamic to Static Converter ................... 2.7 Photodiode Clamp ................... 2.8 Clock Enable 2.9 Summary 32 ..... ................... Latching Sense Amplifier ................ . .................. .. ......... 2.3 ................... 27 ....... ....... .. ................ ..... .. 33 . 35 .. .. 38 .... . ....... ......... ......... 38 .. ... 39 40 40 i ~- ~--.(~I.--~-.-.~II -i--^-li. i-iii-l;il-"-;-"i--.i'.--~''~ 3 Receiver Simulation 4 3.1 Transient Simulation ....... 3.2 Offset Compensation Simulation ......... 3.3 Setup Time ......................... ........... . . . . . 43 . . . . 46 . . . .. . 46 3.4 Characterization of the Receiver's Sampling Aperture . . . . . . . . . 49 3.5 Eye Diagram 3.6 Process Variation . 3.7 Summary . .... .. . ................. . . . . . . 51 .............. . . . . . . 53 . . . . . .. . 55 ..... ...................... EOS2 Test Chip 57 4.1 Chip Organization 4.2 Digital Test Structures ....... 4.3 5 43 . ............ . . . . . 57 . . . . . . . 61 ..... . . .. . . 61 . 63 . 64 .. . . . . 65 4.2.1 Scan Chain ....... 4.2.2 High-Speed Signal Synchronization 4.2.3 PRBS/Pattern Generator 4.2.4 Counters ......... 4.2.5 Snapshot. .............. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . ... ................. . . . . . 66 . . . . . 67 Conclusion and Future Work 5.1 Conclusion . . .................. 5.2 Future Work ......... 69 ............. . . .................... . 5.2.1 Co-design of Optical Devices and Electrical Circuits .... 5.2.2 Optical Device Design, Layout, and Verification . ....... 69 ... 70 . .. 70 70 A Testbench Schematics 73 B Power Measurement in Simulation 77 C Sampling Aperture Data 79 D Custom Standard Cell Library 81 D.1 Register ....... ... ........................... . 81 D.2 MUX . 82 ................................... 85 E Scan Chain Control E.1 85 Correct Signalling for the Scan Chain ................... E.1.1 Reading from EOS2 . .................. E.1.2 Writing from EOS2 . ................. F PRBS Generation . . . . . . 86 86 89 List of Figures 17 . ......... 1-1 An integrated photonic link implementing WDM [1]. 1-2 A ring with a modulator on the EOS2 test chip ............. 19 1-3 Optical data receivers from the literature. . ............... 21 2-1 Block diagram of the optical data receiver implemented in the EOS2 ... test chip ................... ........... .. 27 . . 29 2-2 The absorption coeffecient plotted against wavelenth of light [15]. 2-3 A cross-sectional view of the waveguide showing the mode propagation. [Figure courtesy of Jason Orcutt] . ................... 31 32 2-4 A 3D model of one of the photodiodes implemented in EOS2 .... 2-5 Photodiode Model . 32 2-6 The latching data receiver implemented on the EOS2 test chip..... . 34 2-7 7-bit offset compensation scheme for latching data receiver........ 35 2-8 Offset compensation performance...... 2-9 The output buffer of the optical receiver. . ...... ....... ................... . . ........... 2-10 Dynamic-to-static schematic. ................... 38 .. ....... 36 39 .... 2-11 The photodiode-clamping circuit used in the optical data receiver. .. 39 2-12 The clock-enabling block for each optical data receiver. . ........ 40 44 3-1 Transient simulation waveforms demonstrating bit decisions. ..... 3-2 Propagation delay through the latch plotted against setup time. The input photocurrent was varied, and the input data is switched from 0 .. to 1. Clock frequency is 5-GHz. ................... 3-3 Sampling apertuture characterization for 0-1 step input. . ....... 48 50 i 3-4 In Situ eye diagram from simulation. 3-5 Transient plot of the positive output terminal of the receiver during a ................... DC input photocurrent. Superimposed on the plot is the distribution of threshold values from monte carlo simulation. 4-1 Layout photo of the EOS2 chip . . . . . . . . . . . . . . . . . . 4-2 Organization of the EOS2 electrical cell. 4-3 An optical link in EOS2 . 4-4 Block diagram showing cell operation. . . . . . . 4-5 Block diagram of scan chain cells..... . . . 4-6 High-speed SEnable synchronization. . . . . 4-7 High-speed signal synchronization . 4-8 PRBS/pattern generator block . 4-9 Output transition counter with reference. . . . . . . ......... . . . . . . . . 58 . . . . . . 58 . . . . . .. . 59 . . . . . . 60 . . . . . . . 62 . . . . . . 63 . . . . . . . . 64 . 65 . . . . . . 66 .. . . . . . . . . . . . . 4-10 Snapshot implemented for the modulator and data receiver of EOS2. A-1 Transient analysis testbench . . . . . . . .... 54 . . 67 . . . . . . 73 . . . . 74 A-3 Testbench for setup and hold time simulation. . . . . . . . . . . . . . 74 A-4 Sampling aperture characterization testbench. . . . . . . . . . . . . . 75 A-5 Variation analysis testbench. ........... . 75 A-2 Offset compensation testbench. ......... . . . . . . . B-I Testbench for power measurement simulation . . . . . . . . . . . . . 77 C-I Sampling apertuture characterization for 1-0 step input. . . . . . . . 80 D-I Register design. ................. D-2 M UX design . .. .. . . . . . . . . . . .... . . . . . . . . . . . . . . . . . . . . . . . . . . . E-i Block diagram of scan chain cells. ................... F-I The output of the PRBS generator, driven at 5-GHz .......... . List of Tables 1.1 Summary of optical receivers scaled to 32nm. Power scaling for digital designs follows P = fCVDD, with C scaling linearly with process. Power scaling for analog designs follows a linear scaling with current and supply, based on maintaining a constant transistor transconductance. 20 2.1 Example configuration data. Threshold values are found using Figure 2-7 ...... 3.1 37 ............................ Power consumption for each block of the data receiver during 5-GHz 45 operation with a 100-uA 'on' photocurrent. . ............... F.1 Progression of PRBS pattern. . .................. . 90 Chapter 1 Introduction This thesis presents an energy-efficient monolithically-integrated optical receiver for photonic interconnects in a 32-nm CMOS process. The work will analyze the various aspects of the receiver, focusing on challenges such as process variation, sensitivity, energy-efficiency, and integrated photodiode modelling. 1.1 Background Optics has been used to transmit high-bandwidth data since the 1970s, when it was first used in long-haul interconnects. In computing, optical networks have been envisioned as a replacement for electrical backbones since the early 1990s [2] when they were used to connect multiple computers to form a single super-computer. Since that time, Moore's law continued to drive CMOS gate sizes smaller and smaller, increasing clock speeds. The price, however, was paid in power dissipation. Processor manufacturers such as Intel and AMD drove clock speeds to the point where the heat produced could not be pulled away fast enough, and so a practical limit in CPU clock speed was reached in the range of 3- to 4-GHz. Designers had to find a way to keep computation scaling, and the answer was found in parallelism. Since the release of the first multi-core processors, the trend in industry and academia has been to continue to increase the number of cores on each chip. At the time of this proposal, it is common for consumer computers to have dual- or quad- i-1 core processors, such as the Intel Core i7 [3]. More specialized consumer hardware has seen 9-core processors, such as the IBM Cell in the Playstation 3 [4], and 16-core server processors, such as the third-generation SPARC processor [5]. Going forward, we can envision systems with upwards of 256 cores. In order to fully harness the computing capability of many-core systems, communication between cores and with a shared off-chip memory must be extremely fast. In past single-core processors, the number of long global wires constituted only a small fraction of the total. However, with the emergence of multi-core systems, where each core must be able to communicate with each other as well as off-chip memory, global interconnects have become a major bottleneck. Power density requirements further dictate that the communication must also be low power and energy-efficient. The problem with electrical I/O is that even its projected performance will not satisfy the bandwidth and energy-efficiency demands of many-core systems [1]. This is due in part to the fact that electrical I/O performance is ceasing to scale with technology, and is instead becoming channel-limited [6]. Integrated photonic links provide a new type of channel for inter- and intra-chip I/O. Wavelength-division multiplexing (WDM) allows for several high-speed data streams to be placed onto a single lowlatency waveguide, reducing the number of interconnects and increasing the energy efficiency and bandwidth density. 1.2 Motivation The reward of a monolithically-integrated optical link is high, but there are many challenges that stand in its way. The characteristics of such a link are well-suited to high-speed communication, but have significanly different characteristics than traditional electrical interconnect. As a result, it may be difficult for designers to adapt to optical I/O design. Designers must be knowledgeable in optical device physics in order to determine which of the limited materials and layers to use in a given semiconductor process. They must be able to take these materials and create optical device geometries that either route optical data or serve as an interface between the optical LM and electrical domains. Finally, they must have a strong system-design background in order to optimize the electrical-optical-electrial link as a whole. This thesis addresses this emerging design issue through the implementation of an optical data receiver. Each of the parts of the receiver will be described and analyzed, exemplifying how designers must use their knowledge of both optics and electronics in order to exploit optical I/O. This serves to help future designers by focusing on the most important design parameters. Previous Work 1.3 Chip A Vertical Coupler V and Tap External Lser Soure Ring Modulator with A Resonance Chip B Photodetector. Ring Modulator Driver V Waveguide Ring Modulator with 2 Resomnance Mode Fiber Receiver , O\ Ring Filter with Xl Resonance Ring Filter 2 Resonance with Figure 1-1: An integrated photonic link implementing WDM [1]. This section reviews the building blocks and previous work done on integrated optical interconnects. A diagram of an integrated optoelectronic system is shown in Figure 1-1, with the main building blocks being: 1. integrated photonic waveguide - the path along which the modulated optical data can travel 2. optical modulator - converts electronic data into an optical signal 3. optical receiver - converts optically modulated data back into the electrical domain In Figure 1-1, the optical source is an off-chip continuous-wave laser. This laser light is coupled onto the IC by means of a vertical coupler which redirects the light to travel in the plane of the chip. Once propagating along the integrated photonic waveguides, the continuous-wave laser light can be manipulated. Different wavelengths can be re-routed by means of wavelength-selective filters, and data can be imprinted onto each wavelength by means of an optical modulator. The modulated light can be placed onto a bus common with data modulated on different wavelengths, without interference. This optically modulated data can then be routed to an optical receiver, either on the same chip or a different one. While these components have been demonstrated in the past, one of the key challenges in this work is that the components must be monolithically integrated onto a single wafer of silicon, in order to enable high energy efficiency and bandwidth density. In the following sections, we outline each of the above components in more depth. 1.3.1 Optical Waveguides Optical confinement in any medium is created by an index difference between the core and surrounding materials. Integrated photonic waveguides have been demonstrated recently in two main ways: in special processes, such as silicon-on-insulator (SOI); and in bulk CMOS processes [7]. In an SOI process, the waveguide core can be created using the body, with the buried-oxide layer creating the cladding. In bulk CMOS, the waveguide core can be made from the poly-silicon layer used to create the unsilicided resistors, located above the shallow-trench isolation (STI). One of the main problems with bulk CMOS integration is that the oxide layer is much thinner, increasing optical loss to the substrate. In order to mitigate this effect, an air-undercladding is created during post-processing, as demonstrated in [8]. 1.3.2 Optical Modulators The purpose of the optical modulator is to take an electrical signal, and imprint it onto a continuous wave of light, generating an optically modulated signal. In this work, resonant ring modulators are used. Resonant ring filters (Figure 1-2) are circular optical structures adjacent to a waveguide. Each filter is tuned to a particular photon wavelength. At that wavelength, light travelling down the waveguide will cou- - ple into the filter and remain confined there, eventually being dissipated or dropped. All other wavelengths will travel down the waveguide undisturbed, except for some relatively small wavelength-dependent loss. Figure 1-2: A ring with a modulator on the EOS2 test chip. A modulator based on this structure exploits the fact that the resonant frequency can be tuned thermally, or by injecting charge into the ring to change the index of refraction. To generate an optical '0', the ring is tuned to the wavelength of interest and the transmission through the waveguide is decreased. To generate an optical '1', the ring is tuned away from the wavelength of interest, decreasing that wavelength's confinement and increasing its transmission [9]. 1.3.3 Optical Receivers The final component in the optical link is the receiver, which converts the opticallymodulated data back into the electrical domain. This is accomplished by sensing the photocurrent produced by a photodiode that is being illuminated, and converting the current into a voltage waveform. Previous implementations can be broken down into two main types of implementations: current-sensing and current-integrating. The two methods are discussed further in the following subsections. To date, the majority of optical receiver designs in the literature have addressed telecommunications applications, where a single, discrete receiver is connected to a fiber-optic cable and used to generate an electrical data signal. One major challenge associated with this is the very large parasitic capacitance due to packaging of discrete components. However, a major benefit is that since the phodiode is external, any semiconductor material desired can be used. In contrast, in order to achieve a large receiver density with high energy efficiency, we focus on monolithic integration. As a M result, in this work we are constrained to the material selection available in the 32-nm bulk CMOS process. The performance of several recent optical receivers is summarized in Table 1.1. The table shows the process, data rate, power consumption, and topology used. Also shown are power and energy/bit numbers scaled to a 32-nm process node. This was done in order to create a more accurate performance comparison with the receiver proposed in this work. Process [6] [10] [11] [12] This Work 90-nm 0.13-um SOI 80-nm 0.13-um,Ge-on-SOI 32-nm, SiGe PD Speed (Gb/s) 16 4x10 20-GHz 10 5 Power (mW) 23 120 2.2 30 0.5 Power (mW) (scaled) 8.17 19.7 0.88 4.92 0.5 pJ/bit (scaled) 0.51 0.492 0.492 <0.1 Topology DDR Latch TIA/Limit TIA TIA/Limit Latch Table 1.1: Summary of optical receivers scaled to 32nm. Power scaling for digital designs follows P = fCVD, with C scaling linearly with process. Power scaling for analog designs follows a linear scaling with current and supply, based on maintaining a constant transistor transconductance. The integration of all necessary optical and electrical components onto a single substrate was recently demonstrated in [10], where a single SOI substrate was used to create a 4xl0-Gb/s transceiver, with the photodiodes flip-chip bonded to the CMOS die. The receiver consists of a TIA followed by a limiting amplifier. impedance of the TIA is kept small by using resistive feedback. The input This allowed for high-speed operation, despite the relatively large parasitic capacitance. Current-Sensing Receivers In current-sensing implementations, the input photocurrent is converted to a voltage waveform by means of some transimpedance gain. This can be accomplished by a transimpedance amplifier, or as is the case in this work, a current-sense-amplifier. As will be shown in Chapter 2, one advantage of this approach is that it easily enables the implementation of a highly-digital design, with no biasing required at the input (b) [12] (a) [11] TIAF nt.En .. Vdd , P"U (c) Narasimha Data Recever [10] LGnd (d) Receiver-less Optical Clock Injection [13] ceiver presented in [11]. This work targeted low-power, high-bandwidth optical Daylinks T T. ---------------- VWeora~~ --------------,------------------- Figure 1-3: Optical data receivers from the literature. nodes. The disadvantage however is that only the input photocurrent at the time of evaluation is used, increasing the vulnerability due to random noise. Figure 1-3a shows the CMOS trans-impedance amplifer for the Clock optical Injecdata receiver presented in [11]. This work targeted low-power, high-bandwidth optical links for short-distance fiber-optic interconnects. Standard TIA topologies were investigated, and limitations in the 80-nm CMOS process used were discussed. The TIA shown in Figure 1-3a consists of a modified conventional regulated cascode, and overcomes process limitations such as low supply headroom in order to achieve its target specification. The design also uses peaking inductors in order to increase the bandwidth, which further illustrates the problem with analog-style optical receiver designs going forward into more advanced CMOS processes. The photodiode capacitance for the work was 220-fF. In [12], the advantages of monolithic integration of the photodiode and receiver circuit are discussed, with the benefits of the use of Germanium presented. The work uses a Ge-on-SOI detector, which is stated to have a responsivity of 0.35 A/W and an intrinsic capacitance of 30-fF. The TIA ( 1-3b) used is a differential, common-gate topology, that also uses peaking inductors. The TIA is followed by a 5-stage limiting amplifier, and then a buffer to drive a 50-Q load. The complete receiver operates at 12-Gb/s, and the authors note the low photodiode biases used. Data-rate scaling through the implementation of wavelength-division multiplexing (WDM) is presented in [10]. The strategy of the transceiver presented was to take multiple 10-Gb/s links and put them in parallel in order to achieve higher data rates. This was done to address the fact that the channel can become the most expensive part of a long I/O link. The data receiver used is shown in Figure 1-3c. Similar to the receiver presented in [12], it consists of a TIA front-end followed by a 5-stage limiting amplifier and 50-Q output buffer. The photodiode is flip-chip bonded to the die, with light directed to it by means of a holographic lens. What is remarkable about this work is the level of optelectronic integration, with the optical transmitters and receivers implemented on the same CMOS SOI chip. Current-Integrating Receivers Designs that use an analog amplifying front-end burn a lot of quiescent power in order to achieve high bandwidth and low noise [14]. In current-integrating approaches, the input photocurrent is integrated into a capacitance, which converts the signal into a voltage waveform that is integrated over time. The capacitance can be some fixed capacitance, the capacitance of the diode, or the parasitic capacitance of the interfacing transistors. It can also be the capacitance of a programmable current source [6] that continuously discharges the average photo-current. The advantage of the current-integrating approach is that the input current signal is now being integrated over as much as an entire bit time, increasing the sensitivity of the receiver. The receiver will also be more robust to a random noise source at the input, as each noise contribution should, on average, cancel over the integrating time. The clearest example of an integrating-type receiver is presented in [13], and is shown in Figure 1-3d. In this receiver-less design, two optical clock pulses that are out of phase are propagated to two photodiodes that are stacked on top of each other. As the optical clock pulses reach their target photodiodes, the photocurrent that is generated is used to charge/discharge the node looking into the digital logic block. The capacitance at this node is composed of the diode's own capacitances, as well as any parasitic capacitances. Assuming that the diodes are well matched, and that there is enough optical power and high-enough conversion efficiency, the input node to the digital logic will have enough range to drive the following logic blocks. Figure 1-3e shows the optical data receiver presented in [14]. One of the main challenges with using clock-based designs that latch the decisions of comparators is that there must be some reset phase in the operation. Since we only want to use one photodiode and one waveguide to save area, this means that it is difficult to achieve better than one bit per clock period. An innovative solution to this is presented in [14], where double-data-rate (DDR) operation was achieved. The author integrates the photocurrent from the diode onto its own parasitic capacitance. A feedback loop substracts an average optical current from the capacitor in order to prevent the capacitor's voltage from increasing to the supply. The bit decision is then made by comparing the voltage on the capacitor to the voltage on the capacitor from the previous bit. If the voltage is greater, then photocurrent has been generated during this bit decision and an optical '1' is received. If the voltage is smaller (due to the charge subtracted by the feedback loop), then an optical '0' is received. The receiver implemented in [14] was able to resolve a 11-pA input current at 1.6-Gb/s with a power consumption of only 3-mW. The design was implemented in a 0.25-tum CMOS process with a photodiode capacitance of 420-fF. Figure 1-3f shows an updated version of [14], as presented in [6], as part of a complete optical transceiver. In [6], the receiver demultiplexes the data stream into five parallel receivers by means of five different clock phases. Each of the data receivers sequentially uses two consecutive clock phases to implement the double-sampling previously described. This allows the receiver to integrate the input photocurrent for an entire bit-time and then evaluate the decision for another bit-time, significantly increasing the sensitivity. This yields a high data rate while only using a single photodiode and a relatively low-speed clock. 1.4 Contributions of this Thesis This thesis presents the first-generation implementation of an optical data receiver for monolithically-integrated silicon photonic interconnects. The optical receiver presented is a photocurrent-sensing receiver that amplifies the input signal by means of a modified sense amplifier. The amplification is enabled through the positive feedback of two cross-coupled inverter structures in the latch. The challenge in this work is to design a highly energy- and area-efficient receiver for massively integrated applications, that is is robust to process and noise in a highly digital environment. The additional practical challenge associated with this work is that monolithic silicon photonic devices are still under development, so this receiver has to be flexible enough to accomodate a large range of device variations. The issue is compounded by the fact that the work is done in a pilot 32-nm process, where creating a large and complex chip is extremely difficult. 1.5 Summary In this chapter, the use of optical interconnects for emerging multi- and many-core processors was presented. The main building blocks of an electrical-optical-electrical link were outlined, with particular attention paid to the optical data receivers. The two main types of optical data receivers (current-sensing and current-integrating) were contrasted against each other. Finally, the design and implemenation of a highlydigital, monolithically-integrated optical data receiver in a 32-nm process was then introduced as the goal of this thesis. Chapter Receiver Design The optical receiver presented in this work is a highly-digital, synchronous, currentsensing data receiver that interfaces with a monolithically-integrated photodiode. The simulated receiver will be shown to be able to resolve input photocurrents of less than 50-pA at 5-Gb/s with a power consumption of less than 500-pW (100fJ/bit). In this chapter the optical receiver is introduced. An overview of the receiver illustrates its basic operation. Each of the parts of the receiver is then presented and examined in more detail. 2.1 Recever Block Diagram + Buffer + Latching Sense V SAmplifier V n Photodiode Clamp Buffer Vbuf- Vbuf + Vout + Dynamic-to-Static + Converter -Oglobal enable - Vout- Clock Enable Figure 2-1: Block diagram of the optical data receiver implemented in the EOS2 test chip. i -i-^lii) i-iiilii--i:: --- i--i----~~~-'~-~-i'~~~~W*-i*~~il-.i~~ ~i~-~i The optical data receiver presented in this thesis is shown in Figure 2-1. The receiver consists of several highly-digital blocks, including one that provides feedback. The front-end of the receiver consists of a monolithically-integrated photodiode connected directly to a modified latching current-sense amplifier. The output of the latch (dynamic signals Vdy,+, -) is then buffered and converted to static signals V,,t+, - that are valid over an entire bit time. The buffered signals Vbuf+, - are also used to drive a feedback block that shorts the photodiode once the bit decision has been made. The entire receiver is driven by a clock that can be disabled if the receiver is not to be used. Disabling the clock eliminates switching-power consumption in the rest of the receiver. The only block that would consume static current is the offsetcompensation block inside the Latching Sense Amplifer. A separate configuration bit disables the bias in this block, eliminating power consumption in the disabled mode. The optical receiver operates in two clock phases, receiving 1 bit per clock period. In the reset phase (first half clock cycle), the sense amplifier is reset to a metastable initial state. In the evaluation phase (second half clock cycle), the latching sense amplifier evaluates the current generated by the photodiode. By the end of the evaulation phase, the output of the sense amplifier has been resolved to a digital value that is then stored at the onset of the next reset phase. 2.2 Photodiode The main element that allows the optically-modulated data signal to be converted into the electrical domain is the photodiode, which converts incident optical power into electrical current that can be detected. The design, simulation, and implementation of the photodiode used in this work was done by Jason Orcutt. Some key modelling parameters and theory of operation are presented in this section. 2.2.1 Semiconductor Material The mechanism by which a photocurrent is generated by an optical optical signal is the absorption of a photon, and consequently the generation of an electron-hole pair in the depletion region of the diode. The electric field that exists in the depletion region then causes the electron and hole to be swept to the n- and p-doped terminals, respectively, creating a current that can be measured at the diode's terminals. Some of the fundamental parameters to consider are the bandgap of the material, which describes the amount of optical energy required to excite an electron from the valence to conduction band and create an electron-hole pair, the wavelength of the incident optical power, which describes how much energy each incident photon has (Equation 2.1), and whether or not the the semiconductor material has a direct or indirect bandgap. The absorption coeffecient, a [cm - 1] describes how much light is absorbed by a material, and can be plotted as a function of wavelength. Ephoto = he (2.1) Ge E E Eg,d, 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Wavelength [pm] Figure 2-2: The absorption coeffecient plotted against wavelenth of light [15]. Figure 2-2 shows the absorption coefficients for silicon and germanium at various wavelengths. Note that since silicon has an indirect bandgap structure, its absorption rolls off slowly. Germanium, which has an indirect bandgap around 1.8-um also rolls off, but at its direct bandgap between 1.5 and 1.6-nm the absorption increases quickly with photon energy. The photodiode material used in this work is a Silicon-Germamium (SiGe) alloy. The availability of this material is a fortunate coincidence. SiGe is becoming more common in advanced CMOS processes, as the Germanium is implanted into the drain regions of the PMOS transistors in order to improve mobility. We exploit the availability of the Germanium in order to create the photodiodes. With the photodiode created using SiGe, and the optical waveguides (which should have as little absorption as possibile) created using polysilicon, an appropriate range of wavelengths for the light in this work can then be selected. The wavelength should be large enough that the light is not absorbed by the poly-silicon waveguide, but also small enough that it can be absorbed by the high-absorption direct-bandgap of the Germanium in the SiGe. The exact wavelength used will also be a function of the mole fraction of Germanium, with the anticipated range for this process being 1200-1300-nm. 2.2.2 Geometry Once the semiconductor materials for the photodiode have been selected, the geometry must be designed. The layout of the photodiode relative to the direction of the incident optical power depends on the application, and will affect design metrics such as the efficiency and responsivity of the diode, which describe how much current will be generated per incident optical power. The efficiency of the photodiode describes how many electron-hole pairs are generated per incident photon. The responsivity is defined as the amount of photocurrent generated per incident optical power, and can be used directly in a link-budget calcuation. In the case of the EOS2 test chip, the incident optical power is propagating down a poly-silicon waveguide, and must be absorbed near the termination of the waveguide. Figure 2-3 is a simulation result that shows the intensity of the optical power through --- , I _I Figure 2-3: A cross-sectional view of the waveguide showing the mode propagation. [Figure courtesy of Jason Orcutt] the core of an integrated waveguide. It is the portion of the mode outside of the core that will be absorbed. Figure 2-4 shows a model of the photodiode placement relative to the waveguide termination. The model (not to scale) shows that the photodiodes are made using the same p-n junctions as in the MOSFETS of the process. The p-n junctions are oriented radially away from the waveguide, ensuring that there is an absorptive region along the entire length of the waveguide. Photodiodes are placed on each side of the waveguide in order to maximize absorption. The terminals are then tied together using the metal layers, in order to collect the photocurrent in parallel. A final layout picture of the photodiode can be seen in the far-right of Figure 4-3b. 'I- -,~ r - - -- I Figure 2-4: A 3D model of one of the photodiodes implemented in EOS2. 2.2.3 Photodiode Model One of main challenges with designing any system composed of both optical and electrical elements is properly modelling one set of elements in a manner that fits into the design environment of the other. We needed to create a model of the photodiode that could be integrated into the existing electrical design flow. The basic functionality of the photodiode is represented by a current source, representing the photocurrent that is generated when the photodiode is excited by an optical source, and a capacitance that is seen across the diode's terminals (Figure 2-5). *l on,off diode Figure 2-5: Photodiode Model In this work, the modelling is particularly challenging, as all of the optical elements are implemented monolithically in a CMOS process with relatively little information I I ~- about critical doping concentrations and other photonic design and variation parameters. As a result, we use reasonable estimates for the diode's parameters. The dark current of the diode should be very small and is therefore negligible. This is because the p-n junctions used are the same as those used in transistors, and the dark current will be on the same order as any leakage. The relatively small dimension of the diode enables fast operation even without a reverse bias, enabling the interesting current-sensing receiver described in this work. The difference between the diode's on- and off-currents was taken to be 10-dB, due to the expected extinction ratio of the modulator implemented on the chip. The diode capacitance used is 10-fF. Note that this capacitance is significanly smaller than other diode capacitances in the optical communication literature, due to the fact that it is monolithically integrated. While this is certainly good, the penalty that is paid is that in a monolithic implementation, the photodiode material selection is limited only to what is available in the process. The absorption, and correspondingly the photocurrent and sensitivity, therefore potentially suffer from monolithic integration. 2.3 Latching Sense Amplifier For this thesis work, an optical data receiver based on a current-sense-amplifier is proposed. The current-sense-amplifier with the photodiode connected to the input terminals is show in Figure 2-6. As described in Section 2.2, the photodiode is modelled as an ideal current source in parallel with a capacitance. The latch design leverages the fact that the photodiode used can be operated without a reverse-bias applied across its terminals. This allows for a fully differential, highly digital receiver to be designed, without having to worry about biasing the diode. In the reset phase, all of the latch's terminals are precharged to the supply. The photodiode is shorted. During the evaluation phase, transistors M 1,2 are enabled by the clock, and each branch of the latch begins to discharge. If an optical 1 is received, then a photocurrent will be generated, causing current to flow from the node ------lr~'-"~~"~""~"-*ri;i~:r-^-)?ic~~~ ~ Figure 2-6: The latching data receiver implemented on the EOS2 test chip. in- to node in+. This will cause the negative branch of the latch to discharge more quickly, and therefore latch low. In the absence of a photocurrent, the branches would discharge at an equal rate, and (without offset compensation) the latch would not be able to make a decision. Offset compensation will be examined in the next section, and can help the negative branch to latch high in the absence of a photocurrent. The latch's transistor sizes were selected based on speed and sensitivity constaints, as detailed in [16]. NMOS transitors M1,2 are ideally made small, in order to maximize the evalutation time of the latch. Non-minimum-size devices were used in order to achieve the target data rate. The latch itself consists of two cross-coupled inverters (NMOS M3 ,4 and PMOS M5,6 ). In each of the inverters, the NMOS-to-PMOS sizing ratio was made larger than that used for a balanced inverter. This serves to lower the threshold of the cross-coupled inverters, and again increases the evaluation time where the positive-feedback integrates the input signal. Resetting the latch is achieved through transistors M7 ,8,9,10 which precharge all of the internal nodes high. Transistor M11 further balances the two latch branches in the reset phase. Transistors M 7,8,9,10,11 are ideally small to reduce parasitics. 2.4 Offset Compensation Figure 2-7: 7-bit offset compensation scheme for latching data receiver. The latching sense-amplifier presented in Section 2.3 is a differential structure and is nominally perfectly balanced. This balance can be disrupted through process variation, which can only be compensated after fabrication, and so a programmable offset compensation block must be implemented. The offset compensation must have a range that is sufficiently large to counter the effects of both optical and electrical variation, but also enough resolution to measure the eye diagram in situ as will be presented in Section 3.5. The offset compensation is also required to upset the balance of the latch during nominal operation. When an optical-1 is received, the photodiode in Figure 2-6 generates a current from node Vi,- to Vi,+. This will cause Vi,+ to discharge more slowly and therefore latch high. When an optical-0 is receiver, though, little or no I L I ---------------- .. .......... i : -lil- - - - - - - -- photocurrent will be generated, and the latch input will appear to be approximately zero. Without compensation, the decision that the latch will make will be random and dependant on noise at the inputs. Offset compensation is therefore used in order to raise the threshold of the latch, so that V,,t+ will always latch low for an optical-0. 140 - -Clock Speed:5GHz 120 100 80 60 . -. ............. . 40 20 ,. . 0 0 i . 20 40 i. 60 80 100 120 1 DAC Code (a) Offset Compensation Codes S1LSB=uA ......... .... : i .................... * 1 LSB= 1.1uA .. .................. ..... . : . ... : .... .... ... .... .... ... .... ... .... ... .. .. , ... ,, i r 0 :- 6646 .. 1- j- : * -1 "2 0 20 40 60 80 DAC Code 100 120 140 0 (b) Integral Nonlinearity Error * 20 40 80 60 DAC C ode 100 120 140 (c) Differential Non linearity Error Figure 2-8: Offset compensation performance. The offset compensation is shown in Figure 2-7. The compensation consists of a digitally-programmable binary-weighted current mirror (transistors M30-M43) that pulls additional current off of one of the two input nodes shown in Figure 2-6. Transistors M30-36, which have the same gate bias, have binary-weighted widths to generate the binary-weighted currents. Transistors M3 7- 4 3 control which branches are enabled, and are sized the same as M30-M36 to ensure that branches carrying more current have a correspondingly smaller impedance. The node that is compensated is controlled by transistors M46 and M4 7 . The gate bias for the binary-weighted current branches is generated by transistors M25- 29 . In the event that offet compensation is ~! !! . . . . . . . . . ... . . . .. . . .= .. _ ... . .. .. . i + not required, the current mirror can be completely shut off by turning M 24 on and M25 off. This sets the bias voltage equal to zero and prevents any current from being generated in the branches. Figure 2-8 summarizes the performance of the offset compensation block. Figure 28a shows the meta-stable input current that correspondes to of the 128 codes. The range of compensation was determined based on variation analysis that will be further analyzed in Section 3.2. It is interesting to note the non-ideality of the codes near the origin. this shift in the transfer function is due to the capacitance that is added just by turning on transistor M46 . The magnitude of the shift then represents the amount of current required to overcome this additional capicitance. Figures 2-8b and 2-8c show the integral-nonlinearity error (INL) and differential-nonlinearity error (DNL) of the DAC, respectively. These metrics describe the linearity of the DAC. In order to configure the offset compensation, the following steps should be carried out experimentally. Here they are done in simulation for the purpose of this thesis. The results are summarized in Table 2.1. 1. Determine the threshold for the receiver when the input bit is an optical 0. 2. Determine the threshold for the receiver when the input bit is an optical 1. 3. Set the offset compensation to the code half-way between the two thresholds found in steps 1 and 2. ION,IoFF Optical '0' Optical '1' Optical DAC (p 1 A) 120,12 100,10 80,8 60,6 40,4 20,2 Threshold 9 8 7 6 5 2 Threshold 103 80 61 44 29 14 Threshold 56 44 34 25 17 8 Configuration 100 0111 101 0011 101 1101 110 0110 110 1110 111 0111 Table 2.1: Example configuration data. Threshold values are found using Figure 2-7 2.5 Latch Output Buffer Vdy n +/- >0, Vbuf-/+ Figure 2-9: The output buffer of the optical receiver. The latch shown in Figure 2-6 is followed by a buffer stage. The buffers are implemented as inverters, as shown in Figure 2-9. The purpose of the buffers is to increase the sensitivity of the latch at the cost of delay. For a given clock speed, the sensitivity is improved as the capacitance seen at the output nodes is a much weaker function of the previous bit received. This is because the capacitance looking into the dynamic-to-static converter, presented in section 2.6, depends strongly on the previous bit, which it holds on floating nodes until the decision for the current bit is made. During the implementation of the EOS2 test chip, an error was made in the placement of this buffer. The output of the latch was fed directly into the input of the dynamic-to-static converter. The buffered signal was used only in feedback for the photodiode clamp. Not only does this error mean that the capacitances seen at the decision nodes of the latch will depend on the previous bit, but the feedback clamp is also delayed by an inverter delay. The simulation results presented throughout the thesis use the flawed schematic just described, in order to be able to more closely match measured results with simulation. The advantage of the flawed design is that there is less delay between the clock signal and the output data becoming valid. 2.6 Dynamic to Static Converter The dynamic-to-static converter used in the optical receiver is shown in Figure 210. Dynamic-to-static conversion is necessary because the output decisions from the latching sense-amplifier are only valid for half of a bit-time (they are reset during the Vbuf- M21 M18 M19 M2 Vu- M20 M16 M1 M22 - Vout- V Vbu+ Vout + M M12 buf M13 Figure 2-10: Dynamic-to-static schematic. other half). The output of the dynamic-to-static converter is valid over the entire bit time. The dynamic-to-static converter consists of two cross-coupled tristate buffers (transistors M12 -19). While the clock is high, the latch output bits can be set through transistors M20-23 based on the decision of the sense-amplifier. During the receiver's reset phase (I=0), the tristate buffers become cross-coupled inverters, holding their previous values. The output dynamic-to-static converter's output nodes are not driving by any other signal, as V b; + /- are both high. 2.7 Photodiode Clamp Vbuf+ In+ Vbuf- In- Figure 2-11: The photodiode-clamping circuit used in the optical data receiver. The photodiode clamping circuit used is shown in Figure 2-11. The clamp consists of a single NAND-gate that drives an NMOS transistor. The NAND-gate is connected to the outputs of the sense-amplifier, before the dynamic-to-static conversion. During the precharge phase, the sense-amplifier outputs are both HIGH, and the NAND's output will be 0. During the evaluation phase, the sense-amplifier's outputs start off the same and then diverge to a decision. It is only once the divergence has occurred that the NAND's inputs differ, yielding a HIGH signal at the output that enables the clamping NMOS. Therefore, the photodiode is only shorted once the bit decision has been made. The photodiode must be shorted in order to prevent a forward-biasing of the substrate. At the end of the evaluation phase, the photodiodes terminals are connected to two virtual grounds (being held by transistors M1, 2 ). If additional photocurrent causes Vi,- to go below ground, then the M2 will have to start pulling the node up to ground, which an NMOS cannot do well. If Vi,- is driven low enough, the diode formed by the N-doped drain of M2 and the P-type substrate could turn on. 2.8 Clock Enable enable Figure 2-12: The clock-enabling block for each optical data receiver. The clock-enabling scheme is shown in Figure 2-12. The Figure shows that the clock used for each data receiver is derived from a global high-speed clock. By setting the enable bit low, the local clock will not oscillate, and the rest of the highly digital receiver will not switch, eliminating power consumption aside from leakage. 2.9 Summary This chapter introduced the highly-digital, synchronous, current-sensing data receiver presented in this work. The receiver's basic operation was outlined. Each of the components of the receiver was then presented, with operation and design decisions explained. In the next chapter, circuit simulations are presented in order to further illustrate the circuit's functionality, and also to demonstrate the achieved performance. Chapter 3 Receiver Simulation In the previous chapter we outlined the design of the optical data receiver, showing how each of the blocks were designed and work together. In this chapter, the full operation is shown with simulated results. The simulations further illustrate how the receiver operates, as well as analyze the performance achieved. 3.1 Transient Simulation The in-depth analysis of the optical data receiver starts by first looking at the waveforms at the nodes of the data receiver. To do this, a transient simulation is set up, as shown in the testbench Figure A-1. The current source of the photodiode's model is set to be a pulsed current source, with each pulse representing a bit of data to be received. The offset compensation is set such that the threshold value is placed between the 1- and 0-bit threshold values, as determined by Table 2.1. Figure 3-1 shows the receiver voltage waveforms during multiple bit decisions. In this example, the photodiode on-current was set to 100-ipA, while the off-current was only 10-pA, assuming an extinction ratio of 10-dB. The current signal (Figure 3ib) was initially low, and transitions HIGH in time for the first rising clock edge (Figure 3-1a). When the clock transitions high, the two photodiode input nodes (Figure 3-1c) both race toward ground. At 200-ps, the positive node has current being driven onto it from the photodiode, while current is being removed from the i;L ~ -- =----:~.-----r)l-;-~-i;_-~-_-1CI _ o 0.5 0 . 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.7 0.8 0.9 0.7 0.8 0.9 0.7 08 0.9 1 Time (ns) (a) Clock Signal Time (ns) (b) Photodiode Current 0.5 > 0 0.2 0.3 0.4 0.5 0.5 Time (ns) (c) Input Node Voltages z 0.5 0 0.2 0.3 0.4 0.5 0.6 1 Time (ns) (d) VDYN 0.2 0,3 0.4 0.5 0,6 Time (ns) (e) V,,t+ Figure 3-1: Transient simulation waveforms demonstrating bit decisions. negative node at the same time. This causes the negative node to race toward ground more quickly. Figure 3-1d shows the response of the dynamic decision nodes of the receiver (labeled Vdy, + /- in Figure 2-6). During the reset phase, when the clock 44 is LOW, all of the internal nodes are pre-charged HIGH. When clock goes HIGH during the evaluation phase, it is clear that if an optical '1' is being received, the cross-coupled inverters cause Vdan+ to latch high and Vdayif an optical 'O' is being received, such as at 600- and and Vday- 8 0 0 -ps, to latch low. Similarly, Vdyn+ will latch LOW will latch HIGH. The outputs of the basic receiver are dynamic however, and must be integrated into a static system. Figure 3-le shows the positive output of the dynamic-to-static converter. The data at this node is valid for an entire bit period. Latch Offset Compensation Output Buffer Dynamic-to-static Converter Photodiode Clamp Clock Enable Power (uW) 110.2 237.5 2.4 12.6 0.3 53.4 Total 416.4 Table 3.1: Power consumption for each block of the data receiver during 5-GHz operation with a 100-uA 'on' photocurrent. Power consumption analysis is presented in Table 3.1, where the power of each block is shown for operation at 5-GHz and a photodiode 'on' current of 100-uA. From the table, it is clear that the offset compensation block dominates the power consumption. Despite the use of current mirrors with ratios larger than 1, this power results from the total range required by the compensation, while ensuring that all mirroring transistors are near saturation. It is important to note that this power consumption will decrease should the offset not be required to shift so far. This may be the result of a smaller photodiode 'on' current, or from process variation. In the former case, the problem that would arise is that the measurable eye-diagram would have less vertical resolution. That is, the number of offset codes required to sweep out the vertical opening of the eye would decrease. Therefore, future work should address an offset compensation scheme that can add resolution without requiring such a large photodiode 'on' current (which requires a large compensation power). A .__. ~~;~~__;_~~_~ ~ ~. .~~. ~.._ _.....~.....l__,x I-I- -- ------- ;l;x r. programmable array of gate capacitances is a good solution. The energy-per-bit is computed from Table 3.1 to be 83.3-fJ/bit. From Table 1.1, it is clear that the energy-per-bit of the data receiver is competitive with the current state of the art, though it only represents simulated results. Power was measured using the method presented in Appendix B. 3.2 Offset Compensation Simulation The offset compensation block of the optical data receiver was introduced in the previous chapter, and is a particularly crucial block. It is the means by which the threshold of the receiver can be set in order to accomodate different incident optical powers. It is also the means by which the eye diagram of the receiver will be measured. As a result, the offset compensation transfer function must be determined carefully. The testbench for the offset compensation simulation is shown in Figure A-2. The figure shows that for each experiment, the input photocurrent is set to a DC value, and an offset compensation code is set. A transient simulation is then run, with a 5-GHz clock driving the optical data receiver. For each compensation code, the DC current value is increased over multiple simulations. When the positive output of the receiver eventually latches positive, it means the meta-stable current-code pair has been determined. The resulting transfer function was presented in Figure 2-8a. 3.3 Setup Time One of the main metrics in evaluating the performance of a digital block is the relationship between the setup time and the delay. The setup time is defined as the time between a change in the input data and the rising edge of the clock. The delay through the block is the time from the positive edge of the clock to the change in the output of the block. In this section we examine the effect of the data input's phase and magnitude on the ability of the data receiver to evaluate the correct bit. Figure A-3 shows the testbench for the setup time test. The figure shows that the phase of the data transition relative to the rising clock edge was swept (Tsetup). The propagation delay from the positive clock edge to the final latching of the data (Tdelay) is then plotted against the setup time, with power recorded for each measurement. Note that delay was measured as the time between each waveform reaching half its final value. For the optical data receiver proposed, the propagation delay was recorded for a range of setup times across several input signal strengths (which all assumed an extinction ratio of 10-dB). The results are shown in Figure 3-2. Figure 3-2a shows the propagation delay plotted against the setup time for a range of input photocurrents. The plot shows that for larger input optical powers (and therefore larger photocurrents), the propagation delay through the optical receiver decreases. This directly relates to how quickly the receiver can be clocked while still receiving the correct bits. Figure 3-2a also shows that as the setup time decreases, the propagation delay through the latch increases. This is due to the fact that the input photocurrent must charge various parasitic capacitances at the input terminals in order to impact the circuit. Figure 3-2b shows another performance metric: the total delay of the latch, equal to the sum of the setup time and propagation delay. From the plot, it appears that the fastest operation will be achieved from minimizing the setup time. Figure 3-2c shows the power consumption of the receiver for each of the input signal strengths. It is interesting to note that the power actually increases as the signal strength increases. From Table 3.1, we know that this is because a larger threshold shift (and therefore larger offset current) is required for larger input optical powers. Note that the discrepency between the larger total power in Figure 3-2c with respect to the number reported in Table 3.1 is due to a set of buffers that were included in the testbench. The buffers were test structures and Table 3.1 is the true power measure. _L I 1111 ill : --- 120-uA --.... 80-uA ..... 100-uA + ++ + . + :o Scooo .000. .00 0. I 0 -10 + + + ++ ++ I 10 0,00o. * 60-uA o 40-uA + 20-uA . I 30 20 i 40 5 Ttp (ps) (a) Propagation Delay plotted against the setup time. (a) Propagation Delay plotted against the setup time. Tetup (ps) (b) Total Delay plotted against setup time. i- - -- ----- 120-uA 10 0-u A ....... ................... .................. . ............. .......... ... * + * ** *+ * 444** ++++++ Ii 0 t + ++ 10 ....................... * * + 4* * * . .. .. .. .. . .. .. .. .. .. .. .. .. . + +:+++++ I -10 .* * ++-+ i 20 Tsetup (pS) * 4* * * * 30 60-uA 6 0-uA 40-uA + 20-uA * . ... . .. ++++ + +:+++++ 0 **4 * . .. .. . +++ ++ 40 (c) Average power over a bit period plotted against setup time. Figure 3-2: Propagation delay through the latch plotted against setup time. The input photocurrent was varied, and the input data is switched from 0 to 1. Clock frequency is 5-GHz. 3.4 Characterization of the Receiver's Sampling Aperture This section characterizes the sampling aperture of the latch-based optical receiver. The sampling aperture gives us a way to measure the bandwidth of the receiver. The characterization follows the analysis presented in [17], where the latch's bit decision is represented as a weighted time-average of the input signal. One difference between the work in this thesis and that in [17] is that our input signal is current-mode. The DC gain that we present here is a then a transimpedance gain as opposed to a unitless one. The testbench for the aperture characterization is shown in Figure A-4. The input signal consists of a photocurrent step that again assumes a 10-dB extinction ratio. The step's phase relative to the clock is swept, and for each phase, the receiver's threshold is swept by changing the offset compensation code. The latch's bit decision is recorded. The phase-configuration-bit space can then be plotted, showing all of the metastable points and yielding an eye diagram. Figure 3-3 shows the the simulation results from the receiver's aperture analysis for a 0-1 step transition (the results for a 1-0 step transition can be found in Appendix C). Figure 3-3a shows the offset current required to bring the latch to a meta-stable state for each phase of the input step. From the figure, we can see that if the step signal arrives early (negative phase), then the input nodes have time to settle and the input photocurrent appears as a DC signal. The meta-stable code is then predicted by looking at Figure 2-8a. As the step occurs closer and closer to the clock edge, its photocurrent will have less and less impact on the latch's decision. Less offset current is required by the compensation block to match the input signal, and we see a decrease in the metastable code. In the extreme, when the bit arrives much later than the clock edge, the offset compensation block must only match the off-current of the input signal. Again, this appears as a DC-input that can be predicted by Figure 2-8a. Implicit in the code plotted in Figure 3-3a is any non-linearity in the current-DAC ~r -. -.r-- ---,-..rr-x---- ---r--;; - ;-- i:r---iici-r;-;i-ii.;cr;;;.ii i;.i;r--a;;i;r~:il---r;r~r~li;r~;=r--l -~~~;ir~~B~" ;;~~ ':;~-~~~~i w~~;i_:: 70 - 40-uA - - 60-uA --- 80-uA 60 50 40 30 --- . ............... 20 20 10 0 -100 K ......... L -50 0 50 phase (ps) 100 150 0 -100 -50 0 50 phase (ps) (a) cMS(-r) 100 150 (b) IMS(7) x 1010 phase (ps) phase (ps) (c) SSF(T). (d) ISF(r-). 235 234 ,_ 231 230 40-uA -- 60-uA -- 80-uA 0 200 400 600 Frequency GHz (e) FFT(ISF) 800 1000 229 228 -----0 40-uA: BW=14.9-GHz 60-uA: BW=14.9-GHz ............. 80-uA: BW=14.9-GHz 5 10 Frequency GHz 15 (f) FFT(ISF) (zoom view) Figure 3-3: Sampling apertuture characterization for 0-1 step input. i=. -; shown in Figure 2-7. Using the transfer function in Figure 2-8a, the meta-stable code (cMS) is mapped to a meta-stable input current (IMS), shown in Figure 3-3b. The Step Sensitivity Function (SSF) is defined by Equation 3.2, and is used to plot a normalized version of the IMS, shown in Figure 3-3c. If the slope of the step transition is large, it means that the receiver is very sensitive to the arrival time of the input step. The receiver's sampling aperture would be considered fast. SSF-() Is(T) norm( ISForm(T) maz(IMs(r) = - min(Ins(T)) ) - min(Ins(T)) SSForm(T) (3.1) (3.2) The Impulse Sensitivity Function (ISF) is defined in Equation 3.2 as the derivative of the SSF, and is plotted in Figure 3-3d. Through examination of the SSF, we determined that an SSF with a steeper slope has a smaller aperture in time during which it samples its input. This is seen directly in Figure 3-3d, where a receiver with a smaller aperture would appear to have a sharper impulse. Since this impulse the same impulse that we are using to sample our imput, it must be thin enough (have enough bandwidth) to accomodate our desired data rate. As discussed in [17], the bandwidth of the latching receiver can be computed by taking the FFT of the ISF and measuring the 3-dB bandwidth. Figures 3-3e and 3-3f show the FFT of the ISF. From the figures, the bandwidth of the receiver is clearly larger than 10-GHz for each of the input current step sizes measured. This verifies that the latch is fast enough for the 5-Gb/s target data rate. 3.5 Eye Diagram In the previous section we used the SSF and ISF to characterize the sampling aperture of the latching optical receiver. We can also use the cfgMS to create an in situ eye diagram. In order to obtain the eye diagram, the IMS is recorded for a different bit transitions (010,100,101,000,etc) and then superimposed. The resulting I 80 60 V 40 ...... 40 uA 20 .. -100 -50 0 50 100 150 time (ps) 200 250 300 350 (a) 40-uA step input. 60 ...... C_ 40 -....... 60 ......... 20 . . -100 .......... .......... ....... .... t-.....6 OuA .. .. -50 0 50 100 150 time (ps) 200 250 300 350 200 250 300 350 (b) 60-uA step input. 80 .... -100 50 0 50 100 150 time (ps) (c) 80-uA step input. Figure 3-4: In Situ eye diagram from simulation. plot is shown in Figure 3-4, and represents a plot of metastable currents for each phase. The plot is used to determine the optimal sampling phase and threshold value for each optical signal power. It is important to emphasize that this is an experiment that will be performed onchip, and will be the only method to measure an eye-diagram. The difference between the chip measurment and the one presented here is that while the eye diagram shown in Figure 3-4 is derived from a single ideal transient simulation for each phase and offset value, the on-chip measurement will take the average of 220 such simulations for each phase and threshold value. 3.6 Process Variation One of the major challenges with using such an advanced CMOS process as 32-nm is combating the process variation and mismatch. These effects were mitigated by implementing an offset compensation block with sufficient range to accomodate the worst variation. The offset compensation block was presented in Section 2.4. In this section, the performance of the compensation block is examined through variation analysis. Process variation is first studied through examination of the threshold shifts of the latch for a DC input current over several Monte Carlo simulations. The testbench for the experiment is shown in Figure A-5. The figure shows taht for each simulation, the input photocurrent is set to a DC value. A transient simulation is then run with the offset compensation code slowly incremented through the transient simulation, such that each code will be valid over several high-speed clock periods. When the code is incremented to the point where the positive output of the data receiver transitions from high to low, then the meta-stable code that corresponds to the DC input current for that Monte Carlo run has been found. Note that this threshold measurement differs from the one used in Section 3.2 because if a new transient simulation was run for each offset code, the Monte Carlo parameters would be reset. The results of the DC-input variation analysis are shown in Figure 3-5. The plots show a transient simulation with a duration of 2560ns. The codes are swept from -127 through 127, as shown in Figure 3-5a, with each code being valid for 10-ns. With a clock rate of 1-GHz used for this experiment, each code is valid for 10 clock periods, eliminating the effects of switching the codes during the simulation. Each input current bias was simulated with 30 Monte Carlo runs. The threshold for each run was recorded, and the mean and standard deviations computed. The Gaussian distributions are superimposed onto the threshold plots in Figures 3-5b- 35f. rl""l~u"~""m"l""x~""-~""""~ 50 . . ..... ...... .. . . . .. ...... .: . 0-C : i0 0,5 15 2 25 time (us) (a) Offset Code S. .................... ........ .........SI I--"m'°°' 0AC0 I I I 1____ J................. L................ I.-r . Nominal DAC=0 U 0 0.5 1 1.5 2 2.5 time x10 (b) Idiode=0-pA II I . 0 5 . ............. . .-. .. I0 0" 0.5 .. ........... .........I......... r .. ....... .......... .. ............... ": 2 1.5 Nominal DAC=16 ....... .. ................. ., 1 - mean=13 stdev=28 2.5 time x lo (c) Idiode=20- A I I r _ . _ ............ I.................. . 0.5.............. .. V n - - 0 -. r...... 0.5 1 - ................. - Nominal DAC=31 mean=28 stdev=22 " 1.5 2 2.5 time x 10o (d) Idiode=40-pA 1 MI ! 0. ............... IO r....... ................. , .............. ....... .......... ........ ........ . . . . . .. . .! 13 - -' ' -....... 0.5 1 Nominal DAC=45 mean=39 stdev=27 2 1.5 -- ....-..- 2.5 time x lO. a (e) Idiode=60-/tA .... I "-. .. , ................ . -- 05................. :................. :..,/ .......... "................. .. i...... . .... .... 0.5. ..... ., - - Nominal DAC=63 mean=64 - stdev=29 0 0.5 1 1.5 2 time 2.5 e x 10 (f) Idiode=80-pA Figure 3-5: Transient plot of the positive output terminal of the receiver during a DC input photocurrent. Superimposed on the plot is the distribution of threshold values from monte carlo simulation. From the plots, it is clear that the peak of the Guassian distribution is located near the nominal threshold crossing with no variation, as expected. The figures also show that Guassian curves spread themself over a wide range of offset codes. While this partly implies that the variation is severe, it also shows that the offset compensation has good resolution in this implementation, meaning that the offset code required will be quite close to the true meta-stable threshold. The fact that the Gaussian curves largely flatten out within the range of the compensation code indicates that the offset compensation scheme also has sufficient tuning range. The triple-transition seen in Figure 3-5b is due to the parasitic capacitance of the branch-selection switch itself. It is clear that when the branch selection switch is moved from the positive-branch to the negative branch, even when the offset code is zero, there is some ambiguity created in the threshold of the comparator. Finally, while the Gaussian curves from each of the plots overlap each other when superimposed, implying that there exist variation combinations where receiving data is impossible, it is important to remember that each data point represents a different Monte Carlo simulation, and the metastable points for the on- and off-currents will shift together for a given run. 3.7 Summary This chapter built upon the understanding of the optical data receiver presented in Chapter 2 by analyzing the performance of the receiver through a series of experiments. The basic implementation and operation of the optical data receiver was introduced, with assumptions made on the type of optical signal that will be received. Then the main offset tuning mechanism was explored through its own calibration testbench. The offset compensation block's ability to mitigate the effects of process variation was presented, indicated that the receiver should work even in this advanced CMOS process. The speed of the latch was explored by examining the different delays associated with its operation. Finally, a sophisticated method for measuring an in situ eye diagram yielded a predicted bandwith for the receiver. Chapter 4 EOS2 Test Chip Our research team has executed a test chip called EOS2 that implements the main photonic building blocks listed in Section 1.3. The purpose of the chip is to serve as a test vehicle through which each of the components can be fully characterized. The chip contains many different combinations of modulators, receivers, and waveguides in order to develop an experimental understanding of the characteristics of each design. The large degree of redundency serves to combat the unknown severity of process variation in the pilot 32-nm process run. The EOS2 chip was submitted for fabrication in the Spring of 2008, and is due to return in the Fall of 2009. A layout photograph is shown in Figure 4-1. 4.1 Chip Organization The EOS2 chip is composed of optical components and the electrical circuits that interface with them. The electrical circuits are grouped into cells, where each cell contains two modulator drivers, two optical clock receivers, and two optical data receivers. The cells also contain test structures such as Pseudo-Random Binary Sequence (PRBS) generators, counters, and data snapshots (Figure 4-2). Programming of the configuration bits is achieved through a single scan chain which is routed throughout the chip. Configuration bits are first shifted into the scan chain, and then loaded into their respective circuits. - - II - - II Figure 4-1: Layout photo of the EOS2 chip. II I I I II I I !~ I I i tI Buffers t ! ! I 1 !I i ! !I ! i High-speed clock buffer L---- ---- --- ---- -- ---- ---- --- ---- --- Figure 4-2: Organization of the EOS2 electrical cell. - Ir NOW= - - -Optical Fiber I - - Un-modulated Light External Laser Chip Boundary Poly Waveguide I M Vertical Coupler Modulated Ring Ivlodulator , Light I | Modulator Driver Optical Data Receiver (a) Block diagram of a photonic link implemented on EOS2. Clock Rx (x2) Modulator (x2) and PRBS Data Rx (x2) 220-pm (b) A single optical link in EOS2. Figure 4-3: An optical link in EOS2. The electrical cells are connected to variations of photodiodes, waveguides, and modulators in order to fully characterize those components. Figure 4-3 shows the architecture of the EOS2 optical links. Figure 4-3a is a block diagram of the optical link implemented within a single cell. Figure 4-3b is the coressponding layout. In this link, light from an off-chip laser is coupled onto the chip through the vertical - I I -I ar coupler (lower-left corner of Figure 4-3b. The light propagates along the waveguide, and is modulated by the ring modulator in the center. This modulated data signal then continues along the same waveguide until it is fed into the photodiode (top right-hand corner of Figure 4-3b. The photodiode converts the optical signal into a photocurrent, which is then detected by the data receiver. The electrical test setup within each cell is shown in Figure 4-4. The data originates in the transmitter (TX) block, propagates through an optical waveguide, and is received by a photodiode and the receivers of the receiver (RX) block. On the transmitter's side, data from a PRBS/pattern generator drives both the modulator, and a snapshot block. The PRBS/pattern generator can be configured to generate either a pseudo-random bit sequence with a known starting point, or seed. It can also be configured in a racetrack mode, with a pre-programmed 32-bit pattern repeating itself. The snapshot is used to capture the most recent string of data that was driven into the modulator. Scanchain Modulator Driver Snapsho RxData-2 PRBS aSeCounter Snapshot RxClk-1 RxClk-2 Figure 4-4: Block diagram showing cell operation. On the receiver's side of the block diagram, there are two types of input paths: that for the optical data receiver and that for the optical clock receiver. In each cell, there are two optical data receivers. One is connected to an optical circuit such as that shown in Figure 4-3. The other is connect to either a vertically-illuminated photodiode, or a waveguide connected directly to a vertical coupler. This second photodiode is driven by optical data that has been modulated off-chip, and is a backup testing option in the event that the on-chip modulators do not work as expected. The data receiver is selected by of a mux that drives a 32-bit snapshot block. The snapshot records the 32 most recently received bits. The receiver side also contains two clock receivers. The clock receivers are driven by an off-chip optical clock, and are tested by counting the number of output electrical pulses relative to a reference counter driven by an electrical clock. A mux is again used to select which clock receiver is in operation. The counter in the receiver block is a 20-bit pulse counter. The input to the counter is driven by a third mux that selects between the data receiver's path and the clock receiver's path. For the data receiver, the counter will be used primarily for the measurement of an in situ eye diagram. It is important to be clear that in this design, a bit-error-rate measurement cannot be performed on-chip. 4.2 Digital Test Structures This section presents the implementations of the digital test structures used in each cell of the EOS2 test chip. 4.2.1 Scan Chain The EOS2 test chip is fully programmable by means of a single scan chain consisting of a long array of registers. The scan chain consists of two different types of cells: one for writing bits to the chip and one for reading bits from the chip. The scan chain is controlled through five low-speed signals: * Scan In (SI): the data bit to be loaded into the chip * Scan Out (SO): the data bit to be read out of the chip * Scan Clock (SCLK): the clock signal for reading and writing bits on the chip * Scan Enable (SE): controls whether data is being scanned into the chip, or whether the circuits-under-study are being run * Scan Update (SU): controls the loading of configuration bits into the circuitsunder-study Since the scan signals must be controlled from off-chip, the clock frequency used to program the chip is significantly slower than the targetted data rate. The relatively slow reprogramming of the chip is a limiting factor in testing. bCircuit bCircuit 0 SI DQ SI SO 0 1 D QSO SE SE SCLKooK ._LKSCLK SO ! S1K DQ SU U... SC xSCLK SCLKnex (a) Scan chain read cell. L; K (b) Scan chain write cell. Figure 4-5: Block diagram of scan chain cells. Figure 4-5 shows the block diagram for the scanchain's read and write cells. Figure 4-5a shows the read cell, which is used to read a value back from the chip (through pin bCircuit). The bit will then be shifted out of the chip through the scan chain and analyzed. Figure 4-5b shows the scan cell that is used to write a configuration parameter into the chip. It is nearly identical to the scan chain read cell, except that an additional latch is added to the SO line. The latch is controlled by the Scan Update (SU) signal, and is transparent when SU is enabled. It is important to note that the relative phases of SU, SCLK, and SE are critical, as incorrect signalling may cause incorrect data to be written to or read from the chip. The correct signalling procedure is outlined in Appendix E. Each of the scan cells is clocked by the scan cell after it. That is, the direction of clock propogation is opposite that of the flow of data. This was done in order to avoid hold-time violations. 4.2.2 High-Speed Signal Synchronization The EOS2 test chip has a single scan-enabling signal SEnable for both the lowand high-speed blocks. When Senable is 1, all of the digital structures in the selected portion of the chip shift data to the next structure. This means that the PRBS is running in either of its two modes, the snapshot is receiving data, and the scan chain is shifting data across in response to Sclk. It is important to recognize that the while the transition edge of the Senable signal can be tightly controlled with respect to the other low-speed signals (Sclk, Sreset, Supdate, Sin, Sout), its phase relative the corresponding high-speed TX and RX clock edges is extremely difficult to determine, and will vary across the chip. As a result, it is possible that the Senable signal will transition near a high-speed positive clock edge in the local high-speed blocks. This could potentially cause some registers to update, based on Senable=l1 and others based on Senable=0O, corrupting the data. The phase issue was addressed SEnable D Q SEnable (local) Figure 4-6: High-speed SEnable synchronization. by re-synchronizing the Senable signal relative to the local high-speed clock phase. Figure 4-6 shows the synchronization circuit. In the synchronization circuit, a single register controls the local SEnable signal. On the positive high-speed clock edge, the local SEnable signal is updated with the global value set by the Senable pin. The local clock domain then has an entire bit time for Senable to settle, and the data will processed correctly. (a) A sweep of possible SEnable step phases. (b) The synchronized local SEnable signal. (c) Clock Signal. Figure 4-7: High-speed signal synchronization. 4.2.3 PRBS/Pattern Generator The PRBS/Pattern generator implemented in the EOS2 chip (Figure 4-8) consists of 32 registers and supporting logic. The PRBS can be configured to achieve two different modes of operation: PRBS mode and pattern mode. In the PRBS mode, the registers create a 31-bit Linear Feedback Shift Register (LFSR). The outputs from the 28th and 31st registers are summed and fed back into the first register in order to achieve the correct sequence [18]. The 32nd register in the block is bypassed. In the pattern mode, the feedback is changed and the 32nd register is not bypassed. A 32-bit pattern that has been loaded into the registers is allowed to shift arround in a circle. In both PRBS and pattern modes, the initial pattern is set through the scan chain. The same cells used to implement the scan chain's read cells were again used in order to be able to have a pattern loaded into the PRBS. The SO bit of each stage drives the SIN bit of the next, and all blocks share a common clock signal. A pattern SE 0 A . 1W II 1W clock -- 1 Figure 4-8: PRBS/pattern generator block. can be loaded into the PRBS through the circuitIn input of each scan cell. The figure also shows how the outputs of the 31st and 28th stages are summed in a 1-bit adder in order to create the necessary feedback bit. In PRBS mode, this bit is selected by a MUX and sent to the input of the first scan cell. In pattern mode, the output of a 32nd register is instead routed back to the first scan cell. Figure F-1 in Appendix F shows the first few outputs of the PRBS generator in PRBS mode. Table F.1 will help the reader map out the states of the PRBS generator and see how bit 0 and bit 3 are summed and fed back into bit 30. 4.2.4 Counters The counter blocks in the EOS2 test chip are used to perform on-chip measurements, with only the results being scanned out. This approach was necessary, as there are not enough pads on the chip to measure every signal directly, and bringing high-speed data off-chip is not a trivial task. The counter functionality implemented in EOS2 is shown in Figure 4-9. The two counters each consist of 20 registers, where the clock signal comes from the Qoutput of the previous stage, and each D-input comes from the stages own inverted Q-output. The counters are both reset to a full count of all '1' and count down. The low Figure 4-9: Output transition counter with reference. upper counter in the diagram is the reference counter and counts from 220 - 1 down to zero. An additional register serves as a 21st bit, indicated whether or not a zerocount has been reached. When the count reaches zero, the inputs to the reference and measurement counters are switched to ground from the reference clock and input signal, respectively. The data can then be scanned out of the chip and analyzed. 4.2.5 Snapshot The snapshot is a 32-bit shift register that records the most recent receiver ouput transititions. There is a snapshot for both the modulator driver and the data receiver. The data from the snapshot can be scanned out of the chip, and th TX and RX patterns compared. This can be used to determine the necessary phase difference between the TX and RX high-speed clocks. Figure 4-10 shows the block diagram of the snapshot implemented in EOS2. It is important to note that the snapshot uses scan cells with some feedback as opposed to regular D-flip-flops. This is done in order to preserve the data when SEnable is 0. Figure 4-10: Snapshot implemented for the modulator and data receiver of EOS2. 4.3 Summary This chapter presented the EOS2 test chip in which the optical data receiver was implemented. The organization of the chip was outlined at both the chip and test-cell levels. Each test cell was then described in detail, with an example optical link shown. The high-speed test structures were then presented in order to show how some of the simulation results shown in Chapter 3 will be measured. I 11--l-I ---------- =& -- -- - - --- ---- -- ", Chapter 5 Conclusion and Future Work 5.1 Conclusion The emergence of multi-core processors has created an I/O bottleneck both on-chip between the cores, and between the cores and off-chip memory. Optical interconnects are an attractive solution, as wavelength-division multiplexing allows multiple channels of information to be placed onto a single low-latency waveguide, reducing the number of interconnects and increasing the speed of transmission. This thesis presented a monolithically-integrated optical data link, focusing on the design of an optical data receiver. The proposed receiver is a highly-digital, current-sensing optical data receiver that interfaces with a monolithically-integrated photodiode. The receiver is able resolve input photocurrents of less than 50-/pA at a data rate of 5-Gb/s. The power consumption at this rate is less than 0.5-mW, achieving better than 100-fJ/bit. This work demonstrated how an optical link designer must first study the characteristics of the optical interface, and then design the corresponding circuitry. The optical data receiver was presented as part of the EOS2 test chip, a flexible test vehicle that will demonstrate various optical components and the electronic systems that interface with them. -- 5.2 .1 . ..... i. ....i.-.~;-.-j' I----- -- ------ i--------ll----i.^"l~ ~ Future Work The field of monolithically-integrated photonic links for multi-core processors is still an emerging one, and as such there are several areas in which more work can be done, from the design of circuits, to the CAD-tools and infrastructure supporting the actual design, to post-processing techniques. 5.2.1 Co-design of Optical Devices and Electrical Circuits One of the main challenges described in this work related to the use of monolithicallyintegrated photodiodes, which have a different set of characteristics than the discrete photodiodes used in the literature. While a much more limited material selection makes the design of highly-efficient and responsive monolithically-integrated photodiodes difficult, their smaller capacitances makes them ideal candidates for high-speed receivers. Furthermore, by having the diode integrated with the rest of the circuitry, the designer has complete control over the design of the circuits, the photodiode, and also the interface between them. This gives the designer an unprecedented ability to tightly integrate and co-design the two components. Going forward, we can expect to see photodetectors that have been custom designed for a particular style of circuit receiver. The design of the two components will be an iterative procedure, yielding a more system-optimized architecture. 5.2.2 Optical Device Design, Layout, and Verification One of the largest challenges, even in conventional circuit design, is ensuring that performance does not degrade as the design matures from equations, to SPICE simulations, to post-layout extracted simulations, to the final chip. Even verifying that the layout and schemtic views contain identical connectivity is not a trivial task. When optical links are incorporated into a current circuit CAD flow, the connectivity problem only worsens. Designers must often manually check the interfaces between the electrical and optical blocks. A full link simulation is also impossible, as there is no optical simulator built into existing circuit CAD tools. Design rule ";~~"~; checking, which describes how the different layers in the chip's process are allowed to be organizied, is also difficult, as many of the rules intended to protect transistor and other electrical component fabrication must be broken in order to create the optical structures. This requires a large degree of communication with the fabrication company. Appendix A Testbench Schematics IPD out+ out+ O I t :tL Figure A-i: Transient analysis testbench. out- I ----11- IPD Cfg fg cfg -- in+ out+ CP, PD r---- t out+ out- out- RxData in- , 0 out+ clock q Figure A-2: Offset compensation testbench. 'PD t ,, Stin+ out+out+ - cfg IPD cfg RxData CPD out+ int tsetupe : out- Sclock tcqi Figure A-3: Testbench for setup and hold time simulation. out- PD cfg ,in , fg efg t-in+ out+ ' out , in- --- ,t out+ RxData PD PD out+ out- clock - It tsetup I Figure A-4: Sampling aperture characterization testbench. IPD I cfg t out+ t IPD tfg Cfg cfg in+ out+ out- out- RxData D T out+ inclock Figure A-5: Variation analysis testbench. Appendix B Power Measurement in Simulation { -I- Figure B-1: Testbench for power measurement simulation. The testbench for measuring the average power is shown in Figure B-1. The voltage source shown is the supply for the circuit under test. The current-controlled current source is a current source with a gain of 1 that replicated the current from the voltage source in the figure, with no feedback. (B.1) Vcap TdI(t)dt 1 C - IAVGT 77 (B.2) (B.3) i The equation governing the voltage across the capacitor is given by Equation B.3. The length of the transient simulation is given by T, the capacitance by C, and the average current from the supply is given by IAVG. Rearranging the expression and multiplying by the voltage from the supply, we can compute the average power consumption over the simulation interval (Equation B.5). Pavg = VDDIAVG =VDTD a (B.4) (B.5) By measuring the voltage across the capacitor at the end of the simulation, and by knowing the capacitance and the length of the simulation. The average power can then be computed. Appendix C Sampling Aperture Data This appendix contains the sampling aperture data for a 1-0 step transition, as opposed to the 0-1 step transition presented in Section 3.4. From the FFT, we verify that the receiver has sufficient bandwidth for our data rate. "~;";"";': ---;-~~;;;;~ -- ~~~;;-~--~~;----~---~----~--~~~------ 70 -- 40-uA ---60-uA - - 80-uA 60 50 t 40 S30 20 . . 10 0 -10 0 0 50 phase (ps) -50 100 1 50 200 -100 -50 0 50 phase (ps) (a) cMS(r) 100 150 100 150 (b) IMS(r) x 1010 0.8 0.6 C) 0.4 0.2 bu phase (ps) u uU 1ub -100 -50 0 (c) SSF(r). (d) ISF(r). 250 235 . . ..i .... 200 150 100 50 -50 0 234 233 232 .. . . . .. . . . .. . . ..... . .. . . 231 . .. . . . .. . :. 0 -100 50 phase (ps) .. ..... ... ....... ... . . i 200 - 40-uA ....... - - - 60-uA --- 80-uA I 400 600 Frequency GHz (e) FFT(ISF) 230 229 38R 800 1000 0 -40-uA: BW=17.3-GHz --60-uA: BW=14.9-GHz . ---80-uA: BW=17.3-GHz i i , 5 10 Frequency GHz 15 (f) FFT(ISF) (zoom view) Figure C-1: Sampling apertuture characterization for 1-0 step input. Appendix D Custom Standard Cell Library There was no standard cell design library available in the advanced and experimental 32-nm process that this thesis work was done in, and so a small standard cell library had to be designed in order to implement the digital blocks. This appendix outlines the design of some of the key components in the standard cell library and the decisions that were made in their selection. D.1 Register Figure D-1 shows the register design used in the EOS2 test chip. It is a MASTERSLAVE style register from [19] [20], with the first latch being set when the clock is low, and the second being set when the clock transistions from low to high. There are several conservative design features implemented in the latch. Each half of the latch is transparent for half of a clock period. During -= 0 clock phase, the first half of the latch is transparent. Note that the feedback for the first half is disabled during this time. This is to avoid competition. When the clock transisitions from low to high, and )=1, the feedback in the first half of the latch is enabled, and the state is stored. The second half of the latch is enabled for writing, with its feedback disabled. The state of the second half of the latch is stored when the clock returns to zeros. The Q-output of the register is isolated by means of an inverter. This is to prevent whatever circuitry is driven by the latch from over-powering the latches feedback and changing the state that is stored. In the output loop, the clocked transistors were placed closer to the output node to increase the speed of enabling the feedback. The register is asynchronously resetable. D.2 MUX The MUX used for standard cell library is a transmission-gate MUX, and is shown in Figure D-2. The transission-gate style MUX was selected for its simplicity, but care was required when preceding or following the gate by a long wire or large capacitance. The reason for this is that the transmission-gate MUX is much weaker than a NANDbased MUX. As area was not a primary concern in a 32-nm CMOS process, a NANDbased MUX could have been a more appropriate choice. (a) Register used in EOS2. 45 40 35 30 .. -- +T Tcq-T q setup ................. .. 25 20 . 3 1- . ". I 14 . . I 15 ................................ .............. I 16 I I I 17 18 19 Tsetu p (ps) (b) Clock-to-Q plotted against the setup time for the EOS2 register. Figure D-1: Register design. 83 1 inO out sell inl T sell Figure D-2: MUX design. Appendix E Scan Chain Control E.1 Correct Signalling for the Scan Chain This section explains the proper signalling sequence for the scanchain implemented on the EOS2 test chip. Proper signalling ensures that the correct bits are read from and written two the test circuits. The two different types of scan chain cells are shown in Figure 4-5, and are repeated in Figure E-i for convenience. bCircuit bCircuit SO SCLK--+- (a) Scan chain read cell. (b) Scan chain write cell. Figure E-1: Block diagram of scan chain cells. ~L-~X~I'~I"X^~.-(-~---.-_i.--I--_I----- E.1.1 i;~-il--.-r-i;~w~~wl~~;~l-i.~_-~~~~c~~ Reading from EOS2 Reading data has a large margin for error. The bit of interest is connected to node bCircuit in Figure E-la. An example of bits to be read is the count from the output transition counter of Section 4.2.4. In this case, the reference counter has finished counting, and the count is no longer changing. When the EOS2 link circuits are running, the scan enable (SE) signal has been set low. The 2-to-1 MUX in Figure Ela is passing the input from the circuit to the input of the register. The scan clock (SCLK) may or may not be enabled. When data is to be scanned out, the scan clock must be pulsed at least once, moving the bCircuit value from the input of the register to the output. When SE is switched high, each scan cell takes the previous cell's output, which has already been set to the previous cell's corresponding bCircuit value. All of the bCircuit values are then scanned out of the chip as in a normal shift register. E.1.2 Writing from EOS2 Writing data to EOS2 is more complicated and has a larger potential for error. Writing data relies upon the programmer's ability to completely control all of the SCLK, SE, and SU signals. These signals are relatively low-speed, so this is not unrealistic. The scanchain write cell is shown in Figure E-1b, and is identical to the read cell except for the addition of a latch and scan update (SU) signal. The cell is used to write configuration data to the EOS2 test chip. The proper signalling sequence for writing a bit to EOS2 is given by: 1. SE=I, SU=O0, SCLK is clocking: data from a computer-interface is being shifted into EOS2. 2. SE=1, SU=0O, SCLK=O: stop the slow scan clock. Now the data to be written to the circuit is stored at SO of the respective scan cell. 3. SE=1, SU=I, SCLK=0O: bCircuit acquires the value to be written to the circuit. 4. SE=-i, SU=0O, SCLK=0O: bCircuit holds the value to be written to the circuit and is no isolated from the rest of the scan chain. 5. SE=0, SU=O, SCLK=O: the input to the register is now set to the value of the bit that was intended to be written to EOS2 (the value of bCircuit). 6. SE=0, SU=0O, SCLK pulsed once: the value written to the circuit is now stored at SO 7. SE=I, SU=0O, SCLK is clocking: data is scanned out from the chip to be analyzed. The scanchain read cells have read a value from the circuit. The scanchain write cells have stored the value that was written to the circuit. This can be compared to the value that was intended to be written for verification. Appendix F PRBS Generation This appendix shows the first 67 states of the PRBS pattern generator when seeded with all l's. This is intended to give the reader an idea of how the prbs generates all of the bit sequences with pulses of different lengths. Figure F-1 shows the output of the PRBS generator being driven by a 5-GHz clock. In Table F.1, the reader can follow through each of the states, seeing clearly how bits 0 and 3 are summed and fed back into bit 30. 1 , _0 ' •. 0 5 . . 10 15 20 25 . • 30 35 40 45 time (ns) Figure F-1: The output of the PRBS generator, driven at 5-GHz. 50 bit 30 29 27 26 25 21 20 19 18 17 16 15 14 seed 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 state 6 0 0 0 0 0 1 1 1 11 1 1 1 1 1 1 1 state 7 state 8 state 9 state 29 state 30 state 31 state 32 state 33 state 34 state 35 state 36 state 37 state 38 state 57 state 58 state 59 state 60 state 61 state 62 state 63 state 64 state 65 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 state state state state 28 24 23 22 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 12 11 10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 00 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 Table F.1: Progression of PRBS pattern. 9 8 0 0 0 0 0 0 0 0 7 6 5 4 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 3 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 2 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 (output) Bibliography [1] C. Batten, A. Joshi, J. Orcutt, A. Khilo, B. Moss, C. Holzwarth, M. Popovic, Hanqing Li, H. Smith, J. Hoyt, F. Kartner, R. Ram, V. Stojanovic, and K. Asanovic. Building manycore processor-to-dram networks with monolithic silicon photonics. High Performance Interconnects, 2008. HOTI '08. 16th IEEE Symposium on, pages 21-30, Aug. 2008. [2] Rajeev J. Ram. MIT 6.731: Semiconductor Optoelectronics [course notes]. 2008. [3] J.R. Sauer, D.J. Blumenthal, and A.V. Ramanan. Photonic interconnects for gigabit multicomputer communications. LTS, IEEE, 3(3):12-19, Aug 1992. [4] Intel. Core i7 processor, 2008. http://www.intel.com/products/processor/corei7/index.htm. [5] J. Kahle. The cell processor architecture, Nov. 2005. [6] G. Konstadinidis, M. Rashid, P.F. Lai, Y. Otaguro, Y. Orginos, S. Parampalli, M. Steigerwald, S. Gundala, R. Pyapali, L. Rarick, I. Elkin, Yuefei Ge, and I. Parulkar. Implementation of a third-generation 16-core 32-thread chipmultithreading spares processor. Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International,pages 84-597, Feb. 2008. [7] S. Palermo, A. Emami-Neyestanak, and M. Horowitz. A 90 nm cmos 16 gb/s transceiver for optical interconnects. Solid-State Circuits, IEEE Journal of, 43(5):1235-1246, May 2008. [8] J.S. Orcutt, A. Khilo, M.A. Popovic, C.W. Holzwarth, B. Moss, Hanqing Li, M.S. Dahlem, T.D. Bonifield, F.X. Kartner, E.P. Ippen, J.L. Hoyt, R.J. Ram, and V. Stojanovic. Demonstration of an electronic photonic integrated circuit in a commercial scaled bulk cmos process. Lasers and Electro-Optics, 2008 and 2008 Conference on Quantum Electronics and Laser Science. CLEO/QELS 2008. Conference on, pages 1-2, May 2008. [9] C.W. Holzwarth, J.S. Orcutt, Hanqing Li, M.A. Popovic, V. Stojanovic, J.L. Hoyt, R.J. Ram, and H.I. Smith. Localized substrate removal technique enabling strong-confinement microphotonics in bulk si cmos processes. Lasers and ElectroOptics, 2008 and 2008 Conference on Quantum Electronics and Laser Science. CLEO/QELS 2008. Conference on, pages 1-2, May 2008. [10] M. Lipson. Guiding, modulating, and emitting light on silicon-challenges and opportunities. Lightwave Technology, Journal of, 23(12):4222-4238, Dec. 2005. [11] A. Narasimha, B. Analui, Y. Liang, T.J. Sleboda, S. Abdalla, E. Balmater, S. Gloeckner, D. Guckenberger, M. Harrison, R.G.M.P. Koumans, D. Kucharski, A. Mekis, S. Mirsaidi, Dan Song, and T. Pinguet. A fully integrated 4 10-gb/s dwdm optoelectronic transceiver implemented in a standard 0.13 um cmos soi technology. Solid-State Circuits, IEEE Journal of, 42(12):2736-2744, Dec. 2007. [12] C. Kromer, G. Sialm, T. Morf, M.L. Schmatz, F. Ellinger, D. Erni, and H. Jackel. A low-power 20-ghz 52-db-ohm transimpedance amplifier in 80-nm cmos. SolidState Circuits, IEEE Journal of, 39(6):885-894, June 2004. [13] C.L. Schow, L. Schares, S.J. Koester, G. Dehlinger, R. John, and F.E. Doany. A 15-gb/s 2.4-v optical receiver using a ge-on-soi photodiode and a cmos ic. Photonics Technology Letters, IEEE, 18(19):1981-1983, Oct.1, 2006. [14] C. Debaes, A. Bhatnagar, D. Agarwal, R. Chen, G.A. Keeler, N.C. Helman, H. Thienpont, and D.A.B. Miller. Receiver-less optical clock injection for clock distribution networks. Selected Topics in Quantum Electronics, IEEE Journal of, 9(2):400-409, March-April 2003. [15] A. Emami-Neyestanak, D. Liu, G. Keeler, N. Helman, and M. Horowitz. A 1.6 gb/s, 3 mw cmos receiver for optical communication. VLSI Circuits Digest of Technical Papers, 2002. Symposium on, pages 84-87, 2002. [16] S. Mukhopadhyay, R.V. Joshi, Keunwoo Kim, and Ching-Te Chuang. Variability analysis for sub-100nm pd/soi sense-amplifier. pages 488-491, March 2008. [17] M. Jeeradit, J. Kim, B. Leibowitz, P. Nikaeen, V. Wang, B. Garlepp, and C. Werner. Characterizing sampling aperture of clocked comparators. In VLSI Circuits, 2008 IEEE Symposium on, pages 68-69, June 2008. [18] Peter Alfke. Efficient shift registers, Ifsr counters, and long pseudo-random sequence generators. Technical report, Xilinx, 1996. [19] V. Stojanovic, V. Oklobdzija, and R. Bajwa. Comparative analysis of latches and flip-flops for high-performance systems. In Computer Design: VLSI in Computers and Processors, 1998. ICCD '98. Proceedings. International Conference on, pages 264-269, Oct 1998. [20] G. Gerosa, S. Gary, C. Dietz, Dac Pham, K. Hoover, J. Alvarez, H. Sanchez, P. Ippolito, Tai Ngo, S. Litch, J. Eno, J. Golab, N. Vanderschaaf, and J. Kahle. A 2.2 w, 80 mhz superscalar rise microprocessor. Solid-State Circuits, IEEE Journal of, 29(12):1440-1454, Dec 1994.