Variation Aware Design of Data Receiver Circuits for On-Chip Optical Interconnect by Michael James Mills Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2002 © Massachusetts Institute of Technology 2002. All rights reserved. A uth or ............................................................. Department of Electrical Engineering and Computer Science May 24, 2002 Certified by ........... ..... Duane S . Boning Associate Professor of Electrical Engineering and Computer Science Thesis Supervisor .. Accepted by ............. Arthur C. Smith Chairman, Department Committee on Graduate Theses MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 3 12002 LIBRARIES RARKER 2 Variation Aware Design of Data Receiver Circuits for On-Chip Optical Interconnect by Michael James Mills Submitted to the Department of Electrical Engineering and Computer Science on May 24, 2002, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Optical transmission offers an attractive interconnect alternative because light has small propagation delay and negligible crosstalk compared to electrical interconnect. Optical interconnect integrates transmitters and receivers on the same chip, imposing different constraints on receiver design than those faced in the telecommunications industry. This thesis presents an optical data receiver circuit designed to handle low signal levels and large photodetector capacitance. A focus is placed on variation robustness and the key metric is propagation delay. Because thousands of receivers might be integrated on a single chip, power and die area are minimized. The receiver consists of a clocked sense amplifier operating in positive feedback. A current mirror at the input isolates detector parasitics from the switching node, increasing evaluation speed. An extra bit is transmitted with each data bus to serve as a reference. In 0.18 um CMOS simulations, the circuit evaluates correctly at 2.0 GHz and produces acceptable output at more than 1.0 GHz. The circuit dissipates 310 uW of power and consumes 130 um 2 of area. Variation analysis explores changes in evaluation speed for asymmetrical and uniform circuit variations. Asymmetrical variations have a greater effect on performance, which makes circuit matching important. A test chip using free space illumination for proof of concept was submitted for fabrication in December, 2001. Thesis Supervisor: Duane S. Boning Title: Associate Professor of Electrical Engineering and Computer Science 3 4 Acknowledgments I could fill more pages than are in this thesis with acknowledgments. Indeed, I could ramble on for a year and a day about all the thousands of unbelievable people who have made a difference in my life. However, in doing so I would detract from the two people who deserve top billing. For all the wonderful people in my life, I will always be grateful, but this work is for my mother and father. Mom, Dad, I am humbled by your continuous, unconditional, and utterly selfless dedication. In every decision, every day of my life, I think about what you would want me to do, not because you taught me to behave a certain way, but because I hope with each act I can become slightly more deserving of the remarkable treatment you give me. I am awed by the way you live your lives. Your compassion, your integrity, and your sincere desire to do good things inspire me to become a better person. As I journey forward, I can only hope to live my own life with as much dignity and honor. Hopefully, at the end of the journey, I can look back and know that I've taken the high road, that I've made you proud, and that I've fulfilled the tremendous debt I feel, by making the most of every single opportunity afforded me. This thesis represents the first step of my journey, and I believe it is a good start. I have given it my best effort with the hope that you will see in this work not what I have accomplished, but what you have accomplished, for everything in this work, and everything I have done, is a testament to you as parents. Mom, Dad, I owe everything to you. Of course, this research would not have been possible without the support of MARCO, DARPA, the Interconnect Focus Center, and a thesis advisor with a great sense of humor. Thanks for everything, Duane. 5 6 Contents 1 2 3 Introduction 1.1 Issues . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Clock vs. Data . . . . . . . . . . . . . . . . 1.1.2 Challenges of Monolithic Integration . . . . 1.2 Previous Work . . . . . . . . . . . . . . . . . . . . 1.2.1 Clock Receiver Circuits . . . . . . . . . . . . 1.2.2 Examples of Clocked Data Receiver Circuits 1.3 Data Receiver Overview . . . . . . . . . . . . . . . Design 2.1 Schaffer and Mitkas Cell . . . . . 2.1.1 Upside Down or Right Side 2.1.2 Timing and Output . . . . 2.1.3 Charge Sharing . . . . . . 2.2 Current Mirror Input . . . . . . . 2.2.1 Design . . . . . . . . . . . 2.2.2 Analysis . . . . . . . . . . 2.2.3 Transient Issues . . . . . . 2.3 Reference Circuit . . . . . . . . . 2.4 Summary . . . . . . . . . . . . . Process, Sizing, and Simulation 3.1 Process Overview . . . . . . . . 3.2 Sizing and DC Biasing . . . . . 3.2.1 The Latch . . . . . . . . 3.2.2 Current Mirrors . . . . . 3.2.3 Current Sources . . . . . 3.2.4 Reference Circuit . . . . 3.3 Simulation Results . . . . . . . 3.3.1 Test Waveform . . . . . 3.3.2 Output Waveforms . . . 3.3.3 Simulation Measurements . . . . . . . . . 7 . . Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 16 16 18 18 19 19 23 . . . . . . . . . . 25 25 25 27 30 32 32 34 36 43 47 . . . . . . . . . . 51 51 53 54 55 58 58 59 60 62 67 4 Variation Analysis 4.1 Receiver Variation Overview . . . . . . . 4.2 Uniform Variation Results . . . . . . . . 4.2.1 Photodiode Capacitance . . . . . 4.2.2 Photocurrent . . . . . . . . . . . 4.2.3 Channel Length Variation . . . . 4.2.4 Temperature . . . . . . . . . . . . 4.2.5 Threshold Voltage . . . . . . . . 4.2.6 Supply Voltage . . . . . . . . . . 4.2.7 Summary . . . . . . . . . . . . . 4.3 Differential Variation Results . . . . . . 4.3.1 The Latch . . . . . . . . . . . . . 4.3.2 Input Transistors . . . . . . . . . 4.3.3 Input Stage and Reference Circuit 4.3.4 Input and Reference Photocurrent 4.4 Conclusions . . . . . . . . . . . . . . . . 5 Test 5.1 5.2 5.3 5.4 5.5 5.6 Chip Building Blocks . . . . . . . . . . . Data Generator . . . . . . . . . . . Receiver Test Circuitry . . . . . . . Clock Distribution . . . . . . . . . PLL Design . . . . . . . . . . . . . 5.5.1 Phase-Frequency Detector . 5.5.2 Charge Pump . . . . . . . . 5.5.3 Voltage-Controlled Oscillator 5.5.4 Frequency Divider . . . . . 5.5.5 Stabilizing the Loop . . . . Testing Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 73 80 80 81 82 83 84 85 86 88 89 90 91 92 93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 96 97 98 100 102 103 105 106 109 110 113 6 Conclusion 115 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.2 Final Thoughts: Contributions . . . . . . . . . . . . . . . . . . . . . . 117 A TSMC 0.18 urn Digital CMOS Process Characteristics 8 119 List of Figures 1-1 1-2 1-3 1-4 1-5 1-6 Guided wave approach to optical interconnect . . . . . . . . . . . . Optical clock distribution scheme proposed by Lum [8] . . . . . . . Synchronous sense amplifier using positive feedback for amplification Waveforms for sense amplifier in Fig. 1-3 . . . . . . . . . . . . . . . Positive feedback sense amplifier by Ayadi, et al. [2] . . . . . . . . . Positive feedback sense amplifier by Schaffer and Mitkas [13] . . . . 2-1 2-2 2-3 2-4 Flipping Schaffer and Mitkas latch "upside down" . . . . . . . . . . . Using NFET's for series devices reduces transistor size . . . . . . . . Both configurations establish a differential across the input nodes. . . When CLK goes low, the NOR gate temporarily goes high until the signal propagates through the inverters. . . . . . . . . . . . . . . . . . Upside down Schaffer and Mitkas cell . . . . . . . . . . . . . . . . . . Basic waveforms for Schaffer and Mitkas cell . . . . . . . . . . . . . . Input waveforms including charge sharing . . . . . . . . . . . . . . . . Output waveforms including charge sharing . . . . . . . . . . . . . . Adding a current mirror isolates the diode capacitance (left). However, the current mirror needs bias current (middle), and this bias current 26 26 27 must be subtracted to get the correct photocurrent out (right). 2-5 2-6 2-7 2-8 2-9 . . . . . 15 17 20 20 21 22 28 28 29 31 32 . . . 33 . . . . . . . . 33 34 34 35 2-14 Approximate small signal model of current mirror input stage . . . . 2-15 VIN vS. VREF for fast time constant, T . . . . . . . . . . . . . . . . . 35 36 2-16 Transient response of VIN for an arbitrary bit pattern . . . . . . . . . 37 2-17 VIN transients for period T = 0.2T . . . . . . . . . . . . . . . . . . . . 2-18 VIN transients for period T = 0.4T . . . . . . . . . . . . . . . . . . . . 42 42 2-19 2-20 2-21 2-22 2-23 2-24 2-25 2-26 42 43 44 45 45 46 46 47 2-10 2-11 2-12 2-13 Input stage using current mirror . . . . . . . . . . . . . . . Full receiver circuit . . . . . . . . . . . . . . . . . . . . . . Small signal model of current mirror input stage . . . . . . Simplified small signal model of current mirror input stage . . . . . . . . . . . . transients for period T = 0.8T . . . . . . . . . . . . . . . . . . VIN and VREF on a full scale voltage range . . . . . . . . . . . . . Averaging two inputs in the current domain. . . . . . . . . . . . . Identically sized current mirrors present the same impedance . . . A "zero bit" does not require a diode at all . . . . . . . . . . . . . Current averaging with photodiode input . . . . . . . . . . . . . . Reference circuit shared by all bits in a data bus . . . . . . . . . . Example of 128 bit optical data bus with a single reference circuit VIN 9 . . . . . . . . . . . . . . . . . . . . 2-27 Full receiver and reference circuit schematic 3-1 . . . . . . . . . . . . . .4 49 Test setup for finding fT . . . . . . . . . . . . . . . . . . . . . . . . . 53 Layout of receiver circuit for a single data bit (12.6 um x 10.6 um) . Dimensions and transistor names for receiver circuit. Large, unlabeled transistors are MOS decoupling capacitors. . . . . . . . . . . . . . . . 3-4 Layout of reference circuit (7.0 um x 10.6 um) (left), along with dimensions and transistor names (right). Large, unlabeled transistors are MOS decoupling capacitors. . . . . . . . . . . . . . . . . . . . . . 3-5 Input test pattern (top) and corresponding VIN waveform (bottom) 3-6 Test circuit for waveforms in Fig. 3-5 . . . . . . . . . . . . . . . . . . 3-7 Simulation waveforms for clock (grey) and input photocurrent (black) 3-8 Receiver CLK and corresponding RST waveform . . . . . . . . . . . . . 3-9 Simulated VIN and VREF waveforms . . . . . . . . . . . . . . . . . . . 3-10 Simulated IN1 and IN2 waveforms . . . . . . . . . . . . . . . . . . . . 3-11 Zoomed in plot for six cycles of IN1 and IN2 . . . . . . . . . . . . . . 3-12 Simulated output waveforms, Q and /Q . . . . . . . . . . . . . . . . . 54 3-13 Zoomed in plot of output waveforms corresponding to Fig. 3-11 . . . 66 Power dissipation of receiver circuit as a function of clock frequency Average power dissipation at 1.0 GHz for data bus of varying size . Average die area per bit for data bus of varying size . . . . . . . . . . Example of evaluation speed definition: Cycle A evaluates correctly but will not produce a satisfactory output. Cycle B does not evaluate correctly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 69 69 vs. VREF with a slow time constant . . . . . . . . . . . . . . . . Exponential VIN waveform discharging across VREF (top) and corresponding plot of total differential as a function of time (bottom) . . Total differential for several different values of VREF..............--Total differential at three different evaluation times: before, on, and after the zero crossing (teval-min) of total differential . . . . . . . . . . Evaluation speed as a function of photodiode capacitance . . . . . . . Evaluation speed as a function of detector photocurrent . . . . . . . . Changes in evaluation speed as a function of channel length variation Changes in evaluation speed as a function of temperature variation Changes in evaluation speed as a function of threshold voltage variation (Numbers refer to magnitude of VTN and VTP) . . . . . . . . . . . . Changes in evaluation speed as a function of supply voltage variation Absolute value of changes in evaluation speed relative to source variation percentage, as given by regression models in Table 4.1 . . . . . . Changes in evaluation speed as a function of differential variation between input and reference side of latch . . . . . . . . . . . . . . . . . Changes in evaluation speed as a function of differential variation between left and right input transistors . . . . . . . . . . . . . . . . . . 74 3-2 3-3 3-14 3-15 3-16 3-17 4-1 4-2 4-3 4-4 4-5 4-6 4-7 4-8 4-9 4-10 4-11 4-12 4-13 VIN 10 55 59 61 61 62 62 63 64 64 66 70 75 77 79 80 81 82 83 85 86 87 89 90 4-14 Changes in evaluation tween input stage and 4-15 Changes in evaluation tocurrent....... speed as a function of differential variation bereference circuit . . . . . . . . . . . . . . . . . speed as a function of input and reference pho.................................. 5-1 5-2 5-3 5-4 5-5 5-6 5-7 5-8 5-9 5-10 5-11 5-12 5-13 5-14 5-15 5-16 General testing strategy for data receiver circuit . . . . . . Schematic of DFF used in testchip . . . . . . . . . . . . . Schematic of two input mux using transmission gates . . . Schematic of FIFO buffer . . . . . . . . . . . . . . . . . . . Data generator for driving off-chip laser source . . . . . . . Receiver test circuitry using FIFO buffer . . . . . . . . . . Multiplexing external and internal clock signals . . . . . . Clock distribution . . . . . . . . . . . . . . . . . . . . . . . Clock distribution waveforms (corresponding to Fig. 5-8) . Phase-locked loop block diagram . . . . . . . . . . . . . . Phase-frequency detector using DFF's with reset capability Flip-flop with reset signal for use in PFD . . . . . . . . . . Schematic diagram of charge pump . . . . . . . . . . . . . Current-starved inverter with variable propagation delay . VCO consisting of five current-starved inverters . . . . . . Control circuitry for VCO: IIN sets offset frequency, and VIN 5-17 5-18 5-19 5-20 ...................... ....... output frequency ..... VCO frequency as a function of control voltage for different Frequency divider using toggle flip-flops . . . . . . . . . . . Lead-lag loop filter for PLL . . . . . . . . . . . . . . . . . Loop transmission bode plots using lead-lag loop filter . . A-1 A-2 A-3 A-4 A-5 A-6 A-7 A-8 A-9 A-10 A-11 A-12 A-13 I-V Characteristics for NFET with W = 0.5 um, L = 0.18 um I-V Characteristics for NFET with W = 1.0 um, L = 0.18 um I-V Characteristics for NFET with W = 5.0 um, L = 0.18 um I-V Characteristics for PFET with W = 0.5 um, L = 0.18 um I-V Characteristics for PFET with W = 1.0 um, L = 0.18 um I-V Characteristics for PFET with W = 5.0 um, L = 0.18 um Current gain vs. frequency for 0.18 urn TSMC process . . . . . . . . . . . . . . . . fT crossover for minimum length NFET fT crossover for minimum length PFET . . . . . . . . . . . . . Transconductance, g,, of NFET with L = 0.18 um . . . . . . Transconductance, gm, of NFET with L = 0.36 um . . . . . . 0.18 urn . . . . . . Transconductance, gm, of PFET with L Transconductance, gm, of PFET with L = 0.36 um . . . . . . 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . controls 92 93 96 96 97 97 98 99 100 101 101 103 104 104 106 107 107 108 offset levels 109 . . . . . . 110 . . . . . . 111 . . . . . . 112 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 120 120 121 121 121 122 122 122 123 123 124 124 12 List of Tables 1.1 Design requirements for optical clock and data receiver circuits . . . . 17 2.1 Equations for output of a low-pass filter driven by a pulse train . . . 38 2.2 Net input current to latch for high and low bits . . . . . . . . . . . . 43 3.1 3.2 52 3.3 3.4 Summary of TSMC 0.18 um Digital Logic Process . . . . . . . . . . . Circuit performance as a function of bias current and geometry for input stage transistor, M10 .. ...... ....... .... ........ Transistor sizes for receiver and reference circuit in 0.18 um technology Summary of bias and performance characteristics in 0.18 um CMOS . 56 60 71 4.1 Comparison of regression models for different variation sources . . . . 87 5.1 Example PLL values using lead-lag loop filter 13 . . . . . . . . . . . . . 113 14 Chapter 1 Introduction This thesis describes the design and testing of an optical data receiver circuit, with special emphasis on variation robustness. Optical transmission provides an attractive interconnect alternative in modern VLSI chips because of its speed and lack of parasitics. The ideas presented herein, and in previous theses by Sam and Lum [12] [8], differ from those long pursued in the telecommunications industry in two important ways. First, the communication is intra-chip, rather than inter-chip. Second, the receivers are designed with monolithic CMOS integration in mind. Before optical transmission can serve as a viable substitute for metal interconnect, researchers must find (economical) ways to integrate it into standard CMOS processes. Silicon Dioxide Polysilicon Output Current Input Signal Wafer Photodiode Laser Diode Figure 1-1: Guided wave approach to optical interconnect Figure 1-1 describes an optical interconnect scheme. Those familiar with optical interconnect call this the "guided wave" approach because waveguides physically 15 direct the optical signal across the chip, much like a fiber optic cable. The transmission scheme in Fig. 1-1 works as follows. First, a laser diode generates optical pulses. These pulses travel along a waveguide constructed from two sets of materials with different dielectric constants, such as polysilicon and silicon dioxide. At the end of the waveguide, light strikes a photodetector, which converts photons into current. A receiver circuit amplifies the small output current into a rail-to-rail digital signal. The following sections outline several issues concerning optical interconnect, as well as some previous work in the area. Section 1.3 concludes with a roadmap for the following chapters and a short summary of the design strategy. 1.1 Issues Optical interconnect presents unique design challenges because it requires monolithic integration of all components. Furthermore, different applications have different design criteria. This section describes the challenges and constraints for two specific applications - optical clock distribution and optical data transmission. The subtle differences between the two motivate several design decisions in Ch. 2. 1.1.1 Clock vs. Data Clock distribution presents a peculiar engineering problem. The signal is periodic, predictable, almost everything on the chip needs it, and delay does not matter, so long as the clock arrives everywhere at the same time. Unfortunately, in real life the clock does not reach all gates simultaneously, but rather the signals are skewed relative to one another. Generally speaking, clock skew is the single most important metric for clock distribution, while power consumption and area are secondary concerns (due to the relatively small number of receivers). Figure 1-2 shows an optical clock distribution scheme [8]. This method distributes the clock optically at a global level, but electrically at the local level where metal interconnect causes little skew. Note that while either an on-chip or off-chip light 16 source can provide the signal, implementing the laser driver circuitry off-chip saves real estate and power, and reduces noise. Optical Clock Source Waveguides Local Electrical Clock Network Figure 1-2: Optical clock distribution scheme proposed by Lum [8] Data transmission involves a different set of challenges. The data need not arrive at exactly the same time, only within a certain window. Latency becomes the key factor, because reducing latency allows the circuit to run at higher clock frequencies. Also, with possibly thousands of copies of the receiver on a single chip, area and power consumption become important issues. Despite these challenges, a designer has one important advantage when developing a topology for a data receiver circuit - a clock signal. The pre-existence of a synchronous signal forms the basis for many data receiver designs, including the one presented in this thesis. Data Issues LATENCY is key metric Presence of clock signal allows more versatility in design Will be replicated many times, must be low power and small in size Clock Issues SKEW biggest factor Latency a non-issue Relatively small number of receivers, less demanding on size and power Table 1.1: Design requirements for optical clock and data receiver circuits 17 Table 1.1 summarizes the key issues pertaining to optical receiver design. An effective clock receiver circuit minimizes skew, without much regard for power or size. In comparison, a good data receiver minimizes latency while paying careful attention to power consumption and real estate. 1.1.2 Challenges of Monolithic Integration Monolithic integration means fabricating the transmitter, waveguide, detector, and receiver circuit together in a single process. In order to provide an economical alternative to metal interconnect, these items should require as few process modifications as possible. This set of constraints presents several design challenges. First, since optical interconnect targets VLSI (CMOS) chips, the receiver circuit should use MOS devices and minimize the number of passive components. Most digital logic processes do not contain large built in capacitors, limiting the designer to parasitics and MOS capacitors. Furthermore, such processes often use silicide to reduce polysilicon or diffusion resistance, so resistors consume a tremendous amount of area. Second, CMOS integration can severely limit the quality of photodetectors. Silicon detectors cannot be used with silicon waveguides because if one is transparent to a certain wavelength of light, the other one will be as well. Thus, some sort of process modification is inevitable. One particularly promising technology involves growing germanium photodiodes directly on a silicon substrate [7]. 1.2 Previous Work This section looks at previous work in the field of optical interconnect, for both data and clock applications. The first section outlines a pair of clock receiver designs that illustrate some of the nuances of on-chip optical communication. These designs form a basis for the variation analysis in Ch. 4, and some of the variation aware implementation decisions in Ch. 3. Next, Section 1.2.2 gives an overview of two data receiver designs, both of which 18 use sense amplifiers operating in positive feedback. In particular, the second circuit serves as a basis for the design in Ch. 2. 1.2.1 Clock Receiver Circuits Sam's thesis discusses the design of an optical clock receiver circuit, and sources of skew in optical clock distribution. During her treatment of skew, Sam maps out a set of four process and environmental variation sources that serve as a basis for the variation analysis in Ch. 4: power supply, temperature, channel length, and threshold voltage. In Sam's circuit, these variation sources contribute to clock skew by causing asymmetrical clipping in the amplifier [12]. At the last amplification stage of the clock receiver, a sine wave passes through an inverter, turning it into a full scale square wave. If the input signal of the inverter is not biased exactly at the switching threshold, the inverter clips asymmetrically. This produces subtle differences in rise and fall times, which ultimately translate into skew and duty cycle variations [12]. Lum builds upon Sam's work by designing specifically with variation robustness in mind. Additional feedback biasing keeps signals centered around the switching threshold of the inverters, while a linear voltage regulator rejects power supply variations. A bandgap reference biases the entire circuit in an effort to eliminate temperature variation effects [8]. Lum succeeds in eliminating skew due to environmental variations (power supply and temperature), but he has less success designing around process variations (channel length and threshold voltage). In the end, process variation sources still present the biggest headaches for designers [8]. 1.2.2 Examples of Clocked Data Receiver Circuits As Sec. 1.1.1 points out, a pre-existent clock signal provides a competitive advantage in data receiver design. The two designs in this section highlight that concept, using the principle illustrated in Fig. 1-3. 19 Power LATCHI LATCH IReference Diod~e VDD '_1111Ainput Diode Figure 1-3: Synchronous sense amplifier using positive feedback for amplification The circuit operates as follows. A low signal on LATCH turns off the inverters. Light shines on the input photodiode, inducing a small amount of current. This current causes the input node to drift relative to the reference node, building up a small differential voltage. When LATCH goes high, the inverters turn on again and positive feedback takes over, amplifying the small differential all the way to the rails. Also, with a high value on LATCH the circuit holds state, so it doubles as a flip-flop for part of the cycle - a desirable feature since most data buses end in a flip-flop. Input Node VDD --------A - GND Reference Node LATCH Light Input 1 -1 Figure 1-4: Waveforms for sense amplifier in Fig. 1-3 20 Figure 1-4 illustrates waveforms for the circuit in Fig. 1-3. Both the input and reference node start at some metastable point. Optical input causes current to flow onto the input node, increasing the voltage slightly. Positive feedback amplifies this small potential difference when LATCH goes high. The first design using this concept, shown in Fig. 1-5, consists of two inverters in positive feedback connected to virtual power supplies controlled by a signal called STORE. The reset signal provides a way to precharge the input and reference nodes to the same level [2]. VDD /STORE VDD VDD Input __ V Diode * - " w Reference Diode Q RST -/Q STORE- Figure 1-5: Positive feedback sense amplifier by Ayadi, et al. [2] Signal STORE starts out low, while RST goes high for a short period of time. This precharges the input and reference nodes (Q and /Q) to the same (metastable) state. Optical input causes node Q to drift above /Q, building up a potential across the inputs. As soon as STORE goes high, the inverters turn on and the circuit rapidly amplifies the input differential using positive feedback. Unfortunately, the "virtual rails" in this circuit store charge when STORE turns off. The inverters can still function for short periods of time using this charge reservoir. Figure 1-6 shows the schematic diagram for a second design, which fixes the charge storage problem by precharging all nodes to a predictable value. In this dia21 gram, Q and /Q precharge to ground, while the input and reference nodes charge to (VDD - VTP) [13]- VDD Reference Input /RST /LAT /LAT- Figure 1-6: Positive feedback sense amplifier by Schaffer and Mitkas [13] A high signal on /LAT disconnects the NFET and PFET of each inverter while precharging Q and /Q to ground. Meanwhile, a short low pulse on /RST resets the input and reference nodes. Switching /LAT from high to low reconnects the inverters, and charge redistributes between /Q and the input node (and between Q and the reference node). Since Q and /Q both precharge to the exact same voltage, they take the same amount of charge off the input and reference nodes. Thus, although charge sharing still occurs, it happens in equal proportions so as not to cause bit errors. The development here centers around the evaluation of a "one" bit in response to a high optical input. However, the circuit must also evaluate "zeroes" in response to low (or absent) optical inputs. There are several ways to accomplish this. One might design mismatch into the circuit, creating an unbalanced sense amplifier that evaluates zeroes by default. Differential signaling provides another alternative because light always shines on at least one of the diodes. 22 For the sake of variation robustness, this thesis pursues a third option. A specialized reference circuit biases the reference side halfway in between a high and low input, so that the input triggers the circuit on "one" bits, and the reference triggers the circuit on "zero" bits. 1.3 Data Receiver Overview The next chapter details the design of an optical data receiver circuit based on the sense amplifier by Schaffer and Mitkas in Fig. 1-6. A few slight modifications improve the performance of this circuit, but ultimately capacitance on the switching nodes limits speed. Specifically, the photodiode adds 100 fF or more to nodes that normally only have 5-10 f F. This thesis proposes to isolate the large photodiode capacitance using a current mirror. This technique reflects the small signal input current from the photodetector to the input side of the sense amplifier, which remains a low capacitance node. A special reference circuit drives an identical current mirror on the other side of the sense amplifier. This circuit "averages" low and high input levels to produce a signal halfway in between. On high bits, the input signal exceeds the reference and the circuit evaluates a "one," while on low bits the reference exceeds the input, resulting in evaluation of a "zero." This strategy requires that an additional reference bit accompany each optical data bus. Chapter 3 outlines implementation and simulation of the data receiver in 0.18 um CMOS technology. Discussion focuses on choosing bias parameters to maximize speed while maintaining low power, small size, and variation robustness. In an effort to quantify circuit performance, Ch. 3 defines the term evaluation speed. Simulation results show an evaluation speed of 2.0 GHz for 10 uA of input photocurrent, a 100 fF photodiode capacitance, and 305.74 uW of power dissipation. Each receiver bit occupies an area of 133.56 um 2 , while the reference circuit takes up 74.20 uM 2. Using these parameters, Ch. 4 performs variation analysis on the receiver. The 23 chapter begins by attempting to quantify how circuit parameters determine evaluation speed. Then it presents simulation results for both uniform and differential variation sources. Uniform variation refers to changes that affect the whole circuit, such as a uniform increase in supply voltage. Changes in evaluation speed are plotted as a function of photodiode capacitance, input current, and the four variation sources outlined by Sam: power supply, temperature, channel length, and threshold voltage [12]. Differential variation describes how the circuit behaves when the input and reference sides vary asymmetrically. The treatment in Ch. 4 looks at changes in evaluation speed as a function of differential channel length variation in three different parts of the circuit, as well as mismatches between the optical power of the reference and input signals. Chapter 5 details the implementation of a test chip submitted for fabrication in December, 2001. The testing strategy uses an on-chip signal source to drive an offchip laser, ensuring synchronous optical input signals. The chapter also discusses the design of a phase-locked loop for on-chip clock multiplication, and explains how to stabilize the loop using an external loop filter. Finally, Ch. 6 reviews the problem statement, along with summarizing the analysis, implementation, and simulation of the receiver. It also presents some final thoughts on the contributions of this thesis, and how these contributions are generally useful to future designers and researchers. 24 Chapter 2 Design This chapter outlines the design and operation of an optical data receiver circuit. The design is based on a synchronous sense amplifier by Schaffer and Mitkas (Fig. 1-6, page 22) [13]. Discussion focuses first on the Schaffer and Mitkas latch, and some slight modifications that improve performance. After that, Secs. 2.2 and 2.3 describe the two main contributions of this thesis, namely isolating photodiode parasitics with a current mirror, and creating a reference circuit using current domain arithmetic. 2.1 Schaffer and Mitkas Cell The Schaffer and Mitkas latch provides a good starting point because of its small size and power consumption. However, optimizing the cell for maximum performance requires a more in-depth exploration of its operation. Note that this thesis uses the terms "sense amplifier" and "latch" interchangeably, since the Schaffer and Mitkas sense amplifier also holds state. 2.1.1 Upside Down or Right Side Up Figure 2-1 proposes flipping the Schaffer and Mitkas latch "upside down." In the original version, the charge up path of Q and /Q goes through two PFET's, whereas 25 the charge down path goes through a single NFET. In the flipped version the two series devices are NFET's. LAT - IN2 IN1 /A /LAT-] Q /Q /RST /LAT LAT RST -Q ElF IN1 IN2 Figure 2-1: Flipping Schaffer and Mitkas latch "upside down" Assuming the mobility of electrons is roughly twice that of holes, two series PFET's would have to be about four times as wide as a single NFET to ensure equal rise and fall times, as shown in Fig. 2-2. Unfortunately, capacitance scales with geometry, and these large transistors limit switching speed, so designers prefer to use NFET's when multiple devices must be placed in series. 4W/L dj W/L VOUT 4W/L W/L VOUT W/L W/L Figure 2-2: Using NFET's for series devices reduces transistor size 26 Note that flipping the "polarity" also requires inverting the control signals. In Fig. 2-1, /LAT becomes LAT and /RST becomes RST. The inputs, on the other hand, need not be reversed. Since it is only necessary to build up a differential across the inputs, the input stage can source or sink current. In other words, both configurations in Fig. 2-3 work equally well. This fact becomes important in Sec. 2.2 because an n-channel current mirror is much faster than a p-channel. VDDnVDD OR Figure 2-3: Both configurations establish a differential across the input nodes. 2.1.2 Timing and Output In a synchronous system, a clock replaces the latch signal, and the reset signal is generated from the clock. For example, the circuit in Fig. 2-4 creates a pulse on the falling edge of the clock, where the propagation delay of the inverters determines the width of the pulse. In simulation, three inverters usually produce a pulse long enough to reset the latch. The following discussion on timing refers to the diagram in Fig. 2-5, which shows the "upside down" Schaffer and Mitkas latch with transistor names and control signals relabeled. Figure 2-6 plots the associated waveforms for this circuit. Bit evaluation consists of two phases. First, CLK goes low and RST pulses high. This turns off M5 and M6 while simultaneously resetting the input nodes (IN1 and 27 IN2) to the same value, notably VTN. The voltage never drops any lower than this point because M7 and M8 turn off. CLK RST Figure 2-4: When CLK goes low, the NOR gate temporarily goes high until the signal propagates through the inverters. VIDD CLK- M4 M2 M1 M3 Q /Q M6 M5 CLK RST IN 1 Input Photodiode IN2 M-1 wReference M7 M8 Photodiode Figure 2-5: Upside down Schaffer and Mitkas cell After RST returns to zero, the cell begins building up a differential across the inputs. During the first clock period in Fig. 2-6, the optical input (IPHOTO) is high. This example uses photodiodes as current sinks, so a high optical pulse drains current off IN1. The low optical input in the second clock period has no effect, so IN1 stays constant. In both clock cycles, IN2 drifts down slightly, indicating some sort of reference input that sinks an intermediate amount of current, in between an optical one and an optical zero. Section 2.3 discusses how to design such a reference. 28 ---- - -~ u--I.-.- Undefined Undefined 7ei PHOTO - CLK RST R S T~IN1 IN1 & (Input) .4 - IN2 (Ref) IN2 -DD DD - VTN VTN Figure 2-6: Basic waveforms for Schaffer and Mitkas cell In the second phase of evaluation, CLK turns on but RST stays low, connecting the latch like a pair of cross-coupled inverters. Positive feedback amplifies the differential until IN1 and IN2 saturate. For example, look at the first clock period. Although IN2 drifts down slightly, IN1 (the input node) drops by more. When M5 and M6 turn back on, the two inverters (M3 and M7 on the left and M4 and M8 on the right) quickly amplify the differential using positive feedback. The reverse happens in the second clock period. With no optical input, INI stays constant, but IN2 still drifts down slightly by default. Positive feedback once again amplifies the differential, but this time in the other direction. There are three important points about these waveforms that need to be mentioned. First, note that IN1 and IN2 saturate at (VDD - VTN) on the positive side, and ground on the negative side. NFET's make good pull-down devices, but bad pull-up devices because they stop providing current once the source gets within a threshold voltage of the gate. In this case, M5 to M8 are all NFET's, so the inputs swing between zero and 29 (VDD - VTN). On the other hand, outputs Q and /Q charge up through PFET's and charge down through NFET's, so they swing all the way from zero to VDD. If the outputs did not swing rail to rail, then transistors in the next stage would always be partially on, dissipating large amounts of static power. Second, notice in Fig. 2-6 that optical input has no effect when CLK is high. The circuit functions with or without return-to-zero signaling. Finally, Sec. 1.2.2 claims that, although charge sharing does occur in the Schaffer and Mitkas latch, it cannot cause bit errors. The next section describes this important phenomenon in more detail. 2.1.3 Charge Sharing The output nodes, Q and /Q, have the same capacitance, CQ, and precharge to the same voltage, VDD. VDDCQ. As a result, they both contain an amount of charge equal to The input nodes, on the other hand, charge to different voltages. If IN2 charges to VIN2, then IN1 charges to (VIN 2 +AV), where AV represents the differential built up across IN1 and IN2 due to an optical input. Both input nodes have the same capacitance, CINWhen the clock signal goes high, M5 and M6 short the input and output nodes together. The total charge on the input side becomes the sum of the charge on /Q and IN1. Likewise, the charge on Q and IN2 pools together on the reference side. Equations 2.1 and 2.2 express these relationships. Qinput = VDDCQ + (VIN2 + AV)CIN (2.1) Qref = VDDCQ + VIN2CIN (2.2) The total charge on each side redistributes over the new capacitance, which is effectively the sum of the capacitances on the individual nodes which are now shorted together. Equations 2.3 and 2.4 provide expressions for the new voltages, Vj " and V,'ef, on each side. 30 Qnput(23 CIN ± CQ V' = ref (2.4) Qref CIN + CQ The difference between these new voltages represents the total differential across the latch after charge sharing occurs. As shown in Eq. 2.7, the new differential, AV', is smaller than the original differential by a factor cz CQ, but has the same sign. So although charge sharing dilutes the magnitude of the differential, it does not change the direction, and therefore cannot cause bit errors. AV' Vinput (2.5) Vef (2.6) AV'= Qi"pu ref 7 CIN + CQ AV' (2.7) CIN =AV( CIN + CQ Note that decreasing the magnitude of the differential can potentially decrease speed. Luckily, the capacitance CIN tends to be larger than CQ, so C-Ic- is usually between 0.5 and 1.0. -- VDDVTN IN2 ---- IN1 - - VTN Figure 2-7: Input waveforms including charge sharing Figure 2-7 updates the waveforms to account for charge sharing. When CLK goes high, charge immediately redistributes from the output nodes to the input nodes, and the voltage on nodes IN1 and IN2 jumps up to a new value. Charge sharing also produces glitching on the output nodes, Q and /Q. In other 31 words, when the clock turns on, Q and /Q charge down, IN1 and IN2 charge up, and they meet in the middle at their new values, V and V'ef. Figure 2-8 shows all four waveforms superimposed upon one another. VDD IN2 - ----- 4 IN1 - DD -VTN VTN Figure 2-8: Output waveforms including charge sharing 2.2 Current Mirror Input Connecting a photodiode directly to the inputs of the sense amplifier places the largest capacitance in the circuit - the junction capacitance of the photodiode directly on the switching nodes, IN1 and IN2. Adding a current mirror solves this problem by isolating the photodiode capacitance, while still reflecting the small signal photocurrent to the input of the latch. 2.2.1 Design Figure 2-9 shows the evolution of a current mirror input stage. The left diagram demonstrates the basic idea of a current mirror. In reality, M10 needs some sort of bias current, which also gets reflected, as shown in the middle diagram. Adding an identical source of bias current on top of M11 subtracts this offset away, leaving only IPHOTO at the output (right diagram). In a real design, p-channel transistors replace the ideal current sources. Figure 2-10 shows the final configuration. The current mirror input stage attaches to IN1 of the sense amplifier. An identical set of transistors (M14 and M15) must be connected to the other side to preserve balance. 32 VDD VDD VDD ZBIAS VDD 'BIAS 'BIAS 'PHOTO M10 Ml Ml M10 M10 Ml Figure 2-9: Adding a current mirror isolates the diode capacitance (left). However, the current mirror needs bias current (middle), and this bias current must be subtracted to get the correct photocurrent out (right). VDD V P VDD M13 M12 PHOTO M11 M10 Figure 2-10: Input stage using current mirror Figure 2-11 shows the new receiver circuit. The grey dotted line effectively outlines a new sense amplifier, with inputs VIN and VREF. In this document, the term "sense amplifier" or "latch" (or "Schaffer and Mitkas cell") refers to the basic sense amplifier consisting of transistors M1 to M9. Transistors M11, M13, M14, and M15 are called the "input transistors" because they directly drive the input nodes, IN1 and IN2. Finally, the "input stage" refers to transistors M10 and M12, as well as the photodiode. On a similar note, most analysis in this document deals with the voltages VIN and VREF in Fig. 2-11. These nodes directly drive the input transistors, and therefore control the input nodes, IN1 and IN2. 33 VDD CLK-- VDD V M12 M1ME VI0 M4 M2 M1 M3 CLK RST VDD M13 VDD IN1 IN2 V, M14 HVE M8 M7 M15 Figure 2-11: Full receiver circuit 2.2.2 Analysis Figure 2-12 gives an equivalent small signal model for the current mirror input stage. The photodiode consists of a small signal current source, IPHOTO, a junction capacitance, Cj, and a junction resistance, Rj. Photocliode .0 'PHOTO jC CGS10 VGS10 1 G.10 r1012 -F VIN CGS11 %J Figure 2-12: Small signal model of current mirror input stage Shorting the drain and gate of M10 (diode connected) causes the transconductance source to look like a resistor with value 9m 1.10 Grouping the resistors and capacitors in parallel condenses the circuit to the model in Fig. 2-13. 34 I PHOTO Rjlr 0 lolIr1/g M10 C + CGS10 + CGS11 VIN Figure 2-13: Simplified small signal model of current mirror input stage Reasonably sized integrated transistors usually have gate capacitances on the order of 1-10 f F, whereas most photodiodes have junction capacitances of 100 fF or more. So, the other capacitors are insignificant compared to C,. Also, 1 gmio will be small for a well designed current mirror. Since the other resistors in parallel are quite large, they can be ignored. These assumptions produce the approximate small signal model shown in Fig. 2-14. 1gM10 PHOTO CJ VIN Figure 2-14: Approximate small signal model of current mirror input stage As the input photocurrent pulses high and low, VIN charges and discharges with a time constant given by Eq. 2.8. T =- (2.8) Figure 2-15 shows VIN charging up and down with a time constant fast enough for VIN to saturate on every cycle. Unfortunately, a large photodiode capacitance often limits the speed of the time constant, meaning that VIN does not saturate on every cycle, causing the DC offset to drift around. This turns out to be one of the biggest design challenges of all, so the next section takes an in depth look at the transient behavior of VIN. 35 PHOTO REF IN Figure 2-15: VIN vs- 2.2.3 VREF for fast time constant, T Transient Issues Define VLO as the steady state value of VIN with no optical input, and VHI as the steady state value of VIN with maximum optical input. In other words, VLO is the gate to source voltage on M10 required to sink the bias current, and VHI is the gate to source voltage required to sink the bias current as well as the photocurrent. The two voltages differ by an amount approximately equal to the photocurrent times the small signal resistance, as given by Eq. 2.9. VHI ~ VLO + IPHOTO 9m10 (2.9) Comparing Eq. 2.9 to Eq. 2.8 reveals a tradeoff between voltage swing and speed, since speed goes as the inverse of the time constant. Figure 2-16 displays a more realistic set of waveforms for VIN in response to an arbitrary bit pattern. Notice that VLO provides a lower bound, VHI an upper bound, and that the time constant is not fast enough for VIN to swing the full distance between them in one clock period. The dashed lines in Fig. 2-16 represent the "trajectory" of VIN, or rather the path it would follow with an infinite clock period. The actual waveform always lies on one of these paths, but jumps from curve to curve as the input switches back and forth. In mathematical terms, the waveform in every period behaves as an exponential, only time shifted and with a different set of initial conditions. 36 VHI V IN LO PHOTO Figure 2-16: Transient response of VIN for an arbitrary bit pattern Near VLO, the waveform charges up faster than it charges down, causing the DC value of VIN to drift upwards. Likewise, near VHI the DC value tends to drift downwards. Amazingly, one can show that this "low frequency" drift actually has the same time constant, T, as the dashed exponential traces in Fig. 2-16. More formally, VIN is the output, v0 , of a single pole system satisfying the differential equation in Eq. 2.10. For simplicity, treat the driving term, iINR, as a simple voltage source, v, (Eq. 2.11). + vo = iINR (2.10) RCi 0 + vo = v, (2.11) RC 0 The homogeneous solution to this differential equation is an exponential with time constant T = RC. Assume a square wave drives the system, such that v, = (VHI-VLo) on odd periods, and v, = 0 on even periods. In both cases, a constant term equal to v, provides the particular solution. Then Eqs. 2.12 (charge up) and 2.13 (charge down) represent the total solutions, where A depends on the initial conditions. vo = v, + Ae-'t/ (2.12) Vo = Ae-t/T (2.13) 37 For a given period length, T, let x = !. Input v, starts by going high in the first period, so solving the "charge up" equation with initial condition vo = 0 gives the expression in Eq. 2.14 for vo during the interval (0 < t < T). At the end of the first period (after time T), the value of v0 is given by Eq. 2.15. vO(t) = v(1 - e-t/T) (2.14) vO(t = T) = v(1 - e-x) (2.15) Starting at time T, the input goes low, and vo begins discharging. Using the new set of initial conditions from Eq. 2.15 and the "charge down" equation yields the result in Eq. 2.16 for (T < t < 2T). Notice that the exponential is time shifted by an amount T. Equation 2.17 gives the final value for this period, which can then be used as an initial condition for the next time period, (2T < t < 3T). vO(t) = V8 (1 - e-X)e-(tT)/I v0 (t = 2T) = v5(e-x - (2.16) (2.17) -2x) In this manner, one can derive the formula for vo at any arbitrary point in time. Table 2.1 summarizes the first five periods. Time Period 0 < t<T VS[1- T < t < 2T v.[ 2T < t < 3T 3T < t < 4T 4T < t < 5T e- + e- 2 x )e-(t-2)/r 1- e- + e~ 2 x - e-3 x e(t-3 T)/I v,[1- (1 - e-x + e- 2 x - e-3x + e-4xW-(t-4)/] v0 (t) 1e-/ ] 1 - e-x)e-(t-T)/ ( v,[1- ( ( vs [ 1 - Table 2.1: Equations for output of a low-pass filter driven by a pulse train The expressions in Table 2.1 follow a distinct pattern. Namely, the formula for odd periods (1, 3, 5, ...) follows the "charge up" form in Eq. 2.18, and the formula for even, "charge down" periods (2,4,6,...) follows Eq. 2.19. Equation 2.20 defines a, where n is an integer that refers to the period, (1 < n). 38 v0 (t) = v'(1 - ae-(t-nT)/r), n vO(t) = n vsae-(t-nT)/, n a= = 0, 2, 4, ... 1, 3, 5, ... - (2.18) (2.19) (2.20) (-e~x)k k=O Several other results follow naturally from these expressions. First, define 0 as the final value of a as n goes to infinity (Eq. 2.21), which reduces to the expression in Eq. 2.22. 00 / =a(n -+ oo) = (-e-x)k (2.21) k=O 1 = In steady state, v, = v,(1 - + (2.22) ) at the beginning of any charge up cycle, and vo = v.0 at the beginning of any discharge cycle. Equations 2.23 and 2.24 express this mathematically. Note that these equations only apply in steady state for a square wave input. vO(nT) = v,(1 - 3), n even vo(nT) = v, #, n odd (2.23) (2.24) From these two expressions, one can derive the size of the envelope, or rather the fraction of the total voltage swing, (VHI - VLO), that VIN occupies. For example, look forward at Figs. 2-17 to 2-19 (page 42). The two black lines on each graph outline the envelope, and the actual signal can be seen by tracing the dotted exponential "trajectories" back and forth between the the upper and lower envelope. Equation 2.26 shows that envelope size is directly proportional to /, making # an important design parameter. In an ideal world, / would be one, and VIN would occupy 100% of the voltage range. 39 envelope = envelope v8# - v5(1 - 0) (2.25) v.(23 - 1) (2.26) Finally, Eqs. 2.27 to 2.33 derive the time behavior of the envelope, or in other words, the DC drift. Equation 2.27 gives the value of v, at the beginning of any discharge cycle (n = 1, 3, 5, ...). Ultimately this approaches v,/3, which expands to the geometric series in Eq. 2.28. Equation 2.29 defines y as the difference between v 0 (nT) and its final value. Expanding this expression and factoring out an e-(n+1)x yields Eq. 2.31, where the series of exponentials is simply # (Eq. 2.32). Finally, Eq. 2.33 makes the substitution nT = t. As claimed, the DC drift depends only on the time constant T. vO(nT) Svs (1 - e- + e- 2 x - -+ e-(n-)x _ e-nx), n = 1, 3, 5,... (2.27) = vs(1 - e-x + e- 2x y - v0#- v0 (nT), n = 1, 3, 5,... (2.29) y - Vs(e-(n+)x _ e-(n+2)x + - - (2.30) y Svse-(n+)x(1 y = vse "xe-X (2.32) y = vssee-/T (2.33) v5f3 _- -- _ e- x + + e-2x e-(n+)x - e-(n+2)x +- - -) (2.28) (2.31) If y represents the distance of the envelope from its final value as a function of time, then (v3 - y) represents the equation for the upper envelope. Similarly, - y) gives an expression for the bottom envelope. Equations 2.35 and 2.37 (vs(1 - ,3) show expressions for the envelope as a function of time. vtop = VS0 - y (2.34) vtop = vsO(1 - e-xe-t/) (2.35) 40 Vbot Vbot (2-36) Vs(1-)-y s - vsi3(1 + (2.37) 6 x 6 tT) Once again, Figs. 2-17 to 2-19 plot the exponential "trajectory" curves in dotted lines with the envelope superimposed on top of them in solid lines. Each diagram shows the waveforms for a period of 6T, enough time for the exponential to reach 99.75% of its final value. A waveform like the one in Fig. 2-16 (page 37) can be seen by tracing the exponentials back and forth between the upper and lower envelope, starting at zero. The key parameter in these diagrams is the ratio of the clock period, T, to the time constant, T. In Fig. 2-17, T is only a fraction of T. This results in a narrow envelope, and the system takes many clock periods to settle. The envelope takes up vS(2,3 - 1) = 0.10Ov, or 10% of the total possible voltage swing. Figures 2-18 and 2-19 show what happens as the period becomes an increasingly larger fraction of the time constant. It takes fewer clock periods to settle, and the envelope broadens. The envelopes occupy 20% and 38% of the total voltage range in Figs. 2-18 and 2-19, respectively. In fact, Fig. 2-19 exhibits a large enough swing for VIN to cross the 50% point on every cycle. With VREF biased at 50%, this is a necessary but not sufficient feature for correct evaluation. Chapters 3 and 4 discuss how the shape of VIN affects evaluation speed and how to choose parameters for a specific process. 41 100 - 80- ! Li. 60 --- -- 40 5---- 20-0 0 1 2 3 Time (Units of t) 4 5 Figure 2-17: VIN transients for period T 6 0 .2- 100 80- 0 - -0- 20 00 1 2 3 Time (Units of t) .1 4 " Figure 2-18: VIN transients for period T - -.- 6 5 0 4T - -- 10080- 60 7540 0 0 1 2 3 Time (Units of t) 4 Figure 2-19: VIN transients for period T 42 5 0.8T 6 2.3 Reference Circuit A sense amplifier cannot evaluate "ones" and "zeroes" correctly without an appropriate reference. One strategy involves constructing an imbalanced sense amplifier so that the circuit has a tendency to evaluate one way or the other. The work in this thesis goes another direction, keeping the amplifier itself perfectly balanced, while providing a reference input halfway between a one and a zero. 1 IPHOTO __ __ __ VDD V IN - _ --- 1__ VH1 VREF ----- NLO GND Figure 2-20: VIN and VREF on a full scale voltage range Figure 2-20 shows an ideal reference. The input waveform, VIN, swings back and forth between its minimum (VLo) and maximum (VHI) values, while the reference voltage splits it right down the middle. Low Bit High Bit IIN1 9m l(VHI 9m14(VREF IIN2 (IIN1 - IIN2) - VLO) = IPHOTO - VLO) = 1'PHOTO 9m11(VLO gml4(VREF 'IPHOTO - - VLO) VLO) = = 0 "IPHOTO -jIPHOTO Table 2.2: Net input current to latch for high and low bits Table 2.2 explains the "ideal" choice of a reference voltage. When no net current flows into IN1 of the latch. Small changes in 43 VIN VIN = VLO, around this point change the input current by gmllXAVIN. An input of VIN VHI designates a full power optical input, so IPHOTO flows into IN1. Voltage VREF, on the other hand, stays at 2(VHI + VLO), So $'PHOTO always flows into IN2. The net current into the sense amplifier, (IIN1 - IIN2), has exactly the same magnitude for high and low bits, but a different sign. In conclusion, an ideal reference circuit should average VHI and VLO, or in the current domain, average the "on" and "off" photocurrent. Unfortunately, this requires knowledge of IPHOTO. Therefore, a single extra bit accompanies every optical data bus to provide a DC reference of steady state optical power. This additional bit is continually on, so the reference photodiode always sources a current equal to IPHOTO. Given the linear relationship between current and voltage in small signal, creating a reference voltage becomes trivial. Consider the scheme shown in Fig. 2-21. 'ZeroBit 'OneBit Zero-Bit + 'OneBit 'ZeroBit + 'One Bit 2 Z + z 1--- 'Zero Bit + IOne Bit 2 Z2 2 Figure 2-21: Averaging two inputs in the current domain. In the current domain, addition consists simply of connecting two wires together. In Fig. 2-21, the currents for a "one bit" and a "zero bit" add linearly and split up into the bottom two branches based on the relative terminating impedances. If the two impedances are equal, then current splits half and half. Figure 2-22 replaces the impedances with current mirrors. Since M18 and M19 are identical, they present equivalent loads and current still splits half and half. Both 44 transistors in Fig. 2-22 receive the desired reference current, so either one can provide the reference voltage, VREF 2 (VHI + VLO) OneBit 'ZeroBit 'ZeroBit + IOneBit Zero Bit + IOne j Bit 1 2 REF ZeroBit + IOne Bit 2 REF Figure 2-22: Identically sized current mirrors present the same impedance Figure 2-23 proposes schematics for the ideal current sources in Figs. 2-21 and 2-22. Obviously, the input current for a high bit equals (IPHOTO + 'BIAS). However, a low bit consists only of bias current, so the photodiode can be omitted. Bit M17 ZeroBit M16 One M17 h -- M16 Figure 2-23: A "zero bit" does not require a diode at all Replacing the ideal current sources with the virtual inputs from Fig. 2-23 produces the circuit in Fig. 2-24. Obviously, these transistors share a lot of nodes, including VREF. Figure 2-25 shows a condensed version of the full reference circuit. The sizes of transistors M16 to M19 must match their counterparts in the receiver circuit. 45 VDD VDD M17 h-V, M16 21BIAS + PHOTO B _PHOTO BIS+2 8 VREF V, BIS PHOTO 2 VREF Figure 2-24: Current averaging with photodiode input VDD DC Optical VP-+IVI N" M17 M7 V A*-' Input 19 VREF Figure 2-25: Reference circuit shared by all bits in a data bus All bits in the data bus share this reference circuit. For instance, the 128 bit bus in Fig. 2-26 consists of 129 optical signal lines. The extra signal always transmits a high bit, providing a reference of steady state optical power for the reference circuit. A designer might choose to use several optical reference lines to increase variation robustness (see Chapter 4 on variation). For each additional signal, one simply adds another copy of the circuit in Fig. 2-25 in parallel. 46 ---- EN ,..-~.-E-I- 128 2 1 0 VDD VDD VDD -- -~ ~ILiiJ VDD AC Optical Inputs VDD Reference DC Optical Input VVREF______________ R _ Figure 2-26: Example of 128 bit optical data bus with a single reference circuit 2.4 Summary Inverting the polarity of the Schaffer and Mitkas sense amplifier ensures that the two series devices are NFET's rather than PFET's. A clock signal replaces the latch signal, and the circuit in Fig. 2-4 (page 28) generates a reset signal from the clock. During the first clock phase, reset pulses high, precharging IN1 and IN2 to the same value. A reference input always sinks I1PHOTO of current from IN2, while the optical input controls the amount of charge flowing into IN1. After the clock goes high, charge redistributes from the output to input nodes, causing an immediate increase in IN1 and IN2 and a decrease in Q and /Q. This charge sharing dilutes the magnitude of the differential across the latch, but not the sign, so the circuit still evaluates correctly using positive feedback, as shown in Fig. 2-8 (page 32). Adding a current mirror on IN1 isolates the photodiode capacitance, allowing the latch to switch faster. Equation 2.8 (page 35) expresses the time constant on the input node in terms of the junction capacitance and the transconductance of the diode connected transistor, M10. Section 2.2.3 discusses the transient behavior of VIN due to an arbitrary input waveform. Specifically, the DC offset of VIN tends to drift around. On any given 47 cycle, VIN charges up or down exponentially, occupying some fraction (Eq. 2.26, page 40) of the total voltage swing between VLO and VHI (Eq. 2.9, page 36). To avoid this effect, the time constant should be small compared to the clock period. A reference circuit averages the photocurrent from a high and low optical signal to create a reference voltage that is equivalent to an input of 1I'HOTO. This requires an extra, reference optical path. All bits in the data bus can share the same reference voltage. Figure 2-27 shows the final schematic of the receiver and the reference circuit. All references to transistor names and node names in this thesis refer to this diagram. 48 1 Sense Amplifier VDD CLK -<M1 M3WM (D1 Shared Reference Circuit Current Mirror Input Stage VDD V M5CLKM6 VDD r~ (D1 iOptica ~ M21 M2Kc N* M12 Data ln,, VIN RST EF --L 1 N2N2M15 M9 M13 M7 VDD VDD V M8 M1 DC M17W Optical Reference 50 Chapter 3 Process, Sizing, and Simulation This chapter describes implementation of the data receiver circuit in TSMC's 0.18 um digital CMOS process, provided through the MOSIS prototyping service. Simulations are done using Avant! Star-HSPICE@ and device models provided by MOSIS. The next few sections provide a brief description of the process and sizing considerations, followed by simulation results. Simulations use a test photodiode providing 10 uA of bias current with a junction capacitance of 100 f F. 3.1 Process Overview TSMC's 0.18 um digital CMOS process contains one poly layer and six metal layers. The physical gate length is 0.16 um with an oxide thickness of 32 A for a 1.8 V supply. The process also provides an additional set of transistors with a 70 A gate dielectric for I/O interface at 3.3 V. For digital design, TSMC claims densities of over 100,000 gates per mm 2 , logic speeds of over 400 MHz, and a ring oscillator delay of 28 ps. Table 3.1 summarizes some useful process characteristics [6]. Despite the relatively analog nature of receiver design, a digital process is the appropriate testing platform because it demonstrates the feasibility of integrating optical interconnect into VLSI logic chips. A digital process ordinarily imposes serious constraints on the quality of passive components, but the design in Ch. 2 avoids passive components altogether. These process constraints are a big reason why. 51 Parameter Supply Voltage Interconnect Drawn Gate Length Physical Gate Length Gate Oxide Thickness 6T SRAM Cell Size Ring Oscillator Delay Leakage Current Value 1.8 V 6 Metal, 1 Poly 0.18 um 0.16 um 32 A 4.65 um 2 28 ps 0.1 nA/urn Salicide CoSi 2 Metal Via Fill AlCu Tungsten Table 3.1: Summary of TSMC 0.18 um Digital Logic Process Effective design requires an understanding of device performance, and how it changes based on process parameters and bias conditions. In this case, information about the current-voltage (I-V) characteristics, unity current gain frequency (fT), and transconductance (gm) proves useful. TSMC does not provide information about these factors, so they must be extracted through simulation. Appendix A compiles graphs of these parameters. Figures A-1 to A-3 on page 120 show the I-V characteristics of a minimum length NFET as width varies from 0.5 um (minimum) to 5.0 um. Page 121 shows the same graphs for a minimum length PFET. According to the graphs, a minimum width PFET can comfortably deal with currents on the order of 50-100 uA, while an NFET can sink almost three times as much. Figure 3-1 shows the test setup for finding fT. A DC voltage source biases the gate through a huge inductor. This convenient trick essentially removes the voltage source from the circuit during AC simulation. The fT frequency occurs where current gain crosses one. Figure A-7 (page 122) plots current gain versus frequency for transistor widths ranging from 0.5 um to 5.0 um. As expected, DC parameters (such as width) have little or no effect on fT. The crossover points in Figs. A-8 and A-9 (page 122) show an fT of roughly 50 GHz for an NFET, and about a factor of three less for a PFET (17-18 GHz). 52 VDD 'OUT LHUGE V-BIAS U ACI N Figure 3-1: Test setup for finding fT Finally, consider how the tradeoff between bias current and size affects gm. Figures A-10 to A-13 (pages 123 and 124) show how g, varies for a diode connected transistor with lengths of 0.18 um and 0.36 um. As expected, doubling the length substantially reduces transconductance. Also, notice that current tends to have more effect on g, in Fig. A-10, while geometry is more effective in Fig. A-12. The trick is figuring out which parameter gives a greater benefit relative to cost at the current operating point. 3.2 Sizing and DC Biasing Extremely high performance chips operate around a gigahertz in 0.18 urm technology, so this implementation targets 1.0 GHz as a nominal operating frequency. The following sections choose DC parameters for the latch, current mirrors, current sources, and reference circuit. Figure 3-2 shows the receiver layout for a single data bit, measuring 12.6 um wide and 10.6 um tall. Figure 3-4 displays the layout of the reference circuit, measuring 7.0 um wide and 10.6 um tall. Sections 3.2.1 to 3.2.4 refer to these layouts. 53 Figure 3-2: Layout of receiver circuit for a single data bit (12.6 um x 10.6 um) 3.2.1 The Latch In Ch. 4 (page 89), simulations show the latch to be fairly variation resistant, unless transistor strengths are grossly mismatched. In general, as long as an appropriate differential builds up across the inputs, the latch evaluates correctly. Therefore, increasing geometries in the latch provides little benefit in terms of variation robustness. On the other hand, larger geometries in the latch decrease speed. Increasing transistor size increases capacitance, and the circuit requires more time to evaluate. After all, the goal is to minimize capacitance on the switching nodes. Given this tradeoff, the design utilizes minimum sized transistors in the latch, with efforts to minimize capacitance on key nodes whenever possible. For example, refer to Fig. 3-3, which shows the names and sizes of the transistors from Fig. 3-2. 54 0 0 0.360 0.360 0 0 [ 0 360 0 M 0.500 0500 [00.! 500 0.500 10 0.500 o S Mo Z -0 -00 . .1-1.500 1. 500 0.5W0 0.180 - 1.500 Figure 3-3: Dimensions and transistor names for receiver circuit. Large, unlabeled transistors are MOS decoupling capacitors. Devices M1 to M9 form the central latch. The four PFET's stack in series, with the non-shared (higher capacitance) nodes connected to VDD. Transistors M5, M6, and M9 are also laid out in series. Once again, the critical nodes, IN1 (between M5 and M9) and IN2 (between M9 and M6), occupy the shared nodes to minimize capacitance. All devices measure the minimum size of 0.5 um in width and 0.18 um in length. 3.2.2 Current Mirrors Chapter 2 derives an approximate expression for the critical time constant in the receiver, T- -- 9m10 (Eq. 2.8, page 35). A smaller r means faster transients, and faster evaluation, with one catch. The product of input current, IPHOTO, and the small signal resistance on VIN ( 1 ) determines the maximum voltage swing on VIN, as shown in Eq. 2.9 (page 36). Increasing gmio too much makes the voltage swing too small, and decreasing gmio 55 too much makes the input waveform too slow. Both of these effects can decrease the magnitude of the differential that builds up across IN1 and IN2. Geometry and bias current both control gm. Increasing transistor size increases capacitance on IN1, making the latch switch slower, but increasing bias current increases static power dissipation. So, both control knobs have tradeoffs. In addition, the output waveforms of the latch affect dynamic (switching) power dissipation. As shown in Fig. 2-8 (page 32), Q and /Q experience charge sharing with IN1 and IN2. This glitching can cause unnecessary transitions in logic connected to the output nodes. As Eq. 2.7 (page 31) demonstrates, charge sharing depends on CIN, which is dominated by the capacitance of the current mirrors. Performance Factor Power Dissipation Speed - Phase 1 (Clock is low) Dependencies Reasons Static power = VDDIBIAS + IBIAS + W, + L 1 more glitching at output -> more dynamic power dissipation t g.mo 4T t gmio >4T + IBIAS + W, - L - W, - L Speed - Phase 2 - 'BIAS t capacitance on VIN =- TT (negligible) t gio =: 4 voltage swing on VIN > 4 differential across IN1 and IN2 t gmio => 4 voltage swing on VIN => 4 differential across IN1 and IN2 (Clock is high) - W, + L - W, - L CIN => t CIN = latch switches slower Table 3.2: Circuit performance as a function of bias current and geometry for input stage transistor, M10 Table 3.2 summarizes this complex set of dependencies. In essence, one must make a tradeoff between speed and power dissipation, which requires considering operation in both clock phases. For instance, in the first phase, a fast transient navigates VIN around VREF quickly, but that means less voltage swing, which in turn means a smaller differential for the sense amplifier to evaluate when the clock goes high. Furthermore, all of these factors are functions of transistor size and bias current, making evaluation speed a complicated, non-linear function. However, with the aid of simulation and some educated guesses, a designer can converge on an acceptable solution. 56 First, recall the tradeoffs discussed in Sec. 3.1 for determining gm. Doubling the length greatly reduces the transconductance. Also, NFET's have higher transconductance than PFET's. Given this information, and the unfavorable set of dependencies for length in Table 3.2, it seems obvious to use minimum length NFET's for the current mirrors. Increasing length would make the circuit more variation resistant, but the cost is too high. Closer inspection of Fig. A-10 (page 123) reveals some interesting trends. Section 3.1 mentions that, depending on the bias point, sometimes current gives more control over gm, and sometimes geometry has more effect. For a minimum length NFET, current has a bigger effect. Decreasing returns on width start kicking in around one micron for low bias currents, and around two to three microns for higher bias currents. A process of careful simulation finally converges on a value of 50 uA for IBIAS with a transistor width of 3.0 um. This particular operating point yields a respectable gIo of 625 uMHO. With a photodiode capacitance of 100 f F, the time constant comes out to about 0.16 ns, allowing VIN a 16 mV swing with very little DC drift at a gigahertz. In fact, according to Eq. 2.26 (page 40), the envelope utilizes over 99% of the total voltage swing. These choices maximize differential build-up speed in the first clock phase, while still providing a nice balance with evaluation in the second phase because the 3.0 um wide current mirror does not load down the inputs too much. Power dissipation stays just under 300 uW, which seems like a reasonable budget for a single bit. Of course, power and speed can always be traded off for one another. Figure 3-3 shows that capacitance on IN1 can be further reduced by folding the input transistors. Voltage VIN occupies the drain node of M10, which effectively has the capacitance of a transistor only 1.5 um wide. Likewise, IN1 occupies the shared node of M11. A centroid style layout could reduce this capacitance even more. 57 3.2.3 Current Sources The current sources, consisting of transistors M12, M13, M15, M16, and M17, supply bias current to the current mirrors. As seen in the previous section, bias current plays a large role in setting the DC operating point. Therefore, it would be nice to scale up the geometries of these transistors, making them more variation resistant. Unfortunately, capacitance scales with geometry. This tradeoff can be partially circumvented using clever layout. As seen in Fig. 33, folding a transistor doubles the effective width, without increasing the size of the drain. In fact, the shared node has a channel on both sides, so folding a transistor actually decreases total drain capacitance by one unit of sidewall capacitance. Doubling length, on the other hand, approximately doubles the capacitance. In conclusion, scaling the transistors up to - =L 0.36 urmpoevaitnrimproves variation ro- bustness, while increasing the capacitance by a factor less than two. Considering the current sources contribute very little capacitance on the input nodes to start with, this turns out to be a reasonable tradeoff. 3.2.4 Reference Circuit In order to function as an appropriate sense amplifier, it is imperative that the circuit be balanced on both sides. Furthermore, all of the analysis done in Ch. 2 relies on the assumption that IN1 and IN2 have the same capacitance. Therefore, M14 and M15 must match M11 and M13 exactly. Likewise, the reference circuit (M16 to M19) must match its set of input transistors (M14 and M15), or the reference voltage becomes useless. Figure 3-4 shows the layout of the reference circuit on the left, and the names, locations, and sizes of the transistors on the right. Note that this layout makes every effort to match the transistors perfectly, even folding transistors the same way as in the receiver layout. Chapter 4 discusses the importance of symmetrical layout in a sense amplifier. The extra, unlabeled transistors in Figs. 3-2, 3-3, and 3-4 are MOS capacitors 58 used for local decoupling of power supplies. Adding these capacitors wherever there is free space minimizes high frequency voltage supply noise. 0.360 0 0.360 0 0.50 0 0.500 X[ 1.500 S1.500 Figure 3-4: Layout of reference circuit (7.0 urn x 10.6 urn) (left), along with dimen-sions and transistor names (right). Large, unlabeled transistors are MOS decoupling capacitors. In a real design, the photodiode has its own layout, which can vary tremendously in size. Typical sizes range from 10 urn by 10 urn to 100 urn by 100 urn. Table 3.3 (page 60) summarizes transistor sizes for both the receiver and reference circuits, according to the labeling conventions in Fig. 2-27 (page 49). Note that adding more bits only replicates transistors Ml to M15. A single reference circuit (M16 to M19) can service the whole bus. 3.3 Simulation Results Simulations use Avant! Star-HSPICE@. Due to the nonlinear nature of circuit operation, AC analysis is not possible, so the next section focuses on finding an appropriate test waveform for transient analysis. The following sections present simulated out59 Transistor Ml M2 M3 M4 M5 M6 M7 M8 M9 Width 0.5 um 0.5 um 0.5 um 0.5 um 0.5 um 0.5 um 0.5 um 0.5 um 0.5 um Length 0.18 um 0.18 um 0.18 um 0.18 um 0.18 um 0.18 um 0.18 um 0.18 um 0.18 um M10 3.0 um 0.18 um Transistor M1l M12 M13 M14 M15 M16 M17 M18 M19 Width 3.0 um 1.0 um 1.0 um 3.0 um 1.0 um 1.0 um 1.0 um 3.0 um 3.0 um Length 0.18 um 0.36 um 0.36 um 0.18 um 0.36 um 0.36 urn 0.36 um 0.18 um 0.18 um Table 3.3: Transistor sizes for receiver and reference circuit in 0.18 um technology put waveforms and results. All simulations use a photodiode that supplies 10 uA of photocurrent with a parasitic junction resistance of 100 f F. 3.3.1 Test Waveform A good test pattern should exercise all the worst case scenarios, and hopefully exhibit a 50% bit density to allow some representative power measurements. Figure 3-5 proposes an input waveform, and plots the corresponding response of VIN, as per the test circuit in Fig. 3-6. As will be seen, this waveform indeed exercises all three "worst case scenarios," namely a high bit after a series of low bits, a low bit after a series of high bits, and an alternating sequence of ones and zeroes. After a long series of high bits, VIN settles to its maximum value, VHI. A subsequent low bit requires the input node to discharge all the way down from its maximum value, making this the most difficult low bit to evaluate. Usually, VIN starts the cycle somewhere below VHI, so it does not have to discharge as far to evaluate an incoming zero. A similar situation occurs when a high bit follows a long string of low bits. The input settles to VLO, the minimum possible value, making this combination the most difficult high bit to evaluate. Finally, the circuit must behave predictably in steady state, even if the input asks 60 it to flip back and forth repeatedly. The circuit should never fail on an alternating pattern if it does not fail for the two previously mentioned patterns, because VIN never reaches VLO or VHI. Nonetheless, a "1010..." pattern adds some redundancy to the tests and shows off the DC drift behavior. 10 05 0 0 5 10 15 20 25 time (ns) 30 35 40 45 H 50 0 5 10 15 20 25 time (ns) 30 35 40 45 50 I~ 0.61 0.605 0.6 0.595 Figure 3-5: Input test pattern (top) and corresponding VIN waveform (bottom) VDD VDD 1PHOTO BIAS C 4 IBIAS E7>~ M10 M10 VIN VIN Figure 3-6: Test circuit for waveforms in Fig. 3-5 As claimed, the waveform shown in Fig. 3-5 exercises all three test cases, and has approximately a 50% bit density as well. The input starts with a zero, followed by 20 consecutive ones, 20 zeroes, and five alternating "10" sequences. 61 Output Waveforms 3.3.2 As in Fig. 3-6 (page 61), a capacitor and a current source replace the photodiode. Figure 3-7 shows the clock and input waveforms, as generated by Spice, and Fig. 3-8 shows the reset waveform plotted against the clock. The clock runs at one gigahertz, and all sizing corresponds to that derived in Sec. 3.2. I L.- 1.8 10 1.5 1.2 Y 0.9 -J 5 0.6 0 15 5 0 20 2 30 5 40 45 -5 0.3 0 0 0 5 10 15 20 25 time (ns) 35 30 40 45 5 Figure 3-7: Simulation waveforms for clock (grey) and input photocurrent (black) .......... 1.8 CLK RST 1.6 1.4 1.2 0.8 0.6 0.4 0.2 0 0 1 2 3 5 4 6 7 8 9 10 time (ns) Figure 3-8: Receiver CLK and corresponding RST waveform The simulated input and reference waveforms in Fig. 3-9 look similar to those 62 developed in Ch. 2, except with more noise due to capacitive feedthrough from the switching nodes. Two important properties of these waveforms merit discussion. V IN 610 _ VREF VHI 605 -4 600 0 VLO 595 . .. . . . . . . . . . . ... . . . . . .. . . . . . . . ....... . .. .. 590. 0 10 5 15 20 25 time (ns) 30 35 40 45 50 Figure 3-9: Simulated VIN and VREF waveforms First, the voltage range between VLO and VHI spans only 14.58 mV instead of the predicted 16 mV. This discrepancy arises because gm10 changes over time as a function of current. Plus, the output resistances of M10 and M12 have a small effect on the total small signal resistance. Taking the other resistances into account and averaging gm10 between the two operating points gives a more accurate predicted swing of 14.76 mV. Second, due to their small size, the waveforms are particularly susceptible to voltage spikes that propagate to the node through parasitic capacitors. These spikes constitute generic switching noise from CLK, RST, and even the evaluation of INI and IN2. Luckily, only the time average of VIN over the clock cycle matters, because that determines the total amount of charge placed on IN1 and IN2. This time average matches the ideal value pretty closely for the string of high and low bits, but the alternating sequence has some destructive interference that causes it to utilize less of the total voltage swing than expected (The calculations on page 57 predict over 99% using Eq. 2.26). 63 These results justify the alternating pattern of the test waveform, as well as the decision to increase gmio and push envelope utilization over 99%. Aiming for a utilization close to one hundred percent ensures that the actual waveform will meet a specification slightly less. 1.8 IM 1.6 ....- IN2 VDD & GND 1.4 1.2 0.8 0.6 0.4 0.2 10 15 0 2 10 15 20 3 3 4 4 30 35 40 45 0 0 5 25 time (ns) 50 Figure 3-10: Simulated IN1 and IN2 waveforms I-II --- -- -- -- --- -- -.-..- -- -- -- -- -- .- --.-- -.-- -- --- -. . - -- --- -- --- -- -- -- --- -- -- --. - 1.8 INM IN2 1.6 VDD& GND 1.4 1.2 1 0) - ... -. . . 0.8 0.6 .. - -.. -....... -. ..... . . ... . .. .. .7 . . . . . . . 0.4 0.2 0 1i 8 19 20 21 22 23 24 time (ns) Figure 3-11: Zoomed in plot for six cycles of IN1 and IN2 Figure 3-10 shows the waveforms on IN1 and IN2 for the entire test pattern, and Fig. 3-11 shows an enlarged portion for six cycles where the input switches from 64 a sequence of high bits to a sequence of low bits. These plots are the simulation counterpart of Fig. 2-7 (page 31). Figures 3-10 and 3-11 clearly demonstrate four things. First, IN1 and IN2 do not go all the way to VDD. As Ch. 2 explains, M5 and M6 make bad pull-up devices, so the output should be taken off Q and /Q. Second, the "gain" from VIN tO IN1 is inverting. A high optical signal (first three cycles in Fig. 3-11) causes VIN to increase, which turns on M11 and drains charge off IN1, causing it to go down. Third, Fig. 3-11 clearly shows the differential building up. As expected, the smallest one occurs on the fourth cycle when the input switches from high to low. It takes some amount of time for VIN to discharge below VREF in Fig. 3-9. As a result, an incorrect differential initially builds up, at least until VIN crosses VREF and begins erasing it, ultimately building up the correct differential. In the other five cycles, VIN starts in the right place, so a correct differential begins building up right away (as soon as the reset signal turns off). Finally, look at the spikes at the beginning of each evaluation phase in Fig. 3-11. These result from the charge sharing discussed in Sec. 2.1.3 (page 30). When CLK goes high, M5 and M6 short the outputs and inputs of the latch together. They immediately redistribute charge and assume the same value, causing a spike in IN1 and IN2, and a drop in Q and /Q. Next, look at the output waveforms, shown in Fig. 3-12 for the entire test pattern, and in Fig. 3-13 for the same six enlarged cycles as Fig. 3-11. Consider the following two points about these waveforms. First, Q and /Q exhibit the same charge sharing behavior just discussed for the input nodes. Specifically, each time the clock goes high, Q and /Q charge down to meet IN1 and IN2. This glitching can cause unnecessary transitions in combinational logic attached to the output. This results in excess power dissipation, or in some cases, logical errors. For instance, dynamic logic requires a glitch free input. Second, notice that the output itself is a type of dynamic logic. Both outputs start high, and one selectively discharges low. This also means that the signal only 65 U U U1 mU * * i - stays latched for half the clock period. During the other half, both outputs are high. One could possibly take advantage of this "dynamic logic" behavior to asynchronously signal data acquisition. For example, an XOR gate connected to Q and /Q only evaluates high after the receiver successfully acquires a signal. This might allow some sort of asynchronous optimization inside a single clock period, but the circuit itself relies on a synchronous environment to function. I!I SI I I ; 1.8 -1 -Q /. ... VDID & GND 1.6 1.4 1.2 0.8 0.6 0.4 0.2 I ..1.. 0 0 5 10 20 15 25 time (ns) .11 1 35 30 40 45 50 Figure 3-12: Simulated output waveforms, Q and /Q - 1.8 Q - 1.6 /Q VDD&GND 1.4 1.2 0) (D 1 0.8 ...... .. ... . .. .. ... - 0.6 0.4 0.2 ................ 0 8 19 21 20 22 23 24 time (ns) Figure 3-13: Zoomed in plot of output waveforms corresponding to Fig. 3-11 66 -- - 3.3.3 Simulation Measurements The last section presented and explained the simulation output waveforms. This section looks at how the circuit performs in terms of power, speed, and size. Total power consumption breaks down into two parts. Static power refers to power dissipated continuously due to DC bias currents. Dynamic, or switching, power occurs due to the charging and discharging of capacitors during operation, so it depends on frequency (Eq. 3.2). The sum of the two gives total power dissipation, as shown in Eq. 3.1. Ptotal = Ptotal Pstatic + Pdynamic + /3 fCLK = (3.1) (3.2) Static power dissipation can be calculated by multiplying the supply voltage times the total DC bias current. The receiver uses three legs of bias current, each supplying the same amount of current, IBIAS. The reference circuit, on the other hand, uses two units of bias current plus one unit of photocurrent. Remember that light continually shines on the photodiode. This produces a DC current, so it must be included in static power calculations. Equations 3.4 and 3.6 give calculated values for static power consumption. Preceiver-static = VDD( Preceiver-static = 270 3 IBIAS) uW (3-3) (3.4) Preference-static = VDD(2IBIAS + IPHOTO) (3.5) Preference-static 198 UW (3.6) = Figure 3-14 plots simulated values of power dissipation for frequencies ranging from 750 MHz to 1.25 GHz. As expected, power increases linearly with frequency. Running a linear regression using least squares gives the dotted line in Fig. 3-14, and values for a and 3 in Eq. 3.2. The reference circuit dissipates only static power, so it does not depend on frequency. Equations 3.7 and 3.8 give extracted values for power 67 dissipation in the receiver and reference circuit as a function of frequency. 314 - Measured Values Fitted Values 312-310308- 306- . 3040 a-302 -300 -298LL 0.75 0.8 0.85 I1 LL 0.95 1.05 Frequency (GHz) 0.9 1.1 1.15 1.2 1.25 Figure 3-14: Power dissipation of receiver circuit as a function of clock frequency Preceiver (uW) Preference(UW) = 279.456 + 26.433fcLK(GHz) (3.7) = 196.454 (3.8) Simulated values for static power dissipation match closely with the calculated values. The constant term in Eq. 3.7 differs from the calculated value by 9 uW, while the measured value in Eq. 3.8 differs from the calculated value by only 2 uW. It helps to know how much the reference circuit affects power and area consumption for a whole data bus. The average power dissipation per bit, as shown in Fig. 3-15, consists of the power dissipation of the receiver circuits plus the power dissipation of a single reference circuit averaged over the number of bits. As the number of bits becomes large, the average power dissipation approaches the power dissipation of a single receiver circuit. Figure 3-16 shows a similar plot for receiver and reference circuit areas. Each individual receiver circuit takes up an area of 133.56 um 2 , and each reference circuit takes up 74.20 um 2 . However, as the size of the data bus increases, the cost of the reference is averaged out over all the bits. 68 - (K I Avg Receiver Power Avg Reference Power Total Avg Bit Power 400 C 0 -~--- - ca 300 - - - - --- - . - - - - (200 0. 0 a S100 ........... 0C 0 10 . . .. . . . . . . . . . . . . . . . .. 20 30 Number of Bits . . . . . . . . . .I. . . . . . . . . . . . . . . 40 50 60 Figure 3-15: Average power dissipation at 1.0 GHz for data bus of varying size 200 - - Avg Receiver Area Reference .Avg Area Total Avg Bit Area s 150 "E Z a 100 -------------~ '----------- --- -- +------- - --- - .- 50 0 0 10 20 30 Number of Bits 40 50 60 Figure 3-16: Average die area per bit for data bus of varying size For variation robustness, a designer might choose to have multiple reference circuits for a large data bus. In this case, Figs. 3-15 and 3-16 become very important because they clearly demonstrate the average cost of a reference circuit given a certain number of bits. The final, and most important parameter of interest is evaluation speed. Variation analysis in Ch. 4 deals predominantly with how variation affects the waveforms on VIN and VREF, the most sensitive nodes in the circuit. Hence, the most useful definition of evaluation speed should measure whether the circuit accurately compares VREF, not whether the circuit produces glitch free output. 69 IN and ------- U -- ~-~---,~ Therefore, evaluation speed is defined as the maximum clock frequency for which all bits are evaluated correctly. In other words, the circuit makes a correct decision. The nodes of interest in this measurement are IN1 and IN2. 1.8 - 1.6 ~ IN1 -- IN2 -- VDD&GND 1.41.2- 0 0.2 -- 10 12 16 14 18 20 22 time (ns) Figure 3-17: Example of evaluation speed definition: Cycle A evaluates correctly but will not produce a satisfactory output. Cycle B does not evaluate correctly. For example, take the plot in Fig. 3-17. Despite the messy waveforms, cycle A "evaluates" correctly. Nodes Q and /Q will not display satisfactory outputs, but the circuit begins evaluating in the right direction, indicating that VIN and VREF established a correct differential across the inputs of the sense amplifier. Therefore, A is considered a correct evaluation. Cycle B, on the other hand, evaluates the wrong way, indicating an incorrect differential. Evaluation speed is the maximum clock frequency for which no errors (like cycle B) occur on any bit. Under the operating point established in Sec. 3.2, the receiver circuit reaches an evaluation speed of 2.000 GHz. Table 3.4 summarizes the bias point and performance parameters discussed in this chapter. 70 Bias Parameters Latch W/L Current Mirrors W/L Current Sources W/L IBIAS Simulation Results 0.5 um / 0.18 um 3.0 unm / 0.18 um 1.0 unm / 0.36 um 50 uA Simulation IPHOTO 10 uA Simulation C Receiver Power Dissipation (1.0 GHz) Reference Power Dissipation Receiver Area Reference Area Maximum Evaluation Speed 100 fF 305.74 uW 196.45 uW 133.56 urn2 74.20 um 2 2.000 GHz Table 3.4: Summary of bias and performance characteristics in 0.18 um CMOS 71 72 Chapter 4 Variation Analysis A practical design must not only function correctly, but do so despite a myriad of variations in the environment and the process. There are two types of variation. First, uniform variations can occur throughout the chip for a variety of reasons. Resistive drops in the package can decrease the effective internal supply voltage. Slight process errors might increase all drawn dimensions by a small amount. The temperature of the operating environment can vary tremendously from the test environment, and so on. Second, differential variation can seriously affect circuits which rely heavily on transistor matching, including the receiver circuit presented in this thesis. If the two sides of a sense amplifier vary non-uniformly, the circuit becomes mismatched, possibly causing incorrect operation. This chapter begins by discussing how variation affects circuit operation, followed by a presentation of simulated results for uniform and differential variation, respectively. The last section summarizes key points, and suggests some ways to design around variation. 4.1 Receiver Variation Overview Chapter 2 talks about the benefits of a fast time constant on VIN- This section expands on that by trying to quantify the differential across IN1 and IN2 as a function 73 U- -- of the time constant, T, clock period, T, and reference voltage level, x. Voltage VIN controls the gate of M11, and therefore the current into IN1. Likewise, VREF supplies the gate voltage of M14, and determines the current into IN2. The total differential across IN1 and IN2 equals the net charge divided by the capacitance. Since current expresses the rate of charge flow, the change in differential, AViff , is proportional to the difference in currents (Eq. 4.1), which in turn is proportional to the difference in VIN and VREF (Eq. 4.2). AViff oc (IIN1 AVdiff OC (VIN Vdiff IIN2) (4.1) VREF) (4.2) - - c J(VIN - VREF) dt (4.3) In other words, the quantity (VIN - VREF) determines the rate and sign of the net charge flow onto the input nodes. Integrating this quantity over time gives the total differential, as shown in Eq. 4.3. PHOTO AR IN -- ~*~~REF Figure 4-1: VIN vs- VREF with a slow time constant For example, look at the second clock cycle in Fig. 4-1. The integral across the whole clock period equals the area of region B minus the area of region A. In this case, the integral equals zero, so no differential builds up across the input nodes. However, with a faster time constant (or longer clock period), region B would be bigger than region A, resulting in a net positive differential. 74 '12 uflm zM L -~ - Figure 4-2 shows a more realistic plot of VIN for a period of 4T. The voltage swing between VLO and VHI is normalized to 1.0, such that VLO VHI 0-0, VREF = 0.5, and = = 1.0. The "normalized differential" in the bottom graph refers to the geometric area of the dark shaded region in the top graph minus the light shaded region. 1 0) - 0.8 REF .- 0 0.4 .. . . . 00.2 .IN z 01 0 0.02 0) - -- > 0.6 - . .. . - -.-.-.- .. V-- 0.1 -. .. . . .. -.. 0.4 0.3 Time (ns) 0.2 0.5 0.6 0.5 0.6 .-.-.-. .-.-.: .-..- .-.-. .-. Total Differential: 0- 0 0 0.1 0.3 Time (ns) 0.2 0.4 Figure 4-2: Exponential VIN waveform discharging across VREF (top) and corresponding plot of total differential as a function of time (bottom) The graphs in Fig. 4-2 operate as follows. At some point in time, the input switches low after a large number of high bits. As Ch. 3 explains, this constitutes the worst case scenario for evaluating a low bit. Voltage VIN has previously settled to VHI, but now begins discharging exponentially with time constant, r. Since (VIN - VREF) > 0, a positive differential accumu- lates on the input nodes. This differential increases until (VIN - VREF) = 0, at which point the total differential curve in Fig. 4-2 reaches a local maximum. After VIN crosses VREF, the differential begins decreasing because (VIN-VREF) < 0- At some point, the differential returns to zero, indicating that the (incorrect) positive differential has been entirely erased. After that point, VIN remains below VREF and the differential continues decreasing. 75 Chapter 3 defines evaluation speed as the maximum clock frequency for which all bits are evaluated correctly (page 70). This requires only that the circuit make a correct decision, not produce usable output. However, defining evaluation speed this way makes it useful for measuring the calibration between VIN and VREF, a critical metric for variation analysis. In fact, making a correct decision only requires that a correct differential build up across IN1 and IN2. In Fig. 4-2, the circuit can evaluate correctly any time after the zero crossing of total differential. Therefore, by definition, the zero crossing of the total differential curve determines evaluation speed. A differential builds up during the first half of the clock period, and the sense amplifier evaluates it during the second half. So one half of the clock period, T must be greater than teval-min, the minimum time required to establish a correct differential. Of course, teval-min is the zero crossing of Vdiff(t), the total differential as a function of time (Eq. 4.4). This relationship between clock period and teval-min translates into the expression for maximum evaluation speed in Eq. 4.5. Vdiff(teval-min) = feval-max = (4.4) 0 1 Integrating (VIN - (4.5) teval-min over time provides an analytic expression for VREF) which can be solved to find 2 feval-max. Vdiff (t), The following derivations continue to ignore constant terms, instead normalizing the voltages to one. Equations 4.6 and 4.7 give expressions for VIN and VIN VREF as a function of time for a low optical input (assuming starts at VHI). VIN = VREF = (4.6) X0 (4-7) < X <x ) Equation 4.8 expresses total differential as an integral of (VIN - VREF), ignoring constant terms. Substituting Eqs. 4.6 and 4.7 into this expression and integrating 76 from 0 to I gives the expression for Vdiff(t) in Eq. 4.11. j VIN() VREF(t') Vdiff(t) = Vdiff(t) = Vdiff(t) = -Te-t'/' Vdif (t) = T(1 - e-t/T) - (4-8) dt' (4.9) x dt' e-t' I - xt' - I (4.10) xt (4.11) Notice that Eq. 4.11 expresses Vdiff in terms of x, the value of the reference voltage. Ideally, x = 0.5, corresponding to a value of VHI+VLO. VREF However, in real life, can move around due to variations, making either high or low bits evaluate more slowly. Figure 4-3 plots total differential as a function of time on high to low transitions for VREF levels ranging from 0.25 (vH'rVLO) to 0.75 (3(VHI VLO)). 0.08 0.25 0.06 - - 0.02 -- --- -- Z -0.04 - - -0 .0 4 ... 0 .V 0.2 - - 0.75 - - -.-.--.-.-.-- -.-.- ---.-.-.- 0.3 0.5 - -- - 0.1 REF= - - - - 0 0 0 -0 .02 - -REF -.--.-.. - _ REF - - 0.4 Time (ns) - - 0.5 0.6 0.7 Figure 4-3: Total differential for several different values of VREF Finally, given the time constant, one can set Eq. 4.11 equal to zero and find teval-min. Chapter 3 calculates a T of about 0.16 ns (page 57). Solving Eq. 4.4 with this value (and x = 0.5) gives teval-min = 0.2546 ns, corresponding to a maximum evaluation speed of 1.964 GHz. This compares favorably with the simulated value of 2.000 GHz from Ch. 3. Note that these calculations come out correctly without the constant scalars. 77 Adding a constant in front of Eq. 4.11 changes its magnitude, but not the location of its roots. So, normalizing everything to one does not affect the outcome of the calculations. As a final illustration, Fig. 4-4 plots VIN, VREF, and total differential using the value of T calculated in Ch. 3. The term teval represents three possible times at which the circuit could evaluate. In other words, the clock goes high at teval. In the top panel, teval comes before the zero crossing, so circuit (incorrectly) evaluates a high bit. In the second panel, Vdiff(teval) teval > 0. The occurs exactly at the zero crossing of Vdiff(t), namely 0.2546 ns. In this case, no differential exists; the circuit produces an indeterminate (and unpredictable) result. In the third panel, teval occurs after the zero crossing. The circuit correctly evaluates a low bit. The results in this section apply to both high and low input transitions. Due to the symmetric nature of the charge up and charge down waveforms, teval-min is the same in both cases, assuming x = 0.5. 78 1I ..................................................... COM~~ 0 .. . . .. >0.6 VI - evalq 0. . . . ... .. . . V REF. 1... .. (D N .. . . . . . ... . .I ZO .2 0 0 0.6 0.5 0.4 0.3 0.2 0.1 1 IN I 'D 0.4 . 0 . . . . . . . . . . . . ... . ... . .. .. . .. . . .. 0 0. 4 012030.4 .. . .. . .5..0.6. . .. .. .. . . . . .. . . . . . . E E 0 0 .4 . .. . .I.. . . .. .... .. .. .. . .. .. . .. . 0 0.1 0 M 1 0.03.............. 0.2 1 0.3 ..................................... I 0.0......... 0 .6 0.5 0.4 Tota .......... ... ............. .. .. .. . . . ... . . -0.01.... 0- 0.2................... . . . . . . . . . . . . . . . . . . ... . . . . .. Df etal ... ....... ......... ............... ............... 01.1 -0.03 zero crossin the.... (tea...n Total ifferential a he ifrn vlaintms eoe n Figure 4-4: of tota.diferetia .0 . . ... 0....... 9 I... ...... .. n fe 4.2 Uniform Variation Results Uniform variation means the same parameter changes by the same amount everywhere in the circuit. This section first discusses how evaluation speed changes with photocurrent and photodiode capacitance. These parameters determine speed over a much larger frequency range, so they act more like design parameters than variation sources. After discussing photocurrent and capacitance, Secs. 4.2.3 to 4.2.6 focus on the four variation sources originally outlined by Sam: supply voltage, channel length, temperature, and threshold voltage [12]. 4.2.1 Photodiode Capacitance Photodiode capacitance determines the time constant on the input node. Increasing the capacitance by a factor of ten increases T by ten, which should decrease evaluation speed by a factor of ten. '4. '02.5 . . .. .... . . . . .. . . . . . . . .. .. .. .. .. . .. ... ..... . .. .. . . . . . . . . ... . . . . .. . ....... a) 0) C 0 02 . .. .. .. . . .. .. .. . .. .. ... .. ..... .. .. .... ... .. . ..... .. . .. . . .. .. .. ..... .. . . E 1.5 CR 40 60 80 100 120 140 160 Photodiode Capacitance (fF) 180 200 220 240 Figure 4-5: Evaluation speed as a function of photodiode capacitance Figure 4-5 shows the simulated frequency versus capacitance graph for values of C, ranging from 25 fF to 250 f F. The curve is slightly convex to the origin because of the inverse relationship between capacitance and evaluation speed. However, at 80 lower values of Cj, parasitic capacitance in the transistors becomes significant, and the inverse relationship breaks down. Photocurrent 4.2.2 Changes in input current only scale the magnitude of the input waveforms, not the time dependence. So although scaling down the input photocurrent degrades overall circuit performance due to smaller differentials, it should not affect evaluation speed, which only requires making a correct decision. However, in reality photocurrent does affect evaluation speed, for two reasons. First, the changing photocurrent causes small fluctuations in gm10, and therefore the time constant. More importantly, the simulated waveforms have a lot of switching noise on them (see Fig. 3-9, page 63). Increasing optical power enlarges the voltage swing on VIN, making noise less significant. E 0.5 2 4 6 12 14 8 10 Input and Reference Photocurrent (uA) 16 18 20 Figure 4-6: Evaluation speed as a function of detector photocurrent Figure 4-6 shows how evaluation speed changes with photocurrent. At high current levels, additional increases provide only a marginal advantage (by increasing gm10). At low current levels, evaluation speed starts dropping off quickly because switching noise begins to dominate. 81 Remember that this analysis assumes the reference circuit and data bits both receive the same amount of photocurrent. Section 4.3.4 looks at what happens when the currents are mismatched. 4.2.3 Channel Length Variation In general, designers use the minimum value for channel length. This makes length particularly susceptible to variation because it is generally the smallest drawn dimension on the chip. Figure 4-7 plots percent changes in evaluation speed for increases in length ranging from 0% to 40%. Unfortunately, TSMC does not provide models for lengths less than 0.18 um, restricting simulations to one-sided variation. Note that a 5% increase in length corresponds to an increase of 0.009 um in all transistors, even those with lengths greater than 0.18 um. 40 - 30 ...... U) ........................ 0...................................... a) 0 ......... 1- - . -..-. ....... ..........-- Measured Values Fitted Values ...... - .-.-- . - - .-.-. . . -.-.- . . -. - -.-.--.- Ql). CO o . 0 10 ............................ -40 0 - . 0 .... 5 ....... .... ............................... 2..2..3.....4 1. 10 20 15 25 30 % Change Channel Length (based on minimum L = 0.18 um) 35 40 Figure 4-7: Changes in evaluation speed as a function of channel length variation Increasing gate length has two effects. First, the transconductance decreases, effectively making transistors slower. Second, it increases gate capacitance, which also tends to make circuits slower. For these reasons, the speed of digital circuits usually depends heavily on gate length. 82 Data receiver performance, on the other hand, is limited by the photodiode capacitance, not parasitics. So increasing gate length in a uniform manner causes only slight performance degradation, mostly because gmio goes down. Frequency and channel length exhibit an approximately linear relationship. A least squares regression produces the dashed line in Fig. 4-7, and the expression in Eq. 4.12. = A feval-max(%) 4.2.4 (4.12) -1.4670 - 0.1592AL(%) Temperature During normal operation, environmental temperature can vary over a wide range. Furthermore, depending on the kind of circuitry surrounding the receiver, local temperature can greatly exceed that introduced by the environment alone. Modern chips give off a tremendous amount of heat, and dense clusters of logic running at high speeds can exhibit hot spots in excess of 75 degrees Celsius [3]. 40 -Measured Values - - Fitted Values 30 ..-..------ 0 10 - - - - -7 -0)20- -40 -20 -15 -10 0 -5 5 10 15 20 % Change in Temperature (from 300 K) Figure 4-8: Changes in evaluation speed as a function of temperature variation Therefore, it is extremely desirable that the circuit be resistant to temperature variation over a wide range. Unlike channel length and other sources, where a 20% or 40% variation is unlikely to ever occur, a circuit might reasonably be expected 83 to perform over the entire range of temperatures in Fig. 4-8 (ranging from -33.15 to 86.75 degrees Celsius). Changing temperature affects circuit performance in a variety of ways. Most notably, transconductance depends strongly on temperature. As temperature decreases, 9mio increases, and T decreases, boosting evaluation speed. Evaluation speed varies approximately linearly with temperature, as given by the regression in Eq. 4.13. Afeval-max(%) 4.2.5 = -1.1952 - 0.3531A T(%) (4.13) Threshold Voltage For most digital circuits, increasing threshold voltage slows down the logic because it reduces the effective gate overdrive, (VGS - VT). In analog circuits, bias currents usually set the small signal parameters, making performance more or less independent of threshold voltage. However, Fig. 4-9 shows a rather peculiar relationship between frequency and threshold voltage. Note that an "increase" in threshold voltage really refers to an increase in magnitude, since VTP is negative for PMOS devices. These measurements sweep AVT from -0.1 V to +0.1 V, which comes out to approximately plus or minus 20% for VT on the order of 0.5 V. Two things stand out in Fig. 4-9. First, the relationship is positive rather than negative. Increasing threshold voltage actually increases speed. Second, the relationship is quite strong. Evaluation speed changes by nearly 30% as threshold voltage sweeps a range of 40%. Although many factors might contribute to this strange relationship, it appears to occur mainly because of reduced swing on IN1 and IN2. For instance, if threshold voltage increases to 0.6 V, then IN1 and IN2 precharge to 0.6 V rather than 0.5 V. Likewise, when they charge up to (VDD - VTN), they only charge up to 1.2 V rather than 1.3 V 1 . 'The actual values are somewhat lower due to the backgate effect 84 40 - 30 -.-- Measured Values Fitted Values CD, 210......... C 0 -20 -15 -10 -5 0 5 Approximate % Change |VTNI and |VpI 10 15 20 Figure 4-9; Changes in evaluation speed as a function of threshold voltage variation (Numbers refer to magnitude of VTN and VTP) The reduced swing on INi and IN2 translates into less switching noise on VIN and VREF, and thus improved performance. Equation 4.14 gives a linear regression model for evaluation speed as a function of threshold voltage variation. Z 4.2.6 fevai-max(%) = -2.9991 + O.6995ZAjVTN,pI(%) (4.14) Supply Voltage Supply voltage varies for any number of reasons. For example, external power supplies might not provide exactly 1.8 V, or the value might vary with temperature. Furthermore, IR drops in the package and interconnect can lead to differences between the VDD the circuit observes and that measured on the outside. In most digital circuits, increasing supply voltage increases speed. A high logical bit takes on the value of VDD, which acts as the VGs of the next logic gate. Higher VDD therefore translates into more gate overdrive and more speed. However, once again the receiver circuit displays a counterintuitive trend. In Fig. 4-10, evaluation speed exhibits a strong negative dependence on supply voltage. 85 20 -_ ... --.. ..... .. .. - Measured Values Fitted Values (D (I, 0 -10 -8 -6 -4 -2 0 2 % Change VDD (from 1.8 V) 4 6 8 10 Figure 4-10: Changes in evaluation speed as a function of supply voltage variation The reasoning here closely parallels that described on page 84 for threshold voltage, with one slight twist. Changes in supply voltage affect not only IN1 and IN2, but the clock and reset signals as well. An increase in supply voltage increases the voltage swing on IN1, IN2, CLK, and RST, all of which contribute additional switching noise to VIN. This accounts for the larger slope in Fig. 4-10 than in Fig. 4-9. Equation 4.15 gives a linear regression model for supply voltage variation. Afevar-max(%) 4.2.7 -1.4056 - 1.15 8 3 AVDD(%) (4.15) Summary Table 4.1 summarizes the linear regression models fitted to the variation parameters in the four previous sections. The key term is the linear term, which indicates by what percent evaluation speed changes for a one percent change in the variation parameter. Obviously, supply and threshold voltage variations affect performance the most (largest linear terms). These factors directly affect the amount of switching noise on VIN and VREF, which can disrupt circuit evaluation. Channel length and temperature, on the other hand, do not seem to affect circuit 86 performance much at all. The circuit's ability to function depends on its ability to compare VIN and VREF, upon which channel length and temperature have little effect. Linear Term -0.1592 -0.3531 +0.6995 -1.1583 Constant Term -1.4670 -1.1952 -2.9991 -1.4056 Variation Source Channel Length Temperature Threshold Voltage Supply Voltage Table 4.1: Comparison of regression models for different variation sources Note that Figs. 4-7 to 4-10 all maintain a constant aspect ratio, where the y-axis always has twice the range of the x-axis. This allows one to visually compare the slope of each variation effect across graphs. -E- -A- 451 -*- 40 -e--. ..- . (35 . - - ...... a30' - -. - - ... -.. -. 22 -.----.. W20 1) 0 -40 - -. -. ... - ......... -. -. - .. -.. .. ...... - -30 -.-.-.-.- -20 -. -...... - -.-.-- -. .-.-. -. ......... ---.-.-- -.- -.-- - ... -.-.-. -.-.-.. .... .. -10 0 .... -.. . ... -.... .. -.. -. .... ... ... 10 I Fitted VDD Variation Fitted Threshold Variation Fitted Temperature Variation Fitted Length Variation . 20 30 . .. .-.. .. -.. -.. 40 % Change in Variation Source Figure 4-11: Absolute value of changes in evaluation speed relative to source variation percentage, as given by regression models in Table 4.1 As an additional aid, Fig. 4-11 plots all four models from Table 4.1 on the same graph. The variation source on the x-axis ranges from -40% to +40%, with changes in evaluation speed on the y-axis. Clearly, supply voltage variations impact circuit performance the most, followed by threshold voltage, temperature, and channel length. 87 4.3 Differential Variation Results Differential variation means that components on the input and reference side of the circuit change in different ways. These variations have a much more pronounced effect on performance of the optical data receiver than uniform variations. In Sec. 4.2, a 20% variation in supply voltage results in a 23% change in evaluation speed, whereas channel length only produces a 3% change for the same amount of source variation. In comparison, a differential variation of 20% in channel length can cause up to a 50% decrease in evaluation speed. In this section, channel length serves as a tool to explore how differential variation affects circuit performance. Channel length makes a good barometer of differential variation effects for four reasons. First, the simulations are easy. Second, channel length affects circuit performance in an intuitive way. Third, it allows simulation over variation ranges similar to those used in Sec. 4.2, whereas the circuit simply fails for a 20% differential variation in supply voltage. Finally, some parameters, such as temperature and supply voltage, only change gradually. Temperature simply cannot change by 50 degrees from one side of the receiver to the other (a distance of 13.6 um). In these cases, uniform variation is more relevant. Chapter 2 describes naming conventions for the different parts of the receiver circuit (page 33). Corresponding to those conventions, the next three sections present simulation results for differential variation between transistors in the latch, between the input transistors, and between the input stage and reference circuit. In addition, the fourth section looks at what happens when the input and reference photocurrent do not match. During the following discussions, keep in mind the 3% decrease in evaluation speed due to uniform channel length variation (of 20%). This number serves as a useful benchmark for comparing uniform and differential variation. In almost all cases, differential variation effects far exceed the minor degradation caused by uniform variation. 88 The Latch 4.3.1 The discussion on sizing in Sec. 3.2 claims that differential variation inside the latch has little effect on circuit performance. This remains true compared to differential variation in other parts of the circuit, but evaluation speed can still decrease by 15% or 16% for a 20% asymmetrical variation, compared to only 3% for uniform variation. 10 0 10 0 -20 1010 .10 -0 - -30 0 15 20 15 20 -40 --50 % Change L on Reference Side of Latch % Change L on Input Side of Latch Figure 4-12: Changes in evaluation speed as a function of differential variation between input and reference side of latch Figure 4-12 plots changes in evaluation speed on the z-axis, with the two variation sources on the x-axis and y-axis. The "input" side of the latch consists of all the transistors on the input side (Ml, M3, M5, and M7), and the "reference" side of the latch consists of all the transistors on the reference side (M2, M4, M6, and M8). The reset transistor, M9, does not change. Uniform variation occurs along the diagonal from (0%, 0%) to (+20%, +20%), and has little effect on evaluation speed. Moving away from this line increases differential variation, as lengths change in different proportions. Maximum differential variation occurs at the corners ((0%, +20%) and (+20%, 0%)), where the circuit exhibits decreases in speed due to mismatch. 89 Although the magnitude of the differential across IN1 and IN2 remains the most important ingredient for correct evaluation, mismatch in the latch does create a propensity for evaluating one way or the other. Overcoming this propensity requires a larger differential, which takes more time and slows down the circuit. Input Transistors 4.3.2 As in Ch. 2, the "input transistors" refer to the four transistors connected directly to the inputs of the sense amplifier. Namely, the "left" input transistors, M11 and M13, connect to IN1, and the "right" input transistors, M14 and M15, connect to IN2. 10 10 10 o.0 0 0)0 -1 -10 -1 -0, o> -30- 20 -- -4000 -*--50 ----- 10 15 15 20 %Change L (M11 and M13) 20 1-50 % Change L (M14 and M15) Figure 4-13: Changes in evaluation speed as a function of differential variation between left and right input transistors In Fig. 4-13, evaluation speed decreases roughly 12% along the diagonal as the lengths vary uniformly from 0% to 20%, but drops by nearly 44% at the corners. This happens because differential variation changes the transconductance of the input transistors. As shown in Eqs. 4.16 and 4.17, this changes the scalar terms relating the input currents to VIN and VREF. This has a similar effect to simply scaling VIN and VREF themselves. 90 IIN1 X 9m11IN (4.16) IIN2 CX 9m14VREF (4.17) Increases in length on the reference side make M14 weaker, diluting the effectiveness of VREF. Since VREF is responsible for evaluating low bits, this causes errors on low transitions of the input. Likewise, increases in length on the input side make the input signal weaker, and the circuit fails on high bits. Along the diagonal, the circuit fails on low and high bits at the same speed, indicating that VREF has been appropriately chosen to maximize evaluation speed for both high and low bits. 4.3.3 Input Stage and Reference Circuit The left axis in Fig. 4-14 designates variation in the input stage, namely M10 and M12. Recall that g10 determines the time constant on VIN, and M12 supplies bias current. The reference circuit refers to transistors M16 to M19. Uniform length variations along the diagonal of Fig. 4-14 decrease evaluation speed by less than 8%, but differential variation at the corners drops evaluation speed by nearly 50%, or rather a factor of two. The circuit goes from 2.00 GHz with no variation to 1.02 GHz with 20% variation in the input stage relative to the reference circuit. Differential variation between the input stage and reference circuit hurts for two reasons. First, the variation effectively shifts VIN and VREF relative to one another. Second, the increase in length of M10 changes gmiO. This increases the time constant, which also slows down the circuit. 91 ___________________________ ~1 - 10 10 a -0 10 C 10 --.- -2 > -30 -2-20 -20 -- -400 -0 S-50> 0 --- 5 10 -40 -1 1 01 5 15 20 20 105- 50 % Change L in Reference Circuit % Change L (M10 and M12) Figure 4-14: Changes in evaluation speed as a function of differential variation between input stage and reference circuit 4.3.4 Input and Reference Photocurrent Depending on the location of optical transmitters and receivers, and the nature of the optical signal paths in between, the optical power delivered to each photodiode can vary. The reference circuit operates under the assumption that all bits within the same bus follow similar optical paths and deliver the same amount of power. This section looks at what happens when variations disturb the match. The term "input photocurrent" refers to the maximum steady state current produced by the photodiodes for a high optical signal. In the reference circuit, the diode always produces this maximum value. The input photodiode, on the other hand, swings back and forth between zero and the maximum value, depending on the input. Simulations in Ch. 3 assume a photocurrent of 10 uA. Figure 4-15 shows how evaluation speed changes for small perturbations around this point. Recall that the reference circuit essentially averages the maximum and minimum current values. Thus, changing the reference photocurrent causes significant movement in VREF. 92 ii- -'J~~ - - - -7 - ---- 3- - - - -=--------- 10 10 0 -- C0- -20 01 Maximum Input Photocurrent (uA) Reference Photocurrent (uA) Figure 4-15: Changes in evaluation speed as a function of input and reference photocurrent On the other hand, changing the input photocurrent only affects the voltage swing between VLO and VHI. This makes changes in VREF more or less significant compared to the size of the input waveform. For example, in Fig. 4-15, changes in reference photocurrent are more significant with a small input current, because of the smaller voltage swing on VIN. This accounts for the much lower dip on the righthand side of the diagram. 4.4 Conclusions Of the uniform variations, temperature presents possibly the greatest danger. Specifications might require the chip to perform over huge temperature ranges, whereas 20% variations in other parameters are somewhat unlikely to occur. After temperature, any uniform variation that introduces switching noise on the critical nodes can cause problems. Supply and threshold voltage variation introduce this kind of noise by enlarging the "digital" waveforms in the circuit. Reduced swing on the clock and reset signals could potentially alleviate some of this problem, but at 93 the cost of speed or possibly power. Of course, increasing photocurrent or decreasing diode capacitance can always offset the effects of uniform variation. In terms of differential variation, the input stage and reference circuit present the biggest problems, along with mismatches in photocurrent between the input and reference photodiodes. Variation between the input transistors also causes a fairly serious degradation in speed, while variation in the latch has little effect. Obviously, differential variation presents the greatest challenge in the design of an optical data receiver circuit. However, this is not actually new information. Designers have always been aware of the need to accurately match components in sense amplifiers. Good layout can go a long way to improve transistor matching. Beyond that, scaling up transistor dimensions makes them more resistant to geometric variations. If the distance between the receiver and reference circuits becomes a problem, then introducing multiple reference bits might help. For instance, using a reference bit for every bank of 32 data bits ensures that no receiver is farther than 16 bits away from a reference circuit. Alternatively, if current matching becomes an issue, a designer could install multiple reference circuits in parallel. Averaging several optical reference signals, instead of just one, reduces deviation from the true mean. In the end, a designer must determine an acceptable set of performance specifications, and make the sacrifices necessary to achieve them. Luckily, sense amplifiers have been around for a long time, so a wealth of information on dealing with them already exists. 94 Chapter 5 Test Chip In order for the data receiver circuit to function properly, incoming optical data must be synchronized with the receiver clock. In a completely integrated solution, the same clock drives optical transmitters and receivers, automatically enforcing this synchronicity requirement. However, the test chip design uses free space illumination from an external laser to test functionality of the receiver alone. This presents the challenge of synchronizing output from an external laser with the internal test chip clock. Figure 5-1 shows a testing strategy to overcome this constraint. An on-chip "data generator" stores test data to drive the laser. Using the two input signals, Program and Data In, one can manually program a test pattern into the generator. Driving both the receiver and the generator with the same clock circuitry ensures synchronization between the two. This chapter discusses operation of the components in Fig. 5-1 one by one, starting with general digital building blocks, the data generator, and the receiver circuitry. Section 5.4 discusses clock distribution hardware, which includes a phase-locked loop (PLL). Stabilizing this loop can prove difficult, so Sec. 5.5 outlines PLL design in more detail. The last section summarizes the testing strategy, and discusses a test chip submitted for fabrication in December, 2001. 95 On-chip Components Program Data In Data Generator _ _ __ SClock SCircuitry CLK Select CLK VDD VDLa Ofi O f-chip L ser ' Receiver IPHOTOREF Circuitry ____ 'BIAS * Data LJUL Figure 5-1: General testing strategy for data receiver circuit 5.1 Building Blocks This section describes several key digital components. First, Fig. 5-2 shows the schematic of a "D" flip-flop (DFF). This common implementation for a DFF dissipates little power, allows for simple layout, and runs comfortably at speeds in the gigahertz range. Note that Q resolves when CLK goes high, making this a rising edge flip-flop. VDD VDD /CLKj CLKj D- Q CLK /CLK-- Ij Figure 5-2: Schematic of DFF used in testchip Next, look at the two input multiplexor (MUX) in Fig. 5-3. A high input on SEL 96 turns on the top transmission gate, and a low input turns on the bottom one. The use of both an NFET and a PFET in the transmission gate ensures rail-to-rail swing on the output. IN1 SEL - -OUT INO - Figure 5-3: Schematic of two input mux using transmission gates Finally, Fig. 5-4 shows the schematic of a first-in, first-out (FIFO) buffer consisting of 63 "D" flip-flops. The data generator and receiver circuit both use this buffer to store sequential data. Section 5.4 explains the choice of 63 bits during the discussion of clock distribution. 0 DO DQ -D 1 61 2 Q DQ - DQ -DQ 62 -062 CLK Figure 5-4: Schematic of FIFO buffer 5.2 Data Generator The data generator stores arbitrary bit patterns programmed by the user, and generates a synchronized signal to drive the laser. As shown in Fig. 5-5, building such a structure requires only a MUX and a FIFO buffer. 97 PRG (Program) 0 Data InOu -D 162 Q -D Q ----- D Q - + Data Out CLK Figure 5-5: Data generator for driving off-chip laser source Setting PRG high breaks the loop, and the buffer begins accepting external data. This constitutes the programming phase. Each time CLK clicks high, the first flip-flop stores the value on Data In, and all the flip-flops shift their values to the right by one. Doing this 63 times programs the entire array. A low value on PRG closes the loop, causing the bit pattern to continually cycle. Thus, the data generator drives the off-chip laser with a periodic sequence of 63 pre-programmed data bits. This synchronizes optical input with the on-chip clock because the same clock signal drives both the data generator and the receiver circuit. 5.3 Receiver Test Circuitry The receiver test circuitry captures and stores receiver output for later use. In other words, a tester can program an arbitrary bit pattern into the data generator at low speed, run the chip at high speed, and then read the results later at low speed. This storage mechanism also allows a person to take multiple output samples and calculate bit error rates. For example, consider a bit error that occurs 10% of the time. To the naked eye, the waveforms appear correct because 90% of the transitions look right, but one out of ten individual samples should exhibit the bit error. By storing many different output samples for the same input, one can speculate on how often these errors occur. 98 RUN/HOLD D Q VDD CLK --- Input CLK RST 62 0 Optical , / a Diode In CLKK L> __ Data Receiver CLK RST -0G ----- -D D- Q Data ut &Dt u CLK VP VREF VDD Reference REF Circuit PHOTOREF IPHOTOREF V- BIAS Figure 5-6: Receiver test circuitry using FIFO buffer As shown in Fig. 5-6, this data storage mechanism closely resembles the data generator from Sec. 5.2. To understand this diagram, first look at the receiver and reference circuit blocks. An off-chip current source, IBIAS, sinks current from M1, which biases Vp. A second off-chip current source, IPHOTOREF, provides reference photocurrent in lieu of a second photodiode. A current input replaces the reference photodiode for two reasons. First, one cannot expect the same kind of match between two discrete, off-chip laser sources as from two laser diodes built next to each other on the same chip. A reference optical path works because it mirrors a data bit exactly. However, when the lasers exist in completely different packages, this mirroring breaks down, and the reference optical path no longer makes sense. Second, an off-chip photocurrent reference allows greater versatility in testing. Subtle adjustments in the reference current can tweak circuit performance and provide insight into circuit operation inside the chip. Also, by finding the reference current level that maximizes evaluation speed for high and low transitions, one can estimate how much current the input photodiode produces 1 . In addition, a tester can conduct 'The optimum reference current should be exactly half of the maximum input photocurrent 99 primitive variation analysis by varying the reference current up and down by small amounts while measuring evaluation speed. Next, look at the outputs of the data receiver. Only one of the outputs is needed to verify functionality, but both must see the exact same load. Once again, any mismatch between the sides of the receiver can cause bit errors, so if Q drives a flipflop, then /Q must drive a flip-flop as well. One of them hangs unconnected, and the other drives a buffer similar to the data generator. A high select signal on the MUX causes the circuit to "run," and data streams from the receiver into the FIFO buffer. When the select signal goes low, the MUX switches to feedback and no new data flows into the buffer. Instead, it cycles repeatedly through 63 previously stored bits, effectively "holding" the data. 5.4 Clock Distribution Figure 5-7 illustrates a means for supplying both low and high frequency clock signals to the test chip. Setting CLK Select low allows high frequency clock multiplication using a phase-locked loop (PLL), while setting it high passes the external clock input directly to the on-chip components. CLK Select CLK PLL _ 1 On-chip Clock Figure 5-7: Multiplexing external and internal clock signals Normally, one uses the low frequency clock for programming the data generator and reading information from the receiver buffer at low speeds. However, some offchip clock sources might be able to drive the internal chip circuitry directly. In other words, while the PLL can only provide a high frequency clock, the external input 100 can potentially provide both. The rest of this section discusses issues related to clock distribution, while Sec. 5.5 describes the design of the PLL. Figure 5-8 illustrates the clock distribution scheme. The output of the multiplexor from Fig. 5-7 drives the input of a three layer tree. Each level of the tree branches four times, for a total of 43, or 64 leaves. /CLK (Input) 63 nodes driving 63 flip-flops CLK (Flip-flop) A 64th node drives receiver /CLK (Receiver) RST Figure 5-8: Clock distribution Sixty-three leaves of the tree drive flip-flops, while the last node drives the receiver circuit itself. As a result, the flip-flops latch at the same time or right before the receiver circuit resets. This ensures that the flip-flops latch the evaluated signal from the previous cycle, not the reset value of the receiver. Figure 5-9 illustrates these timing constraints more clearly. Inverter Delay /CLK (Input) NOR Gate Delay CLK (Flip-flop) (2x Inverter Delay) Node "A" /CLK (Receiver) RST Figure 5-9: Clock distribution waveforms (corresponding to Fig. 5-8) 101 Each unit of time in Fig. 5-9 represents one inverter delay, while the NOR gate takes approximately two inverter delays. Three delays after the input goes low, the flip-flop clock goes high, followed by node "A" going low. Two time periods after that, the receiver clock and reset signals switch at about the same time. This gives the flip-flops approximately three inverter delays to latch a signal before the receiver begins resetting. In simulation, this provides sufficient time. Notice that an inverted clock signal (/CLK) drives the receiver circuit. The receiver resets and begins a new evaluation cycle on the falling edge of its input clock, essentially making it a "falling edge" element. However, the rest of the test circuitry operates on the rising edge of the clock. To reconcile this, the clock tree drives the receiver with an inverted clock signal. 5.5 PLL Design A phase-locked loop provides an on-chip means of multiplying the input clock frequency. In particular, the test chip uses the PLL architecture in Fig. 5-10, which multiplies the input clock signal by 64. Essentially, the loop tries to lock the phases of fIN and fFB, which requires locking their frequencies as well. This forces fouT to run 64 times faster than fFB and the input, effectively multiplying the frequency. The loop operates as follows. A phase-frequency detector (PFD) senses the phase difference between the input and feedback signals, and encodes the result in a pair of pulse trains on UP and /DN. These signals drive a charge pump, which sinks or sources current into the loop filter, Z(s). The output voltage of this filter represents the relative phase difference between the input and feedback signals, which then drives a voltage-controlled oscillator (VCO). For example, consider a phase increase. The PFD detects the change in phase and increases the density of the "up" train relative to the "down" train. The charge pump turns this into a current, and the loop filter takes the average value. Due to the increased density of "up" pulses, the average value increases, and the VCO increases frequency slightly, allowing the feedback signal to catch up to the input in phase. 102 Off-chip Loop Filter z(S) Charge Pump IN fFB OUT Divide by 64 Figure 5-10: Phase-locked loop block diagram The test chip implementation uses an off-chip loop filter, along with external pins to control charge pump gain and VCO offset. This allows a tremendous amount of versatility, but also introduces a lot of variables. The following four sections describe the implementation of each element in the loop and introduce their transfer characteristics. Section 5.5.5 discusses stability concerns, and gives an example of how to stabilize the PLL for operation at a gigahertz. 5.5.1 Phase-Frequency Detector A phase-frequency detector (PFD) differs from a simple phase detector in that it can track frequencies over a wide range as well as measuring phase difference for signals of similar frequencies. For example, an XOR gate can detect phase for two signals at the same frequency, but a great number of different frequency combinations produce the same output patterns. Thus an XOR gate is a phase detector, but not a phase-frequency detector. On the other hand, consider the PFD in Fig. 5-11. When "up" signal turns on, and when fFB > fIN 103 fFB < fIN, only the only the "down" signal turns on. As a result, the loop always gravitates toward the center frequency, where the PFD acts predominantly as a phase detector [11]. fIN CLK 0 I>-UP Q DN DFF w / RST RST RST DFF w/RST fFB CLK Figure 5-11: Phase-frequency detector using DFF's with reset capability Figure 5-11 does not show inputs for the "D" flip-flops because they are always connected to VDD- In other words, the flip-flops have only two states. When they reset, Q goes to zero, and when the clock goes high, Q goes to one and stays high until reset. Figure 5-12 shows a circuit that implements this behavior [11]. CLK 0 RST Figure 5-12: Flip-flop with reset signal for use in PFD 104 Many sources characterize a PLL in terms of phase input and phase output, in which case the phase detector merely normalizes the phase difference to one. Since phase ranges from zero to 21r, this means dividing by 27r [11] [15]. However, the loop in Fig. 5-10 characterizes the PLL in terms of frequency. In this case, the PFD not only normalizes the phase, but also converts from frequency to phase, equivalent to an integral in the time domain and a pole at zero in the frequency domain. Equation 5.1 first multiplies by 27r to convert from hertz to radians per second, and then divides by s to integrate frequency into phase. However, the PFD normalizes phase to one, so dividing Eq. 5.1 by 27r gives the transfer function for the PFD, as shown in Eq. 5.2. 4<(radians) = GPFD(s) 5.5.2 = _2wr -f(Hz) (5.1) S (5.2) Charge Pump Every time the "up" signal from the PFD turns on (whenever /UP goes low), the charge pump sources current, and every time the "down" signal turns on, the charge pump sinks current, such that the net current out of the charge pump is proportional to the difference between the two pulse trains. The average value of this output current represents the phase difference between fIN and fFB. Figure 5-13 shows the charge pump schematic. In this diagram, inverters Ii and 12 act as power supplies for two current sources, M7 and M8. A low input on /UP "turns on" M7 by connecting it to the positive rail, and a high input on DN "turns on" M8 by connecting it to ground. Transistors M3 to M5 bias the gates of M7 and M8 so they provide an amount of current equal to Icp. However, due to a finite voltage drop in 11, the source of M7 never goes all the way to VDDAdding transistors M1 and M2 compensates for this non-ideality. By mimicing the 105 pull-up PFET inside I1, they ensure that all three PFET's (M3, M4, and M7) have exactly the same gate to source voltage. Transistor M6 provides a similar service for the NFET current mirror. These extra transistors ensure extremely close matching between Icp and the output current. VDD M1 M2 M3 M7 'OUT M5 M8 VDD VDD 'CP M6 F DN - 12 Figure 5-13: Schematic diagram of charge pump Equation 5.3 expresses the output current, IOUT, in terms of UP and DN, the output signals of the PFD. Mathematically, these signals take on a value of either zero or one (where /UP takes on the opposite), consistent with the claim that the PFD normalizes phase difference to one. Therefore, as shown in Eq. 5.4, the charge pump merely converts the output of the PFD into a current, making the gain ICp. 5.5.3 IOUT = ICp(UP - DN) (5.3) Gcp(S) = Icp (5.4) Voltage-Controlled Oscillator Voltage-controlled oscillators tend to introduce non-linearity and high gain into the PLL, making loop stabilization difficult. These constraints have motivated some advanced PLL architectures [11]. Luckily, the test chip can get by with a fairly low performance, ring-oscillator architecture. Varying the propagation delay of individual 106 inverters in the ring changes the frequency of oscillation. For example, consider the current-starved inverter shown in Fig. 5-14. VDD VP M1 M3 OUT INM4 VN M2 Figure 5-14: Current-starved inverter with variable propagation delay This "inverter" cannot source or sink more current than M1 and M2 provide. Changing the voltages VN and Vp changes the current, and therefore the propagation delay through the inverter. Hooking an odd number of these stages together forms a ring oscillator, as shown in Fig. 5-15. Current-Starved Inverter C-S C-S INV INV C-S INV ~' , VDD VP IN '1>& C-S C-S INV INV VN fOUT Figure 5-15: VCO consisting of five current-starved inverters 107 OUT All inverters in the ring share the same control voltages, VN and Vp. The circuitry shown in Fig. 5-16 generates VN and Vp based on an input voltage, VIN, and an offset current, IN- VDD M2 VDD -e- VP IN VIN VN NM M4 Figure 5-16: Control circuitry for VCO: IIN sets offset frequency, and VIN controls output frequency In Fig. 5-16, M1 provides between zero and 50 uA of current as VIN varies between 0.6 V and 1.8 V. For input voltages less than this range, M1 goes into the subthreshold region and acts highly non-linear. In comparison, IIN sinks between 200 uA and 300 uA under normal operation. In other words, IIN sinks most of the current, while VIN causes only small changes. This keeps the gain from VIN to fOUT low, while at the same time allowing a large frequency range of operation by changing the offset, IIN- External tuning also ensures that the VCO can be tweaked to operate at the desired frequency despite process variations (which affect this VCO architecture quite a bit). Figure 5-17 plots VCO frequency as a function of VIN for several different values of of the offset, IIN. The dashed lines in Fig. 5-17 represent fitted linear regressions using a least squares method. Equation 5.5 gives the equations for these lines. fouT(GHz) = 0.7565 + 0.1032VIN IIN = 200uA 0.9010 + 0.0802VIN IIN = 250uA 1.0166 + 0.0590VIN IIN= 300uA 108 (5.5) 1.15 -- Measured Values = 250 uA NI 00.6 08s 1.2 VIN (V) 1 1.4 16e 1.8 Figure 5-17: VCO frequency as a function of control voltage for different offset levels Once again, many sources model the PLL in terms of phase, in which case the VCO acts as an integrator [11] [15]. However, with a frequency output, like the block diagram in Fig. 5-10, the VCO acts as a simple gain stage, and the integrator belongs in the PFD. Equation 5.6 expresses the transfer function of the VCO as a linear gain from VIN to fouT. Varying IIN changes the gain slightly, but it remains on the order of 108 H z/V. J0.1032 GHz/V Gvco(s) = 0.0802 GHz/V 0.0590 G=H2z/V 5.5.4 IN =200u IN = IIN A 250uA (5.6) 300uA Frequency Divider The frequency divider in the feedback path takes fouT from the VCO as an input, and produces fFB as an output. Figure 5-18 shows how to construct a frequency divider using six "toggle" flip-flops. A toggle flip-flop consists of a DFF with inverter feedback, so that the output inverts each time the clock goes high. 109 - D Q D Q -- D Q - -D Q- fOUT AFB Figure 5-18: Frequency divider using toggle flip-flops It takes two clock periods for the output of a toggle flip-flop to cycle from one to zero and back to one again. Thus, each toggle flip-flop divides the frequency by a factor of two, and six stages divide the input frequency by 26, or 64. Equation 5.8 expresses the transfer function of the frequency divider. fFB - GDIV64(s) 5.5.5 = fOUT 64 1 1 (5.7) (5.8) Stabilizing the Loop The previous sections summarized all of the loop components except the loop filter. The output current from the charge pump, IOUT flows into the loop filter, creating a voltage to drive the VCO. Thus, the impedance of the loop filter, Z(s), represents the "gain" between IOUT of the charge pump and VIN of the VCO (Eq. 5.10). VIN(VCO) GLF(S) = IouT(CP)Z(s) = Z(s) (5.9) (5.10) Combining the gain of the loop filter with the gains of the other four stages gives the loop transmission, as shown in Eq. 5.11. To remain stable, the PLL loop transmission must have sufficient phase margin at crossover. The following discussion shows how to stabilize the loop for operation at a gigahertz with approximately 60 degrees of phase margin. 110 L(s) = L(s) GPFD(s)Gcp(s)Z(s)Gvco(s)GDIV64(s) (5.11) 0.0802ICP Z8 4 Z(s) 64s (5.12) Stabilizing the loop consists of five steps. First, choose the VCO offset current. The frequency plots in Fig. 5-17 show that an offset current of 250 uA puts the center frequency right in the middle of the useful input voltage range. From Eq. 5.6, this gives a VCO transfer function of 0.0802 GHz/V. Equation 5.12 combines this value with the transfer functions of the other four elements. Second, find a topology for the loop filter. For example, Fig. 5-19 shows a lead-lag filter commonly used in phase-locked loops. This circuit has a pole at the origin, so L(s) starts with -180 degrees of phase at DC. However, a zero "leads" in before crossover to boost the phase, which goes back down after the "lag" pole kicks in. OUT , + (CP) l S C VIN (VCO) R C2 I_ Figure 5-19: Lead-lag loop filter for PLL Equations 5.13 and 5.14 give expressions for the frequency of the lead zero and the lag pole in radians per second for the circuit in Fig. 5-19. 1 Wlead-zero = R(C 1 + 02) 1 Wlag-pole = 111 RC2 (5.13) (5.14) Third, decide on a crossover point for the loop. For fouT running at a gigahertz, the loop operates 64 times slower (the speed of fFB and fIN), or in other words floop ~ 15.6 MHz. In order to minimize jitter, the crossover of the loop should be at least 100 times slower than floop. Plotting the loop transmission from Eq. 5.12 with a charge pump gain of 20 uA gives a crossover frequency around 106 rad/s, or about 160 kHz. Fourth, calculate component values for the loop filter. In this case, choosing C1 = 1000 pF, C2 = 100 pF, and R = 2.7 kQ places the pole and zero a factor of eleven apart with a maximum phase boost of 56.4 degrees occurring at 1.1167 Mrad/s. Finally, adjust the DC gain elements in the loop to fine tune crossover and achieve maximum phase margin. In this example, increasing the charge pump gain, Icp, to 26.7 uA moves crossover to 1.1164 Mrad/s. Figure 5-20 shows bode plots for the final parameter values. Gm=84.855 dB (at 2.69e+08 rad/sec), Pm=56.442 deg. (at 1.1164e+06 rad/sec) 150 100- 50 - -?-50 -100 CD -130 - a -140 -150-160-170- -180 10' 10' 108 1' 10 10' Frequency (rad/sec) Figure 5-20: Loop transmission bode plots using lead-lag loop filter Table 5.1 compares the important frequencies for this set of parameters. Note that the lead zero occurs a factor of V/_1 lower than crossover, and the lag pole kicks 112 in a factor of v1_ higher. In other words, the maximum phase boost occurs at the geometric mean of the pole and zero. C2 Value 2.7 kQ 1000 pF 100 pF ICP 26.7 uA Parameter R C1 Frequency flead-zero fcrossover flag-pole fIN and floop Value 53.6 kHz 177.8 kHz 589.8 kHz 15.6 MHz fouT 1.0 GHz Table 5.1: Example PLL values using lead-lag loop filter This discussion represents just one example of a stable PLL configuration. With an externally variable charge pump gain, VCO offset, and loop filter, the PLL can be customized to lock over a wide range of target frequencies. 5.6 Testing Summary Figure 5-1 (page 96) gives an overview of the testing strategy. The clock circuitry accepts an external clock signal and either multiplies it using a PLL or feeds it directly through to the internal chip circuitry. This clock network drives both the receiver and the data generator. The data generator cycles through a pre-programmed test pattern to drive the laser. Driving the laser with an on-chip component ensures synchronization between incoming optical data and the local receiver clock. The receiver circuitry consists of the receiver circuit and a buffer. As optical data streams in from the laser, the receiver evaluates each bit and stores the result in a buffer. Flipping a control signal causes the buffer to hold its values for later inspection. An external "photocurrent reference" replaces the reference photodiode for convenience and testing versatility. On December 3, 2001, a test chip was submitted for fabrication. In addition to the optical testing scheme described here, the chip also contains two other types of 113 test circuits. One takes an electrical input, and the other takes a manual (external) input. The "electrical input" comes from the data generator. In addition to driving an off-chip laser, the data generator also controls a charge pump. Based on an external bias current, the charge pump supplies input current to a receiver. On high bits it sources current, and on low bits it does nothing. This input source, combined with a 160 fF capacitor 2 , simulates a virtual photodiode input. The manually driven circuit represents a "bare bones" implementation. All inputs come from external pins, and all outputs go to external pins. The on-chip clock drives this circuit, but everything else comes from outside. All three circuits share the same clock signal, power grids, and bias currents, but each receiver uses its own photocurrent reference. 2 Expected capacitance of test photodiodes 114 Chapter 6 Conclusion This chapter begins by summarizing the contributions of the thesis and then moves to a discussion of how these contributions benefit future designers of optical receiver circuits and other applications. 6.1 Summary This thesis targets the design of a variation robust data receiver circuit for on-chip optical interconnect. However, monolithic integration in a digital CMOS technology presents several unique design challenges. Monolithic integration limits photodetector design to the materials and doping levels available in the process. This can lead to lower efficiency and larger parasitics. Laser diode integration limits the amount of available optical power, which often means small current levels at the photodetector. An effective data receiver circuit circumvents these shortcomings to provide the fastest propagation delay possible while still maintaining low power and area costs. Furthermore, the design must function amidst numerous, inevitable variations in the process and the environment. Luckily, monolithic integration also means synchronization between the transmitter and receiver. Thus, a designer can leverage the presence of a local clock signal as a powerful design tool. 115 This thesis draws upon a previous set of designs that address similar specifications. Many designs of this sort use a latching sense amplifier because it takes advantage of the clock and holds state at the end of each cycle. Making a few modifications to such a circuit significantly enhances performance in the face of large photodiode parasitics. Specifically, adding a current mirror between the photodiode and the sense amplifier input node isolates the large photodiode capacitance from the switching nodes. A specialized reference circuit drives the second input of the sense amplifier so that it correctly evaluates both high and low bits. One extra optical path accompanies each set of data bits to serve as a measure of steady state optical power for the reference circuit. This reference averages the steady state current of a high and low bit to produce a reference voltage that is exactly half of the voltage swing on the photodiode input node. This minimizes propagation delay for both high and low input transitions. A 0.18 um digital CMOS technology provides a testbed for circuit implementation. Increasing transistor width in the current mirrors increases transconductance and speeds up photodiode transients. Transistors acting as current sources are sized at twice the minimum width and length to add bias point stability in the face of process variation. The latch itself exhibits surprising robustness, and operates handily at more than a gigahertz using minimum sized transistors. Effective quantification of receiver performance requires defining "evaluation speed," the maximum frequency for which all input bits are evaluated correctly. This definition measures only whether the sense amplifier makes an accurate logical decision, not whether it produces acceptable logic levels at the output. Therefore, evaluation speed measures mostly the functionality of the input stage and reference circuit. With a photodiode capacitance of 100 fF and an input photocurrent of 10 uA, the circuit achieves an evaluation speed of 2.0 GHz, and provides practical output at frequencies beyond 1.0 GHz. At a gigahertz, each receiver bit dissipates 305.74 uW of power and occupies 133.56 um 2 , while the reference circuit dissipates 196.45 uW of power and occupies 74.20 uM 2 . Process and environmental variations can drastically 116 change these numbers. Uniform channel length and temperature variations over wide ranges produce little effect. Threshold voltage and power supply variations, on the other hand, cause larger changes in evaluation speed, mainly due to the introduction of extra switching noise at the photodiode input. Differential variation degrades performance more than uniform variation. While uniform length variations only decrease evaluation speed by about 3%, asymmetrical variations between the input stage and reference circuit can decrease speed by as much as 50%, or rather cut it in half. The receiver's variation sensitivity makes matching an important concern. Good layout techniques, larger transistors, and multiple reference circuits can all help reduce the impact of process and environmental variations. In the end, the tradeoff is between cost and performance. Adding more hardware increases variation robustness, but consumes chip real estate. Similarly, better photodiodes increase evaluation speed, but require costly process modifications. A designer must balance these tradeoffs to achieve the desired performance given a certain budget constraint. 6.2 Final Thoughts: Contributions This thesis makes two specific contributions, namely using a current mirror to isolate capacitance, and current domain arithmetic to construct a reference circuit. These ideas should be thought of not just in terms of optical receivers, but as generally useful circuit tools. Most digital circuits rely on the ability to switch nodes quickly. The current mirror technique in this thesis provides a means of coupling high capacitance sources into critical nodes without decreasing switching speed. For example, one might use this technique to improve the performance of sense amplifiers in random access memories, where high capacitance bit lines greatly resemble photodiodes. Likewise, current domain arithmetic provides a useful and powerful tool for circuit 117 analysis. This thesis merely uses the technique to average two signals, but Sec. 4.4 hints at the true potential when it suggests averaging more inputs. In reality, one can construct virtually arbitrary expressions consisting of addition, subtraction, and division using only current mirrors and wires. Great things are accomplished in small steps, and the most important things to take away from any work like this are the small contributions that designers can store away in their toolbox and one day use to build something great. Hopefully, the reader can take away from this thesis not only an enhanced knowledge of optical data receivers, but an assemblage of ideas that can transcend the field of optical interconnect given a little ingenuity. 118 Appendix A TSMC 0.18 um Digital CMOS Process Characteristics 119 350V 0.2 = 1.8 v 300- VGS = 1.6V 250VGS = 1.4 V 200- -P150 V GS = 1.0 v 100- 0.8 V vG 50- VGS=0.6 V 0' 0 0.2 0.4 0.6 1 0.8 1.2 1.4 1.6 1.8 VDS (V) Figure A-1: I-V Characteristics for NFET with W = 0.5 um, L = 0.18 um = 0.18 um 700- V Gs = 1.8 V 600 400- = 0GS 500 - 14 G .G ,30 1.6V GS = 1.2 V - 2300- - VGS =0.8 V 100 - vG = 0' 0 0.2 04 0.6 0.8 1 0.6 V 12 1.A 1.6 1.8 VDS (V) Figure A-2: I-V Characteristics for NFET with W = 1.0 um, L 35001VGS = 1.8 V 3000 - vrs = 1.6V 2500 - VGS = 1.4 V -2000 - 01500 V GS = 1.2 V - VGS = 1.0 V 1000 - VGs = 500 - 0.8 V VGS = 0 0.2 0.4 0.6 0.8 1 1.2 0.6 V 1.4 1.6 1.8 VDS (V) Figure A-3: I-V Characteristics for NFET with W = 5.0 um, L = 0.18 um 120 140 120- VsG - 1 V 100- VSG 1.6V -~80- < -9 - 1.4 V_ VSG = 60 60 VsG = 1.2 V 40 - VS = 1.0 20 - 0- 0 Vs = 0.2 0.4 0.6 0.8 1 VSD (V) V 0.8 V 1.2 1.4 1.6 1.8 Figure A-4: I-V Characteristics for PFET with W = 0.5 um, L = 0.18 um 250 VSG 18 200v sG = 1.6V VSG = 1.4 V ;150 - VS0 100 - = 1.2 V _ v SG = 1.0 V 5050 C 0 0.2 0.4 0.6 0.8 1 VS, (V) 1.2 1.4 1.6 1.8 Figure A-5: I-V Characteristics for PFET with W = 1.0 um, L = 0.18 um 1400= 1.8 V v 1200- VsG = 1.6V 1000800- V 600- VsG = 1.2 V 400- vsG = 1.0 v SG = 0.8 V 200 0 0 =1.4V 0.2 0.4 0.6 0.8 VS, (V) 1 1.2 1.4 1.6 1.8 Figure A-6: I-V Characteristics for PFET with W = 5.0 um, L = 0.18 um 121 10 -NFET - -PFET | | 102 $10, .... -.. 10, 10 10 10 Frequency (Hz) 10l 10 le 10 1 10 Figure A-7: Current gain vs. frequency for 0.18 um TSMC process 0.5 1.0 urn urn W=5.0 umn W 0 0C 44 45 46 47 48 49 50 Frequency (GHz) 51 52 53 54 Figure A-8: fT crossover for minimum length NFET -W --.-. = 0.5 urn W=1.Ourm W=5.0urn 010 I 13 14 i i' I 15 16 17 i i 18 19 Frequency (GHz) i 20 - 21 22 23 Figure A-9: fT crossover for minimum length PFET 122 1200 -- 1200 - - -1000 :00000 2 00800 600 600 S400, 20000 0 8 100 5 80 60 200 X4 40 - Bias Current (uA) 0 3 0 Transistor Width (um) 1200.1000 Figure A-10: Transconductance, gm, of NFET with L = 0.18 um 1200 C- 2 : 1200 ,1000 , --1000 -- S800 , S600 , - -800 8 600 S400 , 200 ,- 0 100 80 -5 60 3 40 20 Bias Current (uA) -200 4 1 0 0 2 Transistor Width (urn) Figure A-11: Transconductance, gm,, of NFET with L = 0.36 um 123 0 700 600 700 -- - - 600 , 0 500 2 500400 4300100 8 300 . 200,.s 7 100 Ml- O> 1600 4 60 40 200 0100 3 200 Bias Current (uA) 0 0 Transistor Width (urn) Figure A-12: Transconductance, g,, of PFET with L = 0.18 um 2700 - - - .600 - 7001- 100 500 0 2 500 - 0400 - 4. -.. 400 c 100 - 200 200 100, 805 4 60 -3 40 201 Bias Current (uA) 20 0 0 Transistor Width (umn) Figure A-13: Transconductance, gm,, of PFET with L = 0.36 um 124 -100 Bibliography [1] S. B. Alexander. Optical Communication Receiver Design. SPIE - The International Society for Optical Engineering Press, Bellingham, 1997. [2] K. Ayadi, M. Kuijk, P. Heremans, G. Bickel, G. Borghs, and R. Vounckx. A monolithic optoelectronic receiver in standard 0.7 um CMOS operating at 180 MHz and 176 f J light input energy. IEEE Photonics Technology Letters, 9(1):88-90, 1997. [3] A. P. Chandrakasan. 6.374: Analysis and Design of Digital Integrated Circuits Lecture Notes, Fall 2000. 6.374 is a graduate class at the Massachusetts Institute of Technology. [4] J. A. del Alamo. Integrated Microelectronic Devices: Physics and Modeling. Lecture notes for 6.720J/3.43J at Massachusetts Institute of Technology, August 2000. [5] Taiwan Semiconductor Manufacturing Co., LTD. TSMC 0.18um Logic 1P6M Salicide 1.8V/3.3V Design Rule. Correspondence with MOSIS and TSMC, March 1999. Intellectual Property of TSMC. [6] Taiwan Semiconductor Manufacturing Co., LTD. Technology and manufacturing - 0.18 micron. World Wide Web, April 2002. Information freely available on TSMC's website. [7] H. C. Luan. Ge Photodetectorsfor Si Microphotonics. PhD thesis, Massachusetts Institute of Technology, 2001. 125 [8] A. Lum. An On Chip Low Skew Optical Clock Receiver. Master of Engineering thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2001. [9] D. A. B. Miller. Optical Interconnects to Silicon. IEEE Journal on Selected Topics in Quantum Electronics, 6(6):1312-1317, 2000. [10] R. Ram. Personal communication, 2001. Ram is an Associate Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology. [11] B. Razavi. RF Microelectronics. Prentice Hall PTR, Upper Saddle River New Jersey, 1998. [12] S. L. Sam. Characterization of Optical Interconnects. Master of Science thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2000. [13] M. E. Schaffer and P. A. Mitkas. Smart photodetector array for page-oriented optical memory in 0.35 um CMOS. IEEE Photonics Technology Letters, 10(6):866868, 1998. [14] S. M. Sze. Physics of Semiconductor Devices. John Wiley and Sons, New York, 1981. [15] M. H. Perrott, M. D. Trott, and C. G. Sodini. A general PLL modeling approach for E-A frequency synthesizers. Correspondence: Charles Sodini or Michael Perrott, Massachusetts Institute of Technology, Cambridge, MA. [16] H. Zimmermann. Integrated Silicon Opto-electronics. Springer, Berlin, 2000. 126