Circuits for the Design of a Serial Communication System Utilizing SiGe HBT Technology by Thomas W. Krawczyk Jr. A THESIS SUBMITTED TO THE EXAMINING COMMITTEE OF RENSSELAER POLYTECHNIC INSTITUTE IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY MAJOR SUBJECT: ELECTRICAL ENGINEERING John F. McDonald, Chair Gary Saulnier, Prof. ECSE Kenneth A. Connor, Prof. ECSE Lester Rubenfeld, Prof. Math Donald Millard, Prof. ECSE Rensselaer Polytechnic Institute Troy, New York November 2000 © Copyright 2000 by Thomas W. Krawczyk Jr. All Rights Reserved ii Table of Contents List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1. Introduction & Historical Review . . . . . . . . . . . . . . . . . 1 1.1. Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2. The three chips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3. Project time line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4. State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5. Contribution to the Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5.1. Feed Forward Interpolated VCO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5.2. Transmitter Interleaving Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5.3. Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5.4. Receiver PLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.6. SiGe 5 HP Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.7. Testing Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.8. Document Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2. Serial Communication . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1. Serial Communication Block Diagram . . . . . . . . . . . . . . . . . . . . . . 15 2.2. Transmitter / Multiplexer / Clock Multiplier . . . . . . . . . . . . . . . . . . 16 2.3. Transport Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4. Receiver / Demultiplexer / Clock & Data Recovery . . . . . . . . . . . . 18 2.5. Internal Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6. Support Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 iii 3. Current Starving VCO . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. The need for a VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Simple Current Starving VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 21 22 22 3.4.1. Adjustable Voltage Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.2. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4.3. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4.4. Optimization of Simple CS VCO (post-fabrication). . . . . . . . . . . . . . . . . . . 27 3.5. Current Starving with Feed Forwarding . . . . . . . . . . . . . . . . . . . . . 29 3.5.1. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.2. Testing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.6. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4. Feed Forward Interpolated VCO . . . . . . . . . . . . . . . . . 35 4.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. The Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4. Stage Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5. Circuit Implementation and Analysis . . . . . . . . . . . . . . . . . . . . . . . 35 35 36 40 44 4.5.1. Cascode amplifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5.2. Emitter Resistor for linearity and gain adjustment . . . . . . . . . . . . . . . . . . . . 45 4.5.3. Center capacitor to control frequency range center . . . . . . . . . . . . . . . . . . . 46 4.5.4. Bypass resistor to prevent stage decoupling . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.6. System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.6.1. Branch current to frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.6.2. Center frequency and intrinsic stage delay . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.6.3. Frequency gain at the center frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.6.4. Frequency Range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.7. Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.7.1. The Impulse Sensitivity Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.7.2. Solving for phase noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.7.3. Phase noise comparison between the FFI and CS VCOs . . . . . . . . . . . . . . . 57 4.8. Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9. Interconnect Parasitic Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10. HDL Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 60 61 62 4.11.1. Circuit Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.11.2. Layout Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.12. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 iv 4.12.1. Frequency Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.12.2. Common Mode Gain (5 GHz VCO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.12.3. Response versus supply voltage (5 GHz VCO) . . . . . . . . . . . . . . . . . . . . . 68 4.12.4. Phase noise measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.12.5. Jitter measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5. Design of the Transmitter . . . . . . . . . . . . . . . . . . . . . . 72 5.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2. Top Level Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3. 16-1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.1. The Case for the Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.3.2. Final Implementation and Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4. Phased Locked Loop (Frequency Synthesizer) . . . . . . . . . . . . . . . . 82 5.4.1. Input Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.4.2. Phase Detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4.2.1. Phase detector (Serdes I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4.2.2. Phase detector (Serdes II) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.4.2.3. Phase detector (Serdes III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.4.3. The VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.4.4. Loop Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.4.4.1. Serdes I Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.4.4.2. Serdes II Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.4.4.3. Serdes III Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4.5. PLL Loop Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.4.6. Lock Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.4.6.1. Serdes I Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.4.6.2. Serdes II Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.4.6.3. Serdes III Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.4.7. 20 / 40 Gb/s Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.5. Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6. Data Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7. Line Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8. Internal Testing Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 106 106 106 5.8.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.8.2. Serdes II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.9. Implementation and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.9.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.9.2. Serdes II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.10. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.10.1. Serdes I (transmitter test results). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.10.2. Serdes II (transmitter test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.11. Future Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 v 5.11.1. 8B/10B Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.11.2. Transmitter data retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.11.3. LC Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6. Design of the Receiver . . . . . . . . . . . . . . . . . . . . . . . . 121 6.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.2. Receiver Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.3. Receiver PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.3.1. Phase Detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.3.1.1. Transition Detector (PD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.3.1.2. NRZ Phase / Frequency Detector (PD/FD) . . . . . . . . . . . . . . . . . . . . . 129 6.3.2. The Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.3.2.1. FET Charge Pump / Proportional Control (Serdes I) . . . . . . . . . . . . . . 131 6.3.2.2. Negative Impedance Charge Pump (Serdes II) . . . . . . . . . . . . . . . . . . . 133 6.3.2.3. Mixed Loop (Serdes III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.3.3. PLL Loop Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.3.3.1. Serdes I (FET charge pump) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.3.3.2. Serdes II (negative impedance charge pump) . . . . . . . . . . . . . . . . . . . . 136 6.3.3.3. Serdes III (dual-loop / referenced loop) . . . . . . . . . . . . . . . . . . . . . . . . 137 6.4. 4-16 Demultiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5. Registers and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6. Line Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7. Test Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 139 140 140 6.7.1. On-chip test pattern generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.7.2. True error rate detector (TERD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.8. Implementation and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.8.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.8.2. Serdes II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.9. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.9.1. Serdes I (receiver test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 6.9.2. Serdes II (receiver test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.10. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.10.1. Sampling offset correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.10.2. 40 Gb/s?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.10.3. Demultiplexer improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Discussion & Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 150 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 vi A. IBM SiGe 5 HP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 A.1. NPN Vbe characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 A.2. NPN Ic versus Vce characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 158 A.3. NPN fT Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 B. CML Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 B.1. CML Voltage Swing (non-linearized, digital) . . . . . . . . . . . . . . . B.2. CML Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.3. Voltage Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.4. Buffer with emitter follower outputs . . . . . . . . . . . . . . . . . . . . . . . 160 160 161 162 C. CML Circuit Details . . . . . . . . . . . . . . . . . . . . . . . . . 164 C.1. Linearizing the differential amplifier . . . . . . . . . . . . . . . . . . . . . . 164 C.2. Current bypassing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C.3. CML delay increasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 D. Transistor Sizing to Minimize VCO Delay . . . . . . . 172 E. SpectreHDL models . . . . . . . . . . . . . . . . . . . . . . . . . 178 E.1. FFI VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.2. 3-State PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.3. Transition Detector PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.4. Histogram generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.5. Jittered data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 179 180 181 182 F. Toplevel Chip Schematics . . . . . . . . . . . . . . . . . . . . . 184 F.1. Serdes I Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 F.2. Serdes I Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 F.3. Serdes II Tranciever . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 vii List of Figures Figure 1-1. Figure 2-1. Figure 3-1. Figure 3-2. Figure 3-3. Figure 3-4. Figure 3-5. Figure 3-6. Figure 3-7. Figure 3-8. Figure 3-9. Figure 3-10. Figure 4-1. Figure 4-2. Figure 4-3. Figure 4-4. Figure 4-5. Figure 4-6. Figure 4-7. Figure 4-8. Figure 4-9. Figure 4-10. Figure 4-11. Figure 4-12. Figure 4-13. Figure 4-14. Figure 4-15. Figure 4-16. Figure 4-17. Figure 4-18. Figure 4-19. Figure 4-20. Figure 4-21. Figure 4-22. Figure 4-23. Figure 4-24. Figure 4-25. Figure 4-26. Figure 5-1. Past and proposed future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Toplevel System Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Four stage VCO diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Current Starving VCO frequency and gain response . . . . . . . . . . . . . . . . . 23 Adjustable Voltage Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Layout of Simple CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Test data from Simple CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Frequency Response versus emitter length in delay elements . . . . . . . . . . 29 Feed-forward CS VCO block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Feed forward CS VCO frequency response and gain . . . . . . . . . . . . . . . . 31 Feed-forward CS Delay Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Testing Data from feed-forward CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . 33 Schematic for Delay Interpolated VCO element . . . . . . . . . . . . . . . . . . . . 36 Feed Forward VCO block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 FFI VCO under boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Feed-forward interpolated simulated response . . . . . . . . . . . . . . . . . . . . . 38 Delay versus weighting factor with single stage imbalance . . . . . . . . . . . 42 Decoupling versus delay injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Schematic for FFI VCO element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 FFI VCO frequency versus emitter resistance . . . . . . . . . . . . . . . . . . . . . . 46 FFI VCO frequency versus centering capacitor . . . . . . . . . . . . . . . . . . . . . 47 FFI VCO frequency versus bypass resistance . . . . . . . . . . . . . . . . . . . . . . 48 FFI VCO Frequency Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 FFI VCO System from control voltage to frequency . . . . . . . . . . . . . . . . . 49 Simulated versus analytical response of the FFI Architecture . . . . . . . . . . 50 Center frequency simulation and model . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Current pulse effect on phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Simulated ISF for FFI VCO and output waveform . . . . . . . . . . . . . . . . . . 55 ISF rms values for various ring oscillators . . . . . . . . . . . . . . . . . . . . . . . . . 55 FFI with capacitive interconnect parasitics . . . . . . . . . . . . . . . . . . . . . . . . 61 FFI Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Reducing substrate coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 FFI waveform at 5 GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 FFI VCO measured results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 FFI common mode response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 FFI response versus supply voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Open loop phase noise of FFI VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 FFI VCO analytical and measured jitter . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Transmitter and multiplexer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 73 viii Figure 5-2. Figure 5-3. Figure 5-4. Figure 5-5. Figure 5-6. Figure 5-7. Figure 5-8. Figure 5-9. Figure 5-10. Figure 5-11. Figure 5-12. Figure 5-13. Figure 5-14. Figure 5-15. Figure 5-16. Figure 5-17. Figure 5-18. Figure 5-19. Figure 5-20. Figure 5-21. Figure 5-22. Figure 5-23. Figure 5-24. Figure 5-25. Figure 5-26. Figure 5-27. Figure 5-28. Figure 5-29. Figure 5-30. Figure 5-31. Figure 5-32. Figure 5-33. Figure 5-34. Figure 5-35. Figure 5-36. Figure 5-37. Figure 5-38. Figure 6-1. Figure 6-2. Figure 6-3. Figure 6-4. Figure 6-5. Figure 6-6. Figure 6-7. Figure 6-8. Figure 6-9. Data timing for the 4-1 multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 CML Two Level Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Simulation Testing of CML 2:1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . 77 Simulation Results for CML 2:1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . 78 CML Single Level Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . 78 Symmetric multiplexer transistor states . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Multiplexer Eye Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Multiplexer Layout for Serdes I and II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Linear model of PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Frequency synthesizer evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Schematic for input filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Input filter frequency response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Phase detector schematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Simulated phase detector responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 PLL frequency detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Passive Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Tx PLL passive loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Tx PLL active loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Active loop filter transfer function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Receiver III integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Voltage spectral density for optimal loop bandwidth . . . . . . . . . . . . . . . . 96 PLL simulated step responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 PLL I simulated acquisition plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 PLL II simulated acquisition plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5/10 GHz PLL implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Clocking scheme for transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Transmitter clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Load counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Serdes I LFSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 True error rate detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Serdes II bit pattern generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Serdes I transmitter layout and photograph . . . . . . . . . . . . . . . . . . . . . . . 111 Serdes II chip layout and microphotograph . . . . . . . . . . . . . . . . . . . . . . . 113 Transmitter waveform (Serdes I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Serdes 2 transmitter eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Tx PLL measured phase noise spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Data and clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Top level receiver architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Receiver PLL evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Receiver topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Transition detector in prototype I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Transition detector in prototype II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Gain of transition detector with data jitter . . . . . . . . . . . . . . . . . . . . . . . . 128 Phase detector for NRZ data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Receiver loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 MOSFET charge pump integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 ix Figure 6-10. Proportional control and summing junction . . . . . . . . . . . . . . . . . . . . . . . 132 Figure 6-11. Serdes I loop locking in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Figure 6-12. Frequency and phase lock-in of serdes III Rx PLL . . . . . . . . . . . . . . . . . 138 Figure 6-13. 4-16 demultiplexer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Figure 6-14. Serdes I receiver layout artwork and photograph . . . . . . . . . . . . . . . . . . . 143 Figure 6-15. Serdes I receiver locked to data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Figure 6-16. Serdes I recovered clock showing jitter. . . . . . . . . . . . . . . . . . . . . . . . . . 145 Figure 6-17. Serdes II Rx locked to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Figure 6-18. Serdes II receiver clock phase noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Figure 6-19. Revised 4-to-16 demultiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Figure A-1.Ic-Vbe characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Figure A-2.npn transconductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Figure A-3.Ic-Vce characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Figure A-4.fT vs Ic characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Figure B-1.Current switching versus differential input voltage . . . . . . . . . . . . . . . . . . 160 Figure B-2.Simple CML Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Figure B-3.Reference Voltage Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Figure B-4.CML Buffer with emitter followers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Figure C-1.Linearizing differential amplifier with emitter resistors . . . . . . . . . . . . . . . 164 Figure C-2.Branch current response for various emitter resistors . . . . . . . . . . . . . . . . . 165 Figure C-3.Simulated / Analytical Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Figure C-4.Limiting full current switching with bypass resistors . . . . . . . . . . . . . . . . . 166 Figure C-5.Current limiting effects of bypass resistor . . . . . . . . . . . . . . . . . . . . . . . . . 167 Figure C-6.Current gain effects of bypass resistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Figure C-7.Designing for gain with emitter and bypass resistors . . . . . . . . . . . . . . . . . 170 Figure C-8.Collector Capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Figure C-9.Delay Model with Collector Capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Figure D-1.Delay from emitter follow to differential amplifier . . . . . . . . . . . . . . . . . . 173 Figure D-2.Delay from differential amp to emitter follower . . . . . . . . . . . . . . . . . . . . . 174 Figure D-3.Emitter follower size between driver and receiver . . . . . . . . . . . . . . . . . . . 175 Figure D-4.Delay when using optimized emitter follower . . . . . . . . . . . . . . . . . . . . . . 176 Figure D-5.Delay difference between circuit with follower and one without . . . . . . . . 177 x List of Tables Table 1-1. Equipment used for testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Table 4-1. Circuit parameters for calculating jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Table 5-1. Pin-out of Serdes I transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Table 5-2. Bondpad pin-out of Serdes II chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Table 6-1. Pin-out of Serdes I transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 xi Acknowledgements First and foremost, I want to thank my family. Although they have little knowlege of the research I have done, they have helped more than they know. Without them this would have been a much more difficult undertaking. I want to thank my advisor, Jack McDonald, for his assistance and guidance during the past few years, and for providing me with the oppurtunity to work with cutting edge SiGe technology. The members of my committee, Kenneth Connor, Gary Saulnier, Les Rubenfeld, and Don Millard, also deserve thanks for providing insight and guidance in my research. I would like to extend a special thank you to Dr. Millard for being a wonderful mentor and friend since I began graduate school. He has always been there for me. Also, without my fellow Frisc members and friends, Pete Curran, Samuel Steidl, Matthew Ernest, Steven Carlough, and Bryan Goda, this certainly would have been a boring voyage. Thanks for help. I am indebted to Hank Dardy and Basil Decina at NRL (contract #N00173-99-1G013) for their support in this work. I also wish to thank Sierra Monolithics Incorporated and IBM for the fabrication of my chip designs and for providing additional insight in this research; and Intel, for providing a fellowship to support this work. When I left high school I said, “I’ve just conquered a small hill in my life only to look out and see a huge range of mountains before me.” Am I perhaps standing on the top of the first mountain I saw? xii Abstract The current high-growth nature of digital communications demands higher speed serial communication circuits. Present day technologies barely manage to keep up with this demand, and new techniques are required to ensure that serial communication can continue to expand and grow. The goal of this work was to research, design, implement, test and evaluate high speed serial communication circuits. Research involved an in-depth study of the state of the art in high speed digital and analog circuits; SiGe technology; and serial communication circuits. Two prototype 20 Gb/s transceiver chips were designed using current mode logic (CML) bipolar logic families and using IBM’s SiGe 0.5 µm heterojunction bipolar transistor (HBT) technology. Following fabrication of two designs, the completed chips were extensively tested, and test results were compared to expected results from simulation. After optimization and many improvements, a prototype communication system was designed and prepared for fabrication. The optimized second prototype operated at speeds in excess of 20 Gb/s. It utilized a novel four stage feed-forward interpolated ring voltage controlled oscillator (VCO) architecture, for which RPI is pursuing a patent. By feed-forwarding every stage’s output by one stage the architecture improved the core frequency by greater then 33% with a phase noise of -90.2 dBc/Hz at 1 MHz. The transmitter took advantage of the phase quadrature nature of the VCO in a unique multiplexing technique that required the development of a new 2-to-1 multiplexer. This multiplexer had full input to output symmetry on all three inputs and was capable of performing output data retiming. The PLL had a wide bandwidth of 30 MHz, to suppress VCO noise, and produced in-band jitter of 2.0 ps from 100 kHz to 100 MHz. The receiver, similar in both prototypes utilized the full eight phases of the VCO to twice oversampling every data bit in the phase detector (PD). It was capable of extracting timing information from every rising and falling transition. The loop filter incorporated a xiii negative impedance charge pump integrator which exhibited excellent performance. Four bits of data were sampled through the PD and a 4-to-16 demultiplexer produced the 16 bits of parallel data. A third prototype was developed, but not fabricated, using the data acquired from the first two designs. The transmit PLL bandwidth was optimized to account for the phase noise measurements of the VCO. As a result, a frequency detector was required and added to the PLL to increase the pull-in range. The loop filter was also modified to use the negative impedance charge pump from the receiver PLL. The receiver demultiplexer scheme was improved to decrease the timing constraints. In addition, the receiver PLL was optimized to improve the bit error rate. xiv 1 Introduction & Historical Review 1.1. Motivation and Goals The research presented in this thesis deals with understanding and designing the critical components that make up a serializing and deserializing, or Serdes, circuit. The extremely complicated nature of such a system required a focused study that did not address many of the issues that are present in a similar commercially designed product. Funding for the project was acquired though Dr. Jack McDonald from the Naval Research Lab, NRL. The requirements were to design a SiGe short-haul Serdes system capable of 20 Gb/s that would assist in research that may eventually lead to 40 Gb/s. Serdes circuits, discussed more thoroughly in the following chapter, consists of three parts: a transmitter, a receiver, and a channel. The transmitter accepts streams of data in parallel and multiplexes them together into a single serial stream. Distinguishing the bits at the receiver input, after they travel through the channel, is a primary concern. The receiver accepts the serial stream and demultiplexes it back to the parallel data. It must be sensitive to changes in the data, in order to limit the error rate. The channel connects the transmitter and receiver, and typically consists of amplifiers, repeaters, and optical wiring. IBM’s SiGe HBT process technology was chosen because of the Frisc group’s strength in high-speed bipolar design, and because of the state-of-the-art nature of the process in the industry. The process provides integration with current CMOS technology enabling a very wide variety of circuit topologies. This research used the 5 HP process technology, with 5 levels of metal. It offered 50 GHz fT (transition frequency) HBT and 0.25 µm CMOS transistors. One way of grouping Serdes circuits is by the distance over which the serialized data is expected to travel. Systems, such as Synchronous Optical Network (SONET), are implemented over distances greater than 100 km, and are considered long-haul. Short-haul Serdes, on the other hand, is limited to short distances, such as a LAN, or between CPUs in a multi-processor system. This distinction between short and long haul systems has 1 important implications on the critical specifications of the circuit. For long-haul systems, phase noise is critical, as it dictates the total bit error rate (BER) through the long and noisy channel. Short-haul is less sensitive to phase noise and is instead focused on bit throughput and higher bandwidth. Current industry level Serdes designs, as of the year 2000, run at 10 Gb/s and utilize the same or similar 5 HP technology. Pushing the goal to 20 Gb/s and even 40 Gb/s was intended to place this research on the cutting edge and evaluate the maximum potential of the technology. In addition to the goals of the NRL contract, various other factors motivated the development of this project. First was the available test equipment. The lack of facilities to test a packaged part necessitated a chip with wafer probing capabilities. This limited the total testable signals to 12 RF and 12 DC at one time. Without packaging, a fully integrated solution was necessary, rather than one that needed off chip components, such as capacitors and op-amps. 1.2. The three chips The total design process consisted of three separate designs. The first design, Serdes I, was a prototype that tested some of the key components of a complete design. It was fabricated in February 1999. This chip was an excellent starting point for the development of a fully functional chip. Serdes II was investigated and studied after the results from Serdes I were analyzed. It possessed improvements in important areas such as the PLL, the multiplexer, the receiver topology, and the VCO. Unfortunately the tape-out date was earlier than expected and allowed only one month for final design and layout. This proved to be a difficult time line and some design issues were left unresolved. Following the collection of data from Serdes II, a third iteration, Serdes III, was investigated. The design goal was to solve most of the issues uncovered from Serdes I and Serdes II. Although no new layout was done for Serdes III, a complete set of new simulated schematics were created. With the addition of some minor support circuits, a fully functional and optimized Serdes chip could be implemented. 2 Transmitter designed 99 9 Chips received Test VCOs 19 99 Fe b, Test transmitter Start work on Serdes II Test receiver Submit to ISSCC Candidacy ug A M Fe b, ay , 20 00 20 00 Candidacy prep. Additional simulations Serdes I Design dubmitted N A ug ,1 M ay ,1 Candidacy preparation Receiver designed Final checks ov ,1 99 9 Leap VCO Simple VCO 99 9 Start of research Paper search ,2 00 0 A N ug ,1 99 8 ov ,1 99 8 1.3. Project time line Serdes II received Test FFI VCO Test transmitter Test receiver Se p t, 20 00 SymMux patent Submit Serdes II FFI VCO patent SMI offer to fabricate Intense effort to design Serdes II Both patents pending Complete thesis Submit JSSC paper Defend thesis Figure 1-1 Past and proposed future research This is a time line of the goals and accomplishments of this Serdes research. A time line indicating completed goals is shown in Fig. 1-1. Research into high speed communication circuits was initiated in August 1998. A paper that appeared in ISSCC 1998, titled “A 10 Gb/s Si-Bipolar TX/RX Chipset for Computer Data Transmission” [1], was the basis for the majority of the research. The paper presented a novel idea for a voltage controlled oscillator, VCO, and a description of a transmitter and receiver circuit. VCOs are the most important circuit in the design of communication circuits, and as such, were the starting point for this research. A simple four phase buffer oscillator was 3 designed and simulated. The method for frequency control for this oscillator originated from a modified version of Samuel Steidl’s VCO implementation [2]. An advanced version of this VCO, with a 66% speed improvement, was subsequently implemented. The desire to further increase the frequency led to a study of a phase multiplication techniques [3], [4]. Three separate VCO test chips were laid out to test various aspects of the above techniques. Each chip contained serveral versions of a unique VCO design: with and without phase multiplication, and under several different loading conditions. In November 1998, the transmitter circuit started to take shape. One component of a serializing circuit is the final multiplexer. To design this, a unique register “shuffling” method was evaluated. As it provided better performance than other techniques and worked with a slower rate multi-phase clock, it was chosen for the final design. In order to test the transmitter, a linear feedback shift register, LFSR, was used to provide pseudo-random data. An additional requirement of the transmitter was operation at a speed relative to a fixed low frequency clock. This required the development of a phase locked loop, PLL, capable of synching a low frequency external reference clock to the high rate internal clock. Starting in December and during transmitter development, a receiver design was examined. Many improvements were added to the fundamental architecture found in [1]. Instead of gathering timing data from every fourth transition, it was determined that better performance could be achieved if every transition were used. Since no detailed mechanism for feedback control was described, some ideas were gathered from a clock and data recovery paper [5]. Starting with these ideas, a unique PLL was created for clock recovery. Because of the difficulty of using external function generators, an internal testing source was developed to provide different bit patterns to exercise the circuit completely. All six chips, including an integrated transmitter/receiver chip, were designed and laid out using Cadence software. Simulation was done using HSpice, Matlab, and a digital simulator developed by Peter F. Curran. Final designs were shipped to IBM during the first week of February 1999. After six months in fabrication, a finished wafer was returned to RPI in the beginning of August of the same year. Chip testing began with a detailed study of the three VCO chips and the test source VCO in the receiver. It was became apparent that most of the circuits underperformed, when compared to simulation results. It appeared that under heavily loaded conditions the 4 circuits slowed down more than expected. The transmitter test chip was tested and found to work with a 25% reduction in frequency. This testing was followed by a detailed inspection of the receiver chip, which was found to work nearly at the design speed. During this time, data was being collected for a conference paper to be submitted to the International Solid State Circuits Conference, ISSCC. Although the chips performed slightly slower than anticipated, the paper still showed significant advances in state of the art research. Unfortunately the paper was not accepted, most likely because there was a frequency mismatch between the transmitter and receiver. During the remainder of September, a thorough simulation of the VCO, including layout parasitics, was performed. The initial results showed a close match to the results measured from the fabricated wafer. Some discrepancy remains regarding how loading affects the speed of the devices. A continuation of this work will attempt to match simulations accurately to measured results to ensure that future designs will respond as expected. It was necessary to produce a second Serdes chip, drawing on the success of the of the first test chip, that would meet the goal of a 20 Gb/s. Additional circuitry was needed to round out the design: a 4-to-16 demultiplexer, an internal testing scheme, transmitter and receiver integration onto one chip, packagability, and improved performance. A comprehensive study was performed to determine exactly why and how the chips underperformed. The design was modified to ensure that the parts would meet the required specifications. This included complete redesign of the VCO into the Feed Forward Interpolated VCO (FFI VCO). The new design was based upon the results of the previous design and the development of a new multiplexer. In February 2000, an invention disclosure record entitled “The Symmetric Multiplexer,” was submitted to RPI [6]. The invention improved the standard CML multiplexer and reduced phase noise and jitter at the transmitter output. Serdes 2 was finished and submitted to Sierra Monolithics Incorporated, SMI, for fabrication1 at the end of March 2000. It contained many improvements on the previous design and was capable of being C4 packaged and wafer tested. After its completion, an 1. SMI volunteered silicon on an experimental run. 5 additional invention disclosure record that focused on the FFI VCO was submitted [7]. The VCO is a novel approach to designing ring oscillators. It improves upon many key parameters of the standard ring VCO. The Serdes II chip was received three months after tapeout, in the middle of July 2000. Testing began immediately with a complete characterization of the FFI VCO including its frequency response, CMRR, phase noise, supply response, and jitter. A high quality spectrum analyzer was rented to aid in testing and data acquisition. Testing of the transmitter was followed by a look at clock jitter and data eye diagrams. The transmitter was a complete success, and operated at 20 Gb/s with rms jitter of 2.0 ps in the frequency band of 100 kHz to 100 MHz. The symmetric multiplexer appeared to work exactly as expected. Testing the receiver confirmed an anticipated problem with low lock-in range. This was also seen in Serdes I and was not completely addressed in the second prototype. Following the tape-out of Serdes II, intense work was done on Serdes III. Several last minute problems were discovered in Serdes II that were corrected in the next iteration. Data collected from Serdes II allowed the optimization of important PLL parameters in order to reduce jitter, and improve the pull-in time. A problem with a small pull-in range in both receiver PLLs required a complete redesign of the loop and the addition of a reference signal. Using the data collected in Serdes II, a journal article was submitted to the Journal of Solid-State Circuits, JSSC, in October. It was titled “A Transmitter Architecture for High Speed Short-Haul Serial Communication,” and it detailed the FFI VCO, the symmetric multiplexer and the transmitter architecture. At the end of September, the RPI patent office reported that they were going to pursue U.S. patents for both inventions. This would start with an immediate application for provisional patents that would protect the work after disclosure. 1.4. State of the Art In the quick-paced research area of high speed communications, industry is currently cresting the 10 Gb/s barrier while research is beginning in the 40 Gb/s regime. New microelectronic technologies such as AlInAs/InGaAs heterojunction bipolar transistors 6 (HBT), and SiGe HBTs [8], [9] are playing leading roles. In particular, SiGe HBT and CMOS technology is proving itself to be a high-speed (60-90 GHz fT), high-yield, highintegration, and low-cost solution [10], [11]. It possesses the strengths of silicon because of similar fabrication techniques, but benefits from higher frequencies with the introduction of germanium [12]. The current state of the art in high-speed serial communications can be broken down in three basic design areas: VCOs; clock multiplier units (CMU), or transmitters; and clock and data recovery (CDR) circuits, or receivers. As the speed of serial communication circuits increases, so too must the speed of the core building block of the circuit, the VCO. Multi-phase ring oscillators with top speeds approximately equal to 1/10th of their technology’s fT are being improved [1], [13], [14]. It is common to see speeds around 5 GHz, with maximum quoted speeds up to approximately 15 GHz through clock phase multiplication [3], [4]. Their Q of unity and high noise characteristics are more suitable for short-haul systems or for systems that can tolerate phase noise. In-depth analysis of the sources of phase noise are allowing tight optimization of circuits [15]-[19]. CMOS differential ring oscillators running at speeds up to 5 GHz exhibit -95 dBc/Hz of phase noise at 1 MHz [18], while bipolar rings are quoted as having phase noise values of -86 dBc/Hz at 1 MHz [20]. Jitter, generally expressed by the κ constant, has been documented for a silicon bipolar ring running at 625 MHz with a 0.6 mA tail current at 22 n s [17]. Ring oscillator architecture is straight forward and simple to understand. Through interesting and creative interstage feedback techniques, the VCO frequency, and phase noise can be improved. A four stage ring VCO that increases its speed by 33% by leapfrogging the output of one stage to the input of the stage ahead is documented in [1]. This improves the speed by reducing the effective delay of every stage. A similar, more general approach is presented in [13], which utilizes sub-feedback inverters that create fast and slow loops which can be mixed together. An earlier approach, [23], has a five stage core that potentiometrically mixes the output from the third and fifth stages. By doing this, the ring is able to operate variably between a 3 stage and a 5 stage oscillator. Finally, by using a negative skewed delay scheme, the core frequency of a CMOS ring oscillator is improved by 50% [24]. This is accomplished by compensating for the slower PMOS transistors by 7 tying the PMOS input to the output of a stage two gates back. This turns the transistor on sooner than the NMOS, thus improving its speed at the expense of additional power requirements. LC oscillators, on the other hand, which posses a high Q and extremely low noise and jitter, are being rigorously researched as VCOs for long-haul serial communication. Unlike multi-phase oscillators that can generate frequencies higher than their core frequencies, LC oscillators are typically run at the baud rate of the communication channel. Thus, for a 10 Gb/s serdes implementation, a 10 GHz LC VCO is required. A 5 GHz VCO developed by IBM [21] was quoted as having a phase noise of -98 dBc/Hz at 100 kHz, with a power of 15 mW. A second 11 GHz VCO with an integrated inductor is documented as having a -78 to -87 dBc/Hz phase noise at a 100 kHz offset from the carrier [22]. The state-of-the-art in transmitter, or CMU, research is measured primarily by the maximum bit rate compared to the transistor technology, the clock jitter produced at that rate, and the phase noise of the oscillator. A 1.062 Gb/s transmitter implementation, [26], utilizes a half-rate ring oscillator. The ring oscillator incorporates two mixing elements, between every pair of delay elements to control the rate of oscillation. Its quadrature outputs are further broken up into four quarterrate signals that drive the 10-to-1 multiplexer. The PLL achieves an rms jitter performance of 9.8 ps. A low noise, 12.5 Gb/s CMU is described in [27]. It possesses a differential single phase LC oscillator with a phase noise of -101 dBc/Hz at 1 MHz. The PLL has a very low bandwidth of 300 kHz in order to reduce in-band noise. Its reference is at approximately 195.3 MHz and it utilizes a standard 3-state phase detector (PD). The loop filter consists of a negative impedance amplifier and a single pole, single zero RC filter. The output jitter is quoted as 0.4 ps. An interesting non-optical transceiver described in [28] utilizes a 4-PAM (pulse amplitude modulation) serial link for 8 Gb/s communications. It essentially transmits and receives four level logic, which allows twice the symbol rate for the same bandwidth. It exhibits a transmitter output jitter of 2 ps and a receiver jitter of 4 ps. As bit rates are pushed higher relative to the transistor technology speed, certain problems arise. In the transmitter PLL, a clock frequency divider is needed to drive the PD 8 along with the reference signal, and to drive multiplexer inputs. A feedback MS-latch often does the trick, but for extremely high VCO speeds a new approach is required. A dynamic frequency divider capable of speeds up to 79 GHz using transistors with an f T of 80 GHz is described in [29]. It uses an XOR multiplier, a low pass filter inherent in the multiplier, and it feeds the output back into the multiplier. The only stable condition is when the output is at half the frequency of the input. The state-of-the art in receiver, or CDR, design is measured by the ability to extract data in the presence of both data and clock jitter, and the ability to tolerate pseudo-random data. The design described in [30] uses a full rate ring oscillator with a 12.5 GHz clock to extract the 8B/10B encoded data at 10 Gb/s. The VCO exhibits a phase noise of approximately -80 dBc/Hz at 1 MHz. The PLL has a bang-bang PD and is frequency locked by a 195.3 MHz reference signal. The data PD has a pull-in range of 0.6% and a hold-in range of 1.2%. This receiver is quoted as exceeding the SONET-192 specifications by 50%. A 50 GHz fT SiGe 10 Gb/s CDR for SONET is described in [31]. It utilizes an LC tank VCO running at 10 GHz with a phase noise of -80 dBc/Hz at 100 kHz. The PD is a Hogge type, and the charge pump uses an active MOSFET positive-feedback pull-up amplifier. The recovered clock rms jitter was measured at less than 1 ps, with a bit error rate of 10-9. SONET specifications for jitter tolerance, jitter transfer, and jitter generation were all met. A very high speed CDR discussed in [32] uses a silicon bipolar process with an fT of 12 GHz for 8 Gb/s operation. The loop filter and VCO are off-chip but the frequency and PD are both on-chip. The clock jitter was measured at 1.5 ps rms. 1.5. Contribution to the Field An important aspect of Ph.D. research is advancement of the state of the art, and proving that such work builds upon the shoulders of others and is not merely a reinvention of the wheel. Four key components of this research can be quickly singled out as original and novel, and RPI is pursuing U.S. patents for two of them. 9 1.5.1. Feed Forward Interpolated VCO The Feed Forward Interpolated VCO is an improvement over the standard ring oscillator [1]. The ring VCO in [23] utilizes a similar feed-forward method to extend the frequency range but the feed-forwarding remains fixed and is not used as the delay control mechanism. The design presented in this thesis, however, uses feed-forwarding to increase the frequency range and also as the primary method to control the stage delay. It is versatile and allows adjustments to be made to the center frequency, tuning range, and gain through simple parameter changes. The VCO is 33% faster than a simple four stage ring oscillator utilizing the same power, when it is configured for maximum operating speed. This increase in speed can be traded for additional phase noise and jitter suppression, making the FFI VCO a viable alternative to LC tanks when used in a short-haul communication channel. An invention disclosure record for this circuit was submitted in May 2000 to the RPI patent office. In September 2000, the patent office declared that they were going to pursue a U.S. patent for this invention. 1.5.2. Transmitter Interleaving Architecture As the bit rate is pushed higher, with respect to the technology speed, it becomes increasingly difficult to design VCOs that can keep up. Fractional rate oscillators can solve this difficulty, but require tight timing constraints on the output multiplexer. The transmitter design discussed in this thesis utilizes a relatively slow, well understood, quarter frequency multi-phase VCO. The novel transmitter architecture allows inquadrature phases of the VCO to control a 4-to-1 multiplexer. Although this approach is similar to the design given in [1], it possesses a few differences. First, the 4-to-1 multiplexer is implemented as a single gate whereas the transmitter interleaving architecture breaks the problem into multiple gates. Second, the multiplexer requires multiple level clock inputs which requires the clock phases to be skewed. Third, the multiplexer in the papter requires three levels of logic while this new architecture requires only two. This is important for power saving applications that require only two levels. 10 1.5.3. Symmetric Multiplexer During the development of the transmitter a problem developed that required the basic 2-to-1 multiplexer to be rethought. The problem was that the 2-to-1 multiplexer had become a critical timing path in the transmitter. In other words, any delay mismatches in this circuit were propagated to the output. After analyzing the problem, a new multiplexer was developed that had perfect timing symmetry and possessed none of the problems of the original multiplexer. This discovery enabled the new architecture to operate smoothly. A U.S. patent for the symmetric multiplexer, like the FFI VCO, is being pursued by the RPI patent office. 1.5.4. Receiver PLL The critical circuit in the design of the receiver PLL was the phase detector (PD). Typically, a Hogge-type [31], [52] or a bang-bang type PD [30] is used in high speed serial receivers. The 20 Gb/s goal of this work required a PD to operate twice as fast using the same technology speed. A bang-bang or Hogge style PD with this speed capability would be difficult to design and would require a clock at the same frequency as the data. As a result, a new PD had to be developed. The new design, called a transition detector (TD), incorporates eight MS-latches, each clocked by a different phase of the VCO. This allowed the data to be twice oversampled and timing and information data to be collected. 1.6. SiGe 5 HP Overview IBM’s 5 HP SiGe BiCMOS process incorporates 0.5 µm HBT transistors and 0.35 µm CMOS transistors. The epitaxially graded Ge base in the HBT allows f T speeds of up to 60 GHz. Also included in the technology are: high breakdown NPN transistors, gated lateral PNP transistors, polysilicon resistors, Metal-Insulator-Metal (MIM) capacitors, substrate contacts, precision oxide/nitride decoupling capacitors, schottky barrier diodes, varactor diodes, PIN diodes, electro-static discharge (ESD) devices, last metal (LM) spiral inductors, resistors (NS, RN, and RI), and LM bondpads. 11 Between three and five layers of metal are provided at the back end of the line for interconnect1. The first level of metal is for local interconnect and has a minimum width of 0.8 µm and a fixed thickness of 0.63 µm. The last, or highest level, called LM has a minimum width of 2.4 µm, and a thickness of 2.07 µm. LM is typically used for bond and C4 pads, power and ground wiring, inductors, and MIM capacitors. An extension to the 5 HP process allows LM to be substituted with analog metal (AM) which is 4 µm thick and separated by 3 µm from the next layer of metal. AM is primarily used for inductors which require low resistance and low capacitance to the substrate. Except for AM, all layers of metal are separated by 1.2 µm of silicon dioxide. The Cadence design kit from IBM provides full Spectre and HSpice models for the devices listed above. The kit allows the extraction of interconnect capacitance and resistance to enable full parasitic simulation. See “IBM SiGe 5 HP” on page 156. describes important NPN HBT parameters in more detail. Appendix A.1. describes the turn on characteristics of the transistor, specifically the collector current versus base-emitter voltage. The relationship between the collector current and the collector to emitter voltage is discussed in Appendix A.2. f T is a figure of merit for the transistor family and its relation to the collector current is useful when biasing the transistor for maximum performance. A plot of the transistor fT versus collector current can be found in Appendi xA.3. 1. Serdes I was submitted in a DARPA multi-user wafer which only allowed three levels of metal. Serdes II was submitted through Sierra Monolithics and had the full five levels of metal. 12 1.7. Testing Equipment Table 1-1 Equipment used for testing Type Model Specs Usage time-domain oscilloscope Tektronix 11801C 50 GHz • transmitter eye diagrams spectrum analyzer Rhode & Schwarz FSEM 30 30 Hz 26.5 GHz spectrum analyzer HP 8563E 30 Hz 26.5 GHz signal source HP 4430B < 1 GHz • Low phase noise jitter measurements signal source HP 8350B power supply • time-domain jitter measurements • VCO frequency response • VCO common mode response • VCO frequency versus power supply • VCO phase noise < 10 GHz • Transmitter PLL phase noise • Receiver PLL phase noise • High frequency receiver measurements Agilent 3 ch. DC • Labview controlled VCO frequency and supE3631A ply response 10 channel RF probes GGB > 1 GHz • All high speed RF measurements where made using these probes. 12 channel DC probes GGB < 1 GHz • These probes were used in Serdes II for simple control lines. LabView & GPIB • Labview and GPIB hardware simplified the collecting of most data, including VCO phase noise and responses. 1.8. Document Logistics This thesis is sectioned into an abstract, six chapters, a conclusion, and appendices. This introduction is the first chapter; it describes the goals and motivations behind this project and discusses the state-of-the-art, the novelty of this work, and the test equipment. The second chapter goes through the basic block diagram of a serial communication system and the function of each block. Chapters three and four detail the development and results of the two VCOs researched in this work. Chapter five details the transmitter, including the 13 PLL, architecture, and test structures. The last chapter discusses the receiver, its operation, and test results. Appendices include information on the SiGe process used in this work, and circuit details of this technology. In addition the last appendix has the top level schematics for the Serdes I and II chips. Three different Serdes designs were researched in this work. The first two were fabricated and the third represents research for the future. Each design is designated by the names Serdes I, Serdes II, or Serdes III. Certain conventions were followed throughout this document. First, node names in schematics and within equations are in bold font, such as z20 and a11. Second, equation variables are italicized, as in fo, and ω2. Third, in plots that contain both simulated and measured data, the simulated data is usually expressed as a dotted line and the measured data line is solid. Fourth, for equations solved for the general case the units are usually expressed as a function of the transistor size. This shows how the constants and variables change depending on the transistor size. In contrast, absolute units were used for specific circuits and fabricated circuits. 14 2 Serial Communication The exchange of high speed serial data involves three primary components: transmitter, receiver, and transport channel. A transmitter (Tx) gathers low rate parallel data and transforms it into high speed serial data. The signal is then transported through the channel, potentially air, or wire, to a receiver. The receiver (Rx) must then demodulate the signal and extract the clock and demultiplex the data. The received information is fed out of the receiver as parallel data. Tx PLL clock tree Rx VCO reference clock Rx PLL reference clock Figure 2-1 Toplevel System Block Diagram The transmitter accepts parallel data and serializes it to a NRZ signal. The receiver accepts the bit stream, extracts the clock and demultiplexes the data. 15 DATA OUT decode support circuits registers demux line receiver internal testing Tx VCO Receiver support circuits clock tree line retimer driver internal testing multiplexer registers encoding DATA IN Transmitter transport channel 2.1. Serial Communication Block Diagram Shown above in Fig. 2-1 is a basic block diagram of a serial communication system. Although most systems do not look exactly like this, there is enough in common between this system and others to say that these diagrams represent all such systems fairly accurately. 2.2. Transmitter / Multiplexer / Clock Multiplier The transmitter’s role is to accept a data word of a specified width, serialize it and drive the data onto a channel. The width of the word depends on the application and is a function of the input and output bandwidths. For example, an 8 Gb/s serializer, would require 16 bits at 500 Mbit/s or 64 bits at 125 Mbit/s. Serializing involves multiplexing the data into an ordered bit stream which is typically a non-return-to-zero (NRZ) format. The process of driving a channel may consist of a simple 50 Ω amplifier, or it may consist of a more sophisticated circuit that is capable of driving an optical driver. It is possible, depending on the specifications, that the accepted data may be encoded. The encoding process may include encryption, compression, bit stuffing, error checking, and framing [33]. Depending on the design of the receiver, it may be necessary to introduce additional transitions into the data to meet critical phase locked loop (PLL) specifications in the receiver. 8B/10B encoding is popular and guarantees at least one transition every 5 bits [34]. If channel alignment, which means that bit 0 in the Tx comes out on bit 0 in the Rx is required then encoding will be needed. After possible encoding, the bits are stored in a register of appropriate size for the incoming word and the multiplexer width. When the multiplexer is smaller than the width of a word then the bits may be fed into a shift-register before being multiplexed [35]. This register and the subsequent multiplexer must be timed very carefully to ensure that bits are sampled correctly and that no race or runt pulses exist. Sometimes a first-in first-out (FIFO) system is added to lessen the timing constraints between the data load clock and the reference clock. The PLL clocks the multiplexer and the multiplexer performs the serialization function. This operation may require multiple gates, such as a 32-4 multiplexer followed by a 4-1 multiplexer, or simply a 16-1 multiplexer. Timing at this stage becomes more 16 critical as the output rate of the multiplexer is at the serial data rate. Often multiple clock phases or clock frequencies are needed. The retiming circuit before the line driver re-establishes the transition locations in order to remove any jitter or noise introduced by the registers and multiplexers [42]. This circuit is clocked directly by the PLL to be as noiseless as possible. When low output jitter is the limiting factor in the design, then a retiming circuit is absolutely required. The retiming circuit, or multiplexer, is often unable to drive the pad and external load directly, so a line driver is needed [36], [37]. It matches the internal circuitry impedance to the output impedance and amplifies the signal to a desirable voltage swing if necessary. Perhaps the most important circuit in the transmitter is the PLL, otherwise known as the frequency synthesizer or clock multiplier unit (CMU). It generates the internal clock signals which may be multi-phase or multi-frequency. It’s required to have low phase noise, low jitter, and low frequency drift to generate a similarly low phase noise data stream. The transmitter PLL, as opposed to the receiver PLL usually has a very low bandwidth in conjunction with a low phase noise VCO to generate the cleanest clock signal. The PLL locks the phase of an internal high speed clock to an externally supplied low speed reference. In this way the reference is able to dictate the exact frequency that data is transmitted. For instance, a 10 Gb/s system may have a 625 MHz reference clock, and a 10 GHz internal clock. The PLL must then match the two frequencies after dividing the internal clock by 1/16th. The PLL consists of three basic components: a phase detector (PD), a loop filter (LF), and a voltage controlled oscillator (VCO). The PD generates a signal which is a function of the phase difference between the divided down internal clock and the external reference. In low speed applications such as this (625 MHz clock versus 10 GHz data rage) the PD can generate an accurate, linear measure of phase difference. The LF typically consists of an active filter with high DC gain which has a specific bandwidth and a high frequency pole. With most of the other gains and parameters in the PLL fixed, the LF is the only circuit that is adjustable to meet the specifications. The VCO accepts a voltage input and generates an output signal which has a frequency that is a function of the input. Ideally this relationship is linear which leads to closed-form linear solutions for the PLL. 17 One of the most important figures of merits for the transmitter is the output data jitter. Jitter is created inside the VCO and partially filtered out by the PLL. The retiming circuit and all circuits thereafter add slight jitter to the signal. The transmitter data eye closes horizontally as more jitter is introduced into the circuit. 2.3. Transport Channel The channel carries the data from the transmitter to the receiver, and may be electrical, optical, wireless, or any combination of the three. For long-haul communication the channel is a significant and sometimes dominant source of phase noise and jitter. For short-haul communications, however, we assume that the channel is negligible. 2.4. Receiver / Demultiplexer / Clock & Data Recovery The receiver must extract a clock from a very high frequency serial signal, plagued with jitter and noise and use that clock to sample the data. This process is called clock and data recovery and is made more difficult because transition locations are not guaranteed. A line amplifier with a specific input impedance amplifies the signal to internal levels while minimizing the distortion. The amplifier must have a large bandwidth, typically about 50% higher than the baud rate. Noise injection from this circuit must be minimized because the data signal is already saturated with jitter. When an optical channel is used a laser diode drives the receiver input and a transimpedance amplifier is required. The receiver has a PLL that is very different from the PLL in the transmitter. First, the PD must operate at or near the data rate, which requires a simpler circuit and one that may only provide a non-linear output. The PD must also be able to handle random data that has random transition locations, if the data is of the NRZ variety. In addition, the key PLL parameters must be tuned to a signal with high noise content as compared to the PLL in the transmitter which has a low noise reference as its input. Additional circuitry will be needed to sample the data using the recovered clock unless the PD does so naturally. As in the case of transmitter, a reference clock may be used to bring the receiver VCO close to the data frequency before clock extraction occurs. This greatly enhances the operating range of the receiver PLL. The drawback is that two separate PDs and a circuit 18 that can switch between them is needed. This introduces two loops consisting of common components which must be able to operate independently. A common component in dual loop PLLs is a lock detect circuit which determines if phase lock is lost and if it is, the loop switches back to the external reference loop. This circuit is useful in a high noise environment where data jitter can cause the PLL to become unstable. It also allows notification to the software layer to resend the lost data. Once a clock has been extracted from the serial signal, and the data captured, the data can then be demultiplexed through a series of samplers at decreasing clock rates. For instance, in a 10 Gb/s system the first resampled data would pass through a 1-to-2 demultiplexer driven by a 5 GHz clock. The second stage would consist of two 1-to-2 demultiplexers driven by a 2.5 GHz clock and so on. If a multiphase clock is used, then multiple samples can be taken with separate samplers. This allows the use a clock at a fraction of the data bit rate. One of the most important parameters in the design of the receiver PLL is its jitter transfer function. This determines how sensitive the system is to data jitter. The PLL should be able track low frequency jitter very well. In this case the jitter transfer function should be close to 0 dB. At high frequencies the transfer function should drop off in conjunction with the bandwidth of the loop. Another important parameter is called jitter peaking. This parameter describes high frequency jitter components such as those from spurious modulation. This is especially important in SONET repeaters that feed the receiver clock back into a separate transmitter. A sequence of many repeaters are very sensitive to this form of jitter. After the data is fully demultiplexed down to the desired parallel data width it can be decoded based upon the encoding scheme used in the transmitter. In some cases this also involves channel framing which lines up transmitter input channel n with receiver output channel n. Once the data is decoded it may, like the transmitter, be placed in a FIFO to reduce the timing constraint on the data received clock. 19 2.5. Internal Testing Internal testing involves performance verification of the transmitter and receiver before and after being connected in a complete system. For a chip with both transmitter and receiver components, this may involve a feedback path across the chip from the output of the Tx to the input of the Rx. The parallel data from the Tx and Rx can then be compared to determine the bit error rate (BER). Additional testing modes may involve additional outputs that show the health of the system [38]. Outputs may also be duplicated and fed to testing equipment while actual data is being transmitted. 2.6. Support Circuits Other circuitry may be needed in the system depending on the application. For example, if a transmitter and receiver are required to operate at different fixed frequencies, selectors and special input pins are required. Also, circuits within the chip may not be needed all the time and in some cases a power managing system can cut-off power. This option reduces overall power consumption but requires additional power-switching circuits. 20 3 Current Starving VCO Transm itter Receiver 3.1. Project History The Current Starving VCO (CS VCO) was used exclusively in the first serdes design, which was fabricated in February 1999, in the transmitter, the receiver, and in various oscillator test structures. Its performance was sufficient but the design required some revision to meet frequency specifications. Deficiencies and unpredictable behavior, however, resulted in its elimination from all subsequent designs. The feed forward version of the CS VCO was not intended for use in the transmitter and receiver design. It was instead designed to push the upper frequency limit in the ring oscillator design. However, it had the potential for use in future transmitter and receiver designs in order to double the speed to 40 Gb/s. 3.2. The need for a VCO PLLs, frequency locked loops (FLL), clock extractors, and frequency synthesizers all require a voltage controlled oscillator. These circuits create one or many signals with a frequency that are a function of an external control voltage. In a PLL, or clock extractor, a DC voltage is generated based upon the difference between the VCO signal and an external signal. This voltage is then fed back into the VCO to create a stable phase feedback loop. Frequency synthesizers incorporate frequency dividers to create signals of varying frequencies based upon the VCO’s fixed frequency. VCOs for Serdes circuits are usually either an LC (inductor, capacitor) oscillator or ring oscillator; each having benefits and drawbacks. All VCOs discussed in this section are four stage ring oscillators which produce eight unique phases when used with differential 21 logic. The architecture of the receiver and transmitter requires this crucial multiple-phase characteristic. 3.3. Simple Current Starving VCO The Simple CS ring oscillator has four stages [39], shown in Fig. 3-1, and is able to create eight unique phases. The frequency of oscillation is defined by 1 f = -------------2 ⋅ 4T (3-1) where T is the delay through the gate. A factor of two is necessary, because after a signal passes through four buffers it has only changed sign and requires another trip through all four to oscillate. The frequency and gain response for this oscillator is shown in Fig. 3-2. ΦA ΦD A ΦB B D ΦB ΦA C ΦC Τ ΦC ΦD Figure 3-1 Four stage VCO diagram Frequency control is accomplished through variable delay elements arranged in a ring with an odd number of inversions. The operating frequency range is a function of the delay element range and the number of stages in the ring. The schematic for the Simple CS stage is a buffer, described in Appendix B.4. on page 162, with level two emitter followers. The differential circuit current source is connected to the aVref circuit in order to control its current. 3.4. Basic Operation Current starving VCOs control their frequency by varying the delay through each stage of the ring. Each stage has a differential amplifier with one or many adjustable current sources at the bottom of the tree. In this way, the stage is able to increase its delay with a decrease in current. This effect is a primarily a result of less current causing a decrease in 22 the fT of the transistor, as shown in Appendix A.2. on page 158. Even though the smaller current has less capacitor charging ability, the associated smaller voltage swing produces no net effect in delay. 6.25 3.5 6.00 3.0 frequency response 2.5 5.50 2.0 5.25 1.5 5.00 1.0 gain 4.75 4.50 -1.8 Gain (GHz/V) Frequency (GHz) 5.75 0.5 0.0 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 Control Voltage (V) -0.4 -0.2 0.0 Figure 3-2 Current Starving VCO frequency and gain response The CS VCO’s usable frequency range is between a control voltage of 1.5V to -1.0V or higher. The lower range is limited by the small voltage swing on the output. These simulation results were obtained with one minimally sized buffer on each stage’s output. Interconnect parasitics were not included. Even though current starving is a simple technique for controlling delay, it has numerous disadvantages. The first obvious problem is that at the limits of operation and control voltage, undesirable conditions occur. At the minimum extreme, the current can be decreased to the point that sustained oscillations can no longer occur, because the voltage swing decreases and the gain drops below one. At the maximum, the transistor fT begins to drop off the opposite side of the fT curve and the transistors begin to slow. This is potentially disastrous when used in a phase lock loop because the VCO gain has gone negative and the loop will become unstable. 23 Another problem is the that the delay as a function of current is non-linear in nature. Fig. 3-2 shows the basic frequency response for the Simple CS VCO excluding interconnect parasitic effects. The gain varies from 3.0 GHz/V to 0.5 GHz/V along the curve and is never constant. A non-linear gain makes phase locked loops difficult to design. The output voltage swing is also a concern because as the current increases, the voltage swing across the pull-up resistors also increases. This alters the load driving ability, and creates a situation which is difficult to model analytically. Another problem is that the singled-ended nature of the control voltage does not posses the common-mode noise immunity that is inherent in differential wiring. When phase noise is a dominant design factor this architecture can be quite limiting. The are benefits of this style of ring VCO, including its simplicity and a large tuning range. The layout footprint is also quite small which minimizes interconnect delays. 3.4.1. Adjustable Voltage Reference Vctrl R1 Ir aVref Re Vee Figure 3-3 Adjustable Voltage Reference The input voltage controls the total current through this circuit. In turn this current is mirrored to all connected sources. The active current sources in the CS stages are “mirrored” to a circuit that can vary its current as a function of a single-ended input voltage, as depicted in Fig. 3-3. The current through the reference circuit, and its derivative with respect to the control input is defined 24 by the following equations: V ee + V ctrl – 3V be I r = --------------------------------------------- (3-2) dIr 1 ----------- = ------------------ (3-3) R +R 1 e R1 + Re V ctrl The emitter resistor, Re, is matched to the current sources emitter resistors so that the same voltage exists across both. R1 determines the current gain of the circuit and the value is selected based upon the input voltage swing, and the required output current swing. An additional diode is added to decrease the voltage drop across R1 allowing a smaller resistor size. A common approach to designing a current mirror is to include base-current compensation through a transistor located on the output (see Appendix B.3. on page161). This allows the current reference to drive more loads and lessen the current degradation when more loads are added. The problem with this approach is that it limits the frequency response of the circuit. For this reason it was not included in the design. The current driving capability of the circuit without base-current compensation should be sufficient to drive a single VCO with an equivalent of 8 µm of loading. 3.4.2. Final Implementation The development of the transmitter and receiver played a defining role in the design of this VCO. To meet a goal of 20 Gb/s with a quarter-rate architecture, a VCO centered at 5 GHz was needed. A control voltage range from -0.8 V to -1.6 V was chosen because of the solid transfer characteristics, and because those limits correspond to one and two Vbe drops. At the center of the control range a frequency of 5.75 GHz was achieved, corresponding to a 15% safety margin.1 Symmetry was the leading motivation behind the layout of the Simple CS VCO shown in Fig. 3-4. The four stages were laid out in a square with the inputs and outputs facing the center. In this way the interconnect between stages could be limited to a small 1. This safety margin was build in because parasitic simulations were not done prior to fabrication. It was felt that a greater then 10% margin would adequately account for interconnect effects. 25 region in the center of the design. Power and ground rails, as well as the two reference rails 102 µm (aVref, Vref), were placed in closed concentric LM rings around the top. Figure 3-4 Layout of Simple CS VCO Shown above is the layout for the Simple CS VCO. All inputs and outputs face inward to minimize the effects of interconnect parasitics. Symmetry was the most important design requirement. In addition to CS VCOs in the transmitter and receiver a separate test chip containing CS VCOs was also made. This allowed a more straight forward measurement of the VCO’s frequency and gain characteristics. This test chip also included an XOR phase multiplier [3],[4],[20] tree in order to achieve frequencies double and quadruple the nominal 5 GHz. The goal of the multipliers was only to see how high the technology could be pushed. 3.4.3. Testing Results The plot in Fig. 3-5 shows the results from an ideal interconnect simulation, a simulation with capacitive1 interconnect, and measured results from the fabricated circuits. The 20% decrease in speed between the ideal simulation and the measured results is 1. The IBM 1999B SiGe design kit does not include interconnect resistances correctly and typically simulates with a faster response than with capacitance only. Resistance values are also very small and can be ignored for these localized wires. For these reasons, only capacitance was included. 26 immediately obvious. Unfortunately this was larger then the 15% safety margin and resulted in a frequency range that did not meet the 5 GHz center frequency specification. Between a control voltage of -1.6 and -1.4 the measured VCO tracked very closely to expectations, but above -1.4 the VCO response becomes lethargic. This is likely due to too much current in the tree which is causing a reduction in fT faster then the model predicts. 6.5 Frequency (GHz) 6.0 5.5 5.0 4.5 Simulated 4.0 Parasitics Measured 3.5 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 Control Voltage (V) Figure 3-5 Test data from Simple CS VCO Simulation with and without interconnect parasitics, and measured results are shown in this plot. Measured results track closely with the parasitic simulation with low control voltages. 3.4.4. Optimization of Simple CS VCO (post-fabrication) From Fig. 3-5 it is clear that the oscillator under performed and missed the 5 GHz target. This can be directly attributed to initial simulations that did not include resistive and capacitive interconnect parasitics. Although the layout footprint of the VCO is very small and designed to minimize wire lengths, parasitics still presented a significant influence on speed. The receiver VCO has a frequency range of 4.25 GHz to almost 4.9 GHz. Because 20 Gb/s is the target data rate, we would like 5 GHz to fall in the middle of the operating 27 range of both transmit and receive VCOs. Given that the initial design was slow how can it be ensured that the next version will meet specifications? Can the measured and simulated results be used to maximize the likelihood of a successful design? Each of the four VCO stages must be loaded by an identical buffer which then drives subsequent circuitry. By using the smallest transistors, 1 µm, in the buffers, the loading on the VCO will be minimized and its operation will be maximized. Under such conditions the easiest method for increasing frequency response is to increase the power of the delay elements by using larger transistors. This has the immediate effect of reducing the effective loading on each gate and increasing the frequency at a given control voltage. The devices in the first design iteration had 2 µm emitter lengths and were slightly slow, so an increase in emitter length should bring the VCO to within specifications. Fig. 3-6 shows the relationship between frequency response and transistor size used in the delay stages of the VCO. Because interconnect parasitic simulations require a complete layout this simulation uses ideal interconnects. As suspected there is an increase in performance when larger devices are used. 28 10 Frequency (GHz) 9 8 7 10u 6 6u 4u 3u 5 2.5u 2u 4 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 Control Signal (V) Figure 3-6 Frequency Response versus emitter length in delay elements By increasing the emitter lengths and keeping the loading the same, the effective loading is decreased and the performance improves. This simulation does not include interconnect parasitics. It can be seen that a relatively small increase in transistor size from 2 µm to 2.5 µm achieves a 12% increase in speed at a control voltage of -1.5 V. The 2 µm and 2.5 µm delay elements have an effective loading of 0.5 µm/µm and 0.4 µm/µm respectively, representing a 20% decrease. Assuming that the interconnect parasitic effects stays the same or decreases, the 2.5 µm delay elements should bring about a 12% increase in the VCO response. From a range of 4.25 GHz to 4.9 GHz a 12% improvement yields a range of 4.76 GHz to 5.48 GHz, which is well within the specifications. 3.5. Current Starving with Feed Forwarding Some advantages of the four phase simple VCO circuit include: symmetric phases minimizing phase differences, generation of rising edges every 25 ps at 5 GHz, and a large frequency range. The motivation for a new VCO design is to enhance the frequency beyond the limits of this simple design. 29 One method to do this is to use a delay cell that averages the signals from the last two stages as shown in Fig. 3-7 [1],[13],[23], [24]. Stage C accepts inputs from stage B and stage A, stage D accepts from C and B, and so on. The idea is that the average of the previous two signals occurs earlier than just the previous signal. ΦA ΦA ΦB ΦC A ΦD B D ΦB ΦA+Φ ΦB 2 C stage delay delay savings ΦC Figure 3-7 Feed-forward CS VCO block diagram Each stage in the VCO receives signals from the previous stage and the stage preceding that one. Stage A can realize an effective decrease in delay by utilizing the signal from stage C. The inversions to induce oscillations are left out for clarity. Mathematically, the nth element presents its output after the average of the n-1 st and n-2nd element outputs plus the delay of the nth element. Solving for difference between two consecutive stages yields tn – 1 + tn – 2 - + Ti tn = --------------------------2 t n – t n – 1 = 2--- Ti 3 (3-4) (3-5) which shows that the effective gate delay is reduced to two thirds from the intrinsic stage delay, Ti. The intrinsic delay is defined as the delay of the stage if its inputs were tied together and treated as a normal buffer. 30 6 12.0 5 11.5 4 11.0 3 Frequency 10.5 2 Gain 10.0 1 9.5 0 9.0 -1.8 Gain (GHz/V) Frequency (GHz) 12.5 -1 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 Control Voltage (V) Figure 3-8 Feed forward CS VCO frequency response and gain The Feed Forward CS VCO was designed to achieve the highest frequency possible. After optimization is operates at twice the speed of the Simple CS VCO. 3.5.1. Final Implementation An important consideration in the design of the feed-forward delay element is its higher complexity, having two inputs instead of one, which increases the delay. Also, because there are twice as many wires between stages in the feed-forward design the layout will be larger and more limited by interconnect parasitics. With this in mind, the most simple averaging circuit was created that utilized a minimum number of additional transistors and resistors. The final schematic is shown in Fig. 3-9. A description of its operation is as follows: If Q2 and Q4 are on, Q1 and Q3 are off, and signal b arrives first, then signal b will begin to turn Q3 on and Q4 off. This will start to draw current through Rc1. If b were to completely switch then both Rc1 and Rc2 would carry the same current: an undesirable condition in which the output is the average of a one and a zero, which is undefined. The normal operating condition involves b partially switched followed by the beginning of a switch in the a signal. When this occurs more current flows 31 through Rc1 and less current through Rc2. The effective switching input can be said to occur between the two signals, a and j. R c1 z21 a10 Rc2 a11 b10 Q1 Q2 b11 Q3 z20 Q4 aVref Vref Figure 3-9 Feed-forward CS Delay Element This circuit operates by averaging the a and b inputs through common pull-up resistors. The aVref node is varied in order to control the total current through the tree. Lower current corresponds to longer delay. One important characteristic in the two current starving VCO circuits is the choice of collector resistors which affects the output amplitude and the gate delay. An increase in resistance causes an increase in amplitude and an increase in delay because the same amount of current produces a larger voltage swing and a larger RC time delay. The simple CS VCO was designed around an operating frequency of 5 GHz, so a resistance was chosen so that there was a 200 mV - 400 mV swing around 5 GHz. The feed-forward CS VCO, on the other hand, was designed to achieve the highest possible frequency response, so a resistor small enough to maximize the frequency while leaving a 150 mV - 200 mV swing was used. Fig. 3-8 shows the frequency response of the feed-forward CS VCO. 3.5.2. Testing results The feed-forward CS VCO was not used in the first transmitter and receiver design but was implemented in a test chip. It was configured with one load to achieve the smallest loading effect and thus the highest frequency. The simulation and measured results are plotted in Fig. 3-10. 32 12.5 12.0 Frequency (GHz) 11.5 11.0 10.5 10.0 9.5 1 Load Simulated 9.0 4 Loads Simulated 1 Load w/ Parasitics 8.5 1 Load Measured 8.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 Control Voltage (V) Figure 3-10 Testing Data from feed-forward CS VCO The implementation of the Feed Forward Current Starving VCO only had a single load in order to achieve the highest frequency possible. The measured results are only about 4% lower than simulations with interconnect included. Simulations with one load and no parasitics shows a peak frequency of 12 GHz. With parasitics the frequency drops by 6% to 11 GHz which tracks very closely with the measured results. The steep drop off of the measured results at the high end is likely due to a high collector current causing a drop off in the transistor fT that is not accurately accounted for in the models1. 3.6. Conclusions and Future Work The Current Starving VCOs presented in this section are compact and easy to implement but they have some crucial deficiencies. Their performance was about 5% worse than expected from simulations with interconnect parasitics. Feed forwarding allowed a 1. This is supported by information gathered at a meeting at IBM in 1999 concerning measured results from the DARPA 2 run. An IBM device modeler was quoted as saying that the f T curves drop off faster then the models predict. 33 near doubling of speed at the expense of a slightly more complicated circuit. If implemented correctly this additional speed could be traded off for a reduction of noise. With an increase in power supplied to the VCO that was implemented, the desired specifications should be achieved. However, future research into this VCO topology should be limited because its response is difficult to model and it utilizes a delay strategy which is poorly understood. 34 4 Feed Forward Interpolated VCO Transm itter Receiver 4.1. Project History The Feed Forward Interpolated VCO evolved from the Current Starving Feed Forward VCO and replaced all instances of that VCO in the second serdes chip in submitted in March 2000. Additional test structures were added to further exercise this VCO, and an invention disclosure record was submitted to RPI in May, 2000. An RPI provisional patent was awarded in September 2000. 4.2. The Evolution The evolution of the Feed Forward Interpolated, VCO (FFI VCO) began with the Feed Forward Current Starving VCO (FFCS VCO) discussed in Chapter 3. Each stage of the FFCS VCO averaged the output from the previous stage and the stage before that to generate a signal with a smaller effective delay. The averaging was fixed and reduced the delay by 66%. A common approach in the design of a standard ring oscillator stage without feed forwarding is to use delay interpolation as shown in Fig. 4-1. The idea is to split the input signal into a slow and fast path and create a weighted sum of the two to form the output. Common pull-up resistors, level 3 control inputs, and emitter resistors for linearity make this possible. The slow path need only delay the signal longer than the fast path and a simple capacitor can do the trick. The benefits of this VCO stage include a uniform output voltage swing, a fairly linear response, no limits of operation, and easy minimum frequency control through the capacitor. 35 z21 z20 Cs i20 i21 c30 c31 Re Re Figure 4-1 Schematic for Delay Interpolated VCO element This VCO element linearly interpolates, the input signal after traveling through a fast and slow path. The slow path is created with the addition of a capacitor. The vision of the FFI VCO occurred when looking at the Delay Interpolated VCO and realizing that the fast path could be the implemented as the signal from the stage before the previous stage and the slow path could be from the previous stage. This insight immediately eliminated the need for the slow path capacitor, and nearly doubled the speed of the VCO. The FFI VCO is a delay interpolated VCO with the normal and delayed signals created from different stages rather than from within each stage. This forces each stage to have two inputs rather than one and eliminates the need for the slow path capacitor. The schematic for the FFI stage can be found in Fig.4-7 on pa ge44. 4.3. Basic Operation On a block diagram level, the FFI VCO looks identical to the Feed Forward Current Starving VCO shown in Fig. 4-2. The difference is in the method used to control the delay though each stage. The FFCS VCO controls delay by varying the current through its buffer which is directly related to the delay through its gate. The feed forward technique simply 36 reduces the effective gate delay by about 33%. The FFI VCO, on the other hand, linearly interpolates the signals received from the previous two stages. The current, which remains the same through the tree, is gradually shifted between the two inputs, p and l, as shown in Fig. 4-7. The p (previous) input arrives from one stage back, and the l (leap) input arrives from the stage prior to that. The two signals are weighted by the control signal and summed by the common pull-up resistors. The final result is the frequency response shown in Fig. 4-4. A n n-1 B D n-2 C Figure 4-2 Feed Forward VCO block diagram Each stage in the VCO receives signals from the previous stage and the stage preceding that one. Stage A can realize an effective decrease in delay by utilizing the signal from stage C. (The inversions, to induce oscillations, are left out for clarity) A A B B D D C C (a) (b) Figure 4-3 FFI VCO under boundary conditions Diagram (a) shows the VCO running in the four stage configuration with the control voltage set to a minimum value. Diagram (b) shows the VCO in the two stage configuration, at the maximum control voltage. The minimum operating frequency is defined by the oscillation of the system when the leap signal is ignored, and only the previous signal is used. In this case, the system is 37 running as a four stage oscillator and has a frequency of about 3.9 GHz. When the control voltage is switched in the other direction, the leap signal is used, and the previous stage’s output is ignored. In this configuration the system is running as two separate two stage ring oscillators with a frequency of approximately 7.9 GHz. These two cases are depicted in Fig. 4-3. It is useful to look at the system in terms of an effective delay for all control voltages between the minimum and maximum values. 8.0 Frequency (GHz) 7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Control Voltage (V) Figure 4-4 Feed-forward interpolated simulated response The frequency response of the FFI VCO is linear across a large range from 4.75 GHz to 7.00 GHz. System gain is flat across the operating range. The effective delay of a stage is defined to be the delay of a stage in a four stage oscillator that has the same frequency as the feed forward oscillator. This parameter can be found by setting the intrinsic delay of a stage to T, setting s equal to the weighting factor between 0 and 1, and looking at the output transition times of stages n, n-1, and n-2. The weighting factor is a constant that indicates how much of the leap signal is being used. Set to 0 the ring acts as a normal 4 stage oscillator, and set to 1 the ring acts as a 2 stage oscillator. The edge time of stage n is given by t n = T + st n – 2 + ( 1 – s )t n – 1 (4-1) which is the intrinsic delay through the stage, plus the weighted sum of the previous two 38 stages. Solving for the time difference between two stages yields T T eff = t n – tn – 1 = ---------------(1 + s) (4-2) (1 + s) 1 f vco = ----------- = ---------------8T 8T eff (4-3) which is the effective delay and the frequency of the VCO in terms of the effective and intrinsic delay of each stage. The factor of eight is needed because it takes two complete cycles through four stages to equal one period of the VCO. For s equal to 0, the effective delay is equal to the intrinsic delay of the stage. At the other extreme, when s equals 1, the effective delay is one half of the intrinsic delay. This makes sense because the system in this configuration has two stages rather then four. (4-3) also shows that in the Feed Forward CS VCO, where s is fixed at 0.5 has an effective delay equal to (as 2 ⁄was 3 )Tshown previously. The benefits of the FFI VCO are numerous and represents many improvements over the previously discussed designs. The use of feed forward techniques allows the VCO to exceed the maximum frequency achievable by a simple four stage ring oscillator. This is extremely important if a solid high speed eight phase VCO is required. Fig. 4-4 shows a linear frequency range from -0.2 V to 0.2 V. This linear range is very important when designing phase locked loops, because linearity results in simple closed form solutions. In addition, this VCO has a response with an obvious center and with limits approaching a asymptotic minimum or maximum. In contrast, the CS VCO will stop operating below a certain frequency. Although a control voltage would never be driven to such extreme values as to cause malfunction, this can happen in PLLs during power up. Often an integrator, or capacitor that is never guaranteed to have a specific voltage, will be attached to the VCO control inputs. If it has a poor initial condition, which is maintained by a non-oscillating VCO, then the system will become unstable. It is therefore important to provide the largest control voltage range possible that will still allow the VCO to oscillate. Current through the FFI stage is linearly switched between the previous and feed forward stages. This forces the total current running through the stage to remain constant. 39 This is important for keeping a constant voltage swing, which ensures consistent operation in a system where a variation in voltage swing would cause a change in frequency. The SNR is also dependent on the output voltage swing, which if varying, can complicate the analysis. This is the problem encountered with the CS VCO described in Chapter 3. Differential signaling is used for the control input and throughout the rest of this design. This is crucial when designing for low noise operations since differential wires have strong common-mode rejection. One exciting feature of the FFI VCO, that will be examined in detail in the next section is the extraordinary capacity for customization of this circuit. First, by controlling the linearity through emitter resistors, different frequency gains can be used. (Fig. 4-8) Second, a capacitor at the top of the tree controls the center frequency point. (Fig. 4-9) Third, resistors exist to limit the frequency range and prevent stage decoupling. (Fig. 4-10). One minor drawback to this design is the slightly larger layout footprint. The cascode amplifiers introduce four addition transistors and if a large capacitor is necessary then a large amount of space may be required. 4.4. Stage Decoupling A serious problem exists in the FFI VCO if the weighting factor is pushed to the maximum value of 1. In this case, each stage, n, is only using the signal from the n-2nd stage as depicted in Fig. 4-3(b). The VCO now appears and operates as two completely independent oscillators. The phase difference between each consecutive stage is no longer constant and may fluctuate wildly. This undesirable effect is called stage decoupling and must be addressed in VCO design. The model used to analyze this situation uses an ideal FFI VCO in which one stage has a different delay. This modified delay represents the sum of maximum individual delay excursions that may exist in the real VCO due to unbalanced loading effects, process 40 variations, and signal noise. The stage transfer functions are shown as a n = T + sc n – 1 + ( 1 – s )d n – 1 + N (4-4) b n = T + sd n – 1 + ( 1 – s )a n (4-5) c n = T + sa n + ( 1 – s )b n (4-6) d n = T + sb n + ( 1 – s )c n (4-7) with stage a receiving the additional delay of N. The time at an output change for each stage is represented by a letter and a subscript where the letter is the stage and the subscript is the nth output change from that stage. The output edges appear in time order described by { a0, b 0, c 0, d0, a 1, …d 1, …, d n, … }. (4-8) The next step is to look at the time between successive outputs from any one stage, ( 4T + N ) a n + 1 – a n = --------------------s+1 (4-9) which is simply the sum of the effective delays of the four stages. (4-9) is the same for all stages, even though N only occurs in stage a, under the condition that stage decoupling has not occurred. Solving for the time difference between the output of stage a and the output of stage b using (4-4) through (4-9), yields T N a n – d n – 1 = ----------- + -------------4 s+1 1–s (4-10) T sN b n – an = ----------- – -------------4 s+1 1–s (4-11) 2 T s N c n – bn = ----------- + -------------4 s+1 1–s 3 T s N d n – c n = ----------- – -------------4 s+1 1–s which are the desired solutions. 41 (4-12) (4-13) Normalized Effective Delay 1.5 ad 1 cb N=0 0.5 ba dc 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Weighting Factor Figure 4-5 Delay versus weighting factor with single stage imbalance With non-ideal delay stages used in the FFI VCO, stage decoupling (effective delay goes to zero) can occur when the weighting factor is too high. This is because the VCO acts as two independent 2 stage oscillators instead of one 4 stage oscillator. These equations are in the form of the effective delay plus a factor for the unbalanced delay N. The delay between stages c and b; and between a and d increases rapidly as s approaches 1, and the delay between stages d and c; and between b and a decreases rapidly under the same condition. This divergence is expected because the sum of the four delays follows very closely with the effective delay curve when there is no unbalanced delay. This effect is plotted in Fig. 4-5. Also shown is the curve for all inter-stage delays when no extra delay is introduced. The divergence between the nominal curve and each of the unbalanced curves can be clearly seen. Each stage is affected by the additional delay, but when analyzing stage decoupling it is only necessary to look at bn - an. The delay ba is the most seriously affected of all the delays because it is relative to the output of the stage with the additional delay included. The condition when stage decoupling occurs is when ba goes to 0 and the output of stage b coincides with the output of stage a. Although the equations are continuous at this point, reasonable operation dictates that stage output times should be sequential. 42 Stage Decoupling (s) 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Normalized Additional Delay (N/T) Figure 4-6 Decoupling versus delay injection When an unbalanced delay is injected into a single stage, decoupling between stages occurs when the weighting factor reaches a specific value. In (4-9) with ba set equal to 0 and solving for s yields the weighting factor for stage decoupling for a specific value of N. This solution is shown in Fig. 4-6. As the injected delay increases the point at which stage decoupling occurs departs from the maximum value of 1. The effect of stage decoupling is clearly a problem and results in a VCO that operates improperly. To avoid this problem, the weighting factor must be limited to a value less than that given in Fig. 4-6, based upon a maximum expected delay injection from noise sources and parameter variations. For example, if a maximum 10% deviation is expected (extremely large value), then s must be kept below approximately 0.95. In practice this VCO has a very large operating range which can be sacrificed to prevent stage decoupling.1 1. For the final implementation of this system s was kept below 0.8 to introduce a huge safety margin in which no decoupling will occur. 43 4.5. Circuit Implementation and Analysis Rc Rc Cc z11 z10 p21 l20 l21 z21 z20 p20 Rb Rb c30 c31 Re Re Is Figure 4-7 Schematic for FFI VCO element This VCO element linearly interpolates, through the control voltage (c), the signals from the previous buffer (p) and the buffer previous to that (l). Rb limits the operating range of the VCO, Re adjusts the control voltage range, and Cc defines the center of the operating range. The circuit shown in Fig. 4-7 represents one element of the FFI VCO. It is a three input pseudo-buffer, with emitter follower outputs. The control signal, c, is common between all stages and must be on level 3. The input l (leap) and p (previous) signals are on level 2 which is matched to the output level. Collector resistors, Rc, are set to generate a 250 mV voltage swing. The current sources were chosen to maximize the fT of all transistors. Transistor sizing is a very important parameter when designing such circuits and further details are shown in Appendix Appendix D. on page 173. Each stage in this VCO drives two identical stages and the external circuitry, which typically consists of four minimally sized buffers. For a VCO stage with x µm sized transistors, the external buffer appears as a 1/x effective load, and is 1/(2x+1) the total load driven per stage. If 1 µm transistors are used, the buffer becomes 33% of the load. If, however, 10 µm transistors are used then the buffer becomes a nominal 4.6% of the load. So for larger VCO stages, the external buffer becomes more invisible, but uses more power and physical space. A 44 compromise using 4 µm transistors per gate was chosen which has external loads of 11% of the total. Another design challenge, for maximizing frequency response, is to size the differential amplifier transistors independently of the emitter follower transistors. Please see Appendix Appendix D. on page 173 for a detailed analysis. This approach was not deemed necessary because design specifications of 5 and 10 GHz were easily met without optimization. 4.5.1. Cascode amplifiers Above the level 2 differential amplifiers are cascode, or common base amplifiers. They provide a low input load resistance to the common emitter differential amplifier and act as a impedance transformer. Some delay is introduced by their presence but this is offset by an increase in driving ability and an isolation from the capacitor, Cc. This isolation helps to ensure a linear relationship between the increase in Cc and the increase in delay. The cascode amplifiers also help to reduce phase noise by providing a low impedance output which limits the effect noise has on the phase. 4.5.2. Emitter Resistor for linearity and gain adjustment An ideal differential amplifier has infinite gain, is digital in nature, and requires only that one input is greater then the other for switching. Real bipolar amplifiers are not ideal and possess a high gain approaching 6 (See Appendix C.1. on page 164). High gain is undesirable when designing PLLs because the VCO will generate more noise and loop filters will require smaller bandwidths. Without modification, a small change in the control voltage would cause a large change in current. The solution is to include emitter degeneracy resistors, Re, which reduce the gain and produce a more linear transfer function. A complete analysis of a differential amplifier with emitter resistors is presented in Appendix C.1. on page 164. The value of Re was chosen based upon the desired control voltage range of ±0.2 V, the linearity across that range, and the frequency range. Fig. 4-8 shows the frequency response of the VCO as a function of the emitter resistors. Values of Re below approximately 300 Ω−µm are non-linear at the extremes and produce a gain which is 45 relatively large. Re values above 500 Ω−µm are quite linear but have a limited frequency range, and produce a small gain. As opposed to high gain, small gain and therefore limited frequency range, limits the PLLs in their ability to reach target frequency specifications under all environmental and processing conditions. A trade-off exists between a high and low resistor value and depends on the needs of the circuit. 7.5 7.0 Frequency (GHz) 6.5 Note: resistor values are normalized to the size of transistors in µ m. 6.0 0 Ω -µ m 200 Ω -µ m 400 Ω -µ m 600 Ω -µ m 800 Ω -µ m 5.5 5.0 4.5 4.0 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 Control Voltage (V) Figure 4-8 FFI VCO frequency versus emitter resistance By adjusting the emitter resistor, Re, the gain of the VCO can be controlled. A higher resistance decreases the gain. 4.5.3. Center capacitor to control frequency range center The capacitor, Cc, between the level 1 outputs is parasitic in nature and used only to degrade the performance of the circuit. Increasing its size causes an increase in the delay through the gate, which corresponds to a decrease in frequency. This component is very useful in centering the frequency range to a given specification; simulation results are shown in Fig. 4-9. The disadvantage of using this component arises when very low frequencies are needed, because this requires a large capacitor. Large capacitive elements require significant amount of space, and because each of four stages needs one, their size can become prohibitive. Fortunately for frequency centers from 2 GHz through 8 GHz the component size is quite reasonable. 46 Note: capacitor valuesare normalized to the size of transistors in µ m. 16 1.0 10 0.8 6.3 0.6 4.0 0.4 2.5 0 fF / µ m 25 fF / µ m 50 fF / µ m 100 fF / µ m 150 fF / µ m 250 fF / µ m 0.2 0.0 -0.2 -0.4 Frequency (GHz) Frequency log( GHz ) 1.2 1.6 1.0 0.6 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Control Voltage (V) Figure 4-9 FFI VCO frequency versus centering capacitor A frequency centering capacitor, Cc, is added to increase the delay of the stage in order to move the frequency range to within specifications. 4.5.4. Bypass resistor to prevent stage decoupling The last and perhaps most important element to be discussed are the bypass resistors, Rb. Their necessity, discussed in Sec. 4.4. on page40, is to prevent stage decoupling from occurring by limiting a full switching of current in the tree. In addition to adding decoupling stability to the VCO, these elements can also be used to limit the frequency range while keeping the gain nearly constant. See Fig. 4-10 for the frequency response of the VCO given different values of Rb. The bypass resistor is tied to the collector of the control input transistors and the top of the current source. Each node is kept at a nearly constant voltage because the bases from the level above fix their emitter voltages. Since the voltage across the resistor is constant the current through it will also be constant. This ensures that some current from the active current source will always flow through both branches of the tree and thus prevent a complete depletion of current through the branch. A smaller resistor will allow more 47 current to flow and, in the limit, the control transistors will be completely bypassed and both branches will receive exactly equal current. A complete analysis of this effect is detailed in Section C.2. on page166. 8.0 7.5 Note: resistor values are normalized to the size of transistors in µ m. Frequency (GHz) 7.0 6.5 6.0 5.5 1.6 kΩ -µ m 2.4 kΩ -µ m 4.0 kΩ -µ m 8.0 kΩ -µ m 5.0 4.5 4.0 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 Control Voltage (V) Figure 4-10 FFI VCO frequency versus bypass resistance By adjusting the bypass resistor, Rb, the maximum current through each branch can be limited. This resistor prevents stage decoupling and allows frequency range control. 4.6. System Analysis The frequency profile of the FFI VCO is a function of the various circuit parameters including nominal stage delay, To, Rb, Re, and Cc. If Rb is removed, Re is set to 0 and Cc is set to center at 6.0 GHz then Fig. 4-11 shows the frequency response. The range is from 3.9 GHz to 7.9 GHz, which is a one octave range. The period of the VCO is governed by (4-3) which yields 4T when s = 0 and 8T when s=1, thus the octave range. The addition of the other circuit components only decreases this range. A more comprehensive look at the total system response requires an analysis of the modified differential amplifier and the relationship between the weighting factor s, and the current switching between branches. Fig. 4-12 shows a diagram of the VCO frequency profile as a function of control voltage. The three primary curve parameters are: the 48 frequency range, the center frequency, and the gain at the center frequency. Mathematical models describing each of these parameters can be found in the following sections. 8.0 Frequency (GHz) 7.5 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Control Voltage (V) frequency range Figure 4-11 FFI VCO Frequency Range This is the response when Rb is removed, Re set to 0, and Cc is set to give a 6 GHz center frequency. gain control voltage center frequency Figure 4-12 FFI VCO System from control voltage to frequency An analysis of the FFI system should incorporate a study of the circuit response and the dynamics of the top-level architecture. 4.6.1. Branch current to frequency Relating circuit parameters such as Rb and Re to the frequency profile involves a circuit level description of the differential amplifier. Circuit level analysis are often expressed as differential branch current output and as such do not relate to frequency. Relating branch current to frequency is necessary to achieve the final transfer function. 49 From (4-3) we can find the frequency relative to the weighting factor s, which is directly related to the current by iL – iP 1 s = --- 1 + --------------- Io 2 (4-14) id 1 1 f = --------- 3 + ---- = ------ ( s + 1 ) Io 8T 16T (4-15) where T is the intrinsic stage delay, iL is the current through the branch that accepts input from the “leaped” branch, iP is the current from the “previous” branch, and id is the differential current. This relationship is confirmed in Fig. 4-13 where the simulated frequency versus current are shown along with the results from (4-3) and (4-14). Results at a weighting value of 0.5 show the largest slope difference between the analytical model and simulation. This slope difference is important when analyzing the frequency gain and a factor, α, is introduced to compensate. Taken directly from Fig. 4-13, α has a value of 1.3. 8.0 35 7.0 30 Analytical 6.0 25 5.0 20 4.0 15 3.0 0.00 0.25 0.50 0.75 Weighting Factor (s) Figure 4-13 Simulated versus analytical response of the FFI Architecture The gray, dashed lines represent simulated frequency response for varying branch currents, and the black continuous lines represent the analytical expectation. 50 10 1.00 Effective Delay (ps) Frequency (GHz) Simulated 4.6.2. Center frequency and intrinsic stage delay The center frequency is directly related to the intrinsic stage delay by (4-3) when s is set to 0.5. The intrinsic delay can be accurately modeled by the results presented in Appendix C.3. on page171. The center frequency is modeled as 3 f c = ---------------------------------------------------------16 ( T o + ln ( 2 ) ( 2 Rc C c ) ) (4-16) and is validated in Fig. 4-14. Intrinsic stage delay is also plotted in Fig. 4-14 because these values are needed for the frequency gain and frequency range models. The nominal delay, To, found through simulation, is 21 ps. 4.6.3. Frequency gain at the center frequency 180 12 Frequency (GHz) 10 Intrinsic Stage Delay (ps) Simulated Modeled 8 6 4 2 160 140 120 100 80 60 40 Simulated Modeled 20 0 0 0 50 100 150 200 250 0 50 100 150 200 250 Normalized Capacitance (fF/um) Normalized Capacitance (fF/um) Figure 4-14 Center frequency simulation and model The modeled and simulated intrinsic stage delay and VCO center frequency are shown here. The modeled results follow the simulated results closely. The analytical model for current gain as a function of Rb and Re is solved in Appendix C.1. on page 164, and Appendix C.2. on page 166. To find input voltage to output frequency gain, two elements are needed: the voltage to current gain and the current to frequency gain. The former was solved in (C-12) on page 169, and the latter determined by 51 differentiating (4-15) and substituting the intrinsic delay equation (C-14) on page171. The di d df df α 1 -------- = -------- ⋅ ------- = ----------------------------------------------------- -------------------------------------------------------- 2γv R 16 ( T + 0.7 ( 2 R C ) )I dv d dv d di d o o T b c c --------------------------- + R e || R b R I o – 2v be (4-17) b result is which includes all circuit parameters: Rb, Re, Rc, Cc, Io, and the nominal stage delay To. α is also included to compensate for the weighting factor and frequency gain difference between the simulated and analytical results. 4.6.4. Frequency Range The frequency range of the FFI VCO is mainly governed by the bypass resistor and partially governed by the emitter resistor. Appendix C.2. on page166 describes how these parameters limit the differential current through each branch in the VCO stage. This current is related to the maximum frequency, fmax, through (4-15), where id is replaced with id,max, which is found in (C-5) on page 167. Taking this value, subtracting the center frequency fc, and multiplying by two yields the frequency range, frange. Using the intrinsic delay relationship from (C-14) on page171 and (4-15), yields f range v d v – ---R R – I ( + ) o e b i d, max be 2 = 2 ( f max – fc ) = -------------- = ----------------------------------------------------------------------------------- . 8I o T 8Io ( R b + 2 R e ) ( T o + 0.7 ( 2 R c C c ) ) (4-18) vd should be set to the maximum differential voltage that is allowed during normal operation of the VCO. 4.7. Phase Noise The phase noise of an oscillator is an extremely important consideration during the design phase. VCO phase noise and phase jitter directly affect system performance. In serial communication circuits, a bit stream is generated with the time between transitions defined by the jitter in the VCO and the PLL. The transport mechanism, which includes the wire and buffering circuits, also introduce noise, which appears as phase jitter. The larger 52 the jitter at the receiver, the more difficulty the PLL will have tracking the data and consequently, data corruption will increase. It is therefore imperative to minimize jitter at the source to ensure maximum data throughput [15]. 4.7.1. The Impulse Sensitivity Function Noise in circuits is typically related to thermal, device: (shot and flicker), or external effects. The relationship of the effects to phase noise can be quite complicated and difficult to solve analytically. A straightforward method that involves an analytical foundation and some simulation utilizes the impulse sensitivity function (ISF) [18]. It yields a closed form solution relating circuit noise to phase noise. Circuit noise appears as either amplitude or phase variations in the output of oscillators. When dealing with “digital” ring oscillators, the amplitude variations are small because of the limiting nature of the circuits. Phase variations, on the other hand, are governed by ∆q ∆φ = Γ ( ω o, t ) -------------q swing (4-19) where ∆q is a charge step applied to a specific node,qswing is the nominal charge swing on that node (qswing = Cnode Vswing), and Γ(ωo,t) is the ISF. Γ(ωo,t) can be considered as the normalized phase response of the VCO given a current pulse at a specific point in the output. The ISF is large when a current pulse causes a large change in phase and small when the ISF causes a small phase change. Fig. 4-15 shows an example of the effect on phase for two current pulses of the same size but in different positions. The case on the left applies the pulse during the rising edge, and effectively increases the rise time and decreases the phase. The pulse applied to the flat portion of the curve shows little or no phase change, because the circuit restores the initial value before the edge arrives. 53 current pulse has small phase effect current pulse has large phase effect Figure 4-15 Current pulse effect on phase A current pulse, or charge step applied to a node in the VCO will have a phase effect depending on the temporal location of the pulse. Fig. 4-16 shows the simulated ISF for the FFI VCO and the values of the output at the time that the current pulse is applied. The response appears as it should, with an increase during the rising edge, a decrease during the falling edge and a zero when the output is constant. This form is very similar to the derivative of the waveform function. The important values garnered from these results are the dc and rms values of the ISF. The rms value of 0.077 is used to determine the phase noise and the non-zero dc value of 0.001 shows the upconversion of low frequency noise to base band noise. The rms value of the ISF is only meaningful when compared against other similar ring oscillators. Fig. 4-17 shows various oscillators and their associated rms values. The single ended and differential points are CMOS rings tuned to maintain a constant frequency that is independent of the number of stages. Their values drop with increasing N because each stage’s transitions represent a smaller fraction of the total period and thus have smaller effects on the ISF. The CS (Current Starving) oscillator shows a reasonable match with the other differential oscillators. The FFI oscillator, on the other hand, shows a much lower ISF when compared to systems with the same number of stages. This has important ramifications in the total phase noise and is discussed further in Section4.7.3. 54 0.40 -0.85 ISF 0.30 -0.9 Waveform Waveform -0.95 0.10 -1 0.00 -1.05 ISF 0.20 0 1 2 3 4 5 6 -0.10 -1.1 -0.20 -1.15 -0.30 -1.2 -0.40 -1.25 -0.50 -1.3 Normalized Time (rad/T) Figure 4-16 Simulated ISF for FFI VCO and output waveform The FFI VCO ISF is shown here along with the waveform at the point that the pulse is applied. 1.0 rms value of ISF SE DE CS 0.2 0.1 FFI 3 4 Number of Stages (N) Figure 4-17 ISF rms values for various ring oscillators Shown in this plot are the rms values for the FFI, CS (Current Starving), CMOS differential (DE), and CMOS single ended (SE) ring oscillators. 55 10 Waveform Voltage (V) ISF 4.7.2. Solving for phase noise Using the superposition integral, the phase response for any injected noise current i(t) is equal to t φ(t) = ∫ –∞ Γ ( ω o, τ ) --------------------- i ( τ ) dτ q swing . (4-20) The single-sideband phase-noise spectrum due to a white-noise current source is given by [18] 2 i n2 ⁄ ∆f Γ rms - ⋅ -------------L { ω off } = -----------2 2 4ω off q swing (4-21) where Γrms is the rms value of the ISF, i n2 ⁄ ∆f is the single-sideband power spectral density of the noise current source, and ωoff is the offset from the carrier. Noise in the FFI circuit element shown in Fig. 4-7 is generated primarily by HBT shot noise and resistor thermal noise. The nodes of interest, those generating the most noise and the most sensitive to current fluctuations, are the level one outputs, z10, and z11. The level 2 outputs do introduce twice the shot noise but are less susceptible to current induced phase variations because of their low output resistance and strong restoring force. The single-sideband power spectral density (PSD) for the resistor noise and the collector shot noise is i2 ----n- = 4kTG + 2q e Ic ∆f (4-22) where G is the conductance of the pull-up resistors, and Ic is the current though the collector which is half the tail current. Further refinement of (4-21) and (4-22), and substitution of values for temperature, resistance, and current for optimal operation, yields 2 A 2( N )l- ⋅ ∆φ rms ---------------- ⋅ 161 × 10 –24 -----L { ω off } = -----------2 Hz ∆q 2 2ω off (4-23) where N is the number of stages, l is the length of transistors in µm, and ∆φrms is the rms phase deviation with a simulated charge injection of ∆q. 56 Using (4-23) at a frequency offset of 1 MHz, the FFI VCO has a phase noise value of -93.0 dBc/Hz and the CS VCO has a phase noise value of -79.1 dBc/Hz. If cascode amplifiers are added to the CS VCO to achieve a more accurate comparison, the phase noise decreases to -85.1 dBc/Hz. Both VCOs have the about same center frequency1 and both consume the same amount of power. 4.7.3. Phase noise comparison between the FFI and CS VCOs The benefit achieved by using the FFI architecture for VCO design, rather than a standard ring VCO, is at least 8 dBc/Hz of noise reduction. This improvement is quite compelling because it comes without the need for additional power. There are two main factors which contribute to the noise reduction. The FFI VCO has a higher frequency because of the incorporation of a novel architecture. This higher frequency can be traded off for an increase in level one capacitance. Capacitance was added to each stage to weaken its speed and bring it in line with the speed of a standard ring oscillator. Additional capacitance helps to absorb current noise by decreasing the bandwidth on the outputs. It essentially softens the voltage spike caused by an insertion of charge at the output node. The CS VCO, for example, has a level one capacitance of 28 fF and the FFI VCO has a capacitance of about 180 fF. The second effect is a result of the averaging that occurs between the two inputs to each gate. Any noise disturbance on one input is offset by averaging and results in a change of 66% from the unaveraged expected result. At first it would appear that the effect should only be a 50% but because of the propagation of the effect through multiple averages, the progression leads to a 66% change. This factor of two thirds corresponds to a 2.2 dBc/Hz decrease in the overall phase noise. 1. The center frequency of the CS VCO is actually about 70% that of the FFI VCO. If properly matched the noise value gap between the two will only widen because of the larger capacitor required by the FFI. 57 4.8. Jitter Jitter in a ring VCO is generated by four primary noise sources within each variable delay element: thermal noise from the collector resistors, tail current noise, sampling of input noise by switching of differential pairs, and noise at the VCO input [17], [18]. κ is used as a time domain figure of merit relating the standard deviation of a transition over a fixed amount of time σt -. κ = ---------∆T (4-24) Each noise source contributes to the total κ as described in detail in [19]. This equation is valid for all time in the open loop case and valid for time less then the loop time constant in the closed loop PLL, case. In this VCO, the noise generating sources in the delay element are frequency independent due to the nature of the frequency control. Thermal noise from the collector resistors remains constant because the capacitance and resistance remain constant. Noise introduced by the degenerate tail current source also remains fixed. The input differential pair noise is dependent on the amount of current through the pair, which is linearly switched between the inputs. Since the total current remains constant, the total noise contribution from each pair will remain approximately constant. For these reasons, the jitter introduced by one stage remains constant over all frequencies. Although noise induced jitter per stage remains the same, the total jitter per transition depends strongly on the transition interpolating ability of the VCO. When the VCO is operating in the four stage mode, the jitter in one period is a result of the jitter from all four stages. However, as the weighting factor is shifted to favor the feed-forward signal, the jitter introduced during a full period is only from two stage elements rather than four. 58 The result after including (4-2) is that κ varies according to σt κ = ---------------------------------------. ω ( 1 + s ) ∆T -----ωo (4-25) The factor of ω/ωo is added to normalize in terms of transitions independent of the frequency. Using (4-3) and solving for s as a function of the frequency fraction gives 3ω s ≈ --------- – 1 2ω o (4-26) and substituting (4-26) in (4-25) yields ω κ ≈ 2--- -----o- κ o 3ω (4-27) where κο is the nominal jitter constant for an identical ring oscillator without feed-forward interpolation, ωo is the center frequency and ∆T is the time over which the open loop jitter is being measured. This equation is graphed in Fig. 4-26. Using the derivation in [19] and the data in Table 4-1 yields a κο of 18 n s . Through calculation and simulation it was found that the largest contributor to overall jitter was from the input differential pairs and the emitter followers. 59 Table 4-1 Circuit parameters for calculating jitter. Parameter Value Re 100 Ω Rc 100 Ω Iee 3.2 mA Ko 5.5 GHz/V en(vco) 4.6 nV Hz/ 152 Ω x 8 Rbase 4 inputs 4 followers 4.9. Interconnect Parasitic Simulations Interconnect parasitics are increasing in importance in the design of high speed circuits. In slower, larger circuit the capacitance and resistance of the interconnect was dwarfed by device parameters. Now, with very small devices, this is no longer true and interconnect parameters are as large, or larger than device parameters. Also, with an increase in operating frequencies, speed of light propagation time becomes a larger fraction of the overall cycle time. In general, the effect of non-ideal interconnect is an increase in delay through the wires. This is crucial for ring oscillators, since the operation of the circuit requires stringent control over the delay. If properly simulated and accounted for, an underperforming VCO can be avoided. An oscillator that achieves significantly higher “ideal” speeds then specified is required. It is not uncommon for interconnect to decay speeds by as much as 10% to 20%. To ensure operation at 5 GHz, the FFI VCO was designed with a 20% safety margin. To do this, the circuit was designed to run at 6 GHz without interconnect effects included. This safety margin, in addition to the already large frequency range, assures proper 60 operation at 5 GHz. Only with a 20% interconnect effect and a 20% decay from other negative effects will the VCO fail to meet the specifications. Fig. 4-18 shows the effect on the frequency response before and after adding interconnect capacitance.1 The performance drops a uniform a 12%. Larger effects were seen in the Current Starving VCO because of smaller transistor size and the resulting larger percentage of interconnect to total capacitance. 7.5 7.0 Frequency (GHz) 6.5 6.0 5.5 5.0 4.5 No Parasitics 4.0 Capacitive Parasitics 3.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 Control Voltage (V) Figure 4-18 FFI with capacitive interconnect parasitics The introduction of interconnect parasitics reduces the performance of high speed circuits. When designing a ring oscillator it is absolutely necessary to include these effects. 4.10.HDL Model A transistor level model of this ring oscillator includes 60 active devices and 12 devices for the required balancing loads. If a frequency divider is needed, such as the 1/8 in the transmitter frequency synthesizer, 54 additional devices are needed to represent the 1. The IBM 1999B SiGe design kit does not account for interconnect resistances correctly and typically shows a faster response than with capacitance only. Resistance values are also very small for these localized wires. For these reasons, only capacitance was included. 61 entire VCO. The processing and time limitations imposed on simulating 126 devices is prohibitive and limits design iteration. The solution to this problem was to create an analog Hardware Description Language, HDL, model of the VCO [40]. Spectre HDL, a Cadence package, was used because it is tightly integrated into Cadence and is very similar to VerilogA which is the leading analog HDL. The code for the VCO, shown in Appendix Appendix E.1. on page 179, was modeled after the simulation data in Fig. 4-18. Input loading effects were included in the model so that no addition circuit needed to be added. The output was also inaccurately modeled as a sine wave and was buffered by a small buffer to transform the signal into something more representative of a real signal. The time associated with simulating the transmitter PLL was reduced by about 60% with very little effect on accuracy. The extra time allowed more frequent design iterations, since each highly accurate simulation usually takes hours to run. Another benefit of the HDL model is the ability to extract parameter values such as instantaneous frequency and phase, which was extremely helpful in analyzing the PLL. With a transistor level simulation these values are hidden. 4.11.Final Implementation The final implementation of the FFI VCO was used in revision two (Serdes II) of the transmitter and receiver and in the FFI VCO test chip. The specifications were based on the goal of a 20 Gb/s communication system, and the architecture for that system. 4.11.1. Circuit Parameters Both the transmitter (4-1 multiplexer core) and receiver (twice oversampling core) required a quarter-frequency clock thereby forcing a VCO with a 5 GHz center frequency. To remain conservative and ensure that the 5 GHz specification will be reached 4 µm transistors were used, and a centering capacitor, Cc, of 19 fF/µm or 76 fF was chosen. (see Fig. 4-14 on page 51) Under ideal simulation conditions this put the center frequency at 6.2 GHz, and when parasitics are included, at 5.4 GHz. The specification for frequency range was partially dictated by the uncertainty of achieving the 5 GHz center. Process variations, interconnect parasitics, model inaccuracies, 62 and other simulation difficulties necessitated a large range to ensure that any center frequency deviation could still achieve 5 GHz. In addition, because the bypass resistors intended to control stage decoupling also affect the frequency range, their effect must be considered. The decision was made to maximize the frequency range (see Fig. 4-10 on page 48) while having a conservative response to the stage decoupling problem. The value of Rb was chosen to be 6.4 kΩ-µm, yielding a VCO possessing a large range, and a strong decoupling prevention. The gain of the VCO was chosen based upon the input control voltage swing and the need to provide a linear response across all control values. Since a reasonable voltage swing for CML circuits is 250 mV, as noted in Appendix C, a range corresponding to this swing was chosen for the VCO. This yielded a value of Re equal to 400 Ω−µm. In addition to the 5 GHz VCO a high speed 10 GHz VCO was also designed for test within the Serdes II chip. It had no centering capacitor so that a maximum frequency could be achieved. The ultimate goal was to see if this faster 10 GHz VCO could be used to design a 40 Gb/s communication system. 4.11.2. Layout Considerations A poor layout can result in an underperforming circuit, consequently, layout preparation is an extremely important design concern. Proper layout of a ring oscillator minimizes noise, and interconnect parasitic effects. In addition, because these oscillators generate considerable “digital” noise it is crucial to isolate them from nearby analog circuits. The first goal in the FFI layout, see Fig. 4-19, was to minimize the number of interstage wires and make them symmetrical to guarantee uniform phase spacing. The solution was to design a single compact stage and position the four of the stages around a center with input and outputs in the middle. This provided perfect symmetry and minimal interconnect but required four unique orientations of the devices. Differing orientation introduces directional process variations into the design, but symmetry appeared to be the more important factor. Substrate coupling1 and power supply noise, although partially offset by the differential nature of the circuit topology, is important to address. Substrate noise can occur 63 from external as well as internal circuits. Minimizing external substrate noise, and internal switching effects on external circuits involved the design of a deep trench moat with a substrate contact ring along the inside, as shown in Fig. 4-20. This act provided a ground return path for the enclosed circuitry to the substrate contacts and minimized coupling outside the ring due to the large path around the deep trench. This is critical for this VCO because of its high frequency, multi-phase digital signals that are often near low-noise analog loop filters in PLLs. The compact design also forces substrate noise to appear as one common mode source, thus minimizing its influence. deep trench moat substrate ring (grounded) 225 µm centering capacitor 171 µm power ground rails Figure 4-19 FFI Layout Shown here is the final layout of the FFI VCO. Outputs can be taken from the center or the edges of the block. 1. Substrate noise in this SiGe technology is of particular importance because of the substrate’s lightly doped nature. 64 substrate contact deep trench DT short ground return path internal circuitry external circuitry silicon surface Figure 4-20 Reducing substrate coupling By using a deep trench moat and substrate contacts, substrate coupling can be minimized. Minimizing the length of the supply-lines to pads provides a low resistance ground return path. Like substrate noise suppression, a compact design forces supplies to appear as one common mode source. When laying out routes to external circuits where phase uniformity was important the signals were taken from the center of the VCO to ensure constant length wires. In addition, dummy buffers were included when a VCO phase output was not needed to maintain consistent loading. 4.12.Experimental Results A test chip implementing a 5 GHz (Cc = 76 pF) and a 10 GHz (Cc = 0 pF) FFI VCO was designed along side the Serdes 2 chip. It placed the two VCOs in an environment that is identical to that found in the transmitter and receiver. Two input pads with capacitor bypass provided a differential input for each VCO. The remaining four high-frequency pads were dedicated to a buffered and a 1/8 divided output of each VCO. The slower VCO was used in the Serdes 2 transmitter and receiver and had a center frequency target of 5 GHz. The higher speed VCO was designed to be used in the Serdes 3 project with a center frequency at 10 GHz. 65 Figure 4-21 FFI waveform at 5 GHz This waveform was captured with a control voltage set to generate a 5 GHz output. The peak-to-peak swing is approximately 300 mV. 4.12.1. Frequency Response The shape of the measured frequency response in Fig. 4-22 is nearly identical to the simulated response. It is smooth, linear around zero, and monotonically increasing. The differences are found in the frequency range and center. The center frequency at 0 mV control voltage, was expected to be 5.33 GHz but was measured 8% lower at 4.72 GHz. The frequency range dropped 17% from 2.72 GHz to 2.27 GHz. In addition, the gain at center decreased from 5.57 GHz/V to 4.98 GHz/V. The measured offset between simulation and test results is likely due a capacitance on the level 1 nodes of the ring stages that was larger than anticipated. Base capacitance modeling has always been a difficult issue, as capacitance can have a considerable effect on the frequency. A capacitance increase of 50 fF yields a frequency change that would match the frequency decrease. Another possibility is the poor modeling of fT which has a very dramatic effect on frequency. Part of the effect can be seen in Fig. 4-24, where the supply voltage, was increased beyond the nominal voltage. This increased the current, and to a point increased 66 the frequency. Although the CML trees were optimally designed for maximum f T, clearly more collector current results in a better response. 10 simulated (parasitics) 9 Cc = 0 pF Frequency (GHz) 8 measured 7 6 simulated (parasitics) 5 measured Cc = 76 pF 4 3 -400 -300 -200 -100 0 100 200 300 400 Control Voltage (mV) Figure 4-22 FFI VCO measured results This plot shows results simulated with interconnect parasitics, and measured results for the FFI VCO. The target of 5 GHz for the slower VCO was achieved at a control voltage of 60 mV rather than the expected -50 mV. 4.12.2. Common Mode Gain (5 GHz VCO) The common mode gain represents the gain associated with a common mode change in the input while the differential voltage is kept the same. As the common mode voltage is decreased, the level 3 differential pair begins to press into the active current source below it. Although the current should remain constant as the source’s collector moves and the Early effect produces a slight slope in the response. (see Fig. A-3 on page 158) This has the effect of decreasing the current as the collector to emitter voltage is decreased. At some point the source transistor begins to saturate and the collector current drops more rapidly. With higher common mode voltages the level three transistors are pulled from the active sources which cause the same current effect discussed above. Although the level 3 transistors are pressing into the level 2 transistors, there is little effect because the active 67 source is maintaining a constant current. With a gain of 5 GHz/V from Fig. 4-22, and a common mode gain of 0.5 MHz/mV, the common mode rejection ratio, CMRR, is 20 dB. 4.00 4.60 Frequency 4.55 2.00 4.50 1.00 Common Mode Gain 4.45 0.00 4.40 -1.00 4.35 -2.00 4.30 -3.00 4.25 -4.00 -400 -300 -200 -100 0 100 200 300 Frequency (GHz) Common Mode Gain (MHz/mV) 3.00 4.20 400 Common Mode Control Voltage (mV) Figure 4-23 FFI common mode response The common mode response of the FFI is quite flat with only a 1% deviation in frequency when the common mode is swept through ±100 mV. 4.12.3. Response versus supply voltage (5 GHz VCO) The frequency of the VCO continues to increase, with decreasing supply voltages down to -4.3 V. This can be attributed to an increasing transistor fT as the collector current increases. Below that voltage the transistors begin to experience high current effects and the fT drops. At the peak frequency supply voltage of -4.3 V the collector current is approximately 1.1 mA, which is higher than the 0.8 mA expected for fastest operation. The power supply gain at the nominal -3.3 power supply is -600 kHz/mV. 68 5 4.8 4.4 4.2 4 3.8 3.6 Center Frequency (GHz) 4.6 3.4 3.2 3 -2.5 -3 -3.5 -4 -4.5 -5 -5.5 -6 Supply Voltage (V) Figure 4-24 FFI response versus supply voltage At the nominal supply voltage of -3.4 V the center frequency is 4.6 GHz. Lower voltages show a quick decrease in frequency, while higher voltages show an increase in frequency until -4.5 V. Above -4.5 V the frequency drops quickly. 4.12.4. Phase noise measurements Phase noise measurements, shown in Fig. 4-25, are very close to the ISF predictions in Section 4.7.2. on page 56. At a 1 MHz offset from the carrier, the phase noise was measured at -90 dBc/Hz and was calculated to be -93 dBc/Hz. The difference can best be attributed to: testing effects, probe and wiring losses, and higher temperatures than anticipated. Because of the high noise testing environment a special differential input filter was built to suppress signal noise on the differential input. The filter consisted of a differential RC filter, with a very low bandwidth, and a non-electrolyte capacitor. In addition, because supply noise was an important contributor to noise, batteries were used to supply power to the chip. 69 -60 Phase Noise (dBc/Hz) -70 -80 -90 -100 -110 -120 -130 100 1000 10000 100000 Frequency (kHz) Figure 4-25 Open loop phase noise of FFI VCO This plot shows the phase noise versus the carrier offset frequency. The data was collected using a LabView program in conjunction with a spectrum analyzer and special software supplied with the equipment. 4.12.5. Jitter measurements The jitter relationship versus frequency plot is shown in Fig. 4-26. The data was collected with an open loop VCO circuit using a HP 11801C sampling oscilloscope with ∆T set to 50 ns. The model described by (4-27) accurately described the end points of the jitter function but the results were off by as much as 20% in between. This can be attributed to the fact that when the VCO operates more like a four stage oscillator it exhibits fast rise times. During interpolation, however, the VCO favors a sine-wave output and the rise time is reduced, increasing the jitter. As s is increased, and a two stage oscillator is approached, the rise time is more representative of that indicated in the model. At the target operating frequency of 5 GHz, κ is equal to 14.2, which is 36% lower than κ when operating as a normal four stage oscillator. 70 22 20 measured ( s) 18 16 analytical 14 12 10 3.0 3.5 4.0 4.5 5.0 5.5 Frequency (GHz) Figure 4-26 FFI VCO analytical and measured jitter This plot shows how jitter is related to the frequency of oscillation. The fact that the jitter improves at higher frequencies is a result of the system operating with fewer stages. 71 6.0 5 Design of the Transmitter Transm itter 5.1. Project History The first transmitter was submitted to IBM for fabrication in February 1999 as a stand-alone chip. It generated all 16 parallel data bits internally and had no mechanism to accept externally supplied data. The bit rate specification of 20 Gb/s operating speed was not achieved due to a VCO load imbalance. The second prototype, submitted to Sierra Monolithics Inc. in April 2000, was a unified transmitter-receiver chip. It contained improvements made to the first prototype and was designed to be a fully working chip capable of being packaged or wafer tested. The transmitter is this implementation easily hit the 20 Gb/s target data frequency. An invention disclosure record for the symmetric multiplexer was submitted in February, 2000. RPI has subsequently stated that they are going to pursue a U.S. patent for this invention. 5.2. Top Level Architecture Overview The goal of the transmitter is to accept low speed parallel data and multiplex it to high speed serial data. In some cases, it must first encode the data by adding extra bits for error correction, byte alignment, word framing, or channel synchronizing. The encoded data is then multiplexed from n parallel bits to a single bit stream. An additional stage, driven by a very low noise PLL, may then be used to retime the data [42] to remove accumulated noise. Finally, an amplifier is used to drive the external channel that carries the signal. This Serdes project did not investigate data encoding due to limited time and resources. Although a full featured chip may include data encoding, a system of this type can still operate without one. Presumably the role of the encoder would be off-loaded to the next level of hardware or software. 72 A 16-to-1 multiplexer was implemented as four 4-stage registers and one 4-1 multiplexer. The design revolved around a unique multiplexing scheme that required four inputs and could run with a quarter frequency clock. The output data was clocked at 20 GHz, but the oscillator ran at 5 GHz. Since 16 external bits were to be supplied to the chip and the multiplexing scheme required four bits, a front-end register that could be expanded to meet a parallel data word of any width was designed. Instead of adding an additional stage to perform symbol retiming, the retiming function was pushed into the multiplexer. This necessitated a complete redesign of the standard multiplexing CML gate, so that it could handle the stringent timing requirements for transmission. The symmetric multiplexer evolved from this redesign process. Like the retiming circuit, the channel amplifier was also incorporated into the multiplexer. This involved ramping up transistor sizes and making a change in the output stage of the multiplexer. 16-1 multiplexer Transmitter 16-1 Mux 1 4 A 4 B 4 C shift reg D VCO 4-1 multiplexer 16 20 Gb/s 1.25 Gb/s 4 PLL Figure 5-1 Transmitter and multiplexer architecture The top level transmitter design consists of a 16-1 multiplexer driven by a 5 GHz PLL. Four 4-stage shift registers capture 16 bits of data every 800 ps. These then feed the 4-1 multiplexer in order to serialize the data. 5.3. 16-1 Multiplexer Transm itter Fig. 5-1 depicts the core of the transmitter, the multiplexer. It is divided up into a 4 x 4 shift register bank and a 4-to-1 multiplexer, also shown in the same figure. The 4-to-1 multiplexer captures 16 bits of data every 800 ps and serializes them to a stream of bits. The width of each bit at 20 Gb/s is 50 ps. 73 The shift registers consist of four cascaded MS-latches, each with a 2-to-1 multiplexer front-end. By selecting different inputs, the array of four latches can either load external data, or accept data from the previous latch. Clocking the select line assures that after 3 bits are shifted through the next “shift”, will result in a load. Each load pulse is separated by 16 times the bit width or 800 ps. The tail bit of the register shifts in a zero because new data overwrites it before it never makes it out of the head latch. A A a0 BA B B b0 a1 a0 b0 b1 a2 a1 b1 b2 a2 b2 b3 b3 CBAD b0 a0 d0 c1 b1 a1 d1 c2 b2 a2 d2 c3 C C D 0o 0 1 CD D c0 c0 c1 d0 d0 c2 c1 d1 c3 c2 d2 d1 d2 200ps 400ps c3 0o 90o 0ps 90o Figure 5-2 Data timing for the 4-1 multiplexer The multiplexer interleaves the incoming data by using a multi-phase, quarter frequency clock. Timing of this circuit is critical because this circuit also has the responsibility to retime the data. The unique nature of the multiplexer requires data in registers A and D to be offset by 100 ps from data in registers B and C. This offset was accomplished by clocking the registers with two in-quadrature phases of the PLL. Each of the four registers is connected to the 4-to-1 multiplexer as an input. A special “shuffling” clocking scheme is used to multiplex the data. This alleviates the need for a 10 GHz clock that would typically be required to convert the final two 10 Gb/s signals into one 74 20 Gb/s signal. One single-frequency clock can control the shift registers and clock the multiplexers. Multiplexing is accomplished by offsetting registers A and D by 90° from registers B and C (see Fig. 5-2). This creates the basic interleaving data sequences, BA, and CD, which are synchronized with the first stage of 2-to-1 multiplexers. Interleaving was not necessary to create the sequences, but without it, coincident edges and timing glitches could have been introduced. Signals BA, and CD arrive at the final multiplexer in phase with each other. The phase of the select signal of this multiplexer is shifted exactly 90° from the previous multiplexer’s select signal. This effectively cuts both BA, and CD in half and combines them to form a CBAD signal. Therefore, final output edges are created from two sources: the final multiplexer select and the change of inputs during selection. The phase difference between the 90° and 0° signal is critical in determining any output transition offsets. Any mismatch between the phases directly correlates to a phase offset between consecutive transitions in the bit stream. To guarantee a 90° phase difference a delay which exactly matches the delay of the two 2-to-1 multiplexers is introduced. The easiest way to do this involves using a matched multiplexer whose a input is set to 0 and b input is set to 1. Although this technique consumes some power its use is necessary to significantly reduce phase mismatch. 5.3.1. The Case for the Symmetric Multiplexer The 2-to-1 multiplexer is the final non-amplifying stage in most serial transmitter circuits. It is, therefore of utmost importance to study and understand the performance of this gate and how its performance affects the data stream. A typical 2-to-1 CML multiplexer utilizing levels 1 and 2 is shown in Fig. 5-3. Data inputs a, and b are on level 1 and the select input, s, is on level 2. In a clocked circuit the important performance parameter is the delay from the input transition to the output transition. The largest delay is taken from all of the possible combination of inputs and outputs. This parameter, in conjunction with other gate delays, ultimately determines the maximum speed at which the circuit can be clocked. 75 The multiplexer performance metric, however, is very different when used in a transmitter when the multiplexers perform the retiming. Delay through the gate is of secondary importance, whereas the shape and aperture of the eye diagram is of critical importance. Bit widths must remain consistent, and bit amplitudes must remain large enough to be received when noise is present. z0 z1 a0 Q1 Q2 a1 Q3 Q4 b0 s0 Q5 b1 Q6 s1 Figure 5-3 CML Two Level Multiplexer The level difference between the inputs a, and b; and the select input s, produce a phase mismatch when a, b, and s, are aligned by 90°°. The data and select signals arriving at the multiplexer are forced to a phase difference of 90° by the VCO and overall circuit architecture. It is questionable whether an exact 90° difference is appropriate for this gate because the inputs arrive on different levels. Is there any inherent difference between their respective delays? Perhaps a better choice of phase exists such that a more uniform output is generated? How does the difference in levels affect the loading and driving from previous gates? The circuit in Fig. 5-4 was designed and simulated in order to analyze and answer these questions. Signals a and b are complements of each other and the select signal’s phase, ∅, is varied around 90o. Ideally, the average value of the output will coincide with the median when ∅ is equal to zero. This condition corresponds to an output with a 50% duty cycle, in which each bit is of equal width. The results of the analysis are shown in Fig. 5-5, and indicate that a phase offset of 13.5°, 7.5 ps is needed to maintain a 50% duty cycle. This effect is a result of the data existing on level 1 and the select lines being on level 2. For a select change to propagate to the output it must travel through two levels of logic where a data change only needs to travel 76 through one. There is also a loading difference between the two logic levels. The collectors on level 1 see the pull-up resistors and the base of the proceeding gate. On level two the collectors see two emitters from the level above. a a 0° 2:1 z MUX b b 180° ∅ s load s average 90° + ∅ 0 ps 200 ps 400 ps Figure 5-4 Simulation Testing of CML 2:1 Multiplexer By varying the select phase relative to the data phase and averaging the output signal over time, a measurement showing ideal select and data phase offsets can be made. A 50% duty cycle when the phase difference between data and select signals is 90° is desired, since both are driven off the VCO. The multiplexer, however, requires a 103.5° phase difference for symmetric output. A delay element could be introduced to the data lines to add 7.5 ps, but a better solution was invented; the symmetric multiplexer. The symmetric multiplexer accepts all inputs on the same level, has the same loading per input, and ensures that any input (data or select) will propagate to the output in the same amount of time. An implementation of the gate is shown in Fig. 5-6. The left hand side of the multiplexer represents the OR condition a ·s + b ·s, which generates the high output, and the right hand side represents the inverse condition (a + s) · (b + s), which generates the low output. The four transistors, Q1-Q4, in the center, act as a shared differential amplifier. During all static conditions one branch will have a high and a low level transistor and the other branch will have both transistors in an intermediate state. The branch with the high level will carry all of the current and produce the z output. 77 -0.9 Average Output Voltage (V) -0.95 -1 -1.05 -1.1 -1.15 -1.2 -1.25 -1.3 -180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180 Phase (degrees) Figure 5-5 Simulation Results for CML 2:1 Multiplexer The crossing point, or 50% duty cycle point, occurs at 13.5°,7.5 ps. This shows an asymmetry between the select and data inputs. Input Stage Output Stage Input Stage z0 z1 Q1 Q3 Q2 Q4 a0 a1 b0 b1 s0 s1 ½I ½I ½I I ½I Figure 5-6 CML Single Level Symmetric Multiplexer A novel implementation of a multiplexer with inputs all on level 1, identical loading per input, and completely symmetric response. 78 ½I ½I Fig. 5-7 shows the state of each transistor based upon the input values. “H” represents a high state, or the highest voltage and indicates which transistor will carry the current. The Medium level falls halfway between the High and Low levels. To ensure proper noise margins the voltage difference between the high and low levels is increased to 500 mV. This places a 250 mV difference between the two top voltage levels. Each of the transistors in the central tree of the multiplexer is driven by two differential pairs. This allows for a reduction in the size of the 12 input transistors without any loss of signal integrity, and also directly compensates for the doubled loading on each input. A drawback is that each input requires a minimum of 2 µm of load, no matter the output driving ability. Power requirements for this circuit are also four times higher than those for a typical level 1 output CML multiplexer. On the other hand, since this circuit only requires one level of logic, the negative power supply can be reduced by at least 25%. a b s Q1 Q2 Q3 Q4 Z 0 0 0 M M L H 0 0 0 1 M M H L 0 0 1 0 L H M M 1 0 1 1 M M H L 0 1 0 0 M M L H 0 1 0 1 H L M M 1 1 1 0 L H M M 1 1 1 1 H L M M 1 Figure 5-7 Symmetric multiplexer transistor states The states of transistors Q1-Q4 are defined to be high, low, and middle. The transistor in the high state carries the current and dictates the output value. 5.3.2. Final Implementation and Simulation Serdes I did not utilize the symmetric multiplexer and had a 15% phase error in alternating edges, shown in the simulation in Fig. 5-8. Figure (a) shows the eye diagram of the standard CML multiplexer. The inputs were designed to exercise the circuit as much as possible, i.e. using 50 ps input pulses, and differing a and b inputs when the select input 79 changes. At the center voltage of 125 mV, two distinct crossings can be seen, which result from the input to output delay imbalance in the CML circuit. The time for a select transition to reach the output is about 10 ps longer than for an a or b input to reach the output. Figure (b) shows a much cleaner eye diagram for the symmetric multiplexer. The reason for this improvement lies in the circuit architecture, which was designed with symmetry to ensure that any input changes propagate to the output in the same amount of time. The ramifications of this are obvious. The transmitter output will benefit from a clean, low phase noise multiplexer signal. The 4-to-1 multiplexer with symmetric architecture in Serdes II also plays the role of the line driver by driving the pads directly. The reasoning behind this design feature was removing the noise that would be introduced by an additional line driver. By integrating the two components, the total phase noise is smaller. In order to accomplish this, larger 12 µm transistors, capable of sinking 9.6 mA, were used in the final multiplexer. In addition, a cascode amplifier was added to the output stage to limit the loading on the differential pair. Driving the final 12 µm output stage required ramping up of transistor sizes so that the input stage of the final multiplexer was not loaded down. Starting with a 1 µm input stage, two intermediate emitter followers were added of sizes 2 µm and 4 µm. This enabled an output stage with 8 µm transistors, each capable of driving transistors of their own size or larger. This output stage drives the final multiplexer which has an input of 4 µm. Once again, two 6 µm and 8 µm emitter followers were added, followed by the 12 µm output stage. This technique required a total current of 63 mA as compared to a 15.4 mA current requirement for the standard CML multiplexer and the associated pad driver. 80 0.00 -0.05 -0.05 Output Voltage (V) Output Voltage (V) 0.00 -0.10 -0.15 -0.20 -0.10 -0.15 -0.20 -0.25 -0.25 -0.30 -0.30 0 20 40 60 80 100 0 20 40 60 Time (ps) Time (ps) (a) (b) 80 Figure 5-8 Multiplexer Eye Diagrams These plots are output eye diagrams for the standard CML multiplexer (a), and the symmetric multiplexer (b). Both circuits received identical 20 Gb/s inputs and identical loading. 4x4 registers 4x4 registers CML multiplexer (a) (b) 3 symmetric multiplexers Figure 5-9 Multiplexer Layout for Serdes I and II The transmitter 16-1 multiplexer consists of a 4x4 shift register and a 41 multiplexer. The layouts for Serdes I (a) and Serdes II (b) are shown here. 81 100 5.4. Phased Locked Loop (Frequency Synthesizer) Transm itter When reducing phase noise in the transmitter becomes the most important design factor, the transmitter phase locked loop, PLL, becomes the most important circuit in the system. Its role is to generate a high frequency, extremely low noise clock from a low frequency, noisy, externally supplied reference clock. For the transmitter PLL in this design, the external reference is at 625 MHz, and the PLL clock output is at 5 GHz. The standard linear model of a PLL, shown in Fig. 5-10, has a phase detector (PD), a loop filter (LF), and a VCO. The phase detector subtracts the phase of the input signal from the phase of the output signal. This gives a measure of the phase offset of the two signals and is the mechanism that allows the phases to be locked together. The loop filter filters the output of the phase detector in order to meet certain feedback characteristics, such as output noise, pull-in range1, and pull-in time2. The VCO acts as an integrator, converting a control signal to an oscillating signal represented as a phase. Finally, a 1/8 frequency divider is used to match the internal frequency to the external input frequency, as required by the PD. input filter vi Y(s) phase detector θi Kd loop filter F(s) VCO Ko/s to transmitter frequency divider θo Figure 5-10 Linear model of PLL The PLL used in the transmitter consists of three primary parts: phase detector, loop filter, and VCO. An input filter is added to reduce the noise levels of the input signal. The transmitter’s frequency synthesizer went through three major revisions during its evolution. These revisions are depicted in Fig. 5-11. During the rapid development of the 1. Pull-in range is the maximum range of frequencies for which the PLL can eventually acquire lock. This PLL parameter is primarily a function of the PD implementation, but is also determined by the frequency range of the VCO. 2. Pull-in time or acquisition time is the amount of time it takes the PLL to achieve lock from an initial frequency deviation that is within the pull-in range. 82 first transmitter prototype, a PLL was designed that had minimal functionality and poor performance. The goal was to quickly develop a clock multiplier without concern for phase noise and jitter performance. With more time and results from Serdes I, a highly improved Serdes II PLL evolved. It possessed a 3 state PD, which improved the lock-in range1 and acquisition time; an active op-amp style LF, further improving key characteristics; and the FFI VCO which reduced noise and increased performance was still missing from this design. An optimized bandwidth driven by previous results and specifications. Measuring data about the noise characteristics of the VCO and gathering information about the noise spectrum on the input noise source was key to bandwidth optimization. Test data from the first two prototypes, better simulation techniques, and further research yielded the final PLL design. VCO noise spectra allowed for a much better bandwidth design, further minimizing PLL output phase noise. A smaller bandwidth required frequency detection in the PD because of the much longer pull-in time. Another improvement replaced the clumsy op-amp integrator with a high performance specialized integrator which is also used in the receiver PLL. 1. The lock-in range, a function of the PD and the PLL bandwidth, is defined as the maximum frequency deviation for which the PD will remain in lock, where the PD is in its linear range and does not slip. 83 type I passive LF (RC low pass filter) XOR PD Serdes I CS Simple VCO input filter 3-state PD type II active LF (op-amp filter) Serdes II FFI VCO 3-state PD with frequency detector type II active LF specialized integrator optimized bandwidth Serdes III FFI VCO Figure 5-11 Frequency synthesizer evolution The transmitter’s frequency synthesizer went through three major evolutionary steps. The first had the most basic components and provided minimal functionality. The second incorporated better components to minimize noise and improve the acquisition range and time. The third, unfabricated version, added advanced PLL components and optimized key design variables based upon simulations and measurements from the other prototypes. 5.4.1. Input Filter An effective technique in reducing PLL phase noise is to drive it with a very clean reference source1. The PLL has the ability to lock a noisy VCO to a clean reference and reduce the total output noise to a level below that of the VCO. With this in mind, an input bandpass filter was designed and implemented in order to reduce the out-of-band noise of 1. The signal source used in the Frisc testing lab is very old and very noisy. In practice, a very well controlled low phase noise signal generator would be used as a reference and an input filter would not be needed. 84 the signal source. This technique was added to the Serdes II design but removed in the subsequent design because a better input signal generator was acquired. C1 C2 R1 R3 CML amplifier R2 R1 C2 C1 attenuator R1 R2 R3 C1 C2 800 Ω 224 Ω 2 kΩ Ω 500 fF 500 fF R3 bandpass filter Figure 5-12 Schematic for input filter The input filter is a bandpass filter centered around the reference frequency. It is intended to filter output low and high frequency noise associated with this signal. Fig. 5-12 depicts the schematic of the input filter, which consists of an input attenuator and an active bandpass filter. The active component of the filter is simply a highgain two-stage buffer with level one and level two outputs. The first stage does not effect the voltage gain of the amplifier and has Darlington pair inputs to reduce the input current by a factor of β. Twenty-five percent larger pull-up resistors were used to increase the total gain to approximately 5. The input resistor tree attenuator compensates for the large total gain of the bandpass filter by reducing the input amplitude by 78%. The frequency transfer function for the input filter is shown in Fig. 5-13. The peak was designed to be at precisely 625 MHz with a bandwidth large enough to account for parameter mismatches and frequency adjusting. Because the final effect of this filter on the output phase noise of the PLL was not known, a multiplexer was added after this circuit so that it could be bypassed if necessary. This opens up the ability to determine the filter’s actual usefulness. 5.4.2. Phase Detector A phase detector produces a signal that yields information about the difference between the phases of its two inputs. Ideally it produces a perfectly linear response for all 85 phase differences and has an arbitrary gain. For real circuits, however, we must settle for non-linear responses that may have regions where the gain becomes negative, where the function is periodic in π/2 or π rather than 2π, and where the gain varies across the range. 5.4.2.1. Phase detector (Serdes I) Frequency (MHz) 1 10 100 1000 10000 0 -10 Gain (dB) -20 -30 -40 -50 -60 Figure 5-13 Input filter frequency response At the reference frequency of 625 MHz the input filter achieves a slightly greater then unity gain. All other frequency are attenuated. Two different phase detectors where investigated in Serdes I and Serdes II, the XOR, or Gilbert Multiplier, and the 3-state, respectively. The schematics for the XOR PD, shown in Fig. 5-14, consist of a single tree CML gate with emitter followers. At one extreme, the inputs are in phase and the average value of the output is 0. When the inputs are 180o apart, the other extreme, then the output is 1. For the 3-state detector the output is taken differentially across its two internal signals VU, and VD. These signals’ rising edges, which are outputs from the two resetable MS-latches, coincide with the rising edges of the input signals, Vi, and Vo. The falling edges, on the other hand, are triggered together after both have risen. This creates a wider pulse on the signal, V U, or VD, when the associated input arrives first. 86 The output of the XOR PD, shown in Fig. 5-15, has a linear response from -180 o to 180o. Outside that range the output slope is negative and produces a temporarily unstable PLL response before the phase detector output enters a positively sloped region again. The gain is about 0.53 V/rad which is relatively high. It is set by the large input control range of the VCO used in Serdes I, the Simple Current Starving version of the VCO. XOR Phase Detector 3-State Phase Detector 1 D vi vi vU Q R vd vo vd R D 1 (a) vD Q vo (b) Figure 5-14 Phase detector schematics The XOR detector (a) uses a XOR logic cell to perform phase detection. The 3-state detector (b) utilizes two resetable MS latches and an and gate. 5.4.2.2. Phase detector (Serdes II) Fig. 5-15 also shows the output of the 3-state PD. Its response is greatly improved over that of the XOR PD. First, the slope is always positive and it extends across the entire input phase difference range. This greatly improves the response of the PLL during lock acquisition. This response will be discussed in Section 5.4.6. Another important improvement appears when phase error is continuously increased above 180o, which is common with larger frequency offsets. Although the plot shows that the output is -120 mV above 180o, the output will step to 0 mV, and continue to rise beyond that phase. This effect increases the pull-in range. In order to implement the 3-state PD one significant hurdle related to the reset feedback through the AND gate had to be resolved. Proper operation occurs when the second output edge from the latches causes the AND to go high, reset both latches and bring the AND low again. Through simulation, however, the very thin reset pulse was failing to reset one of the latches. The problem was traced to the non-uniform loading of the output latches and the asymmetry in the AND gate inputs. The solution was to use a single-ended 87 AND gate to provide symmetric loading, and matched input levels for both latches. This ensured that both latches were uniformly reset, and alleviated all timing issues. 150 1.0 100 0.6 3 state 0.4 50 0.2 XOR 0 -270 -180 -90 0 90 180 -50 0.0 270 -0.2 -0.4 -0.6 -100 XOR Phase Detector Output (V) 3 State Phase Detector Output (mV) 0.8 -0.8 -150 -1.0 Phase Difference (degrees) Figure 5-15 Simulated phase detector responses Plotted above is the average of the signal output of the two phase detectors. The XOR phase detector has a valid range between 0o and 180o, and the 3 state detector output is valid for any phase difference. These PDs are used in a frequency synthesizer which includes a divide-by-8 component. The nature of the PLL gain K, and the 3 dB bandwidth is such that they are both reduced by a factor of N. This factor is incorporated into the PD gain which gives the XOR PD an adjusted gain of 66.3 mV/rad and the 3-State PD an adjusted gain of 5.25 mV/rad. The lock-in range of the PLL using the XOR PD is (π/2)K and πK for the 3-state PD. The larger range of the 3-state PD provides higher resistance to cycle slips and yields a shorter pull-in time when used with a frequency detector. The pull-in time of the XOR PD is about four times larger then the 3-state PD with the same PLL bandwidth. The pull-in range is also four times larger for the 3-state PD. The simulated figure of merit1, M, for the 1. The figure of merit, M, for a PD is Vdo/Kd, where V do is the mean value of the PD output and K d is gain. A low M value for a PD yields a small pull-in range. 88 XOR gate is quite high, approaching 1 million. This was expected, because of the very simple nature of the XOR gate. The 3-state PD, on the other hand, has a value of about 22 which is appropriate for a circuit of this complexity. 5.4.2.3. Phase detector (Serdes III) Research into Serdes III necessitated a decreased bandwidth in order to further suppress spurious noise introduced by the PD. Side effects of a decrease are a reduction in the pull-in range, and an increase in the pull-in time. A very effective way to counter these negative effects is to add a frequency detector, FD, to the 3-state PD. This circuit is able to detect cycle slips and provide a strong pull-in signal in response. A cycle slip occurs when the phase error exceeds the bounds of the PD (0, 2π) and the output steps (See Fig. 5-15 on page 88). This is indicative of a large frequency error and if the proper circuitry is added to sense this event then a large change can be made to the loop filter integrator. X slip Y detector vi loop filter vU 3-state PD vo vU’ vd vD Y vD’ slip X detector slip detector vi X vo Y delay R Q D 1 vd cycle slip Figure 5-16 PLL frequency detector A frequency detector detects cycle slips from the PD and performs large control voltage changes. This allows a much wider pull-in range, and smaller pull-in times. 89 D Q vs The schematic in Fig. 5-16 shows the implemented frequency detector that was added to Serdes II’s design. The detector compares the input to the output of PD. When a cycle slip occurs, an output edge normally created on vu by vi’s rising edge is missing, and this is sensed by the slip detector. The detector will then add or remove a fixed amount of charge from the charge pump integrator. This causes a step change on the output of the integrator. The key to implementing the FD is to ensure that the induced frequency step, ∆ωc, does not exceed twice the lock-in range, ωL which would force the frequency to oscillate around ωL and never acquiring lock. Typically ∆ωc is conservatively set to ωL so that pullin time is minimized and PLL lock is ensured. 5.4.3. The VCO Serdes I utilized the Simple CS VCO with a gain of approximately 0.5 GHz/V. Its highly variable gain, and non-linear frequency response made analytical modeling of the PLL difficult. The second and third prototypes used the FFI VCO which has a consistent gain of 6 GHz/V. Its linear response made analytical modeling much easier to perform. 5.4.4. Loop Filter The loop filter in a PLL plays a critical role in determining the PLL bandwidth. Usually the gains of the PD and the VCO, are fixed and therefore the loop filter is the only component available to control the bandwidth. A high bandwidth corresponds to a strong ability to track the input phase at high frequencies. This would be very useful for a receiver that needs to track an incoming signal plagued with transmitter and line noise. This ability will be discussed further in the following chapter. A small PLL bandwidth, on the other hand, ignores phase variations on the input and performs very slow tracking. This is the necessary situation for a transmitter since it needs to generate a very clean VCO signal, independent of the noise introduced by the input reference signal and from the VCO. Reducing the bandwidth too much, however, prevents the PLL from tracking out the VCO phase noise. An optimum bandwidth for minimum total output phase noise does exist and should be determined. 90 5.4.4.1. Serdes I Loop Filter The transmitter PLL in the first prototype utilized a passive low pass filter1. The filter is a two stage RC ladder, and has two poles, but for the purpose of analysis, the higher frequency pole can be ignored, since it only helps to reduce spurious modulation2. The loop type is considered a two pole loop: one pole in the loop filter and one pole in the VCO. The poles are at 30 MHz (ωn) and 207 MHz, when the capacitance and resistance values are 2 pF and 1 kΩ, respectively. The decision was made to use two RC stages rather than one to increase the high frequency signal rejection. F(s) R C |F(jω ω)| (dB) 1 R C ωn C=2 pF R=1 kΩ Ω log f Figure 5-18 Tx PLL passive loop filter A second order low pass filter utilizing a two stage RC ladder configuration. The resistor and capacitor component values were maximized, for low bandwidth as discussed above, based primarily on the proper operation of the PLL and on layout limitations (capacitors consume large amounts of area). Since the PD output is differential in nature, symmetric loading requires a duplication of the RC ladder. the four capacitors were therefore limited to about 2 pf because they take up a large amount of layout space. Resistor sizes, on the other hand, were reasonably small but values larger than 1 kΩ introduced considerable loading effects because this RC circuit had to drive the VCO aVref control circuit. 1. The design time constraint for this critical Serdes I component was very limited, and effort was only put into the PLL’s proper operation rather then optimization. In the end it worked well enough to drive the transmitter and allow collection of all desired data. 2. A common problem in frequency synthesizers is called spurious modulation and is a result of the normally much higher frequency output of the PD. A result of the frequency divider, these lower frequency signals are not adequately attenuated by the loop filter and are passed on the VCO as unwanted phase noise. 91 5.4.4.2. Serdes II Loop Filter Further research and design allowed for a much improved loop filter to be used in Serdes II. The first important enhancement was the move to an active rather than passive filter. The use of an integrator allowed a loop filter dc gain, F(0), approaching infinity to be used in contrast to a passive filter’s dc gain of unity. From this, the PD static phase error, Vco – Vdo θ eo = ----------- + -----------------Kd F ( 0 ) Kd (5-1) becomes approximately zero, when the PD offset voltage1, Vdo, is zero, where Kd is the gain of the PD, and where Vco is the static control voltage2 of the VCO. Under these conditions the input phase difference is kept near zero, when the PLL is in lock, which improves the purity of the synthesized frequency [41] and aids acquisition. C R2 C3 R1/2 Gain Stage NPN differential amplifier R1/2 C3 op-amp R2 C low pass filter FET Front-End high input impedance integrator Output Stage low output impenitence Figure 5-19 Tx PLL active loop filter This active loop filter incorporates a low pass front-end followed by an integrator. The op-amp has a FET input stage to minimize loading, a high gain NPN stage and a low impedance output stage. Resistors, R1, and R2, and capacitor, C, and the amplifier in Fig. 5-19 form the core of the filter. These elements form a integrator with a zero at 1 ω 2 = ---------R C 1 (5-2) 1. Vdo is the free running, or offset phase detector voltage. It represents the DC output voltage offset for the PD and is a property of the PD alone. 2. The static control voltage or V co, is the control voltage applied to the VCO which matches the input and output frequencies. It is related to the input signal and VCO properties. 92 and a gain of R 2 K h = ------R (5-3) 1 at frequencies above ω2. This choice of 6.4 MHz for the loop bandwidth was based loosely on comparisons with other similar loops which have bandwidths of approximately 1 MHz [41]. These similar loops, however, utilize a much cleaner LC VCO, so a larger bandwidth was needed to compensate. The final design of the loop filter yielded values for R1, R2, and C, equal to 16.7 kΩ, 6.67 kΩ, and 14.1 pF respectively. ω2 was 1.7 MHz, Kh was 0.4, and the total loop gain and bandwidth was 6.4 MHz. In addition, the low frequency gain which is governed by the gain of the amplifier is about 5. 20 0 ω2 ω3 Gain (dB) -20 -40 -60 -80 -100 1kHz 1MHz 1GHz Frequency (Hz) Figure 5-20 Active loop filter transfer function The active loop has a 1.7 MHz zero which forces a high DC gain. A pole at 21 MHz attenuates high frequencies to reduce spurious modulation. The addition of a low pass filter, or pole, to minimize spurious modulation, is realized through element C3 in Fig. 5-19, with a cut-off frequency at ω3. The frequency of the pole is at 21 MHz and yields a capacitor value of 1.8 pF. 93 The frequency response of the open loop response is plotted in Fig. 5-20. A zero at ω2 produces a -20 dB/dec slope which is not realized at low frequencies due to the noninfinite gain of 13.5 dB of the op-amp. Above ω2, the gain is Kh until the pole at ω3 where the curve drops off at -20 dB/dec. An additional pole at approximately 100 MHz exists within the op-amp for loop stability. 5.4.4.3. Serdes III Loop Filter The implementation of the Serdes III loop filter utilizes a negative impedance amplifier, NIA, charge pump [27]. Fig. 5-21 shows that the circuit has a RC filter which is balanced or floated between a pull-up resistance and pull-down negative resistance. As long as the sum of these resistances equates to zero then the filter nodes are allowed to float. Any deviation from zero will result in a drift in the differential output voltage to infinity, or to zero. To ensure a reasonable initial condition, the pull-up resistors should be slightly smaller then the NIA resistance so that the differential voltage is slowly pulled toward zero. The negative resistance is generated through a linearized CML feedback tree that is very similar to the storage mechanism in a MS-latch. The current through one branch is I ia v –v o 0 1 = ---- – ---------------R 2 (5-4) where Io is the total current through the tree, R is the value of the pull-up resistors, and v1 and v2 are the outputs and the nodes of the capacitor. Technically, the circuit acts as a negative impedance v –v 0 1 ------------------- = – R n i –i 0 1 (5-5) which is based upon a differential voltage and current. The end result is that the differential voltage, v1-v0, is allowed to float at any value less than RIo. The resistance value of the NIA, Rn, is the sum of the linearizing resistors and the emitter resistance, as described in Appendix C.1. 94 Rp step0 negative impedance amplifier R v1 z1 ref C1 v0 i1 C 2 z0 i0 int0 int1 step1 Io 7x Figure 5-21 Receiver III integrator The integrator used in Serdes III consists of a negative impedance amplifier which essentially “floats” a capacitor and current trees to move charge on and off each end. The striking benefit of this negative impedance charge pump is that it allows charge to be removed from either end of the capacitor while the differential center voltage is maintained. Removal of capacitor charge through a CML tree causes a differential voltage change, and when a constant current is drawn, the voltage will ramp accordingly, thus showing the integration. There are two methods for affecting the differential output voltage; each method is handled by its own circuit. The first is a standard current source which uses a linearized CML tree with inputs int0, and int1 to draw current from either side of the filter. The amplifier gain, Ka, is approximately 1 mA/V. This value can be derived from the linearized CML tree plot found in Fig. C-3 on page 165. The constant includes a factor of 1/2 because the current is split between two paths, one directly through the pull-up resistor and one through the filter. The second method is a step input used in conjunction with the frequency detector in the PD. In the case of a 3-state PD, a cycle slip detected by the FD will pulse one step input 95 or the other and cause a large charge change on the capacitor. The size of the step current source dictates the amount of change. Serdes III was the first design with a loop gain that was optimized for minimal output phase noise based on measured and simulated phase spectra data from the FFI VCO discussed in Section 4.12.4. on page69. With this information and phase noise data on the reference source, the noise spectrum plot shown in Fig. 5-22 can be created. It shows the voltage spectral density for the FFI VCO and for a very low noise reference source. The frequency at the point of intersection indicates the ideal value for loop bandwidth. Values lower than this allow more VCO noise to propagate to the output while values higher than this allows more reference noise to propagate to the output and increases the spurious modulation from the reference. Φ (dBc/Hz) -40 VCO -60 20 e /d c -120 dB -100 e nc ce re ur fe re so e e iv nc ct re fe fe ef re -80 optimum loop BW for minimum noise 18 dBm -140 0 10 M z z z H H M z z kH H M 10 1 0 10 z kH kH 10 1 Figure 5-22 Voltage spectral density for optimal loop bandwidth Shown above is the voltage spectral density of the VCO and the reference source. The point where they intersect is to first order the optimal place to define the loop bandwidth. The reference source to be used is quoted as having a noise spectral density of -140 dB at frequencies below 1 GHz. This must then be subtracted by the PLL multiplication factor of 8, or the equivalent of 18 dBm. The VCO voltage spectral density was found 96 through simulation, analytical and measurement results, and has a value of -90.2 dBc/Hz at 1 MHz. The relatively high noise content of the VCO and the low noise content of the reference source placed the optimal loop bandwidth, K, at 33 MHz. Suppressing spurious modulation requires placing a pole at 4K, 132 MHz, far enough above K so that the PLL response will not be affected. At a reference frequency of 625 MHz, this results in an a 13 dB suppression of spurious noise which by π K2 σ t = ( 50ps ) --- πδN ------ 4 f r2 (5-6) is equivalent to data rms jitter of 5 ps. The PD minimum duty cycle, δ, is approximately 0.03. σt is one tenth of a bit width, which is unacceptable. Clearly the suppression of spurious modulation is critical in minimizing jitter. Instead of a loop bandwidth of 33 MHz, a bandwidth of 6 MHz was used instead. This yields an rms jitter due to spurious modulation of 0.14 ps, which is considerably lower. With a K at 6 MHz, the PLL zero (ω2) is placed at K/4, or 954 KHz, to give a 13% response overshoot, and the pole (ω3) at 4K, or 24 MHz. For a VCO gain Ko of 34.5 Grad/s/V, a PD gain, Kd, of 5.25 mV/rad, a loop filter gain, Kt, of 1 mA/V, the high frequency gain Kh must be set to 208 for K = K oKdKtKh = 2π(6 MHz). Solving for the loop components from s + ω2 F ( s ) = K h ----------------------s- ----s 1 + ω3 (5-7) C1 R K h = -----------------C 1 + C2 (5-8) 1 ω 2 = ---------C1 R (5-9) C 1 C2 R ω 3 = -----------------C 1 + C 2 (5-10) yields C1 = 802 pF, C2 = 53 pF, and R = 208 Ω. 97 The size of the stepping transistors can be found using 2Cω I ≤ -------------L- f c Ko (5-11) where C is the capacitor size, ωL is the lock in range (πK = 18.8 MHz), K d is the PD gain (34.5 Grad/s/V), and fc is the reference frequency (625 MHz). For this implementation the calculated current is 3.4 mA, corresponding to a transistor size of 4 µm. The ref input is used in conjunction with the step inputs and allows them to be driven single ended to save power. 5.4.5. PLL Loop Response The value of the PLL gain, K, is directly related to the 3dB point, and its design is based on two factors: the VCO noise response and the input noise level. Small values of K yield strong input noise immunity, as the PLL is very slow to respond to input deviations, but transmits all of the low frequency VCO noise to the output. A small bandwidth is also effective at reducing spurious modulation. A large value of K, on the other hand, allows the PLL to track the input very closely and attenuate a considerable portion of the low frequency VCO noise, but means that any input noise is passed on to the output. K, as a frequency, also has a direct proportional effect on the pull-in range, and an inverse relationship with the pull-in time. Put simply, a larger K allows the PLL to lock in more quickly over a larger frequency range. The process of choosing K is affected by the output noise specifications for the PLL, but no noise specifications were given for the design of this PLL, as it was meant for shorthaul communications, where noise does not play a crucial role. So instead, K was chosen small enough to limit the effects of the input noise, but not to adversely effect the layout with large component sizes. Ensuring proper operation was also important, so design limits were not pushed and instead a “center road” approach was taken. The step response for the passive loop of Serdes I and the active loop of Serdes II is shown in Fig. 5-23. Both responses show a very clean, non-oscillatory response which represents adequate choices for pole locations. Serdes II has a longer settling time due to 98 the larger bandwidth and does not undershoot. From [41] the damping factor, ζ, is calculated to be 0.47, and 0.65 for the PLL in Serdes I and Serdes II, respectively. 0.14 Serdes I PLL Step Output (rad) 0.12 Serdes II 0.1 0.08 0.06 0.04 Serdes I 0.02 Serdes II Step Input 0 0 20 40 60 80 100 120 Time(ns) 140 160 180 200 Figure 5-23 PLL simulated step responses The above plots, simulated in MATLAB, show the step responses for both PLLs in Serdes I and II. The longer settling time of PLL 2 corresponds to the smaller bandwidth. PLL 3 has nearly the same response as PLL 2. PLL phase noise in this case is realized as output phase noise of the transmitter. For this reason, no direct PLL phase noise can be measured. Section 5.10. details the noise results for the two transmitter designs. No simulation of phase noise in the PLL was done for this particular design. 5.4.6. Lock Acquisition Lock acquisition can be described by two factors: the pull-in time, Tp, and the pullin range, ωp. The pull-in time represents the maximum amount of time the PLL takes to acquire lock and track the input phase when started out of lock. The pull-in range is the largest frequency error for which the PLL will acquire lock. Both items are important metrics in describing the usefulness of the PLL, and ideally Tp will be zero, and ωp will cover the entire frequency range of the VCO. 99 -1.1 730 MHz -1.2 Control Voltage (V) 720 MHz -1.3 710 MHz -1.4 700 MHz 690 MHz -1.5 680 MHz 670 MHz -1.6 660 MHz -1.7 0 20 40 60 80 100 120 140 160 Time (ns) Figure 5-24 PLL I simulated acquisition plots The above plots show the PLL in Serdes I during simulated acquisition which is ideal and not equivalent to real life. This is also known as the jellyfish plot. 5.4.6.1. Serdes I Simulated Acquisition Since Serdes I used a passive loop filter, the pull-in range is restricted by and equal to the frequency of the dominant pole ω3 at 30.3 MHz. This is a result of the -π/2 angle shift introduced by the pole, which effectively nulls the pull-in voltage. If, for example, a -π angle shift was introduced then the PD output would be inverted, push-out would occur, and the PLL would move further away from lock. The pull-in time is a complicated parameter to derive; an expression and its derivation is presented on pages 186-187 of [41]. A rough approximation for pull-in time from simulation is 100 ns. 5.4.6.2. Serdes II Simulated Acquisition Serdes II’s PLL simulated response is shown in Fig. 5-25. The pull-in time is about four times that of Serdes I due to the smaller loop bandwidth and different phase detector characteristics. With similar loop bandwidths and similar loop filters, the pull-in time for a 100 PLL with a 3-state PD versus an XOR PD is about 4 times smaller, and the pull-in range is about 4 times larger. This is primarily due to the negative slope that exists in the XOR response but not in the 3-state response, as shown in Fig .5-15 on pa ge88. 0.25 850 MHz 0.20 900 MHz Loop Filter Output (V) 0.15 800 MHz 0.10 0.05 750 MHz 0.00 700 MHz -0.05 650 MHz -0.10 -0.15 600 MHz -0.20 -0.25 0 100 200 300 400 500 Time (ns) Figure 5-25 PLL II simulated acquisition plots The above plots shows PLL II during simulated acquisition which is fairly representative of actual acquisition, however Spice has an advantage in setting initial conditions which can show a better response than in real life. Here is the squid plot. The simulated pull-in time for the Serdes II implementation is about 400 ns, and the pull-in range is approximately 75% of the full range of the VCO (600 to 900 MHz). The addition of the 3-state PD has greatly enhanced the pull-in range at the expense of pull-in time. This is a very favorable trade-off since typical pull-in time specifications are on the order of µ-seconds. 5.4.6.3. Serdes III Simulated Acquisition The third prototype has characteristics very similar to the second prototype, including similar parameters such as: loop bandwidth, pole and zero locations, phase detectors, VCOs, and gains. Acquisition plots are, therefore, nearly identical to those shown in Fig. 5-25. See Section 5.4.6.2. for pull-in times, and pull-in ranges. The FLL used in this 101 implementation does not have a considerable effect, but it does reduce the pull-in time by about 10%. 5.4.7. 20 / 40 Gb/s Implementation One area that was pursued in the development of the second prototype was an ability to run the transmitter at either 20 or 40 Gb/s. Adding a second higher speed VCO, multiplexers on the outputs, and an additional multiplexed divide-by-two circuit was rather straightforward, as shown in Fig. 5-26. The primary difficulty arose when designing the loop bandwidth to be appropriate for both VCOs. In the 5 GHz mode, the detector gain is Kd/8 and in the 10 GHz mode it is Kd/16. This requires a reduction in half of the loop pole frequency so that stable operation is guaranteed for both situations. This reduction has negative implications on the pull-in time, because pull-in time has a inverse relationship to the pole frequency. Halving the frequency doubles of the pull-in time. 625 MHz reference 3-state PD loop filter 5 GHz VCO 4 phases 10 GHz VCO divide-by-2 divide-by-8 Figure 5-26 5/10 GHz PLL implementation Creating a 5 and 10 GHz PLL involved the addition of a 10 GHz VCO and various multiplexers to select the correct phases and the proper division circuit. 5.5. Clock Distribution T ransm itter Clock distribution in the transmitter involves delivering the PLL signal outputs, to the shift registers, to the external circuitry for data loading, and to the multiplexers, with maximum phase alignment. All prototype transmitters utilized the same scheme for clocking. 102 A chain of buffers delays, whose inputs are the PLL 0o and PLL 90o signals from the PLL, constitutes the majority of the clock distribution system (see Fig. 5-27). It ensures that data and clock travel in the same direction and that delays in the shift registers, buffers, and multiplexers are matched to delays in the delay chain. The most critical path in the clock distribution circuitry is found between the PLL and the 4-to-1 multiplexer. Here the PLL 0 o and the PLL 90o signals must stay phase matched to ensure alignment of bit edges on the output. Offsets in these signals directly translate to phase jitter and more difficult signal reception. To ensure alignment, the delay chain was designed to be symmetrically loaded, of minimal length, and perfectly balanced. Because the 4-to-1 multiplexer was designed as a two stage multiplexer, and because of the critical timing required by its architecture, a precise delay of one multiplexer was added to the 90o line, guaranteeing perfect clock alignment at the multiplexers. Consequently the SEL 0o and SEL 90o signals are offset by exactly one multiplexer gate delay. The next most important timing event is the clocking of the four shift registers. The 90o branch of the delay chain and its inversion handles all four registers. Since loading from the 8 latches (4 MS latches) was a concern, a driver buffer was added to the front of each register. This forced the addition of an equivalent delay into the delay chain. The total number of gate delays difference between the CLK AD input and the SEL 0o signal was designed to be zero, to ensure maximum noise margin. The timing diagram, Fig. 5-28, clearly depicts the precise relationship between the signals. Loading the 16 bits of parallel data requires a clock edge every 800 ps (50 ps x 16 bits), a time four times slower than the PLL period, thus necessitating a load counter, depicted in Fig. 5-29, which is essentially a frequency divider. Not only does the load counter have to divide by four, it also has to create two load signals separated by 100 ps because of the clock offset on registers A and D versus B and C. The load signals select the multiplexer input on each bit to its load mode rather than shift mode. When the next rising clock edge arrives data is latched into the register. The final aspect of clock distribution is the generation of the signal that informs the external circuitry that it is ready for new parallel data. The straight forward solution is to use the LOAD AD signal. This guarantees that when both loads have completed, the data has had a maximum amount of time to settle. 103 Although the use of a delay chain makes clock distribution straightforward and very reliable, it does have one serious drawback. Since it lies between the PLL and the output multiplexer, it contributes to the overall phase noise and jitter of the circuit. This noise is a result of shot noise, thermal noise in the chain of buffers, fabrication mismatches between the 0o and 90o phase lines, and coupling between the lines and substrate. Minimizing these noise effects involved designing a symmetric and tight layout of the delay chain. externally supplied parallel data 4 CLK AD LOAD AD D D Q S D Q S D Q S A Q S BA 4 D D Q S D Q S D Q S Q B S SO 4 D D Q S D Q S D Q S Q C S 4 CD Q S D Q S D D Q S LOAD CLK delay chain PLL load counter 0o 90o Figure 5-27 Clocking scheme for transmitter The top level schematic for the transmitter clocking circuitry includes the PLL as the clock generator, a delay chain for distribution, the registers, and the 4-1 multiplexer. 104 SEL 90o Q S D SEL 0o D PLL 0o 3 gates PLL 90o 3 gates CLK AD SEL 0o pulse every 4th CLK 0o edge LOAD AD A,D B,C BA, CD SEL 90o SO 0 200 400 600 800 Figure 5-28 Transmitter clock timing The timing of the transmitter revolves around the delay chain which ensures that the data and the clock flow in the same direction. The bottom three signals clearly show how the 4-1 multiplexer interleaves to produce the output. D Q LOAD CLK LOAD CLK LOAD BC D Q LOAD AD 800 ps D Q LOAD BC D Q LOAD AD 100 ps 200 ps Figure 5-29 Load counter The load counter divides the PLL signal by four and generates two 200 ps load pulses offset by 100 ps from each other. 105 time (ns) 5.6. Data Encoding Transm itter Data encoding is a general term for such techniques as: encryption, compression, improved transition density, error detection, channel alignment, byte alignment, DC voltage balance, simplified clock recovery, and frame detection. Typically, improved transition density and channel alignment are performed on-chip although all could potentially be performed off-chip. No encoding was performed in either Serdes I or Serdes II. See Section 5.11.1. on page 118, for a brief study and recommendation of the 8B/10B encoding scheme. 5.7. Line Driver Transm itter The purpose of the line driver is to amplify the transmitter signal, and drive the 50 Ω output line. Depending on the specifications, this can either be a single-ended or differential circuit [48], [36], [37]. At these speeds differential is usually the optimum choice. The bandwidth of the circuit must be large enough so that is will not attenuate the high frequency components and close the signal eye. Noise is also an issue since any phase noise introduced by the line driver will be directly realized on the output. The line driver in the Serdes I circuit utilized a simple pad driver circuit which was not optimized for this purpose. In Serdes II, however, the line driver was integrated into the final output multiplexer which limited the introduction of noise. The output voltage swing was designed to be 400 mV. 5.8. Internal Testing Circuitry 5.8.1. Serdes I Serdes I was designed without the ability to accept external parallel data. Instead, the data was generated pseudo-randomly on chip, through a 16 bit linear feedback shift register (LFSR). 106 T ran sm itter Designing a true maximal length 16 bit LFSR would create a sequence 65,535 bits long, and because 16 bits are transmitted then followed by a single shift and repeated, the serialized length is greater then 1 million bits. This was determined to be too long for the simple reason that it would be very difficult to determine whether the transmitter was working correctly, during testing. An oscilloscope can only capture so much information and it would be nearly impossible to find the exact position within the sequence. Instead, a four bit maximal length LFSR followed by a 12 bit shift register was implemented. The circuit shown in Fig. 5-30, has 16 MS-latches clocked through a buffer tree, an XNOR gate for feedback, and an AND gate to create a synchronizing signal. The synchronizing signal, SYNC senses all zeros in the LFSR and was placed on an output pad in order to detect the start of the sequence. The ZBIT is the final bit of the generator and was also placed on a pad to analyze the operation of the circuit. A 4 input AND gate, not shown in the figure, determines if the LFSR contains all ones and if so inverts the output of the XNOR to force proper oscillation. SYNC ZBIT 0 1 2 3 4 5 15 CLOCK 4 bit LFSR 12 bit shift register 0000111011001010 1000011101100101 0100001110110010 1010000111011001 0101000011101100 0010100001110110 1001010000111011 1100101000011101 0110010100001110 1011001010000111 1101100101000011 1110110010100001 0111011001010000 0011101100101000 0001110110010100 Figure 5-30 Serdes I LFSR A 16 bit, on-chip pseudo-random pattern generator consists of a 4 bit LFSR and a 12 bit shift register. The circuit used in the transmitter is capable of generating a 240 bit serial stream. 5.8.2. Serdes II Off-chip testing of this serial communication system required testing equipment that operates at the bandwidth of the transmitter and receiver. At the rates being designed for no such equipment exists and comprehensive testing must be done on-chip. The testing scheme that was implemented feeds the transmitter serial output directly to the receiver and the parallel data received back into the transmitter as shown in Fig. 5-31 [43]. A single bit offset between the receiver outputs and the transmitter inputs allows data input on Tx pin 0 to travel through the loop 16 times, and then output on pin 15 of the Rx. By generating a 107 pseudo random sequence (see Fig. 5-30) at the input and verifying that sequence at the output, the bit error rate (BER) can be measured. The verifying circuit generates a pulse every time a good sequence is measured. A missing pulse indicates a bit error. A divider was added at the output so that high BER measurements could be made without high bandwidth test equipment. With a 12 bit maximal length LFSR, a 4095 bit sequence can be generated. Since the total sequence must traverse the loop 16 times, a minimum BER of 10-5 can be detected with this method. The maximum time is determined by the time length of the test. transceiver bit pattern verification bit pattern generator LFSR Rx 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Tx good pattern Rx bit 15 reset bit pattern verification Figure 5-31 True error rate detector The TERD operates by feeding the transmitter output back into the receiver and feeding the deserialized data back into the transmitter. A one bit offset with an LFSR and verifier determines the BER. The TERD requires proper channel alignment, which is accomplished through data encoding and decoding. Since these circuits were not included in the second prototype, the bit pattern generator was configured to feed directly into the transmitter through the pin mapping shown in the top of Fig. 5-32 Various bits had to be duplicated, but after inversion and separation the data is still sufficiently random. 108 CLOCK 0 1 Tx input pins 9 12 14 8 7 3 11 13 5 0 10 6 2 4 15 1 LFSR output pins 0 1 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 reset Figure 5-32 Serdes II bit pattern generator A 12 stage LFSR with feedback to three stages yields a maximal length LFSR. A reset line was needed for use in the bit pattern verifying circuit. 5.9. Implementation and Fabrication T ransm itter 5.9.1. Serdes I A -4.5 V power supply was chosen for this chip. This left plenty of room for the three levels of logic and the active current sources. Power minimization was not a design goal so this voltage was not optimized. Fig. 5-33 shows the artwork and fabricated pictures of the first transmitter design, and Table 5-1 shows the pad connections. The chip has two inputs: the 625 MHz reference clock and a full/half rate frequency selector. Three outputs were included to diagnose problems with the PLL and delay chain. Two pads output the LFSR sequence and another pad outputs when the LFSR is reset. 5.9.2. Serdes II The goal for the second Serdes chip was to correct problems from the first iteration, combine the transmitter and receiver into one chip, and make the chip packagable. Correcting the problems involved redesign of the VCO, and PLLs to meet the 20 Gb/s specification. Combining the two systems allowed the development of an on-chip testing circuit (TERD), which could perform full feedback testing. A drawback was that fewer probe pads were available in the larger chip. Designing for packagability involved the use 109 of an array of C4 pads for flip-chip packaging. Pad drivers and receivers were developed to accept and drive the 16 bits of parallel input and output data. Table 5-1 Pin-out of Serdes I transmitter Pin I/O Description S0 not used S1 RF input reference clock (625 MHz) S2 DC input frequency select (20 Gb/s or 10 Gb/s) S3 RF output PLL output (5 GHz) S4 RF output delay chain output (/8) (625 MHz) S5 RF output delay chain output (5 GHz) S6 not used S7 RF output LFSR: sequence reset pulse S8 RF output LFSR: sequence S9 RF output transmitter out S10 not used S11 not used The east half of the chip was comprised of the transmitter as shown in Fig. 5-34. High frequency probe pads T4, and T5 were used for the differential serial out signals. The 625 MHz reference input pad, T8, and the PLL clock output pad, T9, were required for testing. An on chip LFSR, which was part of the test system could be selected through a DC pad, C8, to drive the transmitter. Bit 3 of the LFSR was routed to output pad T1 to verify the proper functioning of the test system. The transmitter utilized two VCOs, which could be multiplexed through pad, C11, into the clock synthesizer PLL. A selectable divide-by-2, circuit driven by pad C10, was added to the output of the PLL for half frequency operation of the transmitter. An input filter to help suppress high frequency phase noise from the reference could be activated by pad C9. 110 S0 S6 S1 S7 PLL LFSR S8 S3 S9 test S2 S5 delay chain S4 mux S10 driver S11 artwork fabricated chip Figure 5-33 Serdes I transmitter layout and photograph On the left is the final artwork for the first transmitter design. On the right is a microphotograph of the fabricated part. The receiver located on the west side of the chip, accepts differential serial data on the two high frequency pads R4, and R5. The recovered clock, important for lock verification, was routed to a pad R8. By using pads C3, and C4, four different demultiplexed bits could be analyzed on pad R9 for proper operation. The test source built into the receiver was controlled through C1 and C2, enabling three different test patterns. The true error rate detector circuit pulsed pad R0 when a bad packet was seen and toggles R1 when a good packet was detected. In order to reduce chip power, the circuits were optimized around a supply voltage of -3.3 V. This represents a 25% power savings when compared to the Serdes I -4.5 V supply. 111 Table 5-2 Bondpad pin-out of Serdes II chip Pin I/O Description Pin I/O Description T0 RF out duplicated data into Rx R0 RF out TERD: bad packet seen R1 RF out TERD: toggle every full packet T1 RF out LFSR: bit 3 into Tx R2 Power Vee (-3.3V) T2 Power Vee (-3.3V) R3 Power Gnd T3 Power Gnd R4 RF in differential serial in T4 RF out differential serial out R5 RF in differential serial in T5 RF out differential serial out R6 Power Gnd T6 Power Gnd R7 Power Vee (-3.3V) T7 Power Vee (-3.3V) R8 RF out receiver clock T8 RF in R9 RF out selected demuxed data T9 RF out PLL out (divided by 8) C0 DC in Rx test source control voltage C6 Power Vee (-3.3V) C1 DC in Rx test source select A C7 C2 DC in Rx test source select B C8 DC in select Tx input source C3 DC in TERD: select A test bit C9 DC in enable Tx input filter C4 DC in TERD: select B test bit C10 DC in enable TX PLL divide-2 C5 Power Gnd C11 DC in select VCO (5/10 GHz) 112 ref clock (625 MHz) not used 16 bit input data R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 Rx Tx T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 16 bit output data Figure 5-34 Serdes II chip layout and microphotograph Shown here is the full Serdes II chip including a microphotograph in the bottom left corner. The testing pads are located around the perimeter. 5.10.Testing Results 5.10.1. Serdes I (transmitter test results) An output waveform captured directly from the oscilloscope is shown in Fig. 5-35(a). It shows the bit pattern expected from the on-chip LFSR testing circuitry. The ability of the PLL to achieve lock was very poor and a narrow pull-in range of 420 MHz to 460 MHz was measured. The hold-in range was larger, from 393 MHz to 490 MHz: equivalent to a data 113 bit rate of 12.6 Gb/s to 15.7 Gb/s. At a bit rate of 15.3 Gb/s, the rms phase jitter was measured1 to be 6.3 ps, or about 10% of the bit width. (a) (b) Figure 5-35 Transmitter waveform (Serdes I) (a) The output waveform of the transmitter running at 15 Gb/s with a 350 mVp-p swing. The pseudo-random pattern matches the expected pattern from simulations. (b) An eye diagram at 15 Gb/s showing the relatively large phase noise and its effects on the closing of the eye. Although the transmitter was designed to operate at 20 Gb/s it performed 25% worse, 15 Gb/s, which can be attributed to two important factors. The first was a result of the VCO loading environment, which ideally consists of equal loading with four minimum sized buffers. It was instead loaded with two buffers on one stage, one on each of two others, and none on the fourth2. The effect was a reduction in speed probably due to the double load on one stage, and a non-quadrature phase mismatch between stages. The second factor was a result of simulations that did not adequately compensate for interconnect parasitics. Resistive and capacitive effects at these frequencies can have a profound effect on the 1. Performing a true phase noise, and jitter measurement requires a spectrum analyzer capable of an absolute reading. A time domain oscilloscope, such as the one used to collect this data, merely measures the jitter between the signal and the trigger. If the trigger signal is correlated in time to the measurement signal then the jitter measurement can be quite a bit less than the absolute jitter. 2. This was an oversight and was definitely not intended. The receiver which was designed a few weeks after this had ideal loading characteristics. This improved its response and left the transmitter and receiver with two non-overlapping frequency ranges. 114 overall speed of the chip. Lack of time and understanding for these simulations produced slower than expected results. Both of the issues discussed were addressed and solved in Serdes II. The loads on the transmitter and receiver VCOs were carefully checked to make sure loading was balanced and minimal. Interconnect simulations produced better designs in critical circuits such as the VCO and PLL. A wide margin was introduced in the design of the VCO to account for unknown effects. 5.10.2. Serdes II (transmitter test results) The Serdes 2 design was successful in attaining the 20 Gb/s target bit rate. The relevant eye diagram is shown in Fig. 5-36. The output voltage swing is 350 mV and the eye is 30 ps wide and 200 mV high. This represents a big improvement from the original design, which failed to meet the specifications. The eye diagram is also much cleaner and symmetric with less total rms jitter. Figure 5-36 Serdes 2 transmitter eye diagram Shown here is an eye diagram at the target 20 Gb/s. It shows an opening 30 ps wide and 200 mV high. 115 The PLL has a wide pull-in range from 3.6 to 5.3 GHz (14.27 to 21.58 Gb/s), which is more than 75% of the total frequency range of the FFI VCO. The hold-in range is identical to the pull-in range, indicating a well balanced and nearly optimal PD. When using the higher speed VCO the pull-in range changed to 5.4 to 7.6 GHz, yielding an upper data rate of 30 Gb/s. Jitter measures the accumulation of transition offsets over a given length of time. For an open loop, without a PLL, a clock will have exponentially increasing jitter with respect to time. When placed in a PLL, the jitter levels off and becomes constant after one bandwidth time constant. For the Serdes 2 PLL, the jitter was measured with the time domain oscilloscope at 4.3 ps with the reference signal and 2.9 ps without. This indicates that considerable jitter was being introduced by the signal source. Fig. 5-37 shows the phase noise spectra of the open loop VCO, the open loop reference, the open loop reference plus 18 dB and the closed loop PLL. The reference plus 18 dB is the effective phase noise seen at the input to the PLL. The PLL closed loop phase noise behaved as expected. First, at low frequencies the phase noise approached that of the reference. This phase noise was expected since this was well below the loop bandwidth of 6.2 MHz and the PLL is able to track out the VCO leaving just the reference noise on the output. The difference between the PLL and reference phase noise is likely from noise introduced in the loop filter. Close to the loop bandwidth of 6.2 MHz, the sum of both the reference and VCO noise contributed to the total noise. And above the loop bandwidth, the phase noise should follow closer to the VCO phase noise and that is what was seen. A more accurate way to measure jitter is in the frequency domain. This enables the removal of the in-band low frequency jitter, which is easily removed by the receiver PLL, from the rms jitter measurement. Integrating the PLL phase noise plot from 100 kHz to 100 MHz gives an rms jitter of 1.4 ps. This value is lower than the 4.3 ps found with the time domain oscilloscope, which indicates that a larger amount of low frequency jitter can be found in the reference signal. The preliminary specification for OC-192 SONET indicates that the maximum acceptable jitter must be less then 0.09 UI (Unit Interval) for 1012 bits. Finding the associated rms jitter involves integrating the Gaussian probability density function (pdf) 116 from x to infinity and setting the result equal to the bit error rate of 10-12. The value of x is about 7.5 standard deviations, yielding a rms jitter specification of 1.2 ps at 10 Gb/s. Although the transmitter jitter of approximately 1.4 ps is larger than the SONET specification of 1.2 ps, this circuit was not designed with SONET in mind. For short-haul communications higher jitter is more acceptable. -60 VCO open loop -70 Phase Noise (dBc/Hz) -80 -90 PLL closed loop -100 ref - 18 dB -110 -120 reference open loop -130 -140 0.1 1 10 100 Frequency (MHz) Figure 5-37 Tx PLL measured phase noise spectra The PLL closed loop behaved as expected with the PLL tracking out the VCO noise at low frequency and following the VCO noise at high frequency. 5.11.Future Design The extremely large scope of this project left a number of areas of research untouched and undeveloped in the first two fabricated designs and the third simulated design. The basic elements of the transmitter were designed with optimizations and research performed only in specific areas. The remainder of this section describes key areas that are recommended for future effort in order to establish these designs as highly functional, useful, production-worthy designs. 117 5.11.1. 8B/10B Encoding 8B/10B encoding solves such issues as transition density imbalance, error detection, command insertion, and DC balancing [26], [35]. It does so by adding an additional two bits of additional information for every eight bit input and requires a 25% increase in speed for the same information throughput. The frequency of transitions in the data is a very important factor in the design of the receiver. In general, the more transitions provided to the receiver, the better the PLL’s ability to lock into the serial stream. 8B/10B encoding guarantees a maximum run length of five bits, and a lowest transition density of 30 transitions per 100 bits. Defining a minimum density makes it easier to model the data stream arriving at the receiver. Another feature of the encoded stream is an equal number of ones and zeros. This allows all single bit errors to be detected. In addition, because of the much larger 10 bit word space, the decoder can detect undefined words and flag them as errors. The DC balance is the average of the number of ones and the number of zeros. For high speed optical links, it is very desirable to have a DC balance of 0.5, which corresponds to an equal numbers of ones and zeros. This stabilizes effects, such as heating in the optical circuits, which can be a function of the sign of bits being sent. 8B/10B guarantees a DC balance of 0.5 because it forces equal number of ones and zeros per character. Since data encoding occurs at the parallel data rate of 1.25, Gb/s the necessary circuitry can be designed completely in CMOS. This reduces power, and space consumption, and allows the use of powerful EDA tools for layout and design. An additional role for 8B/10B encoding is for channel alignment, which guarantees that the bit 0 of the Tx is connected to bit 0 of the Rx. This requires a 16 bit rotator with a detection mechanism to rotate the streams until they match. 5.11.2. Transmitter data retiming A technique that can be used to reduce the output phase jitter of the transmitter is to clock the output signal directly from the PLL through an MS-latch. This retiming circuit alleviates all the noise introduced by the multiplexers and provides the minimum signal path between the transmitter serial output and the PLL. 118 A significant source of jitter on the output data is called deterministic jitter. It is the result of non-periodic data induced noise. Pull-up resistors at the top of CML trees are a common source because as current flows through the resistor they heat up; warmer resistors produce higher rms noise. The ultimate effect is that the noise becomes dependent on the data stream. A stream with a large number of zeros will have a higher noise component than one with an equal number of ones and zeros. The problem with data retiming is that it requires a latch that can operate at the functional speed of the transmitter. In this case, that speed is 20 GHz, and if some encoding is introduced then it can be as high as 25 GHz. Simulations show maximum operation of a latch to be unreliable above 15 GHz. This is a result of the large delay through the two CML tree gates and the feedback that is inherent in these circuits. Although direct data retiming is unattainable unless a much faster latch is found, other improvements can be made. Since the final 4-to-1 (symmetric) multiplexer defines the output jitter, an improvement would be to drive the multiplexer directly by the PLL rather than through the timing delay chain. This adds to design difficultly because the timing of the entire transmitter is running opposite to the timing of the data. The primary benefit of this method is the reduction of five buffers of phase noise introduced by the delay chain. Current Method Proposed Method data data transmitter transmitter clock PLL clock (a) PLL (b) Figure 5-38 Data and clock timing By moving the PLL to the input of the multiplexer (b), the clock must run opposite the data. This creates timing difficulties but decreasing the output phase noise of the transmitter. 5.11.3. LC Oscillator The primary drawback to using the FFI ring oscillator in the transmitter is its very poor phase noise characteristics. LC oscillators have much higher quality factors and 119 considerably less phase noise and jitter [21],[22],[44],[45]. One problem with typical LC VCOs is that they only produce a single phase clock, but the transmitter architecture in this research requires a clock and its quadrature. A possible option, and an area for further research is in multiphase LC oscillators [46],[47]. They have the best of both worlds: low phase noise, and quadrature outputs. 120 6 Design of the Receiver Receiver 6.1. Project History The first receiver (Serdes I) was designed for fabrication in February 1999 and only had a 1-to-4 demultiplexer and clock extractor. Various improvements and optimizations yielded Serdes II, which was a more efficient design, capable of full 16 bit demultiplexing and external data input. 6.2. Receiver Architecture 4 data 16 Phase Detector (PD) demultiplexed data loop filter (PI control) VCO 8 phases Figure 6-1 Top level receiver architecture The receiver is a PLL with a PD, called a transition detector, a PI loop filter, a VCO, and a demultiplexer to extract the NRZ bits from the serial data. The receiver is a PLL and demultiplexer that locks an internal VCO to externally supplied data and extracts the non-return-to-zero (NRZ) bits from the data. Data arrives serially as a differential signal and is buffered in preparation for driving the PD. The information collected about transition phases is combined and fed into a proportional and integral loop filter. The filtered signal is used to drive the VCO to a frequency which matches the frequency of the external data. In addition to collecting timing information, the 121 PD also performs a 1-4 non-aligned demultiplexing of the data. Another circuit, also driven by the VCO finishes the demultiplexing and generates 16 bits of parallel data. 6.3. Receiver PLL Receiver The receiver PLL is considered a clock and data recovery (CDR) circuit and has the primary role of extracting the data bits from the serial signal and ensuring that the extracted bits are not corrupted. The process is made more difficult than in a standard PLL, because random or pseudo-random data has no guaranteed transition times. The 3-state and XOR PDs used in the transmitter PLLs, for example, can only operate with periodic signals. A specialized PD that can handle non-periodic information and allow a VCO to lock to the fundamental frequency of the data is required. Merely locking the VCO to the data’s frequency is only half the problem. The system must also sample, or extract the information contained within the data stream, using the recovered clock The receiver designs for Serdes I through III, all utilize a transition detector (TD) PD. It twice oversamples the data signal and generates a digital measure of the phase difference between this signal and the clock. It essentially indicated whether the clock is too fast or too slow relative to the data. With this information, lock can be acquired and because of the nature of the sampling, data can easily be extracted. The problem with this PD, which was addressed in the third prototype, is the very small pull-in range of the PLL. Without an analog measure of phase difference, the clock and data frequencies have to be very close for the PLL to pull-in. Fig. 6-2 depicts block diagrams for the three receiver prototypes. The first and second designs differ in the integrator design, and the VCO. The third integrates an entirely new loop that is very good at acquiring frequency lock but poor at extracting the data, into the PLL [14], [30], [51]. Together with the TD PD, the PLL’s pull-in range is greatly increased without any sacrifice in performance. 122 data Serdes I transition detector (PD) FET charge pump VCO data gain block transition detector (PD) negative impedence charge pump Serdes II VCO gain block reference Serdes III 3-state PD VCO transition detector (PD) data gain block 2 negative impedence charge pump VCO gain block 1 Figure 6-2 Receiver PLL evolution The receiver PLL has gone through two major improvements. The first design utilized a FET charge pump which was replaced with a negative impedance charge pump in the second design. The third prototype added a referenced frequency detector which greatly improved the pull-in range of the loop. 123 6.3.1. Phase Detector 6.3.1.1. Transition Detector (PD) Data transitions provide the only means to measure the phase of the incoming serial data. If the data were periodic then we could be assured of a transition at a specific time and directly compare it with a coincident VCO transition, similar to the clock synthesizer PLL in the transmitter. However, data by definition, is non-periodic and transition locations cannot be assured at any time. For example, data containing ten ones followed by twelve zeros, containing only two transitions, could be received. Since a transition between bits cannot be guaranteed, there must be no action when no transitions are received and tracking must be performed when transitions are received. The aspect of the clock recovery circuit that had critical implications on its development, was the use of the same eight phase ring oscillator used in the transmitter. It was felt that by matching the oscillators in the transmitter and receiver, they could be ensured to operate at the same speeds and the development of only one VCO would be required. Running at 5 GHz, either the CS, or FFI VCO generates eight unique phases (0o, 45o, 90o, 135o,...)1 each separated by 25 ps. Serial data, arriving at 20 Gb/s can be broken up into bits 50 ps wide. Taking complete advantage of the multi-phase clock, the data is sampled every clock phase resulting in a twice oversampling receiver scheme. In other words, for every bit, two samples of the signal will be taken. Sampling is handled by eight MS-latches whose clock inputs are tied to one of the eight clock phases (see Fig. 6-3). In the locked and stable condition, four of the latches sample at the center of the bits and return data information while the other four sample on the transition and return timing information only. If the latches are labeled consecutively by their clock phase inputs, W, X, Y, Z and their inverses, then the data latches are W, Y, W, and Y, while the timing latches are X, Z, X and Z. 1. Although the VCO has only four unique outputs the inverse of each of them yields the remaining four phases. 124 S dataB transition location detector transition detector F D Q DQ F F sampling latch phase slice D Q serial stream Y X dataA S Z DQ W VCO dataC S W DQ Z X D Y Q D F DQ Q F dataD S Y Z W 75 ps 100 ps X 50 ps W X Y Z W serial data 200 ps 25 ps 0 ps ⊗ X = FAST X ⊗ Y = SLOW Y ⊗ Z = FAST Z ⊗ W = SLOW W ⊗ X = FAST X ⊗ Y = SLOW Y ⊗ Z = FAST Z ⊗ W = SLOW W X Figure 6-3 Receiver topology The receiver is made up of eight MS-latches, each tied to a unique phase of the VCO. Since each phase is separated by 25 ps, the data is twice oversampled, and thus, able to extract transition timing information from all edges. FAST or SLOW in the diagram is a command to the VCO. Fig. 6-4 shows a detailed look at the transition detector used in Serdes I. Data is latched with L1 using Φn, the n-th buffered phase of the VCO. Φn and Φn+1 are consecutive phases of the VCO, separated by 25 ps, or 45 o, and Φn is equal to Φn+8.The sampled data, 125 sn, is XORed with the sample from the previous detector, sn-1, and retimed with L2. The clock input to this latch comes six phases later, or after 150ps, in order to allow the output of the XOR to settle to the correct value. tn, the output of L2, indicates whether a transition has occurred during this phase slice. The total time that the tn signal remains high is dependent on the period of the VCO and whether additional transitions are detected in this phase slice. With the VCO running at 5 GHz, the minimum time that tn is high is 200 ps. This circuit is then repeated eight times to collect transition information from every transition. sn-1 data Φn’ Φn sn MS-latch D Q L1 Φn D Q tn sn-1 L2 sn Φn+6 Φn+6 tn Figure 6-4 Transition detector in Serdes I The first iteration of the transition detector had a latch to sample the data. This sample and the sample from the previous detector are XORed together and latched again to produce the transition detector signal. The phase plot in Fig. 6-4 shows a transition detector on the X (45o) phase. It uses samples from itself and from the previous detector to detect transitions within the shaded region. The XOR of these signals is clocked six phases later. One of the issues that defines the performance of this circuit is the time between when the data is sampled and when the detected-transition signal changes. Assuming a 20 ps gate delay, the approximate time is 170 ps. And since the transition detected signal is high for 200 ps, the effect of a single transition lasts for a total of 370 ps after the sample, which is equivalent to 7 bits. This is important, because during lock it is desirable to have the frequency of the VCO adjust as quickly as possible after a transition is detected. The digital nature of this circuit results in discrete changes to the VCO output, so oscillations are natural when in lock. If the PD delay is large then these oscillations will also increase, as the VCO’s frequency continuously overshoots and undershoots. A further analysis of this phenomena can be found in Sec. 6.3.2. on pa ge130. The motivating factor in the design of Serdes II’s TD, shown in Fig. 6-5, was to reduce the delay through the detector. In the first prototype this time was 170 ps, which 126 directly effected the ability of the PLL to maintain and acquire lock. In order to improve on that design a look at the timing requirements of the XOR was required. The two level nature of the XOR gate requires the level 2 input to precede the level 1 input by approximately 10 ps. The time between sampled data sn-1 and s is equal to 25 ps, and with the additional 5 ps of delay introduced by the level 2 output of the MS-latch a total of 30 ps is found between the level 2 input to the XOR gate and the sn-1 signal. When 40 ps of buffer delay is added to the sn-1 signal, a time delta of 10 ps between the inputs of the XOR gate is realized. tn sn-1 data Φn’ D Q 1 2 sn tn sn Φn sn-1 L1 Figure 6-5 Transition detector in Serdes II Optimization of the transition detector allowed the removal of the second MS-latch and reduced the total delay by 75%. This circuit is simplified and requires a less complicated layout. When the timing is optimized to this extent, the necessity of the second MS-latch, L2, is removed. The same 200 ps pulse is created, but the total transition detector delay has been reduced from 170 ps to 40 ps. An additional benefit is in the simplified layout of this circuit; only one clock phase is required. In the Serdes I circuit, a complex routing scheme was required because two phases were necessary. The gain of the transition detector is not clearly defined because of the digital nature of the circuit. When the phase difference is greater than zero, it will generate a slow pulse and when less then zero, it will generate a high pulse. There is no linear relationship between phase and output. Instantaneous gain must therefore be defined to be infinite. The average gain, however, is not infinite and can be found when a statistical distribution of transitions or jitter is introduced. A real data signal does not have perfect transition separation but instead has transitions separated according to a constant plus a random gaussian variable. This jitter acts as “transition fuzz” which effectively gives the PD gain. The process of calculating this gain is shown in Fig. 6-6 for both a uniform and Gaussian distribution. Fundamentally, it 127 comes down to subtracting the two areas created by splitting the probability density function (pdf) around zero, after setting a specific mean and standard distribution. For Gaussian jitter, an approximation; the gain is assumed linear based upon a line that passes through the point at one standard deviation. instantaneous PD output average PD output A θ -A θe uniform transition distribution (pdf) STD = σ 0.58 σ average phase error gaussian transition distribution (pdf) STD = σ θ K d' = A 0.58 ---------σ 0.68 σ K d' = A 0.68 ---------σ Figure 6-6 Gain of transition detector with data jitter Solving for the gain of the transition detector must take into account the fact that the data has jitter. This jitter spreads out the transitions producing an average PD output. In order to include the effect of the transition density (tpb = transitions per bit), Kd is multiplied by tpd. A factor of four must also be included to account for the fact that a slow/fast pulse is carried across 4 bit widths. This yields the final transition detector gain: K d = Vp 0.68 ---------- 4 ( tpb ). σ σ = σ t 2πrad ---------------100ps (6-1) In the Serdes I implementation with a pulse size, Vp, of 300 mV, a transition density of 1/4 and an rms jitter value, σt, of 4 ps, the detector gain equals 811 mV/rad. In the Serdes II transition detector, the pulse size was reduced to 40 mV yielding a smaller gain of 108 mV/rad. 128 6.3.1.2. NRZ Phase/Frequency Detector (PD/FD) (Hogge) The digital nature of the transition detector PD and its phase response, yields a very poor pull-in range. When lock is acquired, however, this PD has very strong noise immunity, and an inherent ability to extract data from the signal. The Hogge PD helps the poor pull-in range but has no net effect on the TD PD properties. Its use, in conjunction with the transition detector PD, was evaluated but not implemented for Serdes III. The schematic of the Hogge PD is shown in Fig. 6-7 [52], [53] which operates on the NRZ data and generates an analog signal based upon the difference between it and the VCO. Data, vi, must arrive at half the frequency of the clock, vo, for the PD to operate correctly. This is accomplished by dividing the input data signal down 4 times. This has the negative effect of removing every three out of four edges. The two latches and the va XOR gate retime the data by creating pulses based on data transitions but timed to the clock transitions. The vb XOR gate, on the other hand, has a similar waveform but the edges are timed with the data transitions. The dc component of the difference between these two signals yields a measure of the phase difference. critical delay vi vi ∆θ D vo vb Q1 vo vd Q1 D va Q2 vb vd −π Q2 ∆θ π va for 50% transition density Figure 6-7 Phase detector for NRZ data This circuit shows one technique for detecting phase for NRZ data in a PLL. The bit rate of the data and frequency of the clock must be the same. The output is taken differentially and yields an continuous analog signal as a function of phase difference. 129 The most important aspect in implementing this PD was maximizing the figure of merit. It this case it is defined by the range of pulse widths expressed in vb against the constant width of va pulses. Ideally, the widths of vb would range from 0 to twice the width of a va pulse. Finding this solution required a fine adjustment of the critical delay, which is approximately the delay through an MS-latch. By minimizing the integral of the vd versus ∆θ plot over a full 2π radians, the figure of merit can be maximized. The gain of this PD is a function of the transitions per bit (tpb) for the incoming data stream. For a 11001100... stream, the tpb is equal to 0.5. From simulation, the gain was found to be 80 mV/rad/tpd, which includes the divide-by-4 circuit. Ultimately this PD was not used because it was exceeding difficult to optimize the delays in the circuit. Slowing down the clock and data was the only way to correct the problem and as a result the pull-in range suffered. The Serdes III implementation addressed the small pull-in problem by using an external reference signal. 6.3.2. The Loop Filter Receiver The purpose of the loop filter is to take the digital transition information from the eight transition detectors and create an appropriate VCO signal. The transition detectors yield relative information in regards to data and clock phase offset, so an integrator is required. An integrator alone is insufficient in the loop, so a proportional factor is summed with the integrator output. Together the proportional and integral control comprise the PI loop filter. Although the loop filter in Fig. 6-8 is expressed as a integral and proportional gain it can also be expressed by the pole-zero equation s+ω K h --------------2s K h = KP K ω 2 = ------I KP (6-2) where ω2 is the loop zero and Kh is the high frequency gain. Unlike the frequency synthesizer in the transmitter, the integrator and proportional gain components must operate at the frequency of the clock and accept four faster and four slower signals. This necessitates the use of specialized circuits able to handle the much 130 higher frequency. The Serdes III design, although slightly more complicated, still contains the basic components shown in Fig. 6-8. phase detector(s) 4 KI/s Ko fa st er sl ow er loop filter 4 VCO KP 8 Figure 6-8 Receiver loop filter The receiver loop filter accepts eight “digital” signals from the transition detectors and produces an analog control signal for the VCO. 6.3.2.1. FET Charge Pump / Proportional Control (Serdes I) The charge pump integrator shown in Fig. 6-9 utilizes four field effect transistor (FET) pairs to place and remove charge from the capacitor. Each FET can act independently of the others, so one could be adding charge while another is removing it. Careful consideration assured that the nFET and pFET sizes were chosen to have matching currents. Each FET draws on average 60 µA during one complete period of the clock. With a 300 mV input from the PD this corresponds to a 0.0002 1/Ω gain from the FETs. With C f equal to 4 pF, a slow/fast pulse will change the capacitor voltage by ± 3 mV. Dividing the FET gain by the capacitance yields the integrator gain K I = 50 Mrad/s. Proportional control, on the other hand, is handled through eight differential switches, one for each fast and slow PD output, with one branch tied together to form a single-ended “analog” signal (Fig. 6-10). By default, without any fast or slow signals, all fast trees will pull 0.75 mA through the pull-up resistor Rcc and all slow trees will pull 0 mA as shown in Fig. 6-10. In this way, the voltage across Rcc will increase when a fast signal is received and decrease when a slow signal is received. Rcc was set to 100 Ω, which produces a 75 mV change for each input pulse. The emitter follower tied to Rcc only introduces a DC offset to interface properly with the summing junction. Designed similarly to the integrator, the proportional circuit inputs are all able to operate independently. 131 Vcc This MOSFET is designed to balance the current drawn from the base. Cf S1 S4 F1 F4 4 MOSFET pairs Vint -2 V S: A slow signal places a charge packet on the capacitor. F: A fast signal removes a charge packet from the capacitor. Figure 6-9 MOSFET charge pump integrator The FET transistors in this circuit act as current switches removing and adding charge to a capacitor. This action integrates the slow and fast inputs. R cc Vint F1 F1 S1 S1 aVref (VCO) summing junction repeated 4 times for each S/F pair Figure 6-10 Proportional control and summing junction This circuit provides the proportional gain for the loop filter and sums the result with the signal from the charge pump integrator. This ultimately drives the aVref control voltage for the VCO. For each 300 mV input pulse, the output of the proportional control circuit changes by 75 mV. This corresponds to a proportional gain, Kp, of 0.25. The summing junction combines the outputs of the integrator and the proportional gain stage. It introduces an 132 additional gain of 0.286 into the total gain of the loop. Given the gain derived above the loop filter has a zero, ω2, at 32 MHz and a high frequency gain, K h, of 71.5 m. Collecting all the gains from this circuit and multiplying by the pulse period shows a ±0.7 ο phase change of the VCO for every slow/fast pulse. 6.3.2.2. Negative Impedance Charge Pump (Serdes II) The goal for the receiver in the Serdes II implementation was to replace the FET charge pump and proportional control with a much simpler negative impedance charge pump, while keeping all the PLL parameters the same. There were problems associated with the FET pump including: poor high frequency response, difficulty in matching pullup and pull-down components, high capacitance discharge, and significant complexity. The negative impedance pump solved all of these problems with a smaller and simpler circuit. Using the circuit in Fig. 5-21, equations (5-7)-(5-10), and the loop natural frequency, zero, and pole of 25 MHz, 6.4 MHz, and 102 MHz, respectively, C1 = 575 pF, C2 = 38 pF, and R = 43 Ω. A high frequency pole was added to reduce spurious modulation and reduce the clock jitter and had little effect on the overall loop response. 6.3.2.3. Mixed Loop (Serdes III) The primary design goal of the third Serdes implementation was to improve the poor pull-in range of the transition detector that was due to its non-linear nature. This resulted in the serial data frequency being required to be very close to the nominal frequency of the VCO for pull-in to occur. Given a specific bit-rate this can be very difficult to design across all thermal, process, and implementation deviations. An initial approach utilized a down-counted data signal fed into a separate Hogges style NRZ PD (Section 6.3.1.2. on page129). The idea was to utilize a second PD that had a larger pull-in range and could be coupled with the TD PD loop for a better overall pullin range. This NRZ PD proved to be difficult to design due to very strict delay requirements and it did not significantly improve the pull-in range. A second approach used an additional loop which accepts a reference at the (bit rate)/8 and was designed to respond identically to the loop in the transmitter (Section 5.4. on page 82). The loop filter output is summed with the transition detector of the original 133 loop to create the VCO’s control voltage as shown in Fig. 6-2 on page123. The purpose of the new loop is to acquire frequency lock, which pulls the first PLL into lock because of the common integrator. The second loop is able to acquire solid phase lock once within its lock-in range and then begin to extract data. The parameters for the new loop are identical to those previously used. The only remaining design choices are the gain of the TD PD, and its filter. Choosing an appropriate gain for the transition detector involves a trade-off in bit error rate and the lock-in range. At one extreme, a large gain will give the PLL a large lock-in range that is approximately equal to the bandwidth of the loop. For instance, a doubling of the PD gain will result in a doubling of the lock-in range. This higher gain however, results in a higher bit error rate (BER) because of the large phase correction. On the other extreme, a small gain will limit the bandwidth and the lock-in range, but reduce the error rate. The effect of a large gain on BER results from consecutive transitions that are jittered in one direction causing an accumulation of phase change. The mean frequency of the data and of the clock are assumed to be constant, an assumption that reasonable over the few transitions needed in this analysis. The BER of single bit errors is given by Q (jitter > 25 ps) which is equal to 3x10 -15 for an rms data jitter of 4 ps, and bit width of 50 ps. Q(x) is the integral from x to infinity of the normalized Gaussian probability density function (pdf). If the BER introduced by the TD is less than this value, then its effects can, in general, be ignored. The TD introduces a ∆t ps phase change per transition. The worst case scenario for an error is when enough phase changes bring the clock phase to 12.5 ps from consecutive data jitter followed by a jitter of -12.5 ps in the other direction. In such a case the phase difference between the clock and the data will be 25 ps. Solving for this is best done by an example. Assume ∆t equals 5 ps. Q( jitter > 0 ps ) = 5x10-1 Q( jitter > 5 ps ) = 6x10-2 Q( jitter > 10ps ) = 9x10 -4 Q( jitter > 15ps ) = 1X10 -6 Q( jitter < 10ps ) = 9x10 -4 --------------------------total probability = 3x10-14 -- make 5 ps phase adjust -- jitter must be > then 5ps -- ... and so on -- bit error! For this example, there were four consecutive “jitters” in the positive direction, causing a clock phase change of 25 ps. They were followed by a jitter of 10 ps in the 134 opposite direction. The probability of these individual events are multiplied together to find the total probability for an error from this chain of events. For the same analysis, but with ∆t equal to 4 ps the result is 7x10-19. In conclusion as long as ∆t is kept below about 4 ps then the effect of accumulated jitter on phase will be smaller than the chance of a single bit error, and can be ignored. Without an integrator in the loop, the VCO control voltage can not exceed the maximum swing of the TD. Given a 1010 sequence at 20 Gb/s (tpb=1), there would be four overlapping pulses of magnitude ∆t, which, when multiplied by the VCO gain, yields the frequency deviation. This defines the lock-in range of the TD loop and is equal to ω L = ∆vK o ( 4tpb ) (6-3) where ∆v is the magnitude of the voltage pulse from the TD. The factor of 4tpb takes into account the fact that the TD has no effect on the frequency if there are no transitions. The more transitions, the larger the potential frequency deviation. Relating a voltage change to an associated time change yields ∆t f c 2 -. ∆v = --------------Ko (6-4) Combining the previous two equations to find the lock-in range as a function of ∆t results in ω L = 2∆t ω c (24tpb ). (6-5) where ωc is the clock frequency. Typical specifications for a receiver of this type provide for a reference signal which is within 100 ppm of the frequency of the data. Using a more conservative value of 1000 ppm gives a maximum reference deviation of 20 MHz. Using this value in (6-5) gives a minimum ∆t of 0.4 ps. For the final implementation, a value of 0.6 ps was chosen for the phase correction for every transition. The lock-in range is therefore 30 MHz at a 0.25 transitions per bit. This relates to a 4 mV pulse which is generated within the TD by combining the eight slow and fast signals through a common set of pull-up resistors. The resistors were set at 5 Ω with an 0.8 mA current source in each tree. 135 6.3.3. PLL Loop Response 6.3.3.1. Serdes I (FET charge pump) The total loop gain or bandwidth is found through a product of the VCO gain, K o = 3.14 Grad/s/V; the PD gain, Kd = 811 mV/rad; and the loop filter gain, Kh = 71.5 m and is equal to 29 MHz. With the loop zero at 32 MHz this yields a damping factor K ζ = 0.5 -----ω2 (6-6) equal to 0.5 which is underdamped with an overshoot of 30%. For all higher transition rates the PD gain will increase and increase and improve the damping factor. Fig. 6-11 depicts the Serdes I PLL locking into a 6.1 Gb/s (tpb = 0.25) data stream. Using an AHDL program the data was given an rms jitter of 4 ps, which is approximately the amount produced by the associated transmitter. Up until 5 ns the PLL is pulling-in and after 10 ns lock-in has occurred. The large deviations around 6.1 GHz are due to the proportional control mechanism pulsing the frequency to cause a phase correction. During the phase correction the integrated is forcing the average frequency to equal that of the data. The non-linear “digital” nature of the PD results in a very limited pull-in range. From simulation through various initial frequency offsets yields a range of about 2%. The holdin range on the other hand is quite large due to the integrator. 6.3.3.2. Serdes II (negative impedance charge pump) Fundamentally, the Serdes II implementation was very similar to the Serdes I version. The key parameters, including loop bandwidth, were kept the same though a slightly different PD, an improved loop filter, and an improved VCO were used. Because of this, the response is nearly identical to the Serdes I design shown in Fig. 6-11. 136 6.17 6.16 Frequency (GHz) 6.15 6.14 6.13 6.12 6.11 6.1 6.09 6.08 6.07 6.06 0.0 5.0 10.0 15.0 20.0 Time (ns) 25.0 30.0 35.0 40.0 Figure 6-11 Serdes I loop locking in This plot shows the Serdes I receiver VCO locking into 6.1 Gb/s, 4 ps jitter data. Once frequency lock is established the proportional pulses oscillate around the target frequency. 6.3.3.3. Serdes III (dual-loop / referenced loop) The Serdes III implementation has two loops: one independent loop that dictates the frequency, and a second dependent loop that phase locks to the incoming data. Fig. 6-12 shows the frequency loop locking in to a reference signal at 750 MHz which is a 6 GHz clock. Because the same PLL was used in the transmitter of the Serdes III implementation, the acquisition plots shown in Sect i on5.4.6.3. on page101 show behavior identical to the operation of this frequency loop. Also shown in Fig. 6-12, is the phase plot for the phase loop locking in to data with tpb = 0.25. Lock-in occurs when the clock frequency is about 6.02 GHz, which is within 20 MHz of the clock frequency. It was expected that lock-in would occur when the clock was within half of 30 MHz or 15 GHz. The noise seen on the locked-in phase plot is from 4 ps rms jitter added to the data through an HDL model (Appendix E.5. on page 183). This enabled a more accurate and faster simulation. The choice of jitter is directly related to the jitter produced by the transmitter, with the assumption that the channel introduces little noise. 137 350 6.08 6.06 250 6.04 200 6.02 frequency 150 6.00 100 5.98 50 5.96 0 5.94 100 0 20 40 60 80 Clock Frequency (MHz) Sampling Phase (deg) phase 300 Time (ns) Figure 6-12 Frequency and phase lock-in of Serdes III Rx PLL The dual loop nature of the Serdes III Rx PLL allows an independent referenced loop to frequency lock close to the data frequency. The second loop phase locks when the data and reference frequencies are within 0.3% of each other. 6.4. 4-16 Demultiplexing The transition detector Receiver naturally performs 4-16 demultiplexing. It has eight sampling circuits, four of which are actual data. Each of the data bits are available sequentially and as such, all four are valid for only one bit time: 50 ps at 20 Gb/s. This can make timing very difficult. Serdes I was not capable of performing the 4-16 demultiplexing. It could only output the four sampled bits directly off the detector. The demultiplexer added to Serdes II is shown in Fig. 6-13. It uses four 4-bit MSlatches each separately clocked by four phase offset clocks. The clocks are generated with a counter driven by a phase from the PLL. The latches simultaneously sample the 4-bit data from the transition detector. The transition from the fourth bit, followed by the transition 138 from the first bit, dictates the window that the clock has to sample the data. Delays on the clock lines had to be carefully balanced and tightly controlled to ensure that the bits were sampled at the correct time. Φ1 da db demultiplexed data transition detector dc Φ2 dd Φ1 Φ2 Φ3 clock window Φ4 Φ4+ττ Figure 6-13 4-16 demultiplexer architecture The demultiplexer accepts the set of four bits from the transition detector and samples each set into four separate registers. Once 16 bits are captured those registers are resampled by a 16 bit register to produce the final output. After all four latches contain a total of 16 bits, another bank of latches resamples all the bits at once. This register uses the fourth clock, Φ4, plus a small delay. This delay should be longer than the delay through the first register to capture the 4th bank correctly. The delay must also be shorter than the time when the 1st bank is sampled. For a 20 Gb/s system, the clock has a 200 ps window and was placed as close to the center as possible. 6.5. Registers and Decoding Often a First In First Out (FIFO) system is added to the output of the demultiplexer. This reduces the timing constraint on the circuit that reads the 16 bits of parallel data off the chip, through the use of a separate load clock. A FIFO was not 139 Receiver included in either Serdes I or Serdes II in which the output data is only latched in the 4-16 demultiplexer. Data decoding is a general term for such techniques as decryption, decompression, error detection, channel alignment, byte alignment [38], DC voltage balance, simplified clock recovery, frame detection [33], and so on. No encoding was performed in either Serdes I or Serdes II. See Section 5.11.1. on page 118, for a quick study and recommendation of the 8B/10B encoding scheme. 6.6. Line Receiver Receiver The line receiver accepts serial data at up to 20 Gb/s. Its bandwidth must be wide enough, usually 50% higher than the 10 GHz fundamental, to ensure that the data is reproduced accurately [14], [48], [36], [37], [49]. The Serdes I line receiver consists of a simple singleended pad receiver, and is not optimized for bandwidth. The Serdes II circuit is fully differential and consists of a 6 µm buffer with emitter followers and 50 Ω termination resistors. 6.7. Test Circuitry Receiver 6.7.1. On-chip test pattern generation Testing the receiver, by itself, at speed is impossible without a 10 GHz differential signal generator to drive the data inputs. In order to eliminate reliance on external testing hardware, the necessary generator was added internally. This was done in both fabricated Serdes chips by using a 5 GHz VCO in three different configurations. The first signal was generated by multiplying separate phases of the VCO to create a 10 GHz bit stream. The second was simply one phase of the VCO for 5 GHz and the third signal was a phase divided by two for 2.5 GHz. A 4-to-1 140 multiplexer was added to select between these three generated signals and the forth external data signal. 6.7.2. True error rate detector (TERD) The true error rate detection circuit operates between the transmitter and receiver. It determines bit error rate through an LFSR matched to the transmitter LFSR. Its operation was discussed in detail in Section5.8.2. on pa ge107. 6.8. Implementation and Fabrication Receiver 6.8.1. Serdes I As stated previously, The power supply in the Serdes I chips were choose to be -4.5 V. This left plenty of room for the three levels of logic and the active current sources. Power minimization was not a design goal so this voltage was not optimized. Also a -2.0 V supply was required for the bottom of the CMOS charge pump. Table 6-1 shows the pin-outs of the receiver chip and Fig. 6-14 shows the final layout artwork and the microphotograph of the fabricated part. The receiver in the Serdes I implementation was limited to testing pads only, so it did not support the full 4-to-16 demultiplexer. Instead the sampled data from the transition detector was fed directly to output pads. No additional circuitry was added to retime the output data, so the four bits were not presented to the output at the same time. In order to test the high speed operation of the receiver an on-chip data test source was created. This circuit generated periodic signals at 10 GHz, 5 GHz, and 2.5 GHz. Two DC pads, R0 and R1, were used to select between the three data source inputs and an externally supplied input, and R2 was used as a control voltage for the VCO. The receiver clock was connected to pad R5, and the output data was connected to pads R8 through R11. To aid in testing, the capacitor from the charge pump was passed to pad R4 through a high resistance path. This pad could confirm the proper operation of the charge pump while the circuit was operating. 141 Pin Table 6-1 Pin-out of Serdes I transmitter I/O Description R0 DC in test source (SELECT A) R1 DC in test source (SELECT B) R2 RF out test source output R3 DC in control voltage for test source R4 RF out integrator voltage (capacitor) R5 RF out receiver clock R6 Power -2 V (FET charge pump) R7 RF in receiver input R8 RF out data 3 R9 RF out data 2 R10 RF out data 1 R11 RF out data 0 142 S0 S6 test source S1 S7 clock S2 S8 S3 S9 transition detector S4 S10 charge pump S5 S11 artwork fabricated chip Figure 6-14 Serdes I receiver layout artwork and photograph On the left is the final artwork for the first receiver design. On the right is a microphotograph of the fabricated part. 6.8.2. Serdes II The full chip layout and pin-outs are shown and described in Section 5.9.2. on page 109. 6.9. Testing Results 6.9.1. Serdes I (receiver test results) The receiver circuit has a pull-in range of 18.7 to 18.9 Gb/s. This represents the range of frequencies for which the PLL can acquire lock with the onset of new data. Once lockin has occurred, the circuit can maintain lock for its hold-in range of 16.4 to 19.6 Gb/s. This is an undesirable situation for two important reasons. First, the lock-in range dictates the 143 allowable range of data frequencies because the communication system can not be expected to initialize with a lower bit rate and then ramp up to the nominal bit rate. Second, the holdin range did not meet the specification of 20 Gb/s. The cause of the poor pull-in range is the non-linear nature of the transition detector. It has a very high gain and saturates above a small phase deviation, limiting the ability to adjust for phase differences. The low hold-in range is due to the lower then expected frequency range of the current starving VCO, shown in Fig.3-5 on pa ge27. Fig. 6-15 shows the receiver locked to data at 19.4 Gb/s. (The oscilloscope is triggered on the input signal) Fig. 6-15(a) shows a locked condition with data arriving with 20 bits per transition (0.05 tpb) and (b) shows a locked condition with 10 bits transition (0.1 tpb). When the receiver is locked with data at 0.05 tpb (10 one’s 10 zero’s), an rms phase jitter of 2.64 ps is measured and shown in Fig. 6-16. When the number of transitions are decreased to 0.016 tpb (32 1’s 32 0’s) a jitter value of 8 ps is measured. Results indicate that a locked condition can be maintained for a data stream with an edge every 300 bits before the clock jitter becomes too large and lock is lost. recovered clock sampled data (a) (b) Figure 6-15 Serdes I receiver locked to data. The above plots show the recovered clock and the sampled data for a data rate of 19.4 Gb/s. (a) is fed with data with 20 bits per transition and (b) is fed with 10 bits per transition. 144 Figure 6-16 Serdes I recovered clock showing jitter. This plot shows a receiver locked to data with a 30% duty cycle. The recovered clock as an rms jitter of 2.6 ps. 6.9.2. Serdes II (receiver test results) The results from the second receiver iteration were very similar to the first, as expected. The big difference was that the receiver integrator had a circuit glitch that prevented it from operating as an integrator. Instead it operated like a low-pass filter. This limited the hold-in range to that of the pull-in range which was from 4.20 to 4.63 GHz or 16.8 to18.5 Gb/s. Although this small hold-in range is a problem a more serious concern is the small pull-in range. The only way to solve this problem is to provide the receiver with a reference signal very close to the frequency of the data. This solution was evaluated and simulated in Serdes III. Fig. 6-17 shows the receiver in lock with the data and the clock at 4.5 GHz. This was achieved by using an external source running at the same frequency as the clock. The 145 internal source operated correctly with various combinations of frequencies. One included the internal source VCO running at 3.7 GHz with the divide-by-2 enabled and a clock at 4.63 GHz. This corresponds to data with 5 ones and 5 zeros which also indicates that the receiver is able to lock on both rising and falling data transitions. data clock Figure 6-17 Serdes II Rx locked to data The plot captured from the oscilloscope shows input data and the receiver clock locked to it. Both are at 4.5 GHz, and the data represents a bit pattern of 1100 at 18 Gb/s. One way to measure the performance of the receiver is to look at the phase noise of the recovered clock relative to the transition density [14], [31]. Fig. 6-18 shows four different phase noise measurements for varying lengths of periodic data streams. The data was generated with the HP 8563 low phase noise signal source. The curve for 100 bits represents a series of 50 one’s followed by 50 zero’s. As can be seen in the plot, the fewer the transitions the higher the phase noise. At 1 MHz, a transition density of 0.052 yields a phase noise value of -112 dBc/Hz and a density of 0.0064 yields a value of -88 dBc/Hz. As the clock phase noise increases so does the jitter, 146 which relates to a larger BER. In the minimum, and likely, worst case of 19 bits, integrating from 1 MHz to 1 GHz to find the phase noise gives an rms jitter of approximately 2.0 ps. -70 Phase Noise (dBc/Hz) -80 156 bits -90 -100 100 bits 76 bits -110 19 bits -120 -130 0.1 1 10 100 Frequency (MHz) Figure 6-18 Serdes II receiver clock phase noise This plot shows the phase noise for various length bit sequences. The sequence consists of a string of one’s followed by a string of zero’s with a period indicated in the plot. As expected, the fewer transitions the larger the phase noise. The final test of the receiver involved connecting the output of the transmitter back into the receiver. This utilized the full potential of the built-in testing circuitry. The first problem encountered was the inability to feed back a differential signal. This was because two matched lines from the output of the Tx to the input of the Rx could not be guaranteed. The probes, connectors, and cables introduce too much variation in length to work properly. Even a few millimeters could offset the differential signals by a considerable amount. It was concluded that for differential testing, the part would have to be packaged and placed on a board. Because differential testing was out of the question, the system was set up for singleended testing. This was done by tying one end of the receiver input to a DC reference voltage half-way between the high and low transmitter signal levels. This technique destroyed the benefits of a differential signal and would not operate at either 20 or 10 Gb/s. 147 The feed-through pad showed a highly corrupted signal. The single-ended technique and/or a bandwidth problem in the differential pad receiver prevented a full-test of the feedback testing scheme. 6.10.Future Work 6.10.1. Sampling offset correction One attribute of data arriving in a receiver, typically seen in optical systems, is bits that are skewed toward one transition. This is usually an effect of the non-linear nature of the light sensitive diode, but can be a result of the transmitter or from the channel itself. The ramification is an increase in BER if samples are taken at the exact center of the bit. The solution is to allow the offset of the data sampling points relative to the data transitions. 6.10.2. 40 Gb/s? The first step in moving to a 40 Gb/s solution is to utilize a 10 GHz ring oscillator. Given this possibility, the next problem is in the design of the receiver amplifier. This amplifier will require at least a 20 GHz bandwidth and must be able to drive a significant number of loads. It may be necessary to sacrifice phase detection of every transition and just utilize every fourth edge to reduce the MS-latch loading effects. This solution still requires four data latches, plus one transition latch which may still be too high. Another solution would be to use a bang-bang phase detector that requires a clock and its quadrature at half the baud rate [26], [32]. This solution requires only four MS-latches. 6.10.3. Demultiplexer improvements A problem found during the testing of the Serdes II chip was in the 4-to-16 demultiplexer described in Section 6.4. on page 138. Due to stringent timing constraints and excessive loading, the set of 4 four bit latches were failing to latch the data. Fig. 6-19 depicts an improved demultiplexer that operates in stages. The first stage latches the four data bits from one of the PLL clock phases. The clock is then divided by two and used to clock the next stages of eight latches. The clock is then divided again and the data is latched 148 into 16 latches. The final stage realigns all the data edges by latching the 16 bits demultiplexed data transition detector simultanously. da db dc dd Φ1 200 ps Φ1 2 toggle F/F toggle F/F x2 Figure 6-19 Revised 4-to-16 demultiplexer In order to reduce the timing requirements on the demultiplexer the data is demultiplexed in stages. Each stage is successively clocked by a clock of half the frequency from the previous stage. 149 Discussion & Conclusion In conclusion, three 20 Gb/s communication systems were designed and two were fabricated in IBM’s SiGe 5 HP process. Each design built on test results from the previous implementations, and the third, and final design was intended for future research and development. The second iteration was a unified transceiver chip possessing a transmitter and a receiver. It had wirebond pads for wafer probe testing as well as C4 pads for flip-chip packaging. Through the C4 pads, 16 bits of parallel data could be supplied to and extracted from the chip. An internal testing circuit enabled complete testing of the chip without the need for packaging. The Feed Forward Interpolated VCO, a four stage ring oscillator that uses novel feed forwarding techniques, was developed. Its very high frequency nature required the use of capacitance to slow its frequency down to 5 GHz. Its flexibility makes it an excellent choice for short-haul communication systems. Phase noise at 1 MHz was measured as -90.5 dBc/Hz which is one of the best numbers quoted for a ring oscillator at this speed. The associated jitter is quite small and is an interesting function of the control voltage. The transmitter in the second prototype had a very wide operating range of 14.27 to 21.58 Gb/s. A time domain sampling oscilloscope measured an rms clock jitter value of 4.3 ps or 0.086 UI. Using a spectrum analyzer, however, rms clock jitter from 100 kHz to 100 MHz was measured at 1.4 ps. The eye diagram was very symmetric, indicating that the symmetric multiplexer and data interleaving scheme operated as expected. The second receiver did not have an external reference and, therefore, had only the high speed data stream to lock to. This limited the pull-in range to 16.8 to 18.5 Gb/s. Clock jitter measured from the oscilloscope had an rms value of 2.0 ps. At very low transition rates of 78 bits per transition, the receiver was still able to maintain lock. This is credited to the phase detector which is able to use every transition for phase information. 150 A third prototype was developed, but not fabricated, using the data acquired from the first two designs. The transmitter PLL bandwidth was further optimized and a negative impedance amplifier loop filter was added. A frequency locked loop was added to the receiver PLL to greatly enhance the pull-in range. The demultiplexer scheme was also improved to minimize the timing constraints. 151 References [1] R. C. Walker, K. Hsieh, T. A. Knotts, and C. Yen, “A 10 Gb/s Si-Bipolar TX/RX Chipset for Computer Data Transmission,” IEEE International Solid-State Circuits Conference, pp. 302-303, 1998. [2] S. A. Steidl, “A 32-Word by 32-Bit Three-Port Bipolar Register File Implemented Using a SiGe HBT BiCMOS Technology,” Candidacy document, Rensselaer Polytechnic Institute, Department of Electrical Engineering, May 1999. [3] P. M. Cambell, H. J. Greub, A. Garg, S.l A. Steidl, S. Carlough, M. Ernest, R. Philhower, C. Maier, R. P. Kraft, and J. F. McDonald, “A Very-Wide-Bandwidth Digital VCO Using Quadrature Frequency Multiplication and Division Implemented in AlGaAs/GaAs HBTs,” Proc. GaAs IC Symp., pp. 311-314, 1995. [4] A. W. Buchwald, and K. W. Martin, “High-speed voltage-controlled oscillator with quadrature outputs,” Electronics Letters, vol. 27, no. 4, pp. 309-310, February 1991. [5] R. Walker, C. Stout, C-S. Yen, “A 2.488 Gb/s Si-Bipolar Clock and Data Recovery IC with Robust Loss of Signal Detection,” IEEE International Solid-State Circuits Conference, pp. 246-247, 1997. [6] M. Ernest, T. W. Krawczyk, and J. F. McDonald, “Symmetric Multiplexer,” Invention Disclosure Record, Rensselaer Polytechnic Institute, February 2000. [7] T. W. Krawczyk, and J. F. McDonald, “The Feed Forward Voltage Controlled Ring Oscillator,” Invention Disclosure Record, Rensselaer Polytechnic Institute, May 2000. [8] D. C. Ahlgren, G. Freeman, S. Subbanna, R. Groves, D. Greenberg, J. Malinowski, D. Nguyen-Ngoc, S. J. Jeng, K. Stein, K. Schonenberg, D. Kiesling, B. Martin, S. Wu, D. L. Harame, and B. Meyerson, “A SiGe HBT BiCMOS technology for mixed signal RF applications,” Proceedings of the IEEE Bipolar/BiCMOS Circuits and Technology Meeting, Minneapolis, MN, pp. 195-197, September 1997. [9] K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Onai, “95 GHz fT Self-Aligned Selective Epitaxial SiGe HBT with SMI Electrodes,” IEEE International Solid-State Circuits Conference, pp. 312-313, 1998. [10] L. Larson, M. Case, S. Rosenbaum, D. Rensch, P. MacDonald, M. Matloubian, M. Chen, D. Harame, J. Malinowski, B. Meyerson, M. Gilbert, and S. Mass, “Si/SiGe HBT Technology for Low-Cost Monolithic Microwave Integrated Circuits,” IEEE International Solid-State Circuits Conference, pp. 80-81, 1996. [11] J. R. Long, M. A. Copealand, S. J. Kovacic, D. S. Malhi, and D. L. Harame, “RF Analog and Digital Circuits in SiGe Technology,” IEEE International Solid-State Circuits Conference, pp. 82-83, 1996. [12] K. Ismail, “Si/SiGe CMOS: Can it extend the lifetime of Si,” IEEE International Solid-State Circuits Conference, pp. 116-117, 1997. 152 [13] L. Sun, T. Kwasniewski, and K. Iniewski, “A Quadrature Output Controlled Ring Oscillator Based on Three-Stage sub-feedback Loops,” IEEE Internation Symposium on Circuits and Systems, vol. 2, pp 176-179, 1999. [14] R. Walker, C. Stout, and C-S. Yen, “A 2.488 Gb/s Si-Bipolar Clock and Data Recovery IC with Robust Loss of Signal Detection,” IEEE International Solid-State Circuits Conference, pp. 246-247, 1997. [15] L. Dai, and R. Harjani, “Comparisons and Analysis of Phase Noise in Ring Oscillators,” IEEE International Symposium on Circuits and Systems, pp. 77-80, May 2000. [16] A. Hajimiri, and Thomas H. Lee, “A General Theory of Phase Noise in Electrical Oscillators,” IEEE Journal of Solid-State Circuits, vol. 33, no. 2, pp. 179-194, February 1998. [17] J. A. McNeil, “Jitter in Ring Oscillators,” IEEE Journal of Solid-State Circuits, vol. 32, pp. 870-879, June 1997. [18] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and Phase Noise in Ring Oscillators,” IEEE Journal of Solid-State Circuits, vol. 34, no. 6, pp. 790-804, June 1999. [19] T. H. Lee, and A. Hajimiri, “Oscillator Phase Noise: A Tutorial,” IEEE Journal of Solid-State Circuits, vol. 35, no. 3, pp. 326-335, March 2000. [20] H. Matsuoka, and T. Tsukahara, “A 5-GHz Frequency-Doubling Quadrature Modulator with a Ring-Type Local Oscillator,” IEEE Journal of Solid-State Circuits, vol. 34, pp. 1345-1348, September 1999. [21] J. Plouchart, H. Ainspan, M. Soyuer, and A. Ruehli, “A Fully-Monolithic SiGe Differential Voltage-Controlled Oscillator for 5 GHz Wireless Applications,” IEEE Radio Frequency Integrated Circuits Symposium, pp. 57-60, 2000. [22] M. Soyuer, J. N. Joachim, N. Burghartz, H. A. Ainspan, K. A. Jenkins, P. Xiao, A. R. Shahani, M. S. Dolan, and D. L. Harame, “An 11-GHz 3-V SiGe Voltage Controlled Oscillator with Integrated Resonantor,” IEEE Journal of Solid-State Circuits, vol. 32, no. 9, pp. 1451-1454, September 1997. [23] S. K. Enam and A. A. Abidi, “A 300-MHz Voltage-Controlled Ring Oscillator,” IEEE Journal of Solid-State Circuits, vol. 25, no. 1, pp. 312-315, February 1990. [24] S. Lee, B. Kim, and K. Lee, “A Novel High-Speed Ring Oscillator for Multiphase Clock Generation Using Negative Skewed Delay Scheme,” IEEE Journal of SolidState Circuits, vol. 32, no. 2, pp. 1451-1454, February 1997. [25] D. C. Ahlgren, M. Gilbert, D. Greenberg, S. J. Jeng, J. Malinowskil, D. NguyenNgoc, K. Schonenberg, K. Stein, R. Groves, K. Walter, G. Hueckel, D. Colavito, G. Freeman, D. Suderland, D. L. Harame, and B. Meyerson, “Manufacturability demonstration of an integrated SiGe HBT technology for the analog and wireless market place,“ IEEE International Electron Devices Meeting Technical Digest, San Francisco, CA, December 1996, pp. 859-862. [26] J. F. Ewan, A. X. Widmer, M. Soyuer, K. R. Wrenner, B. Parker, and H. A. Ainspan, “Single-Chip 1062 Mbaud CMOS Transceiver for Serial Data Communications,” IEEE International Solid-State Circuits Conference, pp. 32-33, 1995. [27] D. Friedman, M. Meghelli, B. Parker, H. Ainspan, and M. Soyuer, “Sub-picosecond SiGe BiCMOS Transmit and Receive PLLs for 12.5 Gbaud Serial Data Communication,” Symposium on VLSI Circuits, pp. 132-135, 2000. 153 [28] R. Farjad-Rad, C. Yang, M. Horowitz, and T. Lee, “A 0.3-mm CMOS 8-Gb/s 4PAM Serial Link Transceiver,” IEEE Journal of Solid-State Circuits, vol. 35, no. 5, pp. 757-764, May 2000. [29] H. Knapp, T. F. Mefster, M. Wurzer, D. Zoschg, K. Aufinger, and L. Treitinger, “A 79 GHz Dynamic Frequency Divider in SiGe Bipolar Technology,” IEEE International Solid-State Circuits Conference, pp. 208-209, 2000. [30] M. Meghelli, B. Parker, H. Ainspan, and M. Soyuer, “SiGe BiCMOS 3.3V Clock and Data Recovery Circuits for 10Gb/s Serial Transmission Systems,” IEEE International Solid-State Circuits Conference, pp. 56-57, 2000. [31] Y. M. Greshishchev, and P. Schvan, “SiGe Clock and Data Recovery IC with LinearType PLL for 10-Gb/s SONET Application,” IEEE Journal of Solid-State Circuits, vol. 35, no. 9, pp. 1353-1359, September 2000. [32] A. Pottbacker, U. Langmann, and H. Schreiber, “A Si Bipolar Phase and Frequency Detector IC for Clock Extraction up to 8 Gb/s,” IEEE Journal of Solid-State Circuits, vol. 27, no. 12, pp. 1747-1751, December 1992. [33] S. Shioiri, M. Soda, T. Monikawa, T. Hashimoto, F. Sato, and K. Emura, “A 10 Gb/s SiGe Framer/Demultiplexer fo SDH Systems,” IEEE International Solid-State Circuits Conference, pp. 202-203, 1998. [34] Albert X. Widmer, “Method of Coding to Minimize Delay at a Communication Node,” U.S. Patent 4665517, assigned to Internation Business Machines, 1987. [35] M. Fukaishi, S. Nakamura, A. Tajima, Y. Kinoshita, Y. Suemura, H. Suzuki, T. Itani, H. Miyamoto, N. Henmi, T. Yamazaki, and M. Yotsuyanagi, “A 2.125-Gb/s BiCMOS Fiber Channel Transmitter for Serial Data Communications,” IEEE Journal of Solid-State Circuits, vol. 34, no. 9, pp. 1325-1330, September 1999. [36] Y. M. Greshishchev, and P. Schvan, “A 60-dB Gain, 55-dB Dynamic Range, 10Gb/s Broad-Band SiGe HBT Limiting Amplifier,” IEEE Journal of Solid-State Circuits, vol. 34, no. 12, pp. 1914-1920, December 1999. [37] W. Pöhlmann, “A Silicon-Bipolar Amplifier for 10 Gbit/s with 45 dB Gain,” IEEE Journal of Solid-State Circuits, vol. 29, no. 5, pp. 551-556, May 1994. [38] K. Kawai, and H. Ichino, “A 0.6 W 10 Gb/s SONET/SDH Bit-Error-Monitoring LSI,” IEEE International Solid-State Circuits Conference, pp. 54-55, 2000. [39] S. Finocchiaro, G. Palmisano, R. Salerno, and C. Sclafani, “Design of Bipolar Ring Oscillators,” IEEE International Symposium on Circuits and Systems, vol. 1, pp 5-8, 1999. [40] Y. Chen, S. Koneru, E. Lee, and R. Geiger, “Simulation of Random Jitter in Ring Oscillators with SPICE,” IEEE International Symposium on Circuits and Systems, vol. 2, pp 1154-1157, 1997. [41] Dan H. Wolaver, Phase-Locked Loop Circuit Design., Englewood Cliffs, NJ: Prentice Hall, 1991. [42] T. Kuroda, T. Fujita, Y. Itabashi, S. Kabumoto, M. Noda, and A. Kanuma, “1.65 Gb/s 60 mW 4:1 Multiplexer and 1.8 Gb/s 80 mW 1:4 Demultiplexer ICs Using 2V 3-Level Series-Gated ECL Circuits,” IEEE International Solid-State Circuits Conference, pp. 36-37, 1995. [43] D. Chen, R. Waldron, “A Single-Chip 266 Mb/s CMOS Transmitter/Receiver for Serial Data Communications,” IEEE International Solid-State Circuits Conference, pp. 100-101, 1993. 154 [44] M. Soyuer, K. A. Jenkins, J. N. Burghartz, H. A. Ainspan, F. J. Canora, S. Ponnapalli, J. F. Ewen, and W. E. Pence, “A 2.4 GHz Silicon Bipolar Oscillator with Integrated Resonator,” IEEE Journal of Solid-State Circuits, vol. 31, no. 2, pp. 268-270, February 1996. [45] F. Svelto, S Deantoni, and R. Castello, “A 1.3 GHz Low-Phase Noise Fully Tunable CMOS LC VCO,” IEEE Journal of Solid-State Circuits, vol. 35, no. 3, pp. 356-361, March 2000. [46] J. J. Kim, and B. Kim, “A Low-Phase-Noise CMOS LC Oscillator with a Ring Structure,” IEEE International Solid-State Circuits Conference, pp. 430-431, 2000. [47] C. Wu, and H. Kao, “A 1.8 GHz CMOS Quadrature Voltage-Controlled Oscillator (VCO) Using the Constant-Current LC Ring Oscillator Structure,” IEEE International Symposium on Circuits and Systems, vol. 4, pp 378-381, 1998. [48] J. Akagi, Y. Kuriyama, M. Asaka, T. Sugiyama, N. Lizuka, K. Tsuda, and M. Obara, “Five AlGaAs/GaAs HBT ICs for a 20 Gb/s Optical Receiver,” IEEE International Solid-State Circuits Conference, pp. 168-169, 1994. [49] M. Soda, H. Tezuka, F. Sato, T. Hashimoto, S. Nakamura, T. Tatsumi, T. Suzaki, and T. Tashiro, “Si-Analog ICs for 20 Gb/s Optical Receiver,” IEEE International SolidState Circuits Conference, pp. 170-171, 1994. [50] A. Rofougaran, J. Rael, M. Rofougaran, and A. Abidi, “A 900 MHz CMOS LCOscillator with Quadrature Outputs,” IEEE International Solid-State Circuits Conference, pp. 392-393, 1996. [51] B. L. Thompson, and H. Lee, “A BiCMOS Receiver/Transmit PLL Pair for Serial Data Communications,” IEEE Custom Integrated Circuits Conference, pp. 29.6.129.6.5, May 1992. [52] C. R. Hogge, “A Self Correcting Clock Recovery Circuit,” IEEE Journal of Lightwave Technology, vol. LT-3, no. 6, pp. 1312-1314, December 1985. [53] D. Y. Wu, A. C. Yen, D. Meeker, S. Beccue, K. Pedrotti, J. Penney, A. Price, and K. C. Wang, “Two Phase Detectors for 2.5-10 Gb/s NRZ Data Operation: a Hogge and a Balanced Mixer,” GaAs IC Symp., pp. 266-269, 1996. 155 Appendix A. IBM SiGe 5 HP A.1. NPN Vbe characteristics The SiGe npn transistor Vbe characteristics are important for various reasons. First it indicates the turn-on voltage of the transistor: the voltage below which the transistor is considered off. Second, at a given operating collector current it can be used to find the base-emitter voltage. Third, and perhaps most importantly, is that the derivative of the transistor’s Vbe with respect to the collector current, Ic, is the transconductance. This parameter is found in Fig. A-2 by taking the slope at half the peak f T current. This current Normalized Ic (ln(mA/um)) flows through an optimized differential pair when both inputs are biased identically. 6 4 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 Simulated Analytical 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 Vbe (V) Figure A-1 Ic-Vbe characteristics for npn transistor The above plot shows the collector current at a fixed V ce of 2 V versus Vbe. The analytical approximation is accurate up to the operating point of 0.7 mA/µm. 156 0.95 1 Normalized Ic (mA/ m) 0.8 0.7 0.6 120 Ω / µ m 8.33m/Ω / µ m 0.5 0.4 0.3 0.2 0.1 0 0.87 0.88 0.89 0.9 0.91 0.92 0.93 V be (V) Figure A-2 NPN transconductance The transconductance is the point where the collector current is half the maximum fT current. Comparing the simulated transconductance to that found in v r e = γ ----Tie i g m = 1--- ----eγ vT (A-1) yields a a fudge factor, γ, of 1.65. The simulated plot in Fig. A-1 is found from Ic = Is e V be ------VT where Is is graphically determined to be 30 fA. 157 (A-2) A.2. NPN Ic versus Vce characteristics 4.00 Collector Current (mA/ m) 3.50 250 µ A/µ m 3.00 200 µ A/µ m 2.50 150 µ A/µ m 2.00 100 µ A/ µ m 1.50 50 µ A/µ m 1.00 0.50 0 µ A/µ m 0.00 0 1 2 3 4 5 Collector-Emitter Voltage (V) Figure A-3 Ic-Vce characteristics for npn transistor The above plot shows the collector current response versus collector-emitter voltage for different base currents. Breakdown occurs at a Vce of 3 V. The Ic versus Vce characteristics of the npn transistor reveal important design parameters. The first is a breakdown voltage of 3 V which is the maximum voltage that can be applied across the collector-emitter junction. Above this voltage the base current loses control over the collector current and large amounts of current begin to flow. The Early voltage, the voltage at which all backwards linear extrapolations of the curves meet, is about 45 V. This parameter is related to the output resistance looking into the collector by VA r o = -----Ic (A-3) where Ic is the collector current near the active region. The normalized value of ro is 80 kΩ-µm. 158 A.3. NPN fT Curves 70 1 um 60 2.5 um 5 um Frequency (GHz) 50 10 um 20 um 40 30 20 10 0 0.001 0.01 0.1 1 10 Normalized Collector Current (mA/µ m) Figure A-4 fT vs Ic characteristics for npn transistor The maximum transition frequency for the SiGe npn transistors occurs at approximately 0.8 mA/µm. Above that current the fT drops off rapidly and that range should be avoided during design. The most important design parameter found in the fT curves in Fig. A-4 is the DC collector current bias point for maximum operating frequency. Although this normalized current increases slightly as larger transistors are used, a value of 0.8 mA/µm is reasonable for all sizes. Also worth noting in this plot, is the fact that as larger transistors are used, and thus more power is supplied, the faster the transistors operate. The smallest transistor has a peak fT of approximately 50 GHz and the largest transistor peaks at 62 GHz. 159 Appendix B. CML Logic Gates B.1. CML Voltage Swing (non-linearized, digital) The CML voltage swing is found by analyzing the collector current flow through each of the two transistors in a differential pair with a DC differential voltage on the inputs. The voltage swing must be large enough to ensure that the majority of current flows through only one transistor. Fig. B-1 depicts how the current flow shifts from one transistor to the other as the differential voltage changes. At about ±200 mV, at least 99% of the current is flowing through one leg of the CML buffer. This is the assigned minimum operating voltage swing and a more conservative 250 mV or greater was used throughout Percentage of total current log(%) this project. 100.000% 10.000% 1.000% 0.100% 0.010% 0.001% -300 -250 -200 -150 -100 -50 0 Differential Voltage (mV) Figure B-1 Current switching versus differential input voltage The input to a differential pair controls the switching of current through two branches. A critical current level must be reached to assure that the digital gate has completely switched. For a 99.7% current level through one branch, a minimum of 250 mV must be applied. B.2. CML Signals CML circuits posess important attributes called signal levels, which are necessary to connect multiple gates together. The need to merge multiple differential pairs arises from 160 the small, but desirable voltage swing (Appendix B.1.), the large base to emitter voltage (Appendix A.1.), and the technique used. Merging pairs together involves stacking them so that current through one is a function of the state of another. In this way, different current paths can be connected to the pull-up resistors, the output. Other techniques exist for combining differential pairs, see Section5.3.1. on pa ge75, but they are not by themselves considered CML. r1 r2 x1 x0 a0 b0 a1 Q1 b1 y0 y1 z0 z1 Figure B-2 Simple AND CML Gate This gate shows how multiple differential pairs can be merged to produce a two level gate. In Fig. B-2, the differential input a must be of higher potential, specifically one Vbe higher, then input b, to ensure that transistor Q1 will not become saturated. Input a is said to be on level 1 (0 mV, -250 mV) and b is said to be on level 2 (-900 mV, -1150 mV). A supply voltage as low at -3.2 V allows up to three levels of inputs. Level 1 outputs, x, are found at the bottom of the pull-up or collector resistors at the top of the tree. Level 2, y, and 3, z, outputs are generated from emitter followers and a diode. The size of pull-up resistors r1, and r2 is based upon the current source, to produce a nominal voltage swing of at least 250 mV. For 1 µm sized transistors biased at a current of 0.8 mA, the resistors are set to 400 Ω. In general the normalized resistor value is 400 Ω-µm. B.3. Voltage Reference All CML gates require a current source to fix the current flow through the differential pair switch. The simplest approach, a passive source, places a resistor at the bottom of the 161 tree which has a nearly constant voltage across it and is dependent only on the lowest transistor pair. This technique has high common mode gain on the lowest differential pair and often requires a large resistor. 0.75 mA R1 Q2 2x Q1 Vref 200Ω Ω 2x Vee Vee -4.5 V -3.2 V R1 1.73 kΩ Ω 0.87 kΩ Ω 1x 400Ω Ω 1.5 mA Vcc Vee Figure B-3 Reference Voltage Generator Active current sources configured in a current mirror require a reference voltage to control the amount. A more common approach is to use an active current source implemented as a current mirror. Fig. B-3 shows the generating circuit producing a mirror current of 0.75 mA/µm. This current was chosen based upon the current necessary to achieve the maximum operating frequency of the transistors. See Appendi xA.2. The emitter degeneracy resistor typically has 0.4 V across it and is used to control currents which are smaller or larger than the mirror current. For instance, if a 4 µm transistor circuit requires 3.0 mA, then a 100 Ω emitter resistor will be used. Transistor Q2 is used for base current compensation and supplies the base current to all connected circuits. It allows a larger number of sources to be used and prevents current degradation when adding sources. The value of R1 is dependent on the supply voltage of the circuits. Designs with different supplies need only change this resistor to ensure a fixed current throughout all. B.4. Buffer with emitter follower outputs A buffer accepts a single input and duplicates it on its output. Its many uses include: impedance conversion (high input impedance and low output impedance), fixed delay introduction, and level shifting. Buffers also form the foundation for more complicated circuits. 162 The circuit in Fig. B-4 can accept input, a, on levels 1, 2, or 3, since it has only one differential pair. Level 1 output, x, is taken from the bottom of the pull-up resistors, and level 2 output, y, is taken from the output of the emitter follower. Vcc x1 Q1 y1 x0 a0 Q2 a1 y0 Vee Figure B-4 CML Buffer with emitter followers A basic buffer with level 1 and level 2 outputs. It can accept input and any level. The emitter follower output provides a much higher driving ability than the level 1 output. This is because the driving current from the level 1 output is passively pulled-up through the resistors, and actively pulled-down through the differential pair. As more loads are added, the base current from each must be supplied through the passive resistors, which causes a voltage drop and limits the voltage swing. The passive pull-up through the resistors also limits the speed of the gate. The emitter followers, on the other hand, provide a high impedance output through β amplification of current through transistors, Q1, and Q2. In this case, the output is actively pulled-up through the follower transistor and actively pulled,down through the current source. 163 Appendix C. CML Circuit Details C.1. Linearizing the differential amplifier The differential amplifier is very effective in digital circuits because of its high voltage gain. For analog circuits, where a linear response is needed, this gain must be reduced to meet specifications. The preferred method for doing so is to include emitter resistors to augment the emitter resistance, re, already present in the transistor. i1 Rc i0 Rc a0 a1 Re Re Figure C-1 Linearizing the differential amplifier with emitter resistors The addition of emitter resistors augments the output resistance of the differential pair transistors and decreases the total gain of the circuit. The emitter resistance is defined as the resistance from the base to the emitter looking into the emitter, and it is the inverse of the transconductance, gm. The normalized value found through simulation in Appendix A.1. is about 120 Ω-µm. The inverse of the sum of this value and the emitter resistor Re yields the gain 1 Ad ≈ ---------------re + Re V 1 r e = -----T- = -----gm Ie (C-1) of the circuit with output current and input voltage. In order to find the total voltage gain Ad must be multiplied by the collector resistance Rc. A plot of currents, i0 and i1, versus differential input voltage, a0, and a1 is shown in Fig. C-2. The plot with 0 Ω-µm represents the nominal transfer function for a digital gate. The gain is high and an input voltage of 100 mV ensures a nearly complete switch of current. For digital circuits, this allows for a high noise margin, and fast switching 164 characteristics. For analog circuits, on the other hand, the active, linear region of the curve is very small: ±50 mV. It is clear that the addition of the emitter resistors is crucial in reducing the gain and spreading out the linear region. The choice of resistor will be Branch Current {i0,i1} (mA/ m) determined by the output range needed and the gain at an input of 0 V. 0.80 0Ω − µ m 0.70 200Ω − µ m 400Ω − µ m 0.60 0.50 0.40 600Ω − µ m 800Ω − µ m 0.30 0.20 0.10 0.00 -0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 0.40 Differential Voltage (V) Figure C-2 Branch current response for various emitter resistors This plot shows the transfer of current from one branch to the other when the differential inputs are changed. Each pair of curves has a fixed emitter resistor A comparison between (C-1) and the simulated results is plotted in Fig. C-3 and shows a very good match. 1 0.9 0.8 1.25 0.7 1.66 0.6 0.5 0.4 2.5 0.3 5 0.2 0.1 0.0 0 200 400 600 Normalized Re (Ω -µ m) Figure C-3 Simulated / Analytical Gain (C-1) follows the simulated results for the transconductance of a CML buffer with emitter resistors shown here. 165 800 Gain (mA/V/ m) Inverse Gain (V/mA- m) 1.0 C.2. Current bypassing In some situations it may be necessary to limit the extent of current switching in a differential amplifier. For example, the FFI VCO requires a minimum current flow through both branches, no matter the input. The solution is to include a bypass resistor which ensures that some constant current flows in addition to the current defined by the differential transistor pair. a0 Rb Rb i0 a1 i1 Re Re Figure C-4 Limiting full current switching with bypass resistors The addition of bypass resistors allows some current to always flow around the differential pair. This prevents a complete switching of current. Two behavoirs result with the addition of the bypass resistor. First, a full switch of current through the tree is prevented, which is a desired result. Second, there is a relative decrease in the gain of the circuit, because of the decrease in collector current which negatively affects the transconductance. Each of these effects is modeled in this section and compared to simulation results. In addition, two equations which can be used as design tools when specifications on gain, and current range are provided. The maximum current in a branch is a function of the total current, the bypass and emitter resistors, and the input voltage. Starting with the assumption that branch 1 has zero emitter current, i.e. a0 is much higher then a1. The currents through each bypass resistor are the same. It is assumed that there is a differential pair above this one with emitter voltages 166 at the same potential. We define equations Io = i e1 + 2i b i e1 v v o + ----d- – v be 2 = -----------------------------R (C-2) (C-3) e vo i b = ------ (C-4) Rb where Io is the total current through the tree, vd is the differential input voltage and vo is the voltage across the bypass resistor. The value for vbe is found in Fig. A-2 on page 157. Solving for the current through branch 0 yields I max I max v Io ( R e + R b ) – v be – ----d- 2 = Io – i b = ----------------------------------------------------------- = i d, max Rb + 2Re Re = 0 v – v----d- be 2 = Io – ------------------------ . R b (C-5) (C-6) Fig. C-5 shows the analytical and simulated results for the maximum current as a fraction of the total current for emitter resistors of value 0 Ω−µm and 400 Ω−µm, and a differential input of 400 mV. With large bypass resistor values, the circuit allows almost a full current switch because less current is bypassed around the differential pair. Values below about 10 kΩ-µm produce a much larger reduction down to about 3 kΩ-µm when Rb is too small and no current switching takes place. 167 Maximum Current Fraction 1.0 0 Ω -µ m 0.9 Vd =400 mV 400 Ω -µ m 0.8 Simulated Re=0 0.7 Analytical Re=0 Simulated Re=400 0.6 Analytical Re=400 0.5 0 5 10 15 20 25 30 35 40 Bypass Resistor (kΩ -µ m) Figure C-5 Current limiting effects of bypass resistor The bypass resistor prevents current from being completely shut off in a differential branch. The maximum current allowed to flow divided by the total current is called the maximum current fraction. The next step is to examine how the gain is affected by the addition of the bypass resistor. The primary factor in the decrease in the transconductance is because of the decrease in collector current in the differential pair. Gain is directly related to transconductance and emitter resistance. A second order effect results from an increase of voltage, and current, across the bypass resistor when collector current increases through the emitter resistor. Solving for the gain can be broken up into separate pieces: how the emitter current changes relative to the input voltage, and how the total current changes relative to the emitter current. di di di----= -------e ⋅ ------dv dv di e (C-7) shows this relationship. The next step is to solve for the bypass current relative to the emitter current R di b be + i e1 R e d- v-------------------------e ------- = ------ = ------. R Rb di e di e b 168 (C-8) Since the sum of the bypass current and the emitter current is the total current i, then it is possible to find the total current relative to the emitter current Re di b di e di-----= ------- + ------- = ------ + 1 R die di e di e b (C-9) Next, the emitter current relative to the other parameters is determined R I o – 2v be b i e = ---------------------------. 2Re + 2Rb (C-10) From (C-1) on page164 the derivative of emitter current to input voltage is the inverse of sum of the emitter resistances, and (A-1) on page 157 yields the transconductance. Using (C-7), (C-9), and die 1 1 ------- = ---------------= -------------------------------------------------2Re + 2Rb dv re + R e γv T --------------------------- + Re R I o – 2v be b (C-11) and simplifying the equation yields the desired result di 1 di----= -------d- = ----------------------------------------------------2γv T Rb dv d dv --------------------------- + R e || R b R I o – 2v be b (C-12) where id and vd are the differential current and differential voltage, respectively. Results from this analysis compared to simulated results are shown in Fig. C-6. The top plot in Fig. C-6 shows an upward slope as Rb is increased and increases the transconductance. The lower plot shows a very flat response because the gain, in this case, is fixed by the emitter resistor and is not affected by the collector current. (see Appendix C.1. on page164). 169 9.00 Gain (mA/V/ m) 8.00 7.00 6.00 Simulated Re=0 ohm-um 5.00 Analytica Re=0 ohm-um 4.00 Simulated Re=400 ohm-um Analytical Re=400 ohm-um 3.00 2.00 1.00 0.00 0 5 10 15 20 25 30 35 40 Bypass Resistor (kΩ -µ m) Figure C-6 Current gain effects of bypass resistor The bypass resistor lowers the current through the differential pair, which in turn decreases the transconductance, subsequently decreasing the gain. Fig. C-7 is a surface plot showing the relationship between current gain and emitter and bypass resistors. This can be useful when designing a linearized differential amplifier Gain (mA/V/µ m) 1 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 2 800 700 600 500 400 300 3 200 4 100 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Bypass Resistor (kΩ -µ m) Figure C-7 Designing for gain with emitter and bypass resistors This plot is useful for designing with bypass resistors when gain is specified. 170 Emitter Resistor ( - m) with bypass resistors. C.3. Increasing CML delay It is sometimes necessary to increase the delay of a CML gate to meet certain timing requirements. Such a need is found in a ring oscillator that must be centered at a frequency that is lower than the free running frequency. The addition of a capacitor across the level 1 outputs degrades the rise time, and thus, increases the gate delay. This solution is easy to implement and simple to model. Rc Rc Rc Cc 2C c Figure C-8 Collector Capacitor A collector capacitor can be used to degrade the delay through a CML gate by increasing the rise time. Modeling the new gate delay first involves determining the gate delay without the capacitor. This nominal delay is represented by To, and is approximately equal to 12.5 ps. The extra delay is modeled as a RC charging circuit with a time constant of 2RcCc. The factor of 2 arises from the equivalent circuit shown in Fig. C-8, where two series capacitors have a value of twice the original. An additional factor of ln(2) multiplies the time constant to account for the point at which the output is considered switched. This point is –t ----------------- v o = To + I o Rc e RcCc (C-13) which is approximately when the differential voltage is 0 V. The total delay is equal to T = T o + ln ( 2 ) ( 2 R c C c ). 171 (C-14) 350 300 Gate Delay (ps) 250 200 150 100 Analytical Simulated 50 0 0 100 200 300 400 500 Capacitance (fF/µ m) Figure C-9 Delay Model with Collector Capacitor The delay of a CML gate versus level 1 capacitance is derived in this section and is consistent with simulated results. 172 600 Appendix D. Sizing Transistors to Minimize VCO Delay The design of digital logic gates in SiGe technology always includes a consideration of transistor size. Sizes range from an emitter length of 1 µm to a length of 20 µm and if multiple fingered emitters are used, effective lengths up to 40 µm. Usually, the larger transistors have smaller delay, but consume proportionally higher current. A trade-off decision among power, layout space, and delay specifications needs to be made. Logic gates can be extremely varied and may include such functions as multiplexed XOR, and five input AND/OR cells. Delays through each of these will depend on the number of inputs and outputs, the input and output levels and various other factors. An in-depth analysis of all these factors would be very complicated, and the results difficult to utilize. A more general solution, and the one followed in this appendix, is to consider simple buffers with emitter followers driving other buffers. Although not a completely accurate representation of most logic gates, the analysis conclusions are very useful in the design of all gates. If a buffer is driving multiple receivers, this condition is reduced to a case with only one receiver whose size is equal to the sum of the receivers. For instance, if a driver has four 1 µm loads, they can be treated as one 4 µm load. Also worth noting, is that the following analysis is extremely useful in the optimization of ring oscillators. These circuits incorporate a ring of two or more buffers that oscillate because of an odd number of inversions, and are very sensitive to gate delays. If a buffer has a delay of 25 ps, then a 1-2 ps difference in delay can have a 4% or greater impact on the final oscillation frequency. Consideration of the type of loads that will be driven by the VCO is also important when choosing device sizes. For instance, if the VCO has buffers with 1 µm devices, then a 1 µm load on each stage will introduce a proportionally huge loading effect on the system. The assumption in this analysis is that the receiver circuit is fixed and design work will be done on the driver. The data presented here, however, can be useful for the design of the receiver as well. 173 Delay (ps) 10-12 12-14 14-16 16-18 18-20 20-22 22-24 24-26 26-28 28-30 1 2 3 4 9 7 5 3 1 Receiver Size ( m) Design Points 5 6 7 8 9 10 Emitter Follower Size (µ m) emitter follower amp delay Figure D-1 Delay from emitter follow to differential amplifier In general the larger the emitter follow the more capable it is at driving larger differential amplifiers. A rule of thumb in designing an emitter follower to minimize delay and not use considerable power is to use 2 µm devices plus 1 µm per 5 µm of load. Fig. D-1 shows the effect on the delay of using different sized emitter followers to drive various receiver loads. The larger the emitter follower, the smaller the delay since the higher powered follower has a lower output resistance. This, coupled with the receiver input base capacitance, produces a smaller delay. The figure also shows the acceleration in delay as the receiver size remains fixed and the emitter follower shrinks. The acceleration occurs because delay is inversely proportional to output resistance. Also shown on this plot, are design points which establish a good rule of thumb for designing emitter followers based on receiver loading for less critical gates. Obviously, the largest emitter followers used will yield the smallest delay, but there is a point were larger devices do not yield substantial improvement. The design rule is to use followers of at least 2 µm and add an additional 1 µm per 5 µm of load. Following this rule yields very small delays without huge power consumption 174 . 7-8 8-9 9-10 10-11 12-13 13-14 14-15 15-16 11-12 1 2 3 4 5 6 7 8 9 Amp Size ( m) 10 9 8 7 6 5 4 3 2 1 Design Points 10 Emiiter Follower Size (µ m) amp emitter follower delay Figure D-2 Delay from differential amp to emitter follower Designing CML logic gates often requires designing an emitter follower stage. The choice of follower is based on many factors, including the specific differential amplifier driving the followers. In general, the larger the follower, compared to the amplifier, the larger the delay through the gate. After choosing an emitter follower, the next step is to design the differential amplifier that represents the core of the driver. Fig. D-2 shows the delay from the amplifier to the emitter follower, given different sizes of each. Here the effect is opposite from the effect demonstrated in the previous section; a larger follower size now increases the delay. This is because the followers are now acting as loads on the amplifier and the larger transistors add base capacitance. The ideal situation would be to have the smallest emitter followers possible, but this is not an option after considering loading effects. A good rule is to use an amplifier that is at least half the size of the emitter followers. This yields good delay and driving properties. From Fig. D-1 and Fig. D-2, it is clear that a trade-off exists when designing an emitter follower to be placed between two differential amplifiers. An increase in follower size allows for a better ability to drive loads, however, this increase inhibits the ability of the first amplifier to drive the follower. A closer look at this situation yields Fig. D-3, which 175 shows the optimum follower size to use, given a driver and receiver amplifier size. For instance, in a ring oscillator with 2 µm buffers each driving a 1 µm load, the optimal follower to use is about 6 µm in size. From Fig. D-4 we find that the delay through the gate will be about 23 ps. 1 Feed Forward VCO design points 4 18-20 16-18 2 14-16 Ring VCO design points 12-14 3 10-12 8-10 4 5 6 Driver ( m) 6-8 6 4-6 2-4 8 7 8 9 10 12 10 1 2 3 4 5 6 7 8 9 10 Receiver (µ m) Figure D-3 Size of emitter follower between driver and receiver When a gate needs to drive another gate on level 2 or lower, or when the receiver is a large load, emitter followers are used. The optimal transistor size to minimize delay through the driver and receiver gates, is a function of the transistor sizes in the driver and the receiver. Ring oscillators typically have a buffer of size x driving the next buffer, and a load. Minimizing and balancing the external loading on each buffer forces each stage to have 1 µm buffers hanging on it. For standard ring VCOs, an emitter follower design line exists. This is shown on Fig. D-3 and Fig. D-4. For the feed forward VCO, each stage of size x must drive two inputs of size x, yielding a different design curve. The final step is to justify the use of the emitter follower. Since it adds delay to the buffer-follower-buffer system, it may be better (less delay) to remove the follower 176 completely. Fig. D-5 shows the difference in delay between a system with and without an emitter follower. In almost all instances it is beneficial to include the follower unless the receiver is much smaller then the driver. 1 34-35 33-34 25 32-33 2 31-32 24 30-31 3 29-30 23 28-29 4 22 5 6 27-28 26-27 Driver ( m) Feed Forward VCO design points 21 Ring VCO design points 7 8 9 20 10 1 2 3 4 5 6 7 8 9 10 Receiver (µ m) Figure D-4 Delay when using optimized emitter follower The plot above shows the minimum delay achievable between two differential amplifiers when using an optimized emitter follower. 177 25-26 24-25 23-24 22-23 21-22 20-21 1 8.0-10.0 6.0-8.0 2 4.0-6.0 2.0-4.0 3 0.0-2.0 -2.0-0.0 5 6 1 2 3 4 Driver ( m) 4 7 5 8 9 10 6 7 8 9 10 Receiver (µ m) Figure D-5 Delay difference between circuit with follower and circuit without An emitter follower between differential amplifier introduces additional delay, but in most cases reduces the overall delay of the system. Only in cases with large drivers and smaller receivers does the emitter follower increase the delay. 178 Appendix E. SpectreHDL models E.1. FFI VCO // Spectre AHDL for FFI VCO 4u, ahdl // // This cell emulates the functioning of the FFI VCO. // It has 4 sine wave outputs each offset from each other // by 45 degrees. Additional outputs give the instantaneous // frequency and the phase relative to a fixed frequency // source // // Thomas Krawczyk 7/00 // #define PI 3.1415926535 module b_ffi5 ( w20, w21, x20, x21, y20, y21, z20, z21, Vref, s30, s31) (fc,offset,divider,mfreq) node [V,I] w20; node [V,I] w21; node [V,I] x20; node [V,I] x21; node [V,I] y20; node [V,I] y21; node [V,I] z20; node [V,I] z21; node [V,I] s30; node [V,I] s31; node [V,I] phase; node [V,I] freq; node [V,I] Vref; // Center frequency with 0 control voltage parameter real fc = 5.96G ; // DC voltage offset on terminal outputs parameter real offset = -1.1 ; // In PLL encorporate 1/8, 1/16 divider into model parameter real divider = 1 from (0.25:64); // Frequency with which to compare and determine phase offset parameter real mfreq = 5 GHz; { table VCOdata; real control_voltage, f; real s[11], factor[11]; initial { // Mapping data between input control voltage and output frequency collected // from simulation. Must be positive so a 450m offset is introduced. s[0] = 0.500; factor[0] = 0.733; s[1] = 0.600; factor[1] = 0.733; s[2] = 0.700; factor[2] = 0.747; s[3] = 0.800; factor[3] = 0.805; s[4] = 0.850; factor[4] = 0.849; s[5] = 0.900; factor[5] = 0.896; s[6] = 0.950; factor[6] = 0.950; s[7] = 1.000; factor[7] = 1.000; s[8] = 1.050; factor[8] = 1.046; s[9] = 1.100; factor[9] = 1.091; s[10]= 1.150; factor[10]= 1.134; s[11]= 1.200; factor[11]= 1.168; s[12]= 1.300; factor[12]= 1.218; s[13]= 1.400; factor[13]= 1.230; s[14]= 1.500; factor[14]= 1.230; VCOdata = $build_table(2, factor, s, 11); } analog { control_voltage = V(s31,s30) + 450m; // Find the frequency multiplier from the control voltage f = $interpolate(VCOdata, control_voltage); // Find the phase of the w20 phase ph = 2*PI*integ(fc*f/divider,0); // Find the phase of the signal whose frequency is being used for phase difference mph= 2*PI*integ(mfreq,0); // Generate the signals for each phase output V(w20) <- offset + sin(2*PI* integ(fc*f/divider,0) ); V(w21) <- offset - sin(2*PI* integ(fc*f/divider,0) ); V(x20) <- offset + sin(2*PI* integ(fc*f/divider,0) +1*PI/4 ); V(x21) <- offset - sin(2*PI* integ(fc*f/divider,0) +1*PI/4 ); V(y20) <- offset + sin(2*PI* integ(fc*f/divider,0) +2*PI/4 ); V(y21) <- offset - sin(2*PI* integ(fc*f/divider,0) +2*PI/4 ); V(z20) <- offset + sin(2*PI* integ(fc*f/divider,0) +3*PI/4 ); V(z21) <- offset - sin(2*PI* integ(fc*f/divider,0) +3*PI/4 ); 179 // Return the phase difference in degrees V(phase) <- (ph-mph)/PI*180; // Return the exact frequency in GHz V(freq) <- fc*f/divider/1G; } } E.2. 3-State PD // // // // // // // // // // // Spectre AHDL for SERDES3, PD_3state, ahdl This module emulates the 3-state Phase Detector. It looks for rising transtions of the vi and vo inputs and forces the output to a +1 or -1 state depending on which input went high. When both eventually go high the output is reset. The slip outputs although not implemented give a pulse when the detector exceeds is max value. Thomas Krawczyk 9/27/00 module PD_3state ( vd0, vd1, vi_slip10, vi_slip11, vo_slip10, vo_slip11, Vref1, Vref2, vi20, vi21, vo20, vo21) () node [V,I] vd0; node [V,I] vd1; node [V,I] vi_slip10; node [V,I] vi_slip11; node [V,I] vo_slip10; node [V,I] vo_slip11; node [V,I] Vref1; // Can ignore node [V,I] Vref2; // Can ignore node [V,I] vi20; node [V,I] vi21; node [V,I] vo20; node [V,I] vo21; { real vo_center = -1.07; // Center output voltage real vo_swing = 144m; // Swing either high or low real i_rise = -1; // 0 = low 1 = transition 2 = high real o_rise = -1; real out0, out1; analog { // Make sure we get a time point at the input crossings. $threshold( V(vi20)-V(vi21), 1 ); $threshold( V(vo20)-V(vo21), 1 ); if( V(vi20) > V(vi21)) { if( i_rise < 2 ) i_rise++; } else i_rise = 0; if( V(vo20) > V(vo21)) { if( o_rise < 2 ) o_rise++; } else o_rise = 0; // input vi positive transition? if( i_rise == 1 && o_rise == 0 ) { out0 = vo_center + vo_swing; out1 = vo_center - vo_swing; } // input vo position transition? if( i_rise == 0 && o_rise == 1 ) { out0 = vo_center - vo_swing; out1 = vo_center + vo_swing; } // Both transitions detected // reset output back to nominal values if( i_rise >= 1 && o_rise >= 1 ) { out0 = out1 = vo_center; } if( i_rise == -1 && o_rise == -1 ) { out0 = out1 = vo_center; } 180 // Give the output signals a rise time and 3 gate delays V(vd0) <- $transition( out0, 60p, 20p, 20p ); V(vd1) <- $transition( out1, 60p, 20p, 20p ); // Frequency V(vi_slip10) V(vi_slip11) V(vo_slip10) V(vo_slip11) slip detectors are not implemented <- -1.5; <- -1.5; <- -1.5; <- -1.5; } } E.3. Transition Detector PD // // // // // // // // // // // // // // Spectre AHDL for SERDES3, RxEdgeExtraction, ahdl This is a model for the Transistion Phase Detector circuit. Clock inputs are w2 - z2. Data inputs are dw1 - dz1. Sampled outputs are da2 - dd2. Fast and slow commands to the VCO are f20 and s21. Each region is 25 ps wide. \2|1/ 3 \|/ 0 ---+--4 /|\ 7 /5|6\ module RxEdgeExtraction ( da20, da21, db20, db21, dc20, dc21, dd20, dd21, f20, s21, dw10, dw11, dx10, dx11, dy10, dy11, dz10, dz11, w20, w21, x20, x21, y20, y21, z20, z21, region) () node [V,I] da20; node [V,I] da21; node [V,I] db20; node [V,I] db21; node [V,I] dc20; node [V,I] dc21; node [V,I] dd20; node [V,I] dd21; node [V,I] f20; node [V,I] s21; node [V,I] dw10; node [V,I] dw11; node [V,I] dx10; node [V,I] dx11; node [V,I] dy10; node [V,I] dy11; node [V,I] dz10; node [V,I] dz11; node [V,I] w20; node [V,I] w21; node [V,I] x20; node [V,I] x21; node [V,I] y20; node [V,I] y21; node [V,I] z20; node [V,I] z21; node [V,I] region; // AHDL output of the current sampling region { integer reg = 0; // 1-8 (0-45 = 0) integer out[8]; // output array of detected transitions // per region to be summed at end integer sum; // sum of output array integer i; // index for summing loop integer da, db, dc, dd; // Sampled outputs (0,1) map to (-1, 1) integer data_val; // Last data value real out_center = -1; real out_diff = 4m; // center of fast/slow output // fast/slow differential output / edge real data_center = -1.1;// Center of sampled data output real data_amp = 150m; // Amplitude of sampled data output analog { if( V(w20) > V(w21) && reg if( V(dw10) > V(dw11) ) else reg = 0; out[reg] = 0; } if( V(x20) > V(x21) && reg reg = 1; out[reg] = 0; } if( V(y20) > V(y21) && reg if( V(dw10) > V(dw11) ) else reg = 2; out[reg] = 0; } if( V(z20) > V(z21) && reg reg = 3; out[reg] = 0; == 7 ) { da = 1; da = -1; == 0 ) { == 1 ) { db = 1; db = -1; == 2 ) { 181 } if( V(w20) < V(w21) && reg if( V(dw10) > V(dw11) ) else reg = 4; out[reg] = 0; } if( V(x20) < V(x21) && reg reg = 5; out[reg] = 0; } if( V(y20) < V(y21) && reg if( V(dw10) > V(dw11) ) else reg = 6; out[reg] = 0; } if( V(z20) < V(z21) && reg reg = 7; out[reg] = 0; } == 3 ) { dc = 1; dc = -1; == 4 ) { == 5 ) { dd = 1; dd = -1; == 6 ) { // Look for transitions and insert // 1 into output array of current region if( (V(dw10) > V(dw11)) && data_val == 0 ) { out[reg] = 1; data_val = 1; } if( (V(dw10) < V(dw11)) && data_val == 1 ) { out[reg] = 1; data_val = 0; } // Sum the fast/slow regions sum = -out[0]+out[1]-out[2]+out[3]-out[4]+out[5]-out[6]+out[7]; V(da20) V(da21) V(db20) V(db21) V(dc20) V(dc21) V(dd20) V(dd21) <<<<<<<<- data_center data_center data_center data_center data_center data_center data_center data_center + + + + - da*data_amp; da*data_amp; db*data_amp; db*data_amp; dc*data_amp; dc*data_amp; dd*data_amp; dd*data_amp; V(f20) <- $transition(out_center + out_diff/2*sum, 50p, 20p, 20p); V(s21) <- $transition(out_center - out_diff/2*sum, 50p, 20p, 20p); V(region) <- reg; } } E.4. Histogram generator // Spectre AHDL for SERDES3, histogram, ahdl // This cell allows the plotting of a histogram of voltages. // It samples the "vin" signal and places it in one of "bins" bins. // The "sweep" output signal sweeps across all bins while the "plot" // output shows the current value of that bin. // To create the histogram simply set "sweep" as the x axis and // "plot" as the y axis. // // Thomas Krawczyk 9/26/00 // module histogram ( plot, sweep, vin, mean, rms) ( bins, low_v, high_v, begin ) node [V,I] plot; node [V,I] sweep; node [V,I] vin; node [V,I] mean; node [V,I] rms; parameter real bins = 16 from (1:1025); parameter real low_v = 0; parameter real high_v = 1; parameter real begin = 1n from (0:inf); { integer bin[1024]; integer index; integer s=0; // Current sweep index integer count=0; // Total samples real range; // Difference between low_v and high_v real mu, sigma; // Mean and standard deviation 182 real sum, sq_sum;// The sum and the sum of square samples initial { range = high_v-low_v; } analog { if( $time() > begin ) { count++; sum += V(vin); sq_sum += V(vin)*V(vin); mu = sum/(1.0*count); sigma = sqrt(( sq_sum - 2*mu*sum + count*mu*mu )/(1.0*count)); index = (V(vin)-low_v)/range * bins; if( index >= 0 && index < bins ) bin[index]++; s++; if (s == bins) s=0; } V(mean) V(rms) V(sweep) V(plot) <<<<- mu; sigma; low_v + s/(1.0*bins)*range; bin[s]; } } E.5. Jittered data source // Spectre AHDL for PeteExp, datasource, ahdl # define PI 3.1415926535 #define getbitnum(t) floor(t*Bps) module datasource ( d0, d1, sweep, Jout ) (Offset, Vmag, Bps, Sigma) node [V,I] d0; node [V,I] d1; node [V,I] sweep; node [V,I] Jout; parameter real Offset=-1.50e-1; parameter real Vmag=-1.50e-1; parameter real Bps=2.0e10 from (0:inf); parameter real Sigma=1.0e-11; { // Local Variables integer bitnumber,newbitnumber,cbnum,cbval; real jitter; real c_0,c_1,c_2,d_1,d_2,d_3,T,X,p; initial { bitnumber=0; newbitnumber=0; jitter=0.0; c_0 = 2.515517 ; c_1 = 0.802853 ; c_2 = 0.010328 ; d_1 = 1.432788 ; d_2 = 0.189269 ; d_3 = 0.001308 ; } analog { newbitnumber=getbitnum($time()); // time*bps, but want the fractions V(sweep) <- ($time()*Bps - newbitnumber); if (newbitnumber!=bitnumber) { bitnumber=newbitnumber; // Create jitterval for this new bit jitter=$random(); if (jitter<=0.5) p=jitter; else p=1.0-jitter; T = sqrt( ln(1.0/(p*p)) ); X = T-(c_0 + c_1*T + c_2*(T*T))/(1 + d_1*T + d_2*(T*T) + d_3*(T*T*T)); if (jitter>0.5) { 183 jitter=-1.0*X*Sigma; } else { jitter=X*Sigma; } $break_point((1.0+newbitnumber)/Bps+jitter); } V(Jout) <- jitter; // Get possibly current different bit number cbnum=floor(($time()+jitter)*Bps); // Convert bit number to bit value cbval=cbnum % 2; V(d0) <- $slew(Offset-Vmag*(2*cbval-1),3.0e10,-3.0e10); V(d1) <- $slew(Offset+Vmag*(2*cbval-1),3.0e10,-3.0e10); } } 184 Appendix F. Toplevel Chip Schematics F.1. Serdes I Transmitter 185 F.2. Serdes I Receiver 186 F.3. Serdes II Tranciever 187