Circuits for the Design of a Serial Communication System Utilizing

Circuits for the Design of a Serial
Communication System
Utilizing SiGe HBT Technology
by
Thomas W. Krawczyk Jr.
A THESIS SUBMITTED TO THE EXAMINING
COMMITTEE OF RENSSELAER POLYTECHNIC INSTITUTE
IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
MAJOR SUBJECT: ELECTRICAL ENGINEERING
John F. McDonald, Chair
Gary Saulnier, Prof. ECSE
Kenneth A. Connor, Prof. ECSE
Lester Rubenfeld, Prof. Math
Donald Millard, Prof. ECSE
Rensselaer Polytechnic Institute
Troy, New York
November 2000
© Copyright 2000
by
Thomas W. Krawczyk Jr.
All Rights Reserved
ii
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1. Introduction & Historical Review . . . . . . . . . . . . . . . . . 1
1.1. Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. The three chips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3. Project time line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4. State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5. Contribution to the Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.1. Feed Forward Interpolated VCO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.2. Transmitter Interleaving Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.3. Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.4. Receiver PLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6. SiGe 5 HP Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7. Testing Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.8. Document Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2. Serial Communication . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1. Serial Communication Block Diagram . . . . . . . . . . . . . . . . . . . . . . 15
2.2. Transmitter / Multiplexer / Clock Multiplier . . . . . . . . . . . . . . . . . . 16
2.3. Transport Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4. Receiver / Demultiplexer / Clock & Data Recovery . . . . . . . . . . . . 18
2.5. Internal Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6. Support Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
iii
3. Current Starving VCO . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2. The need for a VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3. Simple Current Starving VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4. Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
21
22
22
3.4.1. Adjustable Voltage Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.2. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.3. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.4. Optimization of Simple CS VCO (post-fabrication). . . . . . . . . . . . . . . . . . . 27
3.5. Current Starving with Feed Forwarding . . . . . . . . . . . . . . . . . . . . . 29
3.5.1. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2. Testing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6. Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4. Feed Forward Interpolated VCO . . . . . . . . . . . . . . . . . 35
4.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2. The Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3. Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4. Stage Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5. Circuit Implementation and Analysis . . . . . . . . . . . . . . . . . . . . . . .
35
35
36
40
44
4.5.1. Cascode amplifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5.2. Emitter Resistor for linearity and gain adjustment . . . . . . . . . . . . . . . . . . . . 45
4.5.3. Center capacitor to control frequency range center . . . . . . . . . . . . . . . . . . . 46
4.5.4. Bypass resistor to prevent stage decoupling . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6. System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6.1. Branch current to frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6.2. Center frequency and intrinsic stage delay . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.6.3. Frequency gain at the center frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.6.4. Frequency Range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7. Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7.1. The Impulse Sensitivity Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.7.2. Solving for phase noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7.3. Phase noise comparison between the FFI and CS VCOs . . . . . . . . . . . . . . . 57
4.8. Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.9. Interconnect Parasitic Simulations . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10. HDL Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.11. Final Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
60
61
62
4.11.1. Circuit Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.11.2. Layout Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.12. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
iv
4.12.1. Frequency Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.12.2. Common Mode Gain (5 GHz VCO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.12.3. Response versus supply voltage (5 GHz VCO) . . . . . . . . . . . . . . . . . . . . . 68
4.12.4. Phase noise measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.12.5. Jitter measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5. Design of the Transmitter . . . . . . . . . . . . . . . . . . . . . . 72
5.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2. Top Level Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3. 16-1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.1. The Case for the Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3.2. Final Implementation and Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4. Phased Locked Loop (Frequency Synthesizer) . . . . . . . . . . . . . . . . 82
5.4.1. Input Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4.2. Phase Detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4.2.1. Phase detector (Serdes I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4.2.2. Phase detector (Serdes II) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4.2.3. Phase detector (Serdes III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4.3. The VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.4. Loop Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.4.1. Serdes I Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.4.4.2. Serdes II Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.4.4.3. Serdes III Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4.5. PLL Loop Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.4.6. Lock Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4.6.1. Serdes I Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4.6.2. Serdes II Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4.6.3. Serdes III Simulated Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4.7. 20 / 40 Gb/s Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.5. Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6. Data Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7. Line Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.8. Internal Testing Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
102
106
106
106
5.8.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.8.2. Serdes II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.9. Implementation and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.9.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.9.2. Serdes II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.10. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.10.1. Serdes I (transmitter test results). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.10.2. Serdes II (transmitter test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.11. Future Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
v
5.11.1. 8B/10B Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.11.2. Transmitter data retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.11.3. LC Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6. Design of the Receiver . . . . . . . . . . . . . . . . . . . . . . . . 121
6.1. Project History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2. Receiver Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.3. Receiver PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.3.1. Phase Detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.3.1.1. Transition Detector (PD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.3.1.2. NRZ Phase / Frequency Detector (PD/FD) . . . . . . . . . . . . . . . . . . . . . 129
6.3.2. The Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3.2.1. FET Charge Pump / Proportional Control (Serdes I) . . . . . . . . . . . . . . 131
6.3.2.2. Negative Impedance Charge Pump (Serdes II) . . . . . . . . . . . . . . . . . . . 133
6.3.2.3. Mixed Loop (Serdes III) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.3. PLL Loop Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.3.1. Serdes I (FET charge pump) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3.3.2. Serdes II (negative impedance charge pump) . . . . . . . . . . . . . . . . . . . . 136
6.3.3.3. Serdes III (dual-loop / referenced loop) . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4. 4-16 Demultiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5. Registers and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6. Line Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7. Test Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
138
139
140
140
6.7.1. On-chip test pattern generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.7.2. True error rate detector (TERD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.8. Implementation and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.8.1. Serdes I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.8.2. Serdes II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.9. Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.9.1. Serdes I (receiver test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.9.2. Serdes II (receiver test results) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.10. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.10.1. Sampling offset correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.10.2. 40 Gb/s?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.10.3. Demultiplexer improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Discussion & Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 150
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
vi
A. IBM SiGe 5 HP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
A.1. NPN Vbe characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
A.2. NPN Ic versus Vce characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.3. NPN fT Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
B. CML Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
B.1. CML Voltage Swing (non-linearized, digital) . . . . . . . . . . . . . . .
B.2. CML Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.3. Voltage Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.4. Buffer with emitter follower outputs . . . . . . . . . . . . . . . . . . . . . . .
160
160
161
162
C. CML Circuit Details . . . . . . . . . . . . . . . . . . . . . . . . . 164
C.1. Linearizing the differential amplifier . . . . . . . . . . . . . . . . . . . . . . 164
C.2. Current bypassing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
C.3. CML delay increasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
D. Transistor Sizing to Minimize VCO Delay . . . . . . . 172
E. SpectreHDL models . . . . . . . . . . . . . . . . . . . . . . . . . 178
E.1. FFI VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E.2. 3-State PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E.3. Transition Detector PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E.4. Histogram generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E.5. Jittered data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
178
179
180
181
182
F. Toplevel Chip Schematics . . . . . . . . . . . . . . . . . . . . . 184
F.1. Serdes I Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
F.2. Serdes I Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
F.3. Serdes II Tranciever . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
vii
List of Figures
Figure 1-1.
Figure 2-1.
Figure 3-1.
Figure 3-2.
Figure 3-3.
Figure 3-4.
Figure 3-5.
Figure 3-6.
Figure 3-7.
Figure 3-8.
Figure 3-9.
Figure 3-10.
Figure 4-1.
Figure 4-2.
Figure 4-3.
Figure 4-4.
Figure 4-5.
Figure 4-6.
Figure 4-7.
Figure 4-8.
Figure 4-9.
Figure 4-10.
Figure 4-11.
Figure 4-12.
Figure 4-13.
Figure 4-14.
Figure 4-15.
Figure 4-16.
Figure 4-17.
Figure 4-18.
Figure 4-19.
Figure 4-20.
Figure 4-21.
Figure 4-22.
Figure 4-23.
Figure 4-24.
Figure 4-25.
Figure 4-26.
Figure 5-1.
Past and proposed future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Toplevel System Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Four stage VCO diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Current Starving VCO frequency and gain response . . . . . . . . . . . . . . . . . 23
Adjustable Voltage Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Layout of Simple CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Test data from Simple CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Frequency Response versus emitter length in delay elements . . . . . . . . . . 29
Feed-forward CS VCO block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Feed forward CS VCO frequency response and gain . . . . . . . . . . . . . . . . 31
Feed-forward CS Delay Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Testing Data from feed-forward CS VCO . . . . . . . . . . . . . . . . . . . . . . . . . 33
Schematic for Delay Interpolated VCO element . . . . . . . . . . . . . . . . . . . . 36
Feed Forward VCO block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
FFI VCO under boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Feed-forward interpolated simulated response . . . . . . . . . . . . . . . . . . . . . 38
Delay versus weighting factor with single stage imbalance . . . . . . . . . . . 42
Decoupling versus delay injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Schematic for FFI VCO element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
FFI VCO frequency versus emitter resistance . . . . . . . . . . . . . . . . . . . . . . 46
FFI VCO frequency versus centering capacitor . . . . . . . . . . . . . . . . . . . . . 47
FFI VCO frequency versus bypass resistance . . . . . . . . . . . . . . . . . . . . . . 48
FFI VCO Frequency Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
FFI VCO System from control voltage to frequency . . . . . . . . . . . . . . . . . 49
Simulated versus analytical response of the FFI Architecture . . . . . . . . . . 50
Center frequency simulation and model . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Current pulse effect on phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Simulated ISF for FFI VCO and output waveform . . . . . . . . . . . . . . . . . . 55
ISF rms values for various ring oscillators . . . . . . . . . . . . . . . . . . . . . . . . . 55
FFI with capacitive interconnect parasitics . . . . . . . . . . . . . . . . . . . . . . . . 61
FFI Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Reducing substrate coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
FFI waveform at 5 GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
FFI VCO measured results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
FFI common mode response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
FFI response versus supply voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Open loop phase noise of FFI VCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
FFI VCO analytical and measured jitter . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Transmitter and multiplexer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 73
viii
Figure 5-2.
Figure 5-3.
Figure 5-4.
Figure 5-5.
Figure 5-6.
Figure 5-7.
Figure 5-8.
Figure 5-9.
Figure 5-10.
Figure 5-11.
Figure 5-12.
Figure 5-13.
Figure 5-14.
Figure 5-15.
Figure 5-16.
Figure 5-17.
Figure 5-18.
Figure 5-19.
Figure 5-20.
Figure 5-21.
Figure 5-22.
Figure 5-23.
Figure 5-24.
Figure 5-25.
Figure 5-26.
Figure 5-27.
Figure 5-28.
Figure 5-29.
Figure 5-30.
Figure 5-31.
Figure 5-32.
Figure 5-33.
Figure 5-34.
Figure 5-35.
Figure 5-36.
Figure 5-37.
Figure 5-38.
Figure 6-1.
Figure 6-2.
Figure 6-3.
Figure 6-4.
Figure 6-5.
Figure 6-6.
Figure 6-7.
Figure 6-8.
Figure 6-9.
Data timing for the 4-1 multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
CML Two Level Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Simulation Testing of CML 2:1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . 77
Simulation Results for CML 2:1 Multiplexer . . . . . . . . . . . . . . . . . . . . . . 78
CML Single Level Symmetric Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . 78
Symmetric multiplexer transistor states . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Multiplexer Eye Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Multiplexer Layout for Serdes I and II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Linear model of PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Frequency synthesizer evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Schematic for input filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Input filter frequency response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Phase detector schematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Simulated phase detector responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
PLL frequency detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Passive Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Tx PLL passive loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Tx PLL active loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Active loop filter transfer function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Receiver III integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Voltage spectral density for optimal loop bandwidth . . . . . . . . . . . . . . . . 96
PLL simulated step responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
PLL I simulated acquisition plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
PLL II simulated acquisition plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5/10 GHz PLL implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Clocking scheme for transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Transmitter clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Load counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Serdes I LFSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
True error rate detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Serdes II bit pattern generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Serdes I transmitter layout and photograph . . . . . . . . . . . . . . . . . . . . . . . 111
Serdes II chip layout and microphotograph . . . . . . . . . . . . . . . . . . . . . . . 113
Transmitter waveform (Serdes I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Serdes 2 transmitter eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Tx PLL measured phase noise spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Data and clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Top level receiver architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Receiver PLL evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Receiver topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Transition detector in prototype I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Transition detector in prototype II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Gain of transition detector with data jitter . . . . . . . . . . . . . . . . . . . . . . . . 128
Phase detector for NRZ data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Receiver loop filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
MOSFET charge pump integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
ix
Figure 6-10. Proportional control and summing junction . . . . . . . . . . . . . . . . . . . . . . . 132
Figure 6-11. Serdes I loop locking in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Figure 6-12. Frequency and phase lock-in of serdes III Rx PLL . . . . . . . . . . . . . . . . . 138
Figure 6-13. 4-16 demultiplexer architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Figure 6-14. Serdes I receiver layout artwork and photograph . . . . . . . . . . . . . . . . . . . 143
Figure 6-15. Serdes I receiver locked to data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Figure 6-16. Serdes I recovered clock showing jitter. . . . . . . . . . . . . . . . . . . . . . . . . . 145
Figure 6-17. Serdes II Rx locked to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Figure 6-18. Serdes II receiver clock phase noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Figure 6-19. Revised 4-to-16 demultiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Figure A-1.Ic-Vbe characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Figure A-2.npn transconductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Figure A-3.Ic-Vce characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Figure A-4.fT vs Ic characteristics for npn transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Figure B-1.Current switching versus differential input voltage . . . . . . . . . . . . . . . . . . 160
Figure B-2.Simple CML Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Figure B-3.Reference Voltage Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Figure B-4.CML Buffer with emitter followers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Figure C-1.Linearizing differential amplifier with emitter resistors . . . . . . . . . . . . . . . 164
Figure C-2.Branch current response for various emitter resistors . . . . . . . . . . . . . . . . . 165
Figure C-3.Simulated / Analytical Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Figure C-4.Limiting full current switching with bypass resistors . . . . . . . . . . . . . . . . . 166
Figure C-5.Current limiting effects of bypass resistor . . . . . . . . . . . . . . . . . . . . . . . . . 167
Figure C-6.Current gain effects of bypass resistor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Figure C-7.Designing for gain with emitter and bypass resistors . . . . . . . . . . . . . . . . . 170
Figure C-8.Collector Capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Figure C-9.Delay Model with Collector Capacitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Figure D-1.Delay from emitter follow to differential amplifier . . . . . . . . . . . . . . . . . . 173
Figure D-2.Delay from differential amp to emitter follower . . . . . . . . . . . . . . . . . . . . . 174
Figure D-3.Emitter follower size between driver and receiver . . . . . . . . . . . . . . . . . . . 175
Figure D-4.Delay when using optimized emitter follower . . . . . . . . . . . . . . . . . . . . . . 176
Figure D-5.Delay difference between circuit with follower and one without . . . . . . . . 177
x
List of Tables
Table 1-1. Equipment used for testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Table 4-1. Circuit parameters for calculating jitter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Table 5-1. Pin-out of Serdes I transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Table 5-2. Bondpad pin-out of Serdes II chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Table 6-1. Pin-out of Serdes I transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
xi
Acknowledgements
First and foremost, I want to thank my family. Although they have little knowlege of
the research I have done, they have helped more than they know. Without them this would
have been a much more difficult undertaking.
I want to thank my advisor, Jack McDonald, for his assistance and guidance during
the past few years, and for providing me with the oppurtunity to work with cutting edge
SiGe technology. The members of my committee, Kenneth Connor, Gary Saulnier, Les
Rubenfeld, and Don Millard, also deserve thanks for providing insight and guidance in my
research. I would like to extend a special thank you to Dr. Millard for being a wonderful
mentor and friend since I began graduate school. He has always been there for me.
Also, without my fellow Frisc members and friends, Pete Curran, Samuel Steidl,
Matthew Ernest, Steven Carlough, and Bryan Goda, this certainly would have been a
boring voyage. Thanks for help.
I am indebted to Hank Dardy and Basil Decina at NRL (contract #N00173-99-1G013) for their support in this work. I also wish to thank Sierra Monolithics Incorporated
and IBM for the fabrication of my chip designs and for providing additional insight in this
research; and Intel, for providing a fellowship to support this work.
When I left high school I said, “I’ve just conquered a small hill in my life only to look
out and see a huge range of mountains before me.” Am I perhaps standing on the top of the
first mountain I saw?
xii
Abstract
The current high-growth nature of digital communications demands higher speed
serial communication circuits. Present day technologies barely manage to keep up with this
demand, and new techniques are required to ensure that serial communication can continue
to expand and grow.
The goal of this work was to research, design, implement, test and evaluate high
speed serial communication circuits. Research involved an in-depth study of the state of the
art in high speed digital and analog circuits; SiGe technology; and serial communication
circuits. Two prototype 20 Gb/s transceiver chips were designed using current mode logic
(CML) bipolar logic families and using IBM’s SiGe 0.5 µm heterojunction bipolar
transistor (HBT) technology. Following fabrication of two designs, the completed chips
were extensively tested, and test results were compared to expected results from
simulation. After optimization and many improvements, a prototype communication
system was designed and prepared for fabrication.
The optimized second prototype operated at speeds in excess of 20 Gb/s. It utilized a
novel four stage feed-forward interpolated ring voltage controlled oscillator (VCO)
architecture, for which RPI is pursuing a patent. By feed-forwarding every stage’s output
by one stage the architecture improved the core frequency by greater then 33% with a phase
noise of -90.2 dBc/Hz at 1 MHz. The transmitter took advantage of the phase quadrature
nature of the VCO in a unique multiplexing technique that required the development of a
new 2-to-1 multiplexer. This multiplexer had full input to output symmetry on all three
inputs and was capable of performing output data retiming. The PLL had a wide bandwidth
of 30 MHz, to suppress VCO noise, and produced in-band jitter of 2.0 ps from 100 kHz to
100 MHz.
The receiver, similar in both prototypes utilized the full eight phases of the VCO to
twice oversampling every data bit in the phase detector (PD). It was capable of extracting
timing information from every rising and falling transition. The loop filter incorporated a
xiii
negative impedance charge pump integrator which exhibited excellent performance. Four
bits of data were sampled through the PD and a 4-to-16 demultiplexer produced the 16 bits
of parallel data.
A third prototype was developed, but not fabricated, using the data acquired from the
first two designs. The transmit PLL bandwidth was optimized to account for the phase
noise measurements of the VCO. As a result, a frequency detector was required and added
to the PLL to increase the pull-in range. The loop filter was also modified to use the
negative impedance charge pump from the receiver PLL. The receiver demultiplexer
scheme was improved to decrease the timing constraints. In addition, the receiver PLL was
optimized to improve the bit error rate.
xiv
1
Introduction & Historical Review
1.1. Motivation and Goals
The research presented in this thesis deals with understanding and designing the
critical components that make up a serializing and deserializing, or Serdes, circuit. The
extremely complicated nature of such a system required a focused study that did not address
many of the issues that are present in a similar commercially designed product.
Funding for the project was acquired though Dr. Jack McDonald from the Naval
Research Lab, NRL. The requirements were to design a SiGe short-haul Serdes system
capable of 20 Gb/s that would assist in research that may eventually lead to 40 Gb/s.
Serdes circuits, discussed more thoroughly in the following chapter, consists of three
parts: a transmitter, a receiver, and a channel. The transmitter accepts streams of data in
parallel and multiplexes them together into a single serial stream. Distinguishing the bits at
the receiver input, after they travel through the channel, is a primary concern. The receiver
accepts the serial stream and demultiplexes it back to the parallel data. It must be sensitive
to changes in the data, in order to limit the error rate. The channel connects the transmitter
and receiver, and typically consists of amplifiers, repeaters, and optical wiring.
IBM’s SiGe HBT process technology was chosen because of the Frisc group’s
strength in high-speed bipolar design, and because of the state-of-the-art nature of the
process in the industry. The process provides integration with current CMOS technology
enabling a very wide variety of circuit topologies. This research used the 5 HP process
technology, with 5 levels of metal. It offered 50 GHz fT (transition frequency) HBT and
0.25 µm CMOS transistors.
One way of grouping Serdes circuits is by the distance over which the serialized data
is expected to travel. Systems, such as Synchronous Optical Network (SONET), are
implemented over distances greater than 100 km, and are considered long-haul. Short-haul
Serdes, on the other hand, is limited to short distances, such as a LAN, or between CPUs in
a multi-processor system. This distinction between short and long haul systems has
1
important implications on the critical specifications of the circuit. For long-haul systems,
phase noise is critical, as it dictates the total bit error rate (BER) through the long and noisy
channel. Short-haul is less sensitive to phase noise and is instead focused on bit throughput
and higher bandwidth.
Current industry level Serdes designs, as of the year 2000, run at 10 Gb/s and utilize
the same or similar 5 HP technology. Pushing the goal to 20 Gb/s and even 40 Gb/s was
intended to place this research on the cutting edge and evaluate the maximum potential of
the technology.
In addition to the goals of the NRL contract, various other factors motivated the
development of this project. First was the available test equipment. The lack of facilities to
test a packaged part necessitated a chip with wafer probing capabilities. This limited the
total testable signals to 12 RF and 12 DC at one time. Without packaging, a fully integrated
solution was necessary, rather than one that needed off chip components, such as capacitors
and op-amps.
1.2. The three chips
The total design process consisted of three separate designs. The first design, Serdes
I, was a prototype that tested some of the key components of a complete design. It was
fabricated in February 1999. This chip was an excellent starting point for the development
of a fully functional chip.
Serdes II was investigated and studied after the results from Serdes I were analyzed.
It possessed improvements in important areas such as the PLL, the multiplexer, the receiver
topology, and the VCO. Unfortunately the tape-out date was earlier than expected and
allowed only one month for final design and layout. This proved to be a difficult time line
and some design issues were left unresolved.
Following the collection of data from Serdes II, a third iteration, Serdes III, was
investigated. The design goal was to solve most of the issues uncovered from Serdes I and
Serdes II. Although no new layout was done for Serdes III, a complete set of new simulated
schematics were created. With the addition of some minor support circuits, a fully
functional and optimized Serdes chip could be implemented.
2
Transmitter
designed
99
9
Chips received
Test VCOs
19
99
Fe
b,
Test transmitter Start work on Serdes II
Test receiver
Submit to ISSCC
Candidacy
ug
A
M
Fe
b,
ay
,
20
00
20
00
Candidacy prep.
Additional
simulations
Serdes I
Design dubmitted
N
A
ug
,1
M
ay
,1
Candidacy
preparation
Receiver
designed
Final checks
ov
,1
99
9
Leap VCO
Simple VCO
99
9
Start of research
Paper search
,2
00
0
A
N
ug
,1
99
8
ov
,1
99
8
1.3. Project time line
Serdes II received
Test FFI VCO
Test transmitter
Test receiver
Se
p
t,
20
00
SymMux patent
Submit Serdes II FFI VCO patent
SMI offer to fabricate
Intense effort to design
Serdes II
Both patents pending
Complete thesis
Submit JSSC paper
Defend thesis
Figure 1-1 Past and proposed future research
This is a time line of the goals and accomplishments of this Serdes
research.
A time line indicating completed goals is shown in Fig. 1-1. Research into high speed
communication circuits was initiated in August 1998. A paper that appeared in ISSCC
1998, titled “A 10 Gb/s Si-Bipolar TX/RX Chipset for Computer Data Transmission” [1],
was the basis for the majority of the research. The paper presented a novel idea for a voltage
controlled oscillator, VCO, and a description of a transmitter and receiver circuit.
VCOs are the most important circuit in the design of communication circuits, and as
such, were the starting point for this research. A simple four phase buffer oscillator was
3
designed and simulated. The method for frequency control for this oscillator originated
from a modified version of Samuel Steidl’s VCO implementation [2]. An advanced version
of this VCO, with a 66% speed improvement, was subsequently implemented. The desire
to further increase the frequency led to a study of a phase multiplication techniques [3], [4].
Three separate VCO test chips were laid out to test various aspects of the above techniques.
Each chip contained serveral versions of a unique VCO design: with and without phase
multiplication, and under several different loading conditions.
In November 1998, the transmitter circuit started to take shape. One component of a
serializing circuit is the final multiplexer. To design this, a unique register “shuffling”
method was evaluated. As it provided better performance than other techniques and worked
with a slower rate multi-phase clock, it was chosen for the final design. In order to test the
transmitter, a linear feedback shift register, LFSR, was used to provide pseudo-random
data. An additional requirement of the transmitter was operation at a speed relative to a
fixed low frequency clock. This required the development of a phase locked loop, PLL,
capable of synching a low frequency external reference clock to the high rate internal clock.
Starting in December and during transmitter development, a receiver design was
examined. Many improvements were added to the fundamental architecture found in [1].
Instead of gathering timing data from every fourth transition, it was determined that better
performance could be achieved if every transition were used. Since no detailed mechanism
for feedback control was described, some ideas were gathered from a clock and data
recovery paper [5]. Starting with these ideas, a unique PLL was created for clock recovery.
Because of the difficulty of using external function generators, an internal testing source
was developed to provide different bit patterns to exercise the circuit completely.
All six chips, including an integrated transmitter/receiver chip, were designed and
laid out using Cadence software. Simulation was done using HSpice, Matlab, and a digital
simulator developed by Peter F. Curran. Final designs were shipped to IBM during the first
week of February 1999. After six months in fabrication, a finished wafer was returned to
RPI in the beginning of August of the same year.
Chip testing began with a detailed study of the three VCO chips and the test source
VCO in the receiver. It was became apparent that most of the circuits underperformed,
when compared to simulation results. It appeared that under heavily loaded conditions the
4
circuits slowed down more than expected. The transmitter test chip was tested and found to
work with a 25% reduction in frequency. This testing was followed by a detailed inspection
of the receiver chip, which was found to work nearly at the design speed.
During this time, data was being collected for a conference paper to be submitted to
the International Solid State Circuits Conference, ISSCC. Although the chips performed
slightly slower than anticipated, the paper still showed significant advances in state of the
art research. Unfortunately the paper was not accepted, most likely because there was a
frequency mismatch between the transmitter and receiver.
During the remainder of September, a thorough simulation of the VCO, including
layout parasitics, was performed. The initial results showed a close match to the results
measured from the fabricated wafer. Some discrepancy remains regarding how loading
affects the speed of the devices. A continuation of this work will attempt to match
simulations accurately to measured results to ensure that future designs will respond as
expected.
It was necessary to produce a second Serdes chip, drawing on the success of the of
the first test chip, that would meet the goal of a 20 Gb/s. Additional circuitry was needed
to round out the design: a 4-to-16 demultiplexer, an internal testing scheme, transmitter and
receiver integration onto one chip, packagability, and improved performance.
A comprehensive study was performed to determine exactly why and how the chips
underperformed. The design was modified to ensure that the parts would meet the required
specifications. This included complete redesign of the VCO into the Feed Forward
Interpolated VCO (FFI VCO). The new design was based upon the results of the previous
design and the development of a new multiplexer.
In February 2000, an invention disclosure record entitled “The Symmetric
Multiplexer,” was submitted to RPI [6]. The invention improved the standard CML
multiplexer and reduced phase noise and jitter at the transmitter output.
Serdes 2 was finished and submitted to Sierra Monolithics Incorporated, SMI, for
fabrication1 at the end of March 2000. It contained many improvements on the previous
design and was capable of being C4 packaged and wafer tested. After its completion, an
1. SMI volunteered silicon on an experimental run.
5
additional invention disclosure record that focused on the FFI VCO was submitted [7]. The
VCO is a novel approach to designing ring oscillators. It improves upon many key
parameters of the standard ring VCO.
The Serdes II chip was received three months after tapeout, in the middle of July
2000. Testing began immediately with a complete characterization of the FFI VCO
including its frequency response, CMRR, phase noise, supply response, and jitter. A high
quality spectrum analyzer was rented to aid in testing and data acquisition. Testing of the
transmitter was followed by a look at clock jitter and data eye diagrams. The transmitter
was a complete success, and operated at 20 Gb/s with rms jitter of 2.0 ps in the frequency
band of 100 kHz to 100 MHz. The symmetric multiplexer appeared to work exactly as
expected. Testing the receiver confirmed an anticipated problem with low lock-in range.
This was also seen in Serdes I and was not completely addressed in the second prototype.
Following the tape-out of Serdes II, intense work was done on Serdes III. Several last
minute problems were discovered in Serdes II that were corrected in the next iteration. Data
collected from Serdes II allowed the optimization of important PLL parameters in order to
reduce jitter, and improve the pull-in time. A problem with a small pull-in range in both
receiver PLLs required a complete redesign of the loop and the addition of a reference
signal.
Using the data collected in Serdes II, a journal article was submitted to the Journal of
Solid-State Circuits, JSSC, in October. It was titled “A Transmitter Architecture for High
Speed Short-Haul Serial Communication,” and it detailed the FFI VCO, the symmetric
multiplexer and the transmitter architecture.
At the end of September, the RPI patent office reported that they were going to pursue
U.S. patents for both inventions. This would start with an immediate application for
provisional patents that would protect the work after disclosure.
1.4. State of the Art
In the quick-paced research area of high speed communications, industry is currently
cresting the 10 Gb/s barrier while research is beginning in the 40 Gb/s regime. New
microelectronic technologies such as AlInAs/InGaAs heterojunction bipolar transistors
6
(HBT), and SiGe HBTs [8], [9] are playing leading roles. In particular, SiGe HBT and
CMOS technology is proving itself to be a high-speed (60-90 GHz fT), high-yield, highintegration, and low-cost solution [10], [11]. It possesses the strengths of silicon because of
similar fabrication techniques, but benefits from higher frequencies with the introduction
of germanium [12].
The current state of the art in high-speed serial communications can be broken down
in three basic design areas: VCOs; clock multiplier units (CMU), or transmitters; and clock
and data recovery (CDR) circuits, or receivers.
As the speed of serial communication circuits increases, so too must the speed of the
core building block of the circuit, the VCO. Multi-phase ring oscillators with top speeds
approximately equal to 1/10th of their technology’s fT are being improved [1], [13], [14].
It is common to see speeds around 5 GHz, with maximum quoted speeds up to
approximately 15 GHz through clock phase multiplication [3], [4]. Their Q of unity and
high noise characteristics are more suitable for short-haul systems or for systems that can
tolerate phase noise. In-depth analysis of the sources of phase noise are allowing tight
optimization of circuits [15]-[19]. CMOS differential ring oscillators running at speeds up
to 5 GHz exhibit -95 dBc/Hz of phase noise at 1 MHz [18], while bipolar rings are quoted
as having phase noise values of -86 dBc/Hz at 1 MHz [20]. Jitter, generally expressed by
the κ constant, has been documented for a silicon bipolar ring running at 625 MHz with a
0.6 mA tail current at 22 n s [17].
Ring oscillator architecture is straight forward and simple to understand. Through
interesting and creative interstage feedback techniques, the VCO frequency, and phase
noise can be improved. A four stage ring VCO that increases its speed by 33% by leapfrogging the output of one stage to the input of the stage ahead is documented in [1]. This
improves the speed by reducing the effective delay of every stage. A similar, more general
approach is presented in [13], which utilizes sub-feedback inverters that create fast and
slow loops which can be mixed together. An earlier approach, [23], has a five stage core
that potentiometrically mixes the output from the third and fifth stages. By doing this, the
ring is able to operate variably between a 3 stage and a 5 stage oscillator. Finally, by using
a negative skewed delay scheme, the core frequency of a CMOS ring oscillator is improved
by 50% [24]. This is accomplished by compensating for the slower PMOS transistors by
7
tying the PMOS input to the output of a stage two gates back. This turns the transistor on
sooner than the NMOS, thus improving its speed at the expense of additional power
requirements.
LC oscillators, on the other hand, which posses a high Q and extremely low noise and
jitter, are being rigorously researched as VCOs for long-haul serial communication. Unlike
multi-phase oscillators that can generate frequencies higher than their core frequencies, LC
oscillators are typically run at the baud rate of the communication channel. Thus, for a 10
Gb/s serdes implementation, a 10 GHz LC VCO is required. A 5 GHz VCO developed by
IBM [21] was quoted as having a phase noise of -98 dBc/Hz at 100 kHz, with a power of
15 mW. A second 11 GHz VCO with an integrated inductor is documented as having a -78
to -87 dBc/Hz phase noise at a 100 kHz offset from the carrier [22].
The state-of-the-art in transmitter, or CMU, research is measured primarily by the
maximum bit rate compared to the transistor technology, the clock jitter produced at that
rate, and the phase noise of the oscillator.
A 1.062 Gb/s transmitter implementation, [26], utilizes a half-rate ring oscillator. The
ring oscillator incorporates two mixing elements, between every pair of delay elements to
control the rate of oscillation. Its quadrature outputs are further broken up into four quarterrate signals that drive the 10-to-1 multiplexer. The PLL achieves an rms jitter performance
of 9.8 ps.
A low noise, 12.5 Gb/s CMU is described in [27]. It possesses a differential single
phase LC oscillator with a phase noise of -101 dBc/Hz at 1 MHz. The PLL has a very low
bandwidth of 300 kHz in order to reduce in-band noise. Its reference is at approximately
195.3 MHz and it utilizes a standard 3-state phase detector (PD). The loop filter consists of
a negative impedance amplifier and a single pole, single zero RC filter. The output jitter is
quoted as 0.4 ps.
An interesting non-optical transceiver described in [28] utilizes a 4-PAM (pulse
amplitude modulation) serial link for 8 Gb/s communications. It essentially transmits and
receives four level logic, which allows twice the symbol rate for the same bandwidth. It
exhibits a transmitter output jitter of 2 ps and a receiver jitter of 4 ps.
As bit rates are pushed higher relative to the transistor technology speed, certain
problems arise. In the transmitter PLL, a clock frequency divider is needed to drive the PD
8
along with the reference signal, and to drive multiplexer inputs. A feedback MS-latch often
does the trick, but for extremely high VCO speeds a new approach is required. A dynamic
frequency divider capable of speeds up to 79 GHz using transistors with an f T of 80 GHz
is described in [29]. It uses an XOR multiplier, a low pass filter inherent in the multiplier,
and it feeds the output back into the multiplier. The only stable condition is when the output
is at half the frequency of the input.
The state-of-the art in receiver, or CDR, design is measured by the ability to extract
data in the presence of both data and clock jitter, and the ability to tolerate pseudo-random
data.
The design described in [30] uses a full rate ring oscillator with a 12.5 GHz clock to
extract the 8B/10B encoded data at 10 Gb/s. The VCO exhibits a phase noise of
approximately -80 dBc/Hz at 1 MHz. The PLL has a bang-bang PD and is frequency locked
by a 195.3 MHz reference signal. The data PD has a pull-in range of 0.6% and a hold-in
range of 1.2%. This receiver is quoted as exceeding the SONET-192 specifications by 50%.
A 50 GHz fT SiGe 10 Gb/s CDR for SONET is described in [31]. It utilizes an LC
tank VCO running at 10 GHz with a phase noise of -80 dBc/Hz at 100 kHz. The PD is a
Hogge type, and the charge pump uses an active MOSFET positive-feedback pull-up
amplifier. The recovered clock rms jitter was measured at less than 1 ps, with a bit error
rate of 10-9. SONET specifications for jitter tolerance, jitter transfer, and jitter generation
were all met.
A very high speed CDR discussed in [32] uses a silicon bipolar process with an fT of
12 GHz for 8 Gb/s operation. The loop filter and VCO are off-chip but the frequency and
PD are both on-chip. The clock jitter was measured at 1.5 ps rms.
1.5. Contribution to the Field
An important aspect of Ph.D. research is advancement of the state of the art, and
proving that such work builds upon the shoulders of others and is not merely a reinvention
of the wheel. Four key components of this research can be quickly singled out as original
and novel, and RPI is pursuing U.S. patents for two of them.
9
1.5.1. Feed Forward Interpolated VCO
The Feed Forward Interpolated VCO is an improvement over the standard ring
oscillator [1]. The ring VCO in [23] utilizes a similar feed-forward method to extend the
frequency range but the feed-forwarding remains fixed and is not used as the delay control
mechanism. The design presented in this thesis, however, uses feed-forwarding to increase
the frequency range and also as the primary method to control the stage delay. It is versatile
and allows adjustments to be made to the center frequency, tuning range, and gain through
simple parameter changes. The VCO is 33% faster than a simple four stage ring oscillator
utilizing the same power, when it is configured for maximum operating speed. This
increase in speed can be traded for additional phase noise and jitter suppression, making the
FFI VCO a viable alternative to LC tanks when used in a short-haul communication
channel.
An invention disclosure record for this circuit was submitted in May 2000 to the RPI
patent office. In September 2000, the patent office declared that they were going to pursue
a U.S. patent for this invention.
1.5.2. Transmitter Interleaving Architecture
As the bit rate is pushed higher, with respect to the technology speed, it becomes
increasingly difficult to design VCOs that can keep up. Fractional rate oscillators can solve
this difficulty, but require tight timing constraints on the output multiplexer. The
transmitter design discussed in this thesis utilizes a relatively slow, well understood,
quarter frequency multi-phase VCO. The novel transmitter architecture allows inquadrature phases of the VCO to control a 4-to-1 multiplexer.
Although this approach is similar to the design given in [1], it possesses a few
differences. First, the 4-to-1 multiplexer is implemented as a single gate whereas the
transmitter interleaving architecture breaks the problem into multiple gates. Second, the
multiplexer requires multiple level clock inputs which requires the clock phases to be
skewed. Third, the multiplexer in the papter requires three levels of logic while this new
architecture requires only two. This is important for power saving applications that require
only two levels.
10
1.5.3. Symmetric Multiplexer
During the development of the transmitter a problem developed that required the
basic 2-to-1 multiplexer to be rethought. The problem was that the 2-to-1 multiplexer had
become a critical timing path in the transmitter. In other words, any delay mismatches in
this circuit were propagated to the output. After analyzing the problem, a new multiplexer
was developed that had perfect timing symmetry and possessed none of the problems of the
original multiplexer. This discovery enabled the new architecture to operate smoothly. A
U.S. patent for the symmetric multiplexer, like the FFI VCO, is being pursued by the RPI
patent office.
1.5.4. Receiver PLL
The critical circuit in the design of the receiver PLL was the phase detector (PD).
Typically, a Hogge-type [31], [52] or a bang-bang type PD [30] is used in high speed serial
receivers. The 20 Gb/s goal of this work required a PD to operate twice as fast using the
same technology speed. A bang-bang or Hogge style PD with this speed capability would
be difficult to design and would require a clock at the same frequency as the data. As a
result, a new PD had to be developed.
The new design, called a transition detector (TD), incorporates eight MS-latches,
each clocked by a different phase of the VCO. This allowed the data to be twice
oversampled and timing and information data to be collected.
1.6. SiGe 5 HP Overview
IBM’s 5 HP SiGe BiCMOS process incorporates 0.5 µm HBT transistors and 0.35
µm CMOS transistors. The epitaxially graded Ge base in the HBT allows f T speeds of up
to 60 GHz. Also included in the technology are: high breakdown NPN transistors, gated
lateral PNP transistors, polysilicon resistors, Metal-Insulator-Metal (MIM) capacitors,
substrate contacts, precision oxide/nitride decoupling capacitors, schottky barrier diodes,
varactor diodes, PIN diodes, electro-static discharge (ESD) devices, last metal (LM) spiral
inductors, resistors (NS, RN, and RI), and LM bondpads.
11
Between three and five layers of metal are provided at the back end of the line for
interconnect1. The first level of metal is for local interconnect and has a minimum width of
0.8 µm and a fixed thickness of 0.63 µm. The last, or highest level, called LM has a
minimum width of 2.4 µm, and a thickness of 2.07 µm. LM is typically used for bond and
C4 pads, power and ground wiring, inductors, and MIM capacitors. An extension to the 5
HP process allows LM to be substituted with analog metal (AM) which is 4 µm thick and
separated by 3 µm from the next layer of metal. AM is primarily used for inductors which
require low resistance and low capacitance to the substrate. Except for AM, all layers of
metal are separated by 1.2 µm of silicon dioxide.
The Cadence design kit from IBM provides full Spectre and HSpice models for the
devices listed above. The kit allows the extraction of interconnect capacitance and
resistance to enable full parasitic simulation.
See “IBM SiGe 5 HP” on page 156. describes important NPN HBT parameters in
more detail. Appendix A.1. describes the turn on characteristics of the transistor,
specifically the collector current versus base-emitter voltage. The relationship between the
collector current and the collector to emitter voltage is discussed in Appendix A.2. f T is a
figure of merit for the transistor family and its relation to the collector current is useful
when biasing the transistor for maximum performance. A plot of the transistor fT versus
collector current can be found in Appendi xA.3.
1. Serdes I was submitted in a DARPA multi-user wafer which only allowed three levels of metal. Serdes II
was submitted through Sierra Monolithics and had the full five levels of metal.
12
1.7. Testing Equipment
Table 1-1 Equipment used for testing
Type
Model
Specs
Usage
time-domain
oscilloscope
Tektronix
11801C
50 GHz • transmitter eye diagrams
spectrum
analyzer
Rhode
&
Schwarz
FSEM
30
30 Hz 26.5
GHz
spectrum
analyzer
HP
8563E
30 Hz 26.5
GHz
signal
source
HP
4430B
< 1 GHz • Low phase noise jitter measurements
signal
source
HP
8350B
power supply
• time-domain jitter measurements
• VCO frequency response
• VCO common mode response
• VCO frequency versus power supply
• VCO phase noise
< 10
GHz
• Transmitter PLL phase noise
• Receiver PLL phase noise
• High frequency receiver measurements
Agilent 3 ch. DC • Labview controlled VCO frequency and supE3631A
ply response
10 channel
RF probes
GGB
> 1 GHz • All high speed RF measurements where made
using these probes.
12 channel
DC probes
GGB
< 1 GHz • These probes were used in Serdes II for simple control lines.
LabView &
GPIB
• Labview and GPIB hardware simplified the
collecting of most data, including VCO phase
noise and responses.
1.8. Document Logistics
This thesis is sectioned into an abstract, six chapters, a conclusion, and appendices.
This introduction is the first chapter; it describes the goals and motivations behind this
project and discusses the state-of-the-art, the novelty of this work, and the test equipment.
The second chapter goes through the basic block diagram of a serial communication system
and the function of each block. Chapters three and four detail the development and results
of the two VCOs researched in this work. Chapter five details the transmitter, including the
13
PLL, architecture, and test structures. The last chapter discusses the receiver, its operation,
and test results. Appendices include information on the SiGe process used in this work, and
circuit details of this technology. In addition the last appendix has the top level schematics
for the Serdes I and II chips.
Three different Serdes designs were researched in this work. The first two were
fabricated and the third represents research for the future. Each design is designated by the
names Serdes I, Serdes II, or Serdes III.
Certain conventions were followed throughout this document. First, node names in
schematics and within equations are in bold font, such as z20 and a11. Second, equation
variables are italicized, as in fo, and ω2. Third, in plots that contain both simulated and
measured data, the simulated data is usually expressed as a dotted line and the measured
data line is solid. Fourth, for equations solved for the general case the units are usually
expressed as a function of the transistor size. This shows how the constants and variables
change depending on the transistor size. In contrast, absolute units were used for specific
circuits and fabricated circuits.
14
2
Serial Communication
The exchange of high speed serial data involves three primary components:
transmitter, receiver, and transport channel. A transmitter (Tx) gathers low rate parallel
data and transforms it into high speed serial data. The signal is then transported through the
channel, potentially air, or wire, to a receiver. The receiver (Rx) must then demodulate the
signal and extract the clock and demultiplex the data. The received information is fed out
of the receiver as parallel data.
Tx
PLL
clock
tree
Rx
VCO
reference clock
Rx
PLL
reference clock
Figure 2-1 Toplevel System Block Diagram
The transmitter accepts parallel data and serializes it to a NRZ signal.
The receiver accepts the bit stream, extracts the clock and demultiplexes
the data.
15
DATA OUT
decode
support
circuits
registers
demux
line
receiver
internal
testing
Tx
VCO
Receiver
support
circuits
clock
tree
line
retimer driver
internal
testing
multiplexer
registers
encoding
DATA IN
Transmitter
transport
channel
2.1. Serial Communication Block Diagram
Shown above in Fig. 2-1 is a basic block diagram of a serial communication system.
Although most systems do not look exactly like this, there is enough in common between
this system and others to say that these diagrams represent all such systems fairly
accurately.
2.2. Transmitter / Multiplexer / Clock Multiplier
The transmitter’s role is to accept a data word of a specified width, serialize it and
drive the data onto a channel. The width of the word depends on the application and is a
function of the input and output bandwidths. For example, an 8 Gb/s serializer, would
require 16 bits at 500 Mbit/s or 64 bits at 125 Mbit/s. Serializing involves multiplexing the
data into an ordered bit stream which is typically a non-return-to-zero (NRZ) format. The
process of driving a channel may consist of a simple 50 Ω amplifier, or it may consist of a
more sophisticated circuit that is capable of driving an optical driver.
It is possible, depending on the specifications, that the accepted data may be encoded.
The encoding process may include encryption, compression, bit stuffing, error checking,
and framing [33]. Depending on the design of the receiver, it may be necessary to introduce
additional transitions into the data to meet critical phase locked loop (PLL) specifications
in the receiver. 8B/10B encoding is popular and guarantees at least one transition every 5
bits [34]. If channel alignment, which means that bit 0 in the Tx comes out on bit 0 in the
Rx is required then encoding will be needed.
After possible encoding, the bits are stored in a register of appropriate size for the
incoming word and the multiplexer width. When the multiplexer is smaller than the width
of a word then the bits may be fed into a shift-register before being multiplexed [35]. This
register and the subsequent multiplexer must be timed very carefully to ensure that bits are
sampled correctly and that no race or runt pulses exist. Sometimes a first-in first-out (FIFO)
system is added to lessen the timing constraints between the data load clock and the
reference clock.
The PLL clocks the multiplexer and the multiplexer performs the serialization
function. This operation may require multiple gates, such as a 32-4 multiplexer followed
by a 4-1 multiplexer, or simply a 16-1 multiplexer. Timing at this stage becomes more
16
critical as the output rate of the multiplexer is at the serial data rate. Often multiple clock
phases or clock frequencies are needed.
The retiming circuit before the line driver re-establishes the transition locations in
order to remove any jitter or noise introduced by the registers and multiplexers [42]. This
circuit is clocked directly by the PLL to be as noiseless as possible. When low output jitter
is the limiting factor in the design, then a retiming circuit is absolutely required.
The retiming circuit, or multiplexer, is often unable to drive the pad and external load
directly, so a line driver is needed [36], [37]. It matches the internal circuitry impedance to
the output impedance and amplifies the signal to a desirable voltage swing if necessary.
Perhaps the most important circuit in the transmitter is the PLL, otherwise known as
the frequency synthesizer or clock multiplier unit (CMU). It generates the internal clock
signals which may be multi-phase or multi-frequency. It’s required to have low phase
noise, low jitter, and low frequency drift to generate a similarly low phase noise data
stream. The transmitter PLL, as opposed to the receiver PLL usually has a very low
bandwidth in conjunction with a low phase noise VCO to generate the cleanest clock signal.
The PLL locks the phase of an internal high speed clock to an externally supplied low
speed reference. In this way the reference is able to dictate the exact frequency that data is
transmitted. For instance, a 10 Gb/s system may have a 625 MHz reference clock, and a 10
GHz internal clock. The PLL must then match the two frequencies after dividing the
internal clock by 1/16th.
The PLL consists of three basic components: a phase detector (PD), a loop filter (LF),
and a voltage controlled oscillator (VCO). The PD generates a signal which is a function of
the phase difference between the divided down internal clock and the external reference. In
low speed applications such as this (625 MHz clock versus 10 GHz data rage) the PD can
generate an accurate, linear measure of phase difference. The LF typically consists of an
active filter with high DC gain which has a specific bandwidth and a high frequency pole.
With most of the other gains and parameters in the PLL fixed, the LF is the only circuit that
is adjustable to meet the specifications. The VCO accepts a voltage input and generates an
output signal which has a frequency that is a function of the input. Ideally this relationship
is linear which leads to closed-form linear solutions for the PLL.
17
One of the most important figures of merits for the transmitter is the output data jitter.
Jitter is created inside the VCO and partially filtered out by the PLL. The retiming circuit
and all circuits thereafter add slight jitter to the signal. The transmitter data eye closes
horizontally as more jitter is introduced into the circuit.
2.3. Transport Channel
The channel carries the data from the transmitter to the receiver, and may be
electrical, optical, wireless, or any combination of the three. For long-haul communication
the channel is a significant and sometimes dominant source of phase noise and jitter. For
short-haul communications, however, we assume that the channel is negligible.
2.4. Receiver / Demultiplexer / Clock & Data Recovery
The receiver must extract a clock from a very high frequency serial signal, plagued
with jitter and noise and use that clock to sample the data. This process is called clock and
data recovery and is made more difficult because transition locations are not guaranteed.
A line amplifier with a specific input impedance amplifies the signal to internal levels
while minimizing the distortion. The amplifier must have a large bandwidth, typically
about 50% higher than the baud rate. Noise injection from this circuit must be minimized
because the data signal is already saturated with jitter. When an optical channel is used a
laser diode drives the receiver input and a transimpedance amplifier is required.
The receiver has a PLL that is very different from the PLL in the transmitter. First,
the PD must operate at or near the data rate, which requires a simpler circuit and one that
may only provide a non-linear output. The PD must also be able to handle random data that
has random transition locations, if the data is of the NRZ variety. In addition, the key PLL
parameters must be tuned to a signal with high noise content as compared to the PLL in the
transmitter which has a low noise reference as its input. Additional circuitry will be needed
to sample the data using the recovered clock unless the PD does so naturally.
As in the case of transmitter, a reference clock may be used to bring the receiver VCO
close to the data frequency before clock extraction occurs. This greatly enhances the
operating range of the receiver PLL. The drawback is that two separate PDs and a circuit
18
that can switch between them is needed. This introduces two loops consisting of common
components which must be able to operate independently.
A common component in dual loop PLLs is a lock detect circuit which determines if
phase lock is lost and if it is, the loop switches back to the external reference loop. This
circuit is useful in a high noise environment where data jitter can cause the PLL to become
unstable. It also allows notification to the software layer to resend the lost data.
Once a clock has been extracted from the serial signal, and the data captured, the data
can then be demultiplexed through a series of samplers at decreasing clock rates. For
instance, in a 10 Gb/s system the first resampled data would pass through a 1-to-2
demultiplexer driven by a 5 GHz clock. The second stage would consist of two 1-to-2
demultiplexers driven by a 2.5 GHz clock and so on. If a multiphase clock is used, then
multiple samples can be taken with separate samplers. This allows the use a clock at a
fraction of the data bit rate.
One of the most important parameters in the design of the receiver PLL is its jitter
transfer function. This determines how sensitive the system is to data jitter. The PLL should
be able track low frequency jitter very well. In this case the jitter transfer function should
be close to 0 dB. At high frequencies the transfer function should drop off in conjunction
with the bandwidth of the loop. Another important parameter is called jitter peaking. This
parameter describes high frequency jitter components such as those from spurious
modulation. This is especially important in SONET repeaters that feed the receiver clock
back into a separate transmitter. A sequence of many repeaters are very sensitive to this
form of jitter.
After the data is fully demultiplexed down to the desired parallel data width it can be
decoded based upon the encoding scheme used in the transmitter. In some cases this also
involves channel framing which lines up transmitter input channel n with receiver output
channel n. Once the data is decoded it may, like the transmitter, be placed in a FIFO to
reduce the timing constraint on the data received clock.
19
2.5. Internal Testing
Internal testing involves performance verification of the transmitter and receiver
before and after being connected in a complete system. For a chip with both transmitter and
receiver components, this may involve a feedback path across the chip from the output of
the Tx to the input of the Rx. The parallel data from the Tx and Rx can then be compared
to determine the bit error rate (BER).
Additional testing modes may involve additional outputs that show the health of the
system [38]. Outputs may also be duplicated and fed to testing equipment while actual data
is being transmitted.
2.6. Support Circuits
Other circuitry may be needed in the system depending on the application. For
example, if a transmitter and receiver are required to operate at different fixed frequencies,
selectors and special input pins are required. Also, circuits within the chip may not be
needed all the time and in some cases a power managing system can cut-off power. This
option reduces overall power consumption but requires additional power-switching
circuits.
20
3
Current
Starving
VCO
Transm itter
Receiver
3.1. Project History
The Current Starving VCO (CS VCO) was used exclusively in the first serdes design,
which was fabricated in February 1999, in the transmitter, the receiver, and in various
oscillator test structures. Its performance was sufficient but the design required some
revision to meet frequency specifications. Deficiencies and unpredictable behavior,
however, resulted in its elimination from all subsequent designs.
The feed forward version of the CS VCO was not intended for use in the transmitter
and receiver design. It was instead designed to push the upper frequency limit in the ring
oscillator design. However, it had the potential for use in future transmitter and receiver
designs in order to double the speed to 40 Gb/s.
3.2. The need for a VCO
PLLs, frequency locked loops (FLL), clock extractors, and frequency synthesizers all
require a voltage controlled oscillator. These circuits create one or many signals with a
frequency that are a function of an external control voltage. In a PLL, or clock extractor, a
DC voltage is generated based upon the difference between the VCO signal and an external
signal. This voltage is then fed back into the VCO to create a stable phase feedback loop.
Frequency synthesizers incorporate frequency dividers to create signals of varying
frequencies based upon the VCO’s fixed frequency.
VCOs for Serdes circuits are usually either an LC (inductor, capacitor) oscillator or
ring oscillator; each having benefits and drawbacks. All VCOs discussed in this section are
four stage ring oscillators which produce eight unique phases when used with differential
21
logic. The architecture of the receiver and transmitter requires this crucial multiple-phase
characteristic.
3.3. Simple Current Starving VCO
The Simple CS ring oscillator has four stages [39], shown in Fig. 3-1, and is able to
create eight unique phases. The frequency of oscillation is defined by
1
f = -------------2 ⋅ 4T
(3-1)
where T is the delay through the gate. A factor of two is necessary, because after a signal
passes through four buffers it has only changed sign and requires another trip through all
four to oscillate. The frequency and gain response for this oscillator is shown in Fig. 3-2.
ΦA
ΦD
A
ΦB
B
D
ΦB
ΦA
C
ΦC
Τ
ΦC
ΦD
Figure 3-1 Four stage VCO diagram
Frequency control is accomplished through variable delay elements
arranged in a ring with an odd number of inversions. The operating
frequency range is a function of the delay element range and the number
of stages in the ring.
The schematic for the Simple CS stage is a buffer, described in Appendix B.4. on
page 162, with level two emitter followers. The differential circuit current source is
connected to the aVref circuit in order to control its current.
3.4. Basic Operation
Current starving VCOs control their frequency by varying the delay through each
stage of the ring. Each stage has a differential amplifier with one or many adjustable current
sources at the bottom of the tree. In this way, the stage is able to increase its delay with a
decrease in current. This effect is a primarily a result of less current causing a decrease in
22
the fT of the transistor, as shown in Appendix A.2. on page 158. Even though the smaller
current has less capacitor charging ability, the associated smaller voltage swing produces
no net effect in delay.
6.25
3.5
6.00
3.0
frequency
response
2.5
5.50
2.0
5.25
1.5
5.00
1.0
gain
4.75
4.50
-1.8
Gain (GHz/V)
Frequency (GHz)
5.75
0.5
0.0
-1.6
-1.4
-1.2
-1.0
-0.8
-0.6
Control Voltage (V)
-0.4
-0.2
0.0
Figure 3-2 Current Starving VCO frequency and gain response
The CS VCO’s usable frequency range is between a control voltage of 1.5V to -1.0V or higher. The lower range is limited by the small voltage
swing on the output. These simulation results were obtained with one
minimally sized buffer on each stage’s output. Interconnect parasitics
were not included.
Even though current starving is a simple technique for controlling delay, it has
numerous disadvantages. The first obvious problem is that at the limits of operation and
control voltage, undesirable conditions occur. At the minimum extreme, the current can be
decreased to the point that sustained oscillations can no longer occur, because the voltage
swing decreases and the gain drops below one. At the maximum, the transistor fT begins to
drop off the opposite side of the fT curve and the transistors begin to slow. This is
potentially disastrous when used in a phase lock loop because the VCO gain has gone
negative and the loop will become unstable.
23
Another problem is the that the delay as a function of current is non-linear in nature.
Fig. 3-2 shows the basic frequency response for the Simple CS VCO excluding
interconnect parasitic effects. The gain varies from 3.0 GHz/V to 0.5 GHz/V along the
curve and is never constant. A non-linear gain makes phase locked loops difficult to design.
The output voltage swing is also a concern because as the current increases, the voltage
swing across the pull-up resistors also increases. This alters the load driving ability, and
creates a situation which is difficult to model analytically.
Another problem is that the singled-ended nature of the control voltage does not
posses the common-mode noise immunity that is inherent in differential wiring. When
phase noise is a dominant design factor this architecture can be quite limiting.
The are benefits of this style of ring VCO, including its simplicity and a large tuning
range. The layout footprint is also quite small which minimizes interconnect delays.
3.4.1. Adjustable Voltage Reference
Vctrl
R1
Ir
aVref
Re
Vee
Figure 3-3 Adjustable Voltage Reference
The input voltage controls the total current through this circuit. In turn
this current is mirrored to all connected sources.
The active current sources in the CS stages are “mirrored” to a circuit that can vary
its current as a function of a single-ended input voltage, as depicted in Fig. 3-3. The current
through the reference circuit, and its derivative with respect to the control input is defined
24
by the following equations:
V ee + V ctrl – 3V be
I r = ---------------------------------------------
(3-2)
dIr
1
----------- = ------------------
(3-3)
R +R
1
e
R1 + Re
V ctrl
The emitter resistor, Re, is matched to the current sources emitter resistors so that the same
voltage exists across both. R1 determines the current gain of the circuit and the value is
selected based upon the input voltage swing, and the required output current swing. An
additional diode is added to decrease the voltage drop across R1 allowing a smaller resistor
size.
A common approach to designing a current mirror is to include base-current
compensation through a transistor located on the output (see Appendix B.3. on page161).
This allows the current reference to drive more loads and lessen the current degradation
when more loads are added. The problem with this approach is that it limits the frequency
response of the circuit. For this reason it was not included in the design. The current driving
capability of the circuit without base-current compensation should be sufficient to drive a
single VCO with an equivalent of 8 µm of loading.
3.4.2. Final Implementation
The development of the transmitter and receiver played a defining role in the design
of this VCO. To meet a goal of 20 Gb/s with a quarter-rate architecture, a VCO centered at
5 GHz was needed. A control voltage range from -0.8 V to -1.6 V was chosen because of
the solid transfer characteristics, and because those limits correspond to one and two Vbe
drops. At the center of the control range a frequency of 5.75 GHz was achieved,
corresponding to a 15% safety margin.1
Symmetry was the leading motivation behind the layout of the Simple CS VCO
shown in Fig. 3-4. The four stages were laid out in a square with the inputs and outputs
facing the center. In this way the interconnect between stages could be limited to a small
1. This safety margin was build in because parasitic simulations were not done prior to fabrication. It was
felt that a greater then 10% margin would adequately account for interconnect effects.
25
region in the center of the design. Power and ground rails, as well as the two reference rails
102 µm
(aVref, Vref), were placed in closed concentric LM rings around the top.
Figure 3-4 Layout of Simple CS VCO
Shown above is the layout for the Simple CS VCO. All inputs and
outputs face inward to minimize the effects of interconnect parasitics.
Symmetry was the most important design requirement.
In addition to CS VCOs in the transmitter and receiver a separate test chip containing
CS VCOs was also made. This allowed a more straight forward measurement of the VCO’s
frequency and gain characteristics. This test chip also included an XOR phase multiplier
[3],[4],[20] tree in order to achieve frequencies double and quadruple the nominal 5 GHz.
The goal of the multipliers was only to see how high the technology could be pushed.
3.4.3. Testing Results
The plot in Fig. 3-5 shows the results from an ideal interconnect simulation, a
simulation with capacitive1 interconnect, and measured results from the fabricated circuits.
The 20% decrease in speed between the ideal simulation and the measured results is
1. The IBM 1999B SiGe design kit does not include interconnect resistances correctly and typically simulates with a faster response than with capacitance only. Resistance values are also very small and can be
ignored for these localized wires. For these reasons, only capacitance was included.
26
immediately obvious. Unfortunately this was larger then the 15% safety margin and
resulted in a frequency range that did not meet the 5 GHz center frequency specification.
Between a control voltage of -1.6 and -1.4 the measured VCO tracked very closely
to expectations, but above -1.4 the VCO response becomes lethargic. This is likely due to
too much current in the tree which is causing a reduction in fT faster then the model
predicts.
6.5
Frequency (GHz)
6.0
5.5
5.0
4.5
Simulated
4.0
Parasitics
Measured
3.5
-1.8
-1.6
-1.4
-1.2
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
Control Voltage (V)
Figure 3-5 Test data from Simple CS VCO
Simulation with and without interconnect parasitics, and measured
results are shown in this plot. Measured results track closely with the
parasitic simulation with low control voltages.
3.4.4. Optimization of Simple CS VCO (post-fabrication)
From Fig. 3-5 it is clear that the oscillator under performed and missed the 5 GHz
target. This can be directly attributed to initial simulations that did not include resistive and
capacitive interconnect parasitics. Although the layout footprint of the VCO is very small
and designed to minimize wire lengths, parasitics still presented a significant influence on
speed.
The receiver VCO has a frequency range of 4.25 GHz to almost 4.9 GHz. Because
20 Gb/s is the target data rate, we would like 5 GHz to fall in the middle of the operating
27
range of both transmit and receive VCOs. Given that the initial design was slow how can it
be ensured that the next version will meet specifications? Can the measured and simulated
results be used to maximize the likelihood of a successful design?
Each of the four VCO stages must be loaded by an identical buffer which then drives
subsequent circuitry. By using the smallest transistors, 1 µm, in the buffers, the loading on
the VCO will be minimized and its operation will be maximized. Under such conditions the
easiest method for increasing frequency response is to increase the power of the delay
elements by using larger transistors. This has the immediate effect of reducing the effective
loading on each gate and increasing the frequency at a given control voltage. The devices
in the first design iteration had 2 µm emitter lengths and were slightly slow, so an increase
in emitter length should bring the VCO to within specifications. Fig. 3-6 shows the
relationship between frequency response and transistor size used in the delay stages of the
VCO. Because interconnect parasitic simulations require a complete layout this simulation
uses ideal interconnects. As suspected there is an increase in performance when larger
devices are used.
28
10
Frequency (GHz)
9
8
7
10u
6
6u
4u
3u
5
2.5u
2u
4
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
Control Signal (V)
Figure 3-6 Frequency Response versus emitter length in delay elements
By increasing the emitter lengths and keeping the loading the same, the
effective loading is decreased and the performance improves. This
simulation does not include interconnect parasitics.
It can be seen that a relatively small increase in transistor size from 2 µm to 2.5 µm
achieves a 12% increase in speed at a control voltage of -1.5 V. The 2 µm and 2.5 µm delay
elements have an effective loading of 0.5 µm/µm and 0.4 µm/µm respectively, representing
a 20% decrease. Assuming that the interconnect parasitic effects stays the same or
decreases, the 2.5 µm delay elements should bring about a 12% increase in the VCO
response. From a range of 4.25 GHz to 4.9 GHz a 12% improvement yields a range of 4.76
GHz to 5.48 GHz, which is well within the specifications.
3.5. Current Starving with Feed Forwarding
Some advantages of the four phase simple VCO circuit include: symmetric phases
minimizing phase differences, generation of rising edges every 25 ps at 5 GHz, and a large
frequency range. The motivation for a new VCO design is to enhance the frequency beyond
the limits of this simple design.
29
One method to do this is to use a delay cell that averages the signals from the last two
stages as shown in Fig. 3-7 [1],[13],[23], [24]. Stage C accepts inputs from stage B and
stage A, stage D accepts from C and B, and so on. The idea is that the average of the
previous two signals occurs earlier than just the previous signal.
ΦA
ΦA
ΦB
ΦC
A
ΦD
B
D
ΦB
ΦA+Φ
ΦB
2
C
stage delay
delay savings
ΦC
Figure 3-7 Feed-forward CS VCO block diagram
Each stage in the VCO receives signals from the previous stage and the
stage preceding that one. Stage A can realize an effective decrease in
delay by utilizing the signal from stage C. The inversions to induce
oscillations are left out for clarity.
Mathematically, the nth element presents its output after the average of the n-1 st and
n-2nd element outputs plus the delay of the nth element. Solving for difference between two
consecutive stages yields
tn – 1 + tn – 2
- + Ti
tn = --------------------------2
t n – t n – 1 = 2--- Ti
3
(3-4)
(3-5)
which shows that the effective gate delay is reduced to two thirds from the intrinsic stage
delay, Ti. The intrinsic delay is defined as the delay of the stage if its inputs were tied
together and treated as a normal buffer.
30
6
12.0
5
11.5
4
11.0
3
Frequency
10.5
2
Gain
10.0
1
9.5
0
9.0
-1.8
Gain (GHz/V)
Frequency (GHz)
12.5
-1
-1.6
-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
Control Voltage (V)
Figure 3-8 Feed forward CS VCO frequency response and gain
The Feed Forward CS VCO was designed to achieve the highest
frequency possible. After optimization is operates at twice the speed of
the Simple CS VCO.
3.5.1. Final Implementation
An important consideration in the design of the feed-forward delay element is its
higher complexity, having two inputs instead of one, which increases the delay. Also,
because there are twice as many wires between stages in the feed-forward design the layout
will be larger and more limited by interconnect parasitics. With this in mind, the most
simple averaging circuit was created that utilized a minimum number of additional
transistors and resistors. The final schematic is shown in Fig. 3-9.
A description of its operation is as follows: If Q2 and Q4 are on, Q1 and Q3 are off, and
signal b arrives first, then signal b will begin to turn Q3 on and Q4 off. This will start to draw
current through Rc1. If b were to completely switch then both Rc1 and Rc2 would carry the
same current: an undesirable condition in which the output is the average of a one and a
zero, which is undefined. The normal operating condition involves b partially switched
followed by the beginning of a switch in the a signal. When this occurs more current flows
31
through Rc1 and less current through Rc2. The effective switching input can be said to occur
between the two signals, a and j.
R c1
z21
a10
Rc2
a11 b10
Q1
Q2
b11
Q3
z20
Q4
aVref
Vref
Figure 3-9 Feed-forward CS Delay Element
This circuit operates by averaging the a and b inputs through common
pull-up resistors. The aVref node is varied in order to control the total
current through the tree. Lower current corresponds to longer delay.
One important characteristic in the two current starving VCO circuits is the choice of
collector resistors which affects the output amplitude and the gate delay. An increase in
resistance causes an increase in amplitude and an increase in delay because the same
amount of current produces a larger voltage swing and a larger RC time delay. The simple
CS VCO was designed around an operating frequency of 5 GHz, so a resistance was chosen
so that there was a 200 mV - 400 mV swing around 5 GHz. The feed-forward CS VCO, on
the other hand, was designed to achieve the highest possible frequency response, so a
resistor small enough to maximize the frequency while leaving a 150 mV - 200 mV swing
was used. Fig. 3-8 shows the frequency response of the feed-forward CS VCO.
3.5.2. Testing results
The feed-forward CS VCO was not used in the first transmitter and receiver design
but was implemented in a test chip. It was configured with one load to achieve the smallest
loading effect and thus the highest frequency. The simulation and measured results are
plotted in Fig. 3-10.
32
12.5
12.0
Frequency (GHz)
11.5
11.0
10.5
10.0
9.5
1 Load Simulated
9.0
4 Loads Simulated
1 Load w/ Parasitics
8.5
1 Load Measured
8.0
-1.8
-1.6
-1.4
-1.2
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
Control Voltage (V)
Figure 3-10 Testing Data from feed-forward CS VCO
The implementation of the Feed Forward Current Starving VCO only
had a single load in order to achieve the highest frequency possible. The
measured results are only about 4% lower than simulations with
interconnect included.
Simulations with one load and no parasitics shows a peak frequency of 12 GHz. With
parasitics the frequency drops by 6% to 11 GHz which tracks very closely with the
measured results. The steep drop off of the measured results at the high end is likely due to
a high collector current causing a drop off in the transistor fT that is not accurately
accounted for in the models1.
3.6. Conclusions and Future Work
The Current Starving VCOs presented in this section are compact and easy to
implement but they have some crucial deficiencies. Their performance was about 5% worse
than expected from simulations with interconnect parasitics. Feed forwarding allowed a
1. This is supported by information gathered at a meeting at IBM in 1999 concerning measured results from
the DARPA 2 run. An IBM device modeler was quoted as saying that the f T curves drop off faster then the
models predict.
33
near doubling of speed at the expense of a slightly more complicated circuit. If
implemented correctly this additional speed could be traded off for a reduction of noise.
With an increase in power supplied to the VCO that was implemented, the desired
specifications should be achieved. However, future research into this VCO topology should
be limited because its response is difficult to model and it utilizes a delay strategy which is
poorly understood.
34
4
Feed
Forward
Interpolated
VCO
Transm itter
Receiver
4.1. Project History
The Feed Forward Interpolated VCO evolved from the Current Starving Feed
Forward VCO and replaced all instances of that VCO in the second serdes chip in submitted
in March 2000. Additional test structures were added to further exercise this VCO, and an
invention disclosure record was submitted to RPI in May, 2000. An RPI provisional patent
was awarded in September 2000.
4.2. The Evolution
The evolution of the Feed Forward Interpolated, VCO (FFI VCO) began with the
Feed Forward Current Starving VCO (FFCS VCO) discussed in Chapter 3. Each stage of
the FFCS VCO averaged the output from the previous stage and the stage before that to
generate a signal with a smaller effective delay. The averaging was fixed and reduced the
delay by 66%.
A common approach in the design of a standard ring oscillator stage without feed
forwarding is to use delay interpolation as shown in Fig. 4-1. The idea is to split the input
signal into a slow and fast path and create a weighted sum of the two to form the output.
Common pull-up resistors, level 3 control inputs, and emitter resistors for linearity make
this possible. The slow path need only delay the signal longer than the fast path and a simple
capacitor can do the trick. The benefits of this VCO stage include a uniform output voltage
swing, a fairly linear response, no limits of operation, and easy minimum frequency control
through the capacitor.
35
z21
z20
Cs
i20
i21
c30
c31
Re
Re
Figure 4-1 Schematic for Delay Interpolated VCO element
This VCO element linearly interpolates, the input signal after traveling
through a fast and slow path. The slow path is created with the addition
of a capacitor.
The vision of the FFI VCO occurred when looking at the Delay Interpolated VCO
and realizing that the fast path could be the implemented as the signal from the stage before
the previous stage and the slow path could be from the previous stage. This insight
immediately eliminated the need for the slow path capacitor, and nearly doubled the speed
of the VCO.
The FFI VCO is a delay interpolated VCO with the normal and delayed signals
created from different stages rather than from within each stage. This forces each stage to
have two inputs rather than one and eliminates the need for the slow path capacitor. The
schematic for the FFI stage can be found in Fig.4-7 on pa ge44.
4.3. Basic Operation
On a block diagram level, the FFI VCO looks identical to the Feed Forward Current
Starving VCO shown in Fig. 4-2. The difference is in the method used to control the delay
though each stage. The FFCS VCO controls delay by varying the current through its buffer
which is directly related to the delay through its gate. The feed forward technique simply
36
reduces the effective gate delay by about 33%. The FFI VCO, on the other hand, linearly
interpolates the signals received from the previous two stages. The current, which remains
the same through the tree, is gradually shifted between the two inputs, p and l, as shown in
Fig. 4-7. The p (previous) input arrives from one stage back, and the l (leap) input arrives
from the stage prior to that. The two signals are weighted by the control signal and summed
by the common pull-up resistors. The final result is the frequency response shown in Fig.
4-4.
A
n
n-1
B
D
n-2
C
Figure 4-2 Feed Forward VCO block diagram
Each stage in the VCO receives signals from the previous stage and the
stage preceding that one. Stage A can realize an effective decrease in
delay by utilizing the signal from stage C. (The inversions, to induce
oscillations, are left out for clarity)
A
A
B
B
D
D
C
C
(a)
(b)
Figure 4-3 FFI VCO under boundary conditions
Diagram (a) shows the VCO running in the four stage configuration
with the control voltage set to a minimum value. Diagram (b) shows the
VCO in the two stage configuration, at the maximum control voltage.
The minimum operating frequency is defined by the oscillation of the system when
the leap signal is ignored, and only the previous signal is used. In this case, the system is
37
running as a four stage oscillator and has a frequency of about 3.9 GHz. When the control
voltage is switched in the other direction, the leap signal is used, and the previous stage’s
output is ignored. In this configuration the system is running as two separate two stage ring
oscillators with a frequency of approximately 7.9 GHz. These two cases are depicted in Fig.
4-3. It is useful to look at the system in terms of an effective delay for all control voltages
between the minimum and maximum values.
8.0
Frequency (GHz)
7.5
7.0
6.5
6.0
5.5
5.0
4.5
4.0
3.5
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Control Voltage (V)
Figure 4-4 Feed-forward interpolated simulated response
The frequency response of the FFI VCO is linear across a large range
from 4.75 GHz to 7.00 GHz. System gain is flat across the operating
range.
The effective delay of a stage is defined to be the delay of a stage in a four stage
oscillator that has the same frequency as the feed forward oscillator. This parameter can be
found by setting the intrinsic delay of a stage to T, setting s equal to the weighting factor
between 0 and 1, and looking at the output transition times of stages n, n-1, and n-2. The
weighting factor is a constant that indicates how much of the leap signal is being used. Set
to 0 the ring acts as a normal 4 stage oscillator, and set to 1 the ring acts as a 2 stage
oscillator.
The edge time of stage n is given by
t n = T + st n – 2 + ( 1 – s )t n – 1
(4-1)
which is the intrinsic delay through the stage, plus the weighted sum of the previous two
38
stages. Solving for the time difference between two stages yields
T
T eff = t n – tn – 1 = ---------------(1 + s)
(4-2)
(1 + s)
1
f vco = ----------- = ---------------8T
8T eff
(4-3)
which is the effective delay and the frequency of the VCO in terms of the effective and
intrinsic delay of each stage. The factor of eight is needed because it takes two complete
cycles through four stages to equal one period of the VCO.
For s equal to 0, the effective delay is equal to the intrinsic delay of the stage. At the
other extreme, when s equals 1, the effective delay is one half of the intrinsic delay. This
makes sense because the system in this configuration has two stages rather then four. (4-3)
also shows that in the Feed Forward CS VCO, where s is fixed at 0.5 has an effective delay
equal to (as
2 ⁄was
3 )Tshown previously.
The benefits of the FFI VCO are numerous and represents many improvements over
the previously discussed designs. The use of feed forward techniques allows the VCO to
exceed the maximum frequency achievable by a simple four stage ring oscillator. This is
extremely important if a solid high speed eight phase VCO is required.
Fig. 4-4 shows a linear frequency range from -0.2 V to 0.2 V. This linear range is very
important when designing phase locked loops, because linearity results in simple closed
form solutions. In addition, this VCO has a response with an obvious center and with limits
approaching a asymptotic minimum or maximum. In contrast, the CS VCO will stop
operating below a certain frequency. Although a control voltage would never be driven to
such extreme values as to cause malfunction, this can happen in PLLs during power up.
Often an integrator, or capacitor that is never guaranteed to have a specific voltage, will be
attached to the VCO control inputs. If it has a poor initial condition, which is maintained
by a non-oscillating VCO, then the system will become unstable. It is therefore important
to provide the largest control voltage range possible that will still allow the VCO to
oscillate.
Current through the FFI stage is linearly switched between the previous and feed
forward stages. This forces the total current running through the stage to remain constant.
39
This is important for keeping a constant voltage swing, which ensures consistent operation
in a system where a variation in voltage swing would cause a change in frequency. The
SNR is also dependent on the output voltage swing, which if varying, can complicate the
analysis. This is the problem encountered with the CS VCO described in Chapter 3.
Differential signaling is used for the control input and throughout the rest of this
design. This is crucial when designing for low noise operations since differential wires have
strong common-mode rejection.
One exciting feature of the FFI VCO, that will be examined in detail in the next
section is the extraordinary capacity for customization of this circuit. First, by controlling
the linearity through emitter resistors, different frequency gains can be used. (Fig. 4-8)
Second, a capacitor at the top of the tree controls the center frequency point. (Fig. 4-9)
Third, resistors exist to limit the frequency range and prevent stage decoupling. (Fig. 4-10).
One minor drawback to this design is the slightly larger layout footprint. The cascode
amplifiers introduce four addition transistors and if a large capacitor is necessary then a
large amount of space may be required.
4.4. Stage Decoupling
A serious problem exists in the FFI VCO if the weighting factor is pushed to the
maximum value of 1. In this case, each stage, n, is only using the signal from the n-2nd stage
as depicted in Fig. 4-3(b). The VCO now appears and operates as two completely
independent oscillators. The phase difference between each consecutive stage is no longer
constant and may fluctuate wildly. This undesirable effect is called stage decoupling and
must be addressed in VCO design.
The model used to analyze this situation uses an ideal FFI VCO in which one stage
has a different delay. This modified delay represents the sum of maximum individual delay
excursions that may exist in the real VCO due to unbalanced loading effects, process
40
variations, and signal noise. The stage transfer functions are shown as
a n = T + sc n – 1 + ( 1 – s )d n – 1 + N
(4-4)
b n = T + sd n – 1 + ( 1 – s )a n
(4-5)
c n = T + sa n + ( 1 – s )b n
(4-6)
d n = T + sb n + ( 1 – s )c n
(4-7)
with stage a receiving the additional delay of N. The time at an output change for each stage
is represented by a letter and a subscript where the letter is the stage and the subscript is the
nth output change from that stage. The output edges appear in time order described by
{ a0, b 0, c 0, d0, a 1, …d 1, …, d n, … }.
(4-8)
The next step is to look at the time between successive outputs from any one stage,
( 4T + N )
a n + 1 – a n = --------------------s+1
(4-9)
which is simply the sum of the effective delays of the four stages. (4-9) is the same for all
stages, even though N only occurs in stage a, under the condition that stage decoupling has
not occurred. Solving for the time difference between the output of stage a and the output
of stage b using (4-4) through (4-9), yields
T
N
a n – d n – 1 = ----------- + -------------4
s+1 1–s
(4-10)
T
sN
b n – an = ----------- – -------------4
s+1 1–s
(4-11)
2
T
s N
c n – bn = ----------- + -------------4
s+1 1–s
3
T
s N
d n – c n = ----------- – -------------4
s+1 1–s
which are the desired solutions.
41
(4-12)
(4-13)
Normalized Effective Delay
1.5
ad
1
cb
N=0
0.5
ba
dc
0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Weighting Factor
Figure 4-5 Delay versus weighting factor with single stage imbalance
With non-ideal delay stages used in the FFI VCO, stage decoupling
(effective delay goes to zero) can occur when the weighting factor is too
high. This is because the VCO acts as two independent 2 stage
oscillators instead of one 4 stage oscillator.
These equations are in the form of the effective delay plus a factor for the unbalanced
delay N. The delay between stages c and b; and between a and d increases rapidly as s
approaches 1, and the delay between stages d and c; and between b and a decreases rapidly
under the same condition. This divergence is expected because the sum of the four delays
follows very closely with the effective delay curve when there is no unbalanced delay. This
effect is plotted in Fig. 4-5. Also shown is the curve for all inter-stage delays when no extra
delay is introduced. The divergence between the nominal curve and each of the unbalanced
curves can be clearly seen.
Each stage is affected by the additional delay, but when analyzing stage decoupling
it is only necessary to look at bn - an. The delay ba is the most seriously affected of all the
delays because it is relative to the output of the stage with the additional delay included.
The condition when stage decoupling occurs is when ba goes to 0 and the output of stage
b coincides with the output of stage a. Although the equations are continuous at this point,
reasonable operation dictates that stage output times should be sequential.
42
Stage Decoupling (s)
1.2
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Normalized Additional Delay (N/T)
Figure 4-6 Decoupling versus delay injection
When an unbalanced delay is injected into a single stage, decoupling
between stages occurs when the weighting factor reaches a specific
value.
In (4-9) with ba set equal to 0 and solving for s yields the weighting factor for stage
decoupling for a specific value of N. This solution is shown in Fig. 4-6. As the injected
delay increases the point at which stage decoupling occurs departs from the maximum
value of 1.
The effect of stage decoupling is clearly a problem and results in a VCO that operates
improperly. To avoid this problem, the weighting factor must be limited to a value less than
that given in Fig. 4-6, based upon a maximum expected delay injection from noise sources
and parameter variations. For example, if a maximum 10% deviation is expected
(extremely large value), then s must be kept below approximately 0.95. In practice this
VCO has a very large operating range which can be sacrificed to prevent stage decoupling.1
1. For the final implementation of this system s was kept below 0.8 to introduce a huge safety margin in
which no decoupling will occur.
43
4.5. Circuit Implementation and Analysis
Rc
Rc
Cc
z11
z10
p21
l20
l21
z21
z20
p20
Rb
Rb
c30
c31
Re
Re
Is
Figure 4-7 Schematic for FFI VCO element
This VCO element linearly interpolates, through the control voltage (c),
the signals from the previous buffer (p) and the buffer previous to that
(l). Rb limits the operating range of the VCO, Re adjusts the control
voltage range, and Cc defines the center of the operating range.
The circuit shown in Fig. 4-7 represents one element of the FFI VCO. It is a three
input pseudo-buffer, with emitter follower outputs. The control signal, c, is common
between all stages and must be on level 3. The input l (leap) and p (previous) signals are on
level 2 which is matched to the output level. Collector resistors, Rc, are set to generate a
250 mV voltage swing. The current sources were chosen to maximize the fT of all
transistors.
Transistor sizing is a very important parameter when designing such circuits and
further details are shown in Appendix Appendix D. on page 173. Each stage in this VCO
drives two identical stages and the external circuitry, which typically consists of four
minimally sized buffers. For a VCO stage with x µm sized transistors, the external buffer
appears as a 1/x effective load, and is 1/(2x+1) the total load driven per stage. If 1 µm
transistors are used, the buffer becomes 33% of the load. If, however, 10 µm transistors are
used then the buffer becomes a nominal 4.6% of the load. So for larger VCO stages, the
external buffer becomes more invisible, but uses more power and physical space. A
44
compromise using 4 µm transistors per gate was chosen which has external loads of 11%
of the total.
Another design challenge, for maximizing frequency response, is to size the
differential amplifier transistors independently of the emitter follower transistors. Please
see Appendix Appendix D. on page 173 for a detailed analysis. This approach was not
deemed necessary because design specifications of 5 and 10 GHz were easily met without
optimization.
4.5.1. Cascode amplifiers
Above the level 2 differential amplifiers are cascode, or common base amplifiers.
They provide a low input load resistance to the common emitter differential amplifier and
act as a impedance transformer. Some delay is introduced by their presence but this is offset
by an increase in driving ability and an isolation from the capacitor, Cc. This isolation helps
to ensure a linear relationship between the increase in Cc and the increase in delay. The
cascode amplifiers also help to reduce phase noise by providing a low impedance output
which limits the effect noise has on the phase.
4.5.2. Emitter Resistor for linearity and gain adjustment
An ideal differential amplifier has infinite gain, is digital in nature, and requires only
that one input is greater then the other for switching. Real bipolar amplifiers are not ideal
and possess a high gain approaching 6 (See Appendix C.1. on page 164). High gain is
undesirable when designing PLLs because the VCO will generate more noise and loop
filters will require smaller bandwidths. Without modification, a small change in the control
voltage would cause a large change in current. The solution is to include emitter degeneracy
resistors, Re, which reduce the gain and produce a more linear transfer function. A complete
analysis of a differential amplifier with emitter resistors is presented in Appendix C.1. on
page 164.
The value of Re was chosen based upon the desired control voltage range of ±0.2 V,
the linearity across that range, and the frequency range. Fig. 4-8 shows the frequency
response of the VCO as a function of the emitter resistors. Values of Re below
approximately 300 Ω−µm are non-linear at the extremes and produce a gain which is
45
relatively large. Re values above 500 Ω−µm are quite linear but have a limited frequency
range, and produce a small gain. As opposed to high gain, small gain and therefore limited
frequency range, limits the PLLs in their ability to reach target frequency specifications
under all environmental and processing conditions. A trade-off exists between a high and
low resistor value and depends on the needs of the circuit.
7.5
7.0
Frequency (GHz)
6.5
Note: resistor values
are normalized to the
size of transistors in µ m.
6.0
0 Ω -µ m
200 Ω -µ m
400 Ω -µ m
600 Ω -µ m
800 Ω -µ m
5.5
5.0
4.5
4.0
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
Control Voltage (V)
Figure 4-8 FFI VCO frequency versus emitter resistance
By adjusting the emitter resistor, Re, the gain of the VCO can be
controlled. A higher resistance decreases the gain.
4.5.3. Center capacitor to control frequency range center
The capacitor, Cc, between the level 1 outputs is parasitic in nature and used only to
degrade the performance of the circuit. Increasing its size causes an increase in the delay
through the gate, which corresponds to a decrease in frequency. This component is very
useful in centering the frequency range to a given specification; simulation results are
shown in Fig. 4-9. The disadvantage of using this component arises when very low
frequencies are needed, because this requires a large capacitor. Large capacitive elements
require significant amount of space, and because each of four stages needs one, their size
can become prohibitive. Fortunately for frequency centers from 2 GHz through 8 GHz the
component size is quite reasonable.
46
Note: capacitor valuesare normalized to
the size of transistors in µ m.
16
1.0
10
0.8
6.3
0.6
4.0
0.4
2.5
0 fF / µ m
25 fF / µ m
50 fF / µ m
100 fF / µ m
150 fF / µ m
250 fF / µ m
0.2
0.0
-0.2
-0.4
Frequency (GHz)
Frequency log( GHz )
1.2
1.6
1.0
0.6
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Control Voltage (V)
Figure 4-9 FFI VCO frequency versus centering capacitor
A frequency centering capacitor, Cc, is added to increase the delay of
the stage in order to move the frequency range to within specifications.
4.5.4. Bypass resistor to prevent stage decoupling
The last and perhaps most important element to be discussed are the bypass resistors,
Rb.
Their necessity, discussed in Sec. 4.4. on page40, is to prevent stage decoupling from
occurring by limiting a full switching of current in the tree. In addition to adding decoupling
stability to the VCO, these elements can also be used to limit the frequency range while
keeping the gain nearly constant. See Fig. 4-10 for the frequency response of the VCO
given different values of Rb.
The bypass resistor is tied to the collector of the control input transistors and the top
of the current source. Each node is kept at a nearly constant voltage because the bases from
the level above fix their emitter voltages. Since the voltage across the resistor is constant
the current through it will also be constant. This ensures that some current from the active
current source will always flow through both branches of the tree and thus prevent a
complete depletion of current through the branch. A smaller resistor will allow more
47
current to flow and, in the limit, the control transistors will be completely bypassed and
both branches will receive exactly equal current. A complete analysis of this effect is
detailed in Section C.2. on page166.
8.0
7.5
Note: resistor values
are normalized to the
size of transistors in µ m.
Frequency (GHz)
7.0
6.5
6.0
5.5
1.6 kΩ -µ m
2.4 kΩ -µ m
4.0 kΩ -µ m
8.0 kΩ -µ m
5.0
4.5
4.0
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
Control Voltage (V)
Figure 4-10 FFI VCO frequency versus bypass resistance
By adjusting the bypass resistor, Rb, the maximum current through each
branch can be limited. This resistor prevents stage decoupling and
allows frequency range control.
4.6. System Analysis
The frequency profile of the FFI VCO is a function of the various circuit parameters
including nominal stage delay, To, Rb, Re, and Cc. If Rb is removed, Re is set to 0 and Cc is
set to center at 6.0 GHz then Fig. 4-11 shows the frequency response. The range is from 3.9
GHz to 7.9 GHz, which is a one octave range. The period of the VCO is governed by (4-3)
which yields 4T when s = 0 and 8T when s=1, thus the octave range. The addition of the
other circuit components only decreases this range.
A more comprehensive look at the total system response requires an analysis of the
modified differential amplifier and the relationship between the weighting factor s, and the
current switching between branches. Fig. 4-12 shows a diagram of the VCO frequency
profile as a function of control voltage. The three primary curve parameters are: the
48
frequency range, the center frequency, and the gain at the center frequency. Mathematical
models describing each of these parameters can be found in the following sections.
8.0
Frequency (GHz)
7.5
7.0
6.5
6.0
5.5
5.0
4.5
4.0
3.5
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Control Voltage (V)
frequency
range
Figure 4-11 FFI VCO Frequency Range
This is the response when Rb is removed, Re set to 0, and Cc is set to
give a 6 GHz center frequency.
gain
control voltage
center frequency
Figure 4-12 FFI VCO System from control voltage to frequency
An analysis of the FFI system should incorporate a study of the circuit
response and the dynamics of the top-level architecture.
4.6.1. Branch current to frequency
Relating circuit parameters such as Rb and Re to the frequency profile involves a
circuit level description of the differential amplifier. Circuit level analysis are often
expressed as differential branch current output and as such do not relate to frequency.
Relating branch current to frequency is necessary to achieve the final transfer function.
49
From (4-3) we can find the frequency relative to the weighting factor s, which is
directly related to the current by
iL – iP
1
s = ---  1 + ---------------
Io
2
(4-14)
id
1
1
f = ---------  3 + ---- = ------ ( s + 1 )
Io
8T
16T
(4-15)
where T is the intrinsic stage delay, iL is the current through the branch that accepts input
from the “leaped” branch, iP is the current from the “previous” branch, and id is the
differential current. This relationship is confirmed in Fig. 4-13 where the simulated
frequency versus current are shown along with the results from (4-3) and (4-14).
Results at a weighting value of 0.5 show the largest slope difference between the
analytical model and simulation. This slope difference is important when analyzing the
frequency gain and a factor, α, is introduced to compensate. Taken directly from Fig. 4-13,
α has a value of 1.3.
8.0
35
7.0
30
Analytical
6.0
25
5.0
20
4.0
15
3.0
0.00
0.25
0.50
0.75
Weighting Factor (s)
Figure 4-13 Simulated versus analytical response of the FFI Architecture
The gray, dashed lines represent simulated frequency response for
varying branch currents, and the black continuous lines represent the
analytical expectation.
50
10
1.00
Effective Delay (ps)
Frequency (GHz)
Simulated
4.6.2. Center frequency and intrinsic stage delay
The center frequency is directly related to the intrinsic stage delay by (4-3) when s is
set to 0.5. The intrinsic delay can be accurately modeled by the results presented in
Appendix C.3. on page171. The center frequency is modeled as
3
f c = ---------------------------------------------------------16 ( T o + ln ( 2 ) ( 2 Rc C c ) )
(4-16)
and is validated in Fig. 4-14. Intrinsic stage delay is also plotted in Fig. 4-14 because these
values are needed for the frequency gain and frequency range models. The nominal delay,
To, found through simulation, is 21 ps.
4.6.3. Frequency gain at the center frequency
180
12
Frequency (GHz)
10
Intrinsic Stage Delay (ps)
Simulated
Modeled
8
6
4
2
160
140
120
100
80
60
40
Simulated
Modeled
20
0
0
0
50
100
150
200
250
0
50
100
150
200
250
Normalized Capacitance (fF/um)
Normalized Capacitance (fF/um)
Figure 4-14 Center frequency simulation and model
The modeled and simulated intrinsic stage delay and VCO center
frequency are shown here. The modeled results follow the simulated
results closely.
The analytical model for current gain as a function of Rb and Re is solved in Appendix
C.1. on page 164, and Appendix C.2. on page 166. To find input voltage to output
frequency gain, two elements are needed: the voltage to current gain and the current to
frequency gain. The former was solved in (C-12) on page 169, and the latter determined by
51
differentiating (4-15) and substituting the intrinsic delay equation (C-14) on page171. The




di d df
df
α
1


-------- = -------- ⋅ ------- = -----------------------------------------------------  -------------------------------------------------------- 2γv R
  16 ( T + 0.7 ( 2 R C ) )I 
dv d
dv d di d
o
o
T b
c c
 --------------------------- + R e || R b
 R I o – 2v be

(4-17)
b
result is
which includes all circuit parameters: Rb, Re, Rc, Cc, Io, and the nominal stage delay To. α
is also included to compensate for the weighting factor and frequency gain difference
between the simulated and analytical results.
4.6.4. Frequency Range
The frequency range of the FFI VCO is mainly governed by the bypass resistor and
partially governed by the emitter resistor. Appendix C.2. on page166 describes how these
parameters limit the differential current through each branch in the VCO stage. This current
is related to the maximum frequency, fmax, through (4-15), where id is replaced with id,max,
which is found in (C-5) on page 167. Taking this value, subtracting the center frequency fc,
and multiplying by two yields the frequency range, frange. Using the intrinsic delay
relationship from (C-14) on page171 and (4-15), yields
f range
v d
 v – ---R
R
–
I
(
+
)
o e
b
i d, max
 be 2 
= 2 ( f max – fc ) = -------------- = ----------------------------------------------------------------------------------- .
8I o T
8Io ( R b + 2 R e ) ( T o + 0.7 ( 2 R c C c ) )
(4-18)
vd should be set to the maximum differential voltage that is allowed during normal
operation of the VCO.
4.7. Phase Noise
The phase noise of an oscillator is an extremely important consideration during the
design phase. VCO phase noise and phase jitter directly affect system performance. In
serial communication circuits, a bit stream is generated with the time between transitions
defined by the jitter in the VCO and the PLL. The transport mechanism, which includes the
wire and buffering circuits, also introduce noise, which appears as phase jitter. The larger
52
the jitter at the receiver, the more difficulty the PLL will have tracking the data and
consequently, data corruption will increase. It is therefore imperative to minimize jitter at
the source to ensure maximum data throughput [15].
4.7.1. The Impulse Sensitivity Function
Noise in circuits is typically related to thermal, device: (shot and flicker), or external
effects. The relationship of the effects to phase noise can be quite complicated and difficult
to solve analytically. A straightforward method that involves an analytical foundation and
some simulation utilizes the impulse sensitivity function (ISF) [18]. It yields a closed form
solution relating circuit noise to phase noise.
Circuit noise appears as either amplitude or phase variations in the output of
oscillators. When dealing with “digital” ring oscillators, the amplitude variations are small
because of the limiting nature of the circuits. Phase variations, on the other hand, are
governed by
∆q
∆φ = Γ ( ω o, t ) -------------q swing
(4-19)
where ∆q is a charge step applied to a specific node,qswing is the nominal charge swing on
that node (qswing = Cnode Vswing), and Γ(ωo,t) is the ISF.
Γ(ωo,t) can be considered as the normalized phase response of the VCO given a
current pulse at a specific point in the output. The ISF is large when a current pulse causes
a large change in phase and small when the ISF causes a small phase change. Fig. 4-15
shows an example of the effect on phase for two current pulses of the same size but in
different positions. The case on the left applies the pulse during the rising edge, and
effectively increases the rise time and decreases the phase. The pulse applied to the flat
portion of the curve shows little or no phase change, because the circuit restores the initial
value before the edge arrives.
53
current pulse
has small
phase effect
current pulse
has large
phase effect
Figure 4-15 Current pulse effect on phase
A current pulse, or charge step applied to a node in the VCO will have a
phase effect depending on the temporal location of the pulse.
Fig. 4-16 shows the simulated ISF for the FFI VCO and the values of the output at
the time that the current pulse is applied. The response appears as it should, with an increase
during the rising edge, a decrease during the falling edge and a zero when the output is
constant. This form is very similar to the derivative of the waveform function. The
important values garnered from these results are the dc and rms values of the ISF. The rms
value of 0.077 is used to determine the phase noise and the non-zero dc value of 0.001
shows the upconversion of low frequency noise to base band noise.
The rms value of the ISF is only meaningful when compared against other similar
ring oscillators. Fig. 4-17 shows various oscillators and their associated rms values. The
single ended and differential points are CMOS rings tuned to maintain a constant frequency
that is independent of the number of stages. Their values drop with increasing N because
each stage’s transitions represent a smaller fraction of the total period and thus have smaller
effects on the ISF. The CS (Current Starving) oscillator shows a reasonable match with the
other differential oscillators. The FFI oscillator, on the other hand, shows a much lower ISF
when compared to systems with the same number of stages. This has important
ramifications in the total phase noise and is discussed further in Section4.7.3.
54
0.40
-0.85
ISF
0.30
-0.9
Waveform
Waveform
-0.95
0.10
-1
0.00
-1.05
ISF
0.20
0
1
2
3
4
5
6
-0.10
-1.1
-0.20
-1.15
-0.30
-1.2
-0.40
-1.25
-0.50
-1.3
Normalized Time (rad/T)
Figure 4-16 Simulated ISF for FFI VCO and output waveform
The FFI VCO ISF is shown here along with the waveform at the point
that the pulse is applied.
1.0
rms value of ISF
SE
DE
CS
0.2
0.1
FFI
3
4
Number of Stages (N)
Figure 4-17 ISF rms values for various ring oscillators
Shown in this plot are the rms values for the FFI, CS (Current Starving),
CMOS differential (DE), and CMOS single ended (SE) ring oscillators.
55
10
Waveform Voltage (V)
ISF
4.7.2. Solving for phase noise
Using the superposition integral, the phase response for any injected noise current i(t)
is equal to
t
φ(t) =
∫
–∞
Γ ( ω o, τ )
--------------------- i ( τ ) dτ
q swing
.
(4-20)
The single-sideband phase-noise spectrum due to a white-noise current source is given by
[18]
2
i n2 ⁄ ∆f
Γ rms
- ⋅ -------------L { ω off } = -----------2
2
4ω off
q swing
(4-21)
where Γrms is the rms value of the ISF, i n2 ⁄ ∆f is the single-sideband power spectral density
of the noise current source, and ωoff is the offset from the carrier.
Noise in the FFI circuit element shown in Fig. 4-7 is generated primarily by HBT shot
noise and resistor thermal noise. The nodes of interest, those generating the most noise and
the most sensitive to current fluctuations, are the level one outputs, z10, and z11. The level
2 outputs do introduce twice the shot noise but are less susceptible to current induced phase
variations because of their low output resistance and strong restoring force.
The single-sideband power spectral density (PSD) for the resistor noise and the
collector shot noise is
i2
----n- = 4kTG + 2q e Ic
∆f
(4-22)
where G is the conductance of the pull-up resistors, and Ic is the current though the collector
which is half the tail current. Further refinement of (4-21) and (4-22), and substitution of
values for temperature, resistance, and current for optimal operation, yields
2
A 2( N )l- ⋅ ∆φ
rms
---------------- ⋅ 161 × 10 –24 -----L { ω off } = -----------2
Hz
∆q 2
2ω off
(4-23)
where N is the number of stages, l is the length of transistors in µm, and ∆φrms is the rms
phase deviation with a simulated charge injection of ∆q.
56
Using (4-23) at a frequency offset of 1 MHz, the FFI VCO has a phase noise value of
-93.0 dBc/Hz and the CS VCO has a phase noise value of -79.1 dBc/Hz. If cascode
amplifiers are added to the CS VCO to achieve a more accurate comparison, the phase noise
decreases to -85.1 dBc/Hz. Both VCOs have the about same center frequency1 and both
consume the same amount of power.
4.7.3. Phase noise comparison between the FFI and CS VCOs
The benefit achieved by using the FFI architecture for VCO design, rather than a
standard ring VCO, is at least 8 dBc/Hz of noise reduction. This improvement is quite
compelling because it comes without the need for additional power.
There are two main factors which contribute to the noise reduction. The FFI VCO has
a higher frequency because of the incorporation of a novel architecture. This higher
frequency can be traded off for an increase in level one capacitance. Capacitance was added
to each stage to weaken its speed and bring it in line with the speed of a standard ring
oscillator. Additional capacitance helps to absorb current noise by decreasing the
bandwidth on the outputs. It essentially softens the voltage spike caused by an insertion of
charge at the output node. The CS VCO, for example, has a level one capacitance of 28 fF
and the FFI VCO has a capacitance of about 180 fF.
The second effect is a result of the averaging that occurs between the two inputs to
each gate. Any noise disturbance on one input is offset by averaging and results in a change
of 66% from the unaveraged expected result. At first it would appear that the effect should
only be a 50% but because of the propagation of the effect through multiple averages, the
progression leads to a 66% change. This factor of two thirds corresponds to a 2.2 dBc/Hz
decrease in the overall phase noise.
1. The center frequency of the CS VCO is actually about 70% that of the FFI VCO. If properly matched the
noise value gap between the two will only widen because of the larger capacitor required by the FFI.
57
4.8. Jitter
Jitter in a ring VCO is generated by four primary noise sources within each variable
delay element: thermal noise from the collector resistors, tail current noise, sampling of
input noise by switching of differential pairs, and noise at the VCO input [17], [18]. κ is
used as a time domain figure of merit relating the standard deviation of a transition over a
fixed amount of time
σt
-.
κ = ---------∆T
(4-24)
Each noise source contributes to the total κ as described in detail in [19]. This equation is
valid for all time in the open loop case and valid for time less then the loop time constant
in the closed loop PLL, case.
In this VCO, the noise generating sources in the delay element are frequency
independent due to the nature of the frequency control. Thermal noise from the collector
resistors remains constant because the capacitance and resistance remain constant. Noise
introduced by the degenerate tail current source also remains fixed. The input differential
pair noise is dependent on the amount of current through the pair, which is linearly
switched between the inputs. Since the total current remains constant, the total noise
contribution from each pair will remain approximately constant. For these reasons, the
jitter introduced by one stage remains constant over all frequencies.
Although noise induced jitter per stage remains the same, the total jitter per transition
depends strongly on the transition interpolating ability of the VCO. When the VCO is
operating in the four stage mode, the jitter in one period is a result of the jitter from all
four stages. However, as the weighting factor is shifted to favor the feed-forward signal,
the jitter introduced during a full period is only from two stage elements rather than four.
58
The result after including (4-2) is that κ varies according to
σt
κ = ---------------------------------------.
ω
( 1 + s ) ∆T -----ωo
(4-25)
The factor of ω/ωo is added to normalize in terms of transitions independent of the
frequency.
Using (4-3) and solving for s as a function of the frequency fraction gives
3ω
s ≈ --------- – 1
2ω o
(4-26)
and substituting (4-26) in (4-25) yields
ω
κ ≈ 2--- -----o- κ o
3ω
(4-27)
where κο is the nominal jitter constant for an identical ring oscillator without feed-forward
interpolation, ωo is the center frequency and ∆T is the time over which the open loop jitter
is being measured. This equation is graphed in Fig. 4-26.
Using the derivation in [19] and the data in Table 4-1 yields a κο of 18 n s . Through
calculation and simulation it was found that the largest contributor to overall jitter was
from the input differential pairs and the emitter followers.
59
Table 4-1 Circuit parameters for calculating jitter.
Parameter
Value
Re
100 Ω
Rc
100 Ω
Iee
3.2 mA
Ko
5.5 GHz/V
en(vco)
4.6 nV
Hz/
152 Ω x 8
Rbase
4 inputs
4 followers
4.9. Interconnect Parasitic Simulations
Interconnect parasitics are increasing in importance in the design of high speed
circuits. In slower, larger circuit the capacitance and resistance of the interconnect was
dwarfed by device parameters. Now, with very small devices, this is no longer true and
interconnect parameters are as large, or larger than device parameters. Also, with an
increase in operating frequencies, speed of light propagation time becomes a larger fraction
of the overall cycle time.
In general, the effect of non-ideal interconnect is an increase in delay through the
wires. This is crucial for ring oscillators, since the operation of the circuit requires stringent
control over the delay. If properly simulated and accounted for, an underperforming VCO
can be avoided. An oscillator that achieves significantly higher “ideal” speeds then
specified is required. It is not uncommon for interconnect to decay speeds by as much as
10% to 20%.
To ensure operation at 5 GHz, the FFI VCO was designed with a 20% safety margin.
To do this, the circuit was designed to run at 6 GHz without interconnect effects included.
This safety margin, in addition to the already large frequency range, assures proper
60
operation at 5 GHz. Only with a 20% interconnect effect and a 20% decay from other
negative effects will the VCO fail to meet the specifications.
Fig. 4-18 shows the effect on the frequency response before and after adding
interconnect capacitance.1 The performance drops a uniform a 12%. Larger effects were
seen in the Current Starving VCO because of smaller transistor size and the resulting larger
percentage of interconnect to total capacitance.
7.5
7.0
Frequency (GHz)
6.5
6.0
5.5
5.0
4.5
No Parasitics
4.0
Capacitive Parasitics
3.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
Control Voltage (V)
Figure 4-18 FFI with capacitive interconnect parasitics
The introduction of interconnect parasitics reduces the performance of
high speed circuits. When designing a ring oscillator it is absolutely
necessary to include these effects.
4.10.HDL Model
A transistor level model of this ring oscillator includes 60 active devices and 12
devices for the required balancing loads. If a frequency divider is needed, such as the 1/8
in the transmitter frequency synthesizer, 54 additional devices are needed to represent the
1. The IBM 1999B SiGe design kit does not account for interconnect resistances correctly and typically
shows a faster response than with capacitance only. Resistance values are also very small for these localized
wires. For these reasons, only capacitance was included.
61
entire VCO. The processing and time limitations imposed on simulating 126 devices is
prohibitive and limits design iteration.
The solution to this problem was to create an analog Hardware Description
Language, HDL, model of the VCO [40]. Spectre HDL, a Cadence package, was used
because it is tightly integrated into Cadence and is very similar to VerilogA which is the
leading analog HDL. The code for the VCO, shown in Appendix Appendix E.1. on
page 179, was modeled after the simulation data in Fig. 4-18. Input loading effects were
included in the model so that no addition circuit needed to be added. The output was also
inaccurately modeled as a sine wave and was buffered by a small buffer to transform the
signal into something more representative of a real signal.
The time associated with simulating the transmitter PLL was reduced by about 60%
with very little effect on accuracy. The extra time allowed more frequent design iterations,
since each highly accurate simulation usually takes hours to run. Another benefit of the
HDL model is the ability to extract parameter values such as instantaneous frequency and
phase, which was extremely helpful in analyzing the PLL. With a transistor level
simulation these values are hidden.
4.11.Final Implementation
The final implementation of the FFI VCO was used in revision two (Serdes II) of the
transmitter and receiver and in the FFI VCO test chip. The specifications were based on the
goal of a 20 Gb/s communication system, and the architecture for that system.
4.11.1. Circuit Parameters
Both the transmitter (4-1 multiplexer core) and receiver (twice oversampling core)
required a quarter-frequency clock thereby forcing a VCO with a 5 GHz center frequency.
To remain conservative and ensure that the 5 GHz specification will be reached 4 µm
transistors were used, and a centering capacitor, Cc, of 19 fF/µm or 76 fF was chosen. (see
Fig. 4-14 on page 51) Under ideal simulation conditions this put the center frequency at 6.2
GHz, and when parasitics are included, at 5.4 GHz.
The specification for frequency range was partially dictated by the uncertainty of
achieving the 5 GHz center. Process variations, interconnect parasitics, model inaccuracies,
62
and other simulation difficulties necessitated a large range to ensure that any center
frequency deviation could still achieve 5 GHz. In addition, because the bypass resistors
intended to control stage decoupling also affect the frequency range, their effect must be
considered. The decision was made to maximize the frequency range (see Fig. 4-10 on
page 48) while having a conservative response to the stage decoupling problem. The value
of Rb was chosen to be 6.4 kΩ-µm, yielding a VCO possessing a large range, and a strong
decoupling prevention.
The gain of the VCO was chosen based upon the input control voltage swing and the
need to provide a linear response across all control values. Since a reasonable voltage swing
for CML circuits is 250 mV, as noted in Appendix C, a range corresponding to this swing
was chosen for the VCO. This yielded a value of Re equal to 400 Ω−µm.
In addition to the 5 GHz VCO a high speed 10 GHz VCO was also designed for test
within the Serdes II chip. It had no centering capacitor so that a maximum frequency could
be achieved. The ultimate goal was to see if this faster 10 GHz VCO could be used to design
a 40 Gb/s communication system.
4.11.2. Layout Considerations
A poor layout can result in an underperforming circuit, consequently, layout
preparation is an extremely important design concern. Proper layout of a ring oscillator
minimizes noise, and interconnect parasitic effects. In addition, because these oscillators
generate considerable “digital” noise it is crucial to isolate them from nearby analog
circuits.
The first goal in the FFI layout, see Fig. 4-19, was to minimize the number of interstage wires and make them symmetrical to guarantee uniform phase spacing. The solution
was to design a single compact stage and position the four of the stages around a center with
input and outputs in the middle. This provided perfect symmetry and minimal interconnect
but required four unique orientations of the devices. Differing orientation introduces
directional process variations into the design, but symmetry appeared to be the more
important factor.
Substrate coupling1 and power supply noise, although partially offset by the
differential nature of the circuit topology, is important to address. Substrate noise can occur
63
from external as well as internal circuits. Minimizing external substrate noise, and internal
switching effects on external circuits involved the design of a deep trench moat with a
substrate contact ring along the inside, as shown in Fig. 4-20. This act provided a ground
return path for the enclosed circuitry to the substrate contacts and minimized coupling
outside the ring due to the large path around the deep trench. This is critical for this VCO
because of its high frequency, multi-phase digital signals that are often near low-noise
analog loop filters in PLLs. The compact design also forces substrate noise to appear as one
common mode source, thus minimizing its influence.
deep
trench
moat
substrate
ring
(grounded)
225 µm
centering
capacitor
171 µm
power
ground
rails
Figure 4-19 FFI Layout
Shown here is the final layout of the FFI VCO. Outputs can be taken
from the center or the edges of the block.
1. Substrate noise in this SiGe technology is of particular importance because of the substrate’s lightly
doped nature.
64
substrate
contact
deep
trench DT
short ground
return path
internal
circuitry
external
circuitry
silicon
surface
Figure 4-20 Reducing substrate coupling
By using a deep trench moat and substrate contacts, substrate coupling
can be minimized.
Minimizing the length of the supply-lines to pads provides a low resistance ground
return path. Like substrate noise suppression, a compact design forces supplies to appear as
one common mode source. When laying out routes to external circuits where phase
uniformity was important the signals were taken from the center of the VCO to ensure
constant length wires. In addition, dummy buffers were included when a VCO phase output
was not needed to maintain consistent loading.
4.12.Experimental Results
A test chip implementing a 5 GHz (Cc = 76 pF) and a 10 GHz (Cc = 0 pF) FFI VCO
was designed along side the Serdes 2 chip. It placed the two VCOs in an environment that
is identical to that found in the transmitter and receiver. Two input pads with capacitor
bypass provided a differential input for each VCO. The remaining four high-frequency
pads were dedicated to a buffered and a 1/8 divided output of each VCO.
The slower VCO was used in the Serdes 2 transmitter and receiver and had a center
frequency target of 5 GHz. The higher speed VCO was designed to be used in the Serdes 3
project with a center frequency at 10 GHz.
65
Figure 4-21 FFI waveform at 5 GHz
This waveform was captured with a control voltage set to generate a 5
GHz output. The peak-to-peak swing is approximately 300 mV.
4.12.1. Frequency Response
The shape of the measured frequency response in Fig. 4-22 is nearly identical to the
simulated response. It is smooth, linear around zero, and monotonically increasing. The
differences are found in the frequency range and center. The center frequency at 0 mV
control voltage, was expected to be 5.33 GHz but was measured 8% lower at 4.72 GHz.
The frequency range dropped 17% from 2.72 GHz to 2.27 GHz. In addition, the gain at
center decreased from 5.57 GHz/V to 4.98 GHz/V.
The measured offset between simulation and test results is likely due a capacitance
on the level 1 nodes of the ring stages that was larger than anticipated. Base capacitance
modeling has always been a difficult issue, as capacitance can have a considerable effect
on the frequency. A capacitance increase of 50 fF yields a frequency change that would
match the frequency decrease.
Another possibility is the poor modeling of fT which has a very dramatic effect on
frequency. Part of the effect can be seen in Fig. 4-24, where the supply voltage, was
increased beyond the nominal voltage. This increased the current, and to a point increased
66
the frequency. Although the CML trees were optimally designed for maximum f T, clearly
more collector current results in a better response.
10
simulated
(parasitics)
9
Cc = 0 pF
Frequency (GHz)
8
measured
7
6
simulated
(parasitics)
5
measured
Cc = 76 pF
4
3
-400
-300
-200
-100
0
100
200
300
400
Control Voltage (mV)
Figure 4-22 FFI VCO measured results
This plot shows results simulated with interconnect parasitics, and
measured results for the FFI VCO. The target of 5 GHz for the slower
VCO was achieved at a control voltage of 60 mV rather than the
expected -50 mV.
4.12.2. Common Mode Gain (5 GHz VCO)
The common mode gain represents the gain associated with a common mode change
in the input while the differential voltage is kept the same. As the common mode voltage is
decreased, the level 3 differential pair begins to press into the active current source below
it. Although the current should remain constant as the source’s collector moves and the
Early effect produces a slight slope in the response. (see Fig. A-3 on page 158) This has the
effect of decreasing the current as the collector to emitter voltage is decreased. At some
point the source transistor begins to saturate and the collector current drops more rapidly.
With higher common mode voltages the level three transistors are pulled from the
active sources which cause the same current effect discussed above. Although the level 3
transistors are pressing into the level 2 transistors, there is little effect because the active
67
source is maintaining a constant current. With a gain of 5 GHz/V from Fig. 4-22, and a
common mode gain of 0.5 MHz/mV, the common mode rejection ratio, CMRR, is 20 dB.
4.00
4.60
Frequency
4.55
2.00
4.50
1.00
Common Mode Gain
4.45
0.00
4.40
-1.00
4.35
-2.00
4.30
-3.00
4.25
-4.00
-400
-300
-200
-100
0
100
200
300
Frequency (GHz)
Common Mode Gain (MHz/mV)
3.00
4.20
400
Common Mode Control Voltage (mV)
Figure 4-23 FFI common mode response
The common mode response of the FFI is quite flat with only a 1%
deviation in frequency when the common mode is swept through ±100
mV.
4.12.3. Response versus supply voltage (5 GHz VCO)
The frequency of the VCO continues to increase, with decreasing supply voltages
down to -4.3 V. This can be attributed to an increasing transistor fT as the collector current
increases. Below that voltage the transistors begin to experience high current effects and
the fT drops. At the peak frequency supply voltage of -4.3 V the collector current is
approximately 1.1 mA, which is higher than the 0.8 mA expected for fastest operation. The
power supply gain at the nominal -3.3 power supply is -600 kHz/mV.
68
5
4.8
4.4
4.2
4
3.8
3.6
Center Frequency (GHz)
4.6
3.4
3.2
3
-2.5
-3
-3.5
-4
-4.5
-5
-5.5
-6
Supply Voltage (V)
Figure 4-24 FFI response versus supply voltage
At the nominal supply voltage of -3.4 V the center frequency is 4.6
GHz. Lower voltages show a quick decrease in frequency, while higher
voltages show an increase in frequency until -4.5 V. Above -4.5 V the
frequency drops quickly.
4.12.4. Phase noise measurements
Phase noise measurements, shown in Fig. 4-25, are very close to the ISF predictions
in Section 4.7.2. on page 56. At a 1 MHz offset from the carrier, the phase noise was
measured at -90 dBc/Hz and was calculated to be -93 dBc/Hz. The difference can best be
attributed to: testing effects, probe and wiring losses, and higher temperatures than
anticipated.
Because of the high noise testing environment a special differential input filter was
built to suppress signal noise on the differential input. The filter consisted of a differential
RC filter, with a very low bandwidth, and a non-electrolyte capacitor. In addition, because
supply noise was an important contributor to noise, batteries were used to supply power to
the chip.
69
-60
Phase Noise (dBc/Hz)
-70
-80
-90
-100
-110
-120
-130
100
1000
10000
100000
Frequency (kHz)
Figure 4-25 Open loop phase noise of FFI VCO
This plot shows the phase noise versus the carrier offset frequency. The
data was collected using a LabView program in conjunction with a
spectrum analyzer and special software supplied with the equipment.
4.12.5. Jitter measurements
The jitter relationship versus frequency plot is shown in Fig. 4-26. The data was
collected with an open loop VCO circuit using a HP 11801C sampling oscilloscope with
∆T set to 50 ns. The model described by (4-27) accurately described the end points of the
jitter function but the results were off by as much as 20% in between. This can be attributed
to the fact that when the VCO operates more like a four stage oscillator it exhibits fast rise
times. During interpolation, however, the VCO favors a sine-wave output and the rise time
is reduced, increasing the jitter. As s is increased, and a two stage oscillator is approached,
the rise time is more representative of that indicated in the model. At the target operating
frequency of 5 GHz, κ is equal to 14.2, which is 36% lower than κ when operating as a
normal four stage oscillator.
70
22
20
measured
( s)
18
16
analytical
14
12
10
3.0
3.5
4.0
4.5
5.0
5.5
Frequency (GHz)
Figure 4-26 FFI VCO analytical and measured jitter
This plot shows how jitter is related to the frequency of oscillation. The
fact that the jitter improves at higher frequencies is a result of the system
operating with fewer stages.
71
6.0
5
Design of the
Transmitter
Transm itter
5.1. Project History
The first transmitter was submitted to IBM for fabrication in February 1999 as a
stand-alone chip. It generated all 16 parallel data bits internally and had no mechanism to
accept externally supplied data. The bit rate specification of 20 Gb/s operating speed was
not achieved due to a VCO load imbalance.
The second prototype, submitted to Sierra Monolithics Inc. in April 2000, was a
unified transmitter-receiver chip. It contained improvements made to the first prototype
and was designed to be a fully working chip capable of being packaged or wafer tested. The
transmitter is this implementation easily hit the 20 Gb/s target data frequency.
An invention disclosure record for the symmetric multiplexer was submitted in
February, 2000. RPI has subsequently stated that they are going to pursue a U.S. patent for
this invention.
5.2. Top Level Architecture Overview
The goal of the transmitter is to accept low speed parallel data and multiplex it to high
speed serial data. In some cases, it must first encode the data by adding extra bits for error
correction, byte alignment, word framing, or channel synchronizing. The encoded data is
then multiplexed from n parallel bits to a single bit stream. An additional stage, driven by
a very low noise PLL, may then be used to retime the data [42] to remove accumulated
noise. Finally, an amplifier is used to drive the external channel that carries the signal.
This Serdes project did not investigate data encoding due to limited time and
resources. Although a full featured chip may include data encoding, a system of this type
can still operate without one. Presumably the role of the encoder would be off-loaded to the
next level of hardware or software.
72
A 16-to-1 multiplexer was implemented as four 4-stage registers and one 4-1
multiplexer. The design revolved around a unique multiplexing scheme that required four
inputs and could run with a quarter frequency clock. The output data was clocked at 20
GHz, but the oscillator ran at 5 GHz. Since 16 external bits were to be supplied to the chip
and the multiplexing scheme required four bits, a front-end register that could be expanded
to meet a parallel data word of any width was designed.
Instead of adding an additional stage to perform symbol retiming, the retiming
function was pushed into the multiplexer. This necessitated a complete redesign of the
standard multiplexing CML gate, so that it could handle the stringent timing requirements
for transmission. The symmetric multiplexer evolved from this redesign process.
Like the retiming circuit, the channel amplifier was also incorporated into the
multiplexer. This involved ramping up transistor sizes and making a change in the output
stage of the multiplexer.
16-1 multiplexer
Transmitter
16-1
Mux
1
4
A
4
B
4
C
shift reg
D
VCO
4-1 multiplexer
16
20 Gb/s
1.25 Gb/s
4
PLL
Figure 5-1 Transmitter and multiplexer architecture
The top level transmitter design consists of a 16-1 multiplexer driven by
a 5 GHz PLL. Four 4-stage shift registers capture 16 bits of data every
800 ps. These then feed the 4-1 multiplexer in order to serialize the data.
5.3. 16-1 Multiplexer
Transm itter
Fig. 5-1 depicts the core of the transmitter, the multiplexer. It
is divided up into a 4 x 4 shift register bank and a 4-to-1 multiplexer,
also shown in the same figure. The 4-to-1 multiplexer captures 16
bits of data every 800 ps and serializes them to a stream of bits. The
width of each bit at 20 Gb/s is 50 ps.
73
The shift registers consist of four cascaded MS-latches, each with a 2-to-1
multiplexer front-end. By selecting different inputs, the array of four latches can either load
external data, or accept data from the previous latch. Clocking the select line assures that
after 3 bits are shifted through the next “shift”, will result in a load. Each load pulse is
separated by 16 times the bit width or 800 ps. The tail bit of the register shifts in a zero
because new data overwrites it before it never makes it out of the head latch.
A
A
a0
BA
B
B
b0
a1
a0
b0
b1
a2
a1
b1
b2
a2
b2
b3
b3
CBAD b0 a0 d0 c1 b1 a1 d1 c2 b2 a2 d2 c3
C
C
D
0o 0
1
CD
D
c0
c0
c1
d0
d0
c2
c1
d1
c3
c2
d2
d1
d2
200ps
400ps
c3
0o
90o
0ps
90o
Figure 5-2 Data timing for the 4-1 multiplexer
The multiplexer interleaves the incoming data by using a multi-phase,
quarter frequency clock. Timing of this circuit is critical because this
circuit also has the responsibility to retime the data.
The unique nature of the multiplexer requires data in registers A and D to be offset
by 100 ps from data in registers B and C. This offset was accomplished by clocking the
registers with two in-quadrature phases of the PLL.
Each of the four registers is connected to the 4-to-1 multiplexer as an input. A special
“shuffling” clocking scheme is used to multiplex the data. This alleviates the need for a 10
GHz clock that would typically be required to convert the final two 10 Gb/s signals into one
74
20 Gb/s signal. One single-frequency clock can control the shift registers and clock the
multiplexers.
Multiplexing is accomplished by offsetting registers A and D by 90° from registers
B and C (see Fig. 5-2). This creates the basic interleaving data sequences, BA, and CD,
which are synchronized with the first stage of 2-to-1 multiplexers. Interleaving was not
necessary to create the sequences, but without it, coincident edges and timing glitches could
have been introduced.
Signals BA, and CD arrive at the final multiplexer in phase with each other. The
phase of the select signal of this multiplexer is shifted exactly 90° from the previous
multiplexer’s select signal. This effectively cuts both BA, and CD in half and combines
them to form a CBAD signal. Therefore, final output edges are created from two sources:
the final multiplexer select and the change of inputs during selection.
The phase difference between the 90° and 0° signal is critical in determining any
output transition offsets. Any mismatch between the phases directly correlates to a phase
offset between consecutive transitions in the bit stream. To guarantee a 90° phase
difference a delay which exactly matches the delay of the two 2-to-1 multiplexers is
introduced. The easiest way to do this involves using a matched multiplexer whose a input
is set to 0 and b input is set to 1. Although this technique consumes some power its use is
necessary to significantly reduce phase mismatch.
5.3.1. The Case for the Symmetric Multiplexer
The 2-to-1 multiplexer is the final non-amplifying stage in most serial transmitter
circuits. It is, therefore of utmost importance to study and understand the performance of
this gate and how its performance affects the data stream.
A typical 2-to-1 CML multiplexer utilizing levels 1 and 2 is shown in Fig. 5-3. Data
inputs a, and b are on level 1 and the select input, s, is on level 2. In a clocked circuit the
important performance parameter is the delay from the input transition to the output
transition. The largest delay is taken from all of the possible combination of inputs and
outputs. This parameter, in conjunction with other gate delays, ultimately determines the
maximum speed at which the circuit can be clocked.
75
The multiplexer performance metric, however, is very different when used in a
transmitter when the multiplexers perform the retiming. Delay through the gate is of
secondary importance, whereas the shape and aperture of the eye diagram is of critical
importance. Bit widths must remain consistent, and bit amplitudes must remain large
enough to be received when noise is present.
z0
z1
a0
Q1
Q2
a1
Q3
Q4
b0
s0
Q5
b1
Q6
s1
Figure 5-3 CML Two Level Multiplexer
The level difference between the inputs a, and b; and the select input s,
produce a phase mismatch when a, b, and s, are aligned by 90°°.
The data and select signals arriving at the multiplexer are forced to a phase difference
of 90° by the VCO and overall circuit architecture. It is questionable whether an exact 90°
difference is appropriate for this gate because the inputs arrive on different levels. Is there
any inherent difference between their respective delays? Perhaps a better choice of phase
exists such that a more uniform output is generated? How does the difference in levels
affect the loading and driving from previous gates?
The circuit in Fig. 5-4 was designed and simulated in order to analyze and answer
these questions. Signals a and b are complements of each other and the select signal’s
phase, ∅, is varied around 90o. Ideally, the average value of the output will coincide with
the median when ∅ is equal to zero. This condition corresponds to an output with a 50%
duty cycle, in which each bit is of equal width.
The results of the analysis are shown in Fig. 5-5, and indicate that a phase offset of
13.5°, 7.5 ps is needed to maintain a 50% duty cycle. This effect is a result of the data
existing on level 1 and the select lines being on level 2. For a select change to propagate to
the output it must travel through two levels of logic where a data change only needs to travel
76
through one. There is also a loading difference between the two logic levels. The collectors
on level 1 see the pull-up resistors and the base of the proceeding gate. On level two the
collectors see two emitters from the level above.
a
a
0°
2:1 z
MUX
b
b
180°
∅
s
load
s
average
90° + ∅
0 ps
200 ps
400 ps
Figure 5-4 Simulation Testing of CML 2:1 Multiplexer
By varying the select phase relative to the data phase and averaging the
output signal over time, a measurement showing ideal select and data
phase offsets can be made.
A 50% duty cycle when the phase difference between data and select signals is 90°
is desired, since both are driven off the VCO. The multiplexer, however, requires a 103.5°
phase difference for symmetric output. A delay element could be introduced to the data
lines to add 7.5 ps, but a better solution was invented; the symmetric multiplexer.
The symmetric multiplexer accepts all inputs on the same level, has the same loading
per input, and ensures that any input (data or select) will propagate to the output in the same
amount of time. An implementation of the gate is shown in Fig. 5-6. The left hand side of
the multiplexer represents the OR condition a ·s + b ·s, which generates the high output,
and the right hand side represents the inverse condition (a + s) · (b + s), which generates the
low output. The four transistors, Q1-Q4, in the center, act as a shared differential amplifier.
During all static conditions one branch will have a high and a low level transistor and the
other branch will have both transistors in an intermediate state. The branch with the high
level will carry all of the current and produce the z output.
77
-0.9
Average Output Voltage (V)
-0.95
-1
-1.05
-1.1
-1.15
-1.2
-1.25
-1.3
-180
-150
-120
-90
-60
-30
0
30
60
90
120
150
180
Phase (degrees)
Figure 5-5 Simulation Results for CML 2:1 Multiplexer
The crossing point, or 50% duty cycle point, occurs at 13.5°,7.5 ps. This
shows an asymmetry between the select and data inputs.
Input Stage
Output Stage
Input Stage
z0 z1
Q1
Q3
Q2 Q4
a0
a1
b0
b1
s0
s1
½I
½I
½I
I
½I
Figure 5-6 CML Single Level Symmetric Multiplexer
A novel implementation of a multiplexer with inputs all on level 1,
identical loading per input, and completely symmetric response.
78
½I
½I
Fig. 5-7 shows the state of each transistor based upon the input values. “H” represents
a high state, or the highest voltage and indicates which transistor will carry the current. The
Medium level falls halfway between the High and Low levels. To ensure proper noise
margins the voltage difference between the high and low levels is increased to 500 mV.
This places a 250 mV difference between the two top voltage levels.
Each of the transistors in the central tree of the multiplexer is driven by two
differential pairs. This allows for a reduction in the size of the 12 input transistors without
any loss of signal integrity, and also directly compensates for the doubled loading on each
input. A drawback is that each input requires a minimum of 2 µm of load, no matter the
output driving ability.
Power requirements for this circuit are also four times higher than those for a typical
level 1 output CML multiplexer. On the other hand, since this circuit only requires one level
of logic, the negative power supply can be reduced by at least 25%.
a
b
s
Q1 Q2 Q3 Q4 Z
0
0
0
M
M
L
H
0
0
0
1
M
M
H
L
0
0
1
0
L
H
M
M
1
0
1
1
M
M
H
L
0
1
0
0
M
M
L
H
0
1
0
1
H
L
M
M
1
1
1
0
L
H
M
M
1
1
1
1
H
L
M
M
1
Figure 5-7 Symmetric multiplexer transistor states
The states of transistors Q1-Q4 are defined to be high, low, and middle.
The transistor in the high state carries the current and dictates the output
value.
5.3.2. Final Implementation and Simulation
Serdes I did not utilize the symmetric multiplexer and had a 15% phase error in
alternating edges, shown in the simulation in Fig. 5-8. Figure (a) shows the eye diagram of
the standard CML multiplexer. The inputs were designed to exercise the circuit as much as
possible, i.e. using 50 ps input pulses, and differing a and b inputs when the select input
79
changes. At the center voltage of 125 mV, two distinct crossings can be seen, which result
from the input to output delay imbalance in the CML circuit. The time for a select transition
to reach the output is about 10 ps longer than for an a or b input to reach the output.
Figure (b) shows a much cleaner eye diagram for the symmetric multiplexer. The
reason for this improvement lies in the circuit architecture, which was designed with
symmetry to ensure that any input changes propagate to the output in the same amount of
time. The ramifications of this are obvious. The transmitter output will benefit from a clean,
low phase noise multiplexer signal.
The 4-to-1 multiplexer with symmetric architecture in Serdes II also plays the role of
the line driver by driving the pads directly. The reasoning behind this design feature was
removing the noise that would be introduced by an additional line driver. By integrating the
two components, the total phase noise is smaller. In order to accomplish this, larger 12 µm
transistors, capable of sinking 9.6 mA, were used in the final multiplexer. In addition, a
cascode amplifier was added to the output stage to limit the loading on the differential pair.
Driving the final 12 µm output stage required ramping up of transistor sizes so that
the input stage of the final multiplexer was not loaded down. Starting with a 1 µm input
stage, two intermediate emitter followers were added of sizes 2 µm and 4 µm. This enabled
an output stage with 8 µm transistors, each capable of driving transistors of their own size
or larger. This output stage drives the final multiplexer which has an input of 4 µm. Once
again, two 6 µm and 8 µm emitter followers were added, followed by the 12 µm output
stage. This technique required a total current of 63 mA as compared to a 15.4 mA current
requirement for the standard CML multiplexer and the associated pad driver.
80
0.00
-0.05
-0.05
Output Voltage (V)
Output Voltage (V)
0.00
-0.10
-0.15
-0.20
-0.10
-0.15
-0.20
-0.25
-0.25
-0.30
-0.30
0
20
40
60
80
100
0
20
40
60
Time (ps)
Time (ps)
(a)
(b)
80
Figure 5-8 Multiplexer Eye Diagrams
These plots are output eye diagrams for the standard CML multiplexer
(a), and the symmetric multiplexer (b). Both circuits received identical
20 Gb/s inputs and identical loading.
4x4 registers
4x4 registers
CML multiplexer
(a)
(b)
3 symmetric
multiplexers
Figure 5-9 Multiplexer Layout for Serdes I and II
The transmitter 16-1 multiplexer consists of a 4x4 shift register and a 41 multiplexer. The layouts for Serdes I (a) and Serdes II (b) are shown
here.
81
100
5.4. Phased Locked Loop (Frequency
Synthesizer)
Transm itter
When reducing phase noise in the transmitter becomes the most
important design factor, the transmitter phase locked loop, PLL,
becomes the most important circuit in the system. Its role is to
generate a high frequency, extremely low noise clock from a low
frequency, noisy, externally supplied reference clock. For the transmitter PLL in this
design, the external reference is at 625 MHz, and the PLL clock output is at 5 GHz.
The standard linear model of a PLL, shown in Fig. 5-10, has a phase detector (PD),
a loop filter (LF), and a VCO. The phase detector subtracts the phase of the input signal
from the phase of the output signal. This gives a measure of the phase offset of the two
signals and is the mechanism that allows the phases to be locked together. The loop filter
filters the output of the phase detector in order to meet certain feedback characteristics,
such as output noise, pull-in range1, and pull-in time2. The VCO acts as an integrator,
converting a control signal to an oscillating signal represented as a phase. Finally, a 1/8
frequency divider is used to match the internal frequency to the external input frequency,
as required by the PD.
input
filter
vi
Y(s)
phase
detector
θi
Kd
loop
filter
F(s)
VCO
Ko/s
to transmitter
frequency
divider
θo
Figure 5-10 Linear model of PLL
The PLL used in the transmitter consists of three primary parts: phase
detector, loop filter, and VCO. An input filter is added to reduce the
noise levels of the input signal.
The transmitter’s frequency synthesizer went through three major revisions during its
evolution. These revisions are depicted in Fig. 5-11. During the rapid development of the
1. Pull-in range is the maximum range of frequencies for which the PLL can eventually acquire lock. This
PLL parameter is primarily a function of the PD implementation, but is also determined by the frequency
range of the VCO.
2. Pull-in time or acquisition time is the amount of time it takes the PLL to achieve lock from an initial frequency deviation that is within the pull-in range.
82
first transmitter prototype, a PLL was designed that had minimal functionality and poor
performance. The goal was to quickly develop a clock multiplier without concern for phase
noise and jitter performance.
With more time and results from Serdes I, a highly improved Serdes II PLL evolved.
It possessed a 3 state PD, which improved the lock-in range1 and acquisition time; an active
op-amp style LF, further improving key characteristics; and the FFI VCO which reduced
noise and increased performance was still missing from this design. An optimized
bandwidth driven by previous results and specifications. Measuring data about the noise
characteristics of the VCO and gathering information about the noise spectrum on the input
noise source was key to bandwidth optimization.
Test data from the first two prototypes, better simulation techniques, and further
research yielded the final PLL design. VCO noise spectra allowed for a much better
bandwidth design, further minimizing PLL output phase noise. A smaller bandwidth
required frequency detection in the PD because of the much longer pull-in time. Another
improvement replaced the clumsy op-amp integrator with a high performance specialized
integrator which is also used in the receiver PLL.
1. The lock-in range, a function of the PD and the PLL bandwidth, is defined as the maximum frequency
deviation for which the PD will remain in lock, where the PD is in its linear range and does not slip.
83
type I passive LF
(RC low pass filter)
XOR PD
Serdes I
CS Simple
VCO
input
filter
3-state
PD
type II active LF
(op-amp filter)
Serdes II
FFI VCO
3-state PD
with frequency
detector
type II active LF
specialized integrator
optimized bandwidth
Serdes III
FFI VCO
Figure 5-11 Frequency synthesizer evolution
The transmitter’s frequency synthesizer went through three major
evolutionary steps. The first had the most basic components and
provided minimal functionality. The second incorporated better
components to minimize noise and improve the acquisition range and
time. The third, unfabricated version, added advanced PLL components
and optimized key design variables based upon simulations and
measurements from the other prototypes.
5.4.1. Input Filter
An effective technique in reducing PLL phase noise is to drive it with a very clean
reference source1. The PLL has the ability to lock a noisy VCO to a clean reference and
reduce the total output noise to a level below that of the VCO. With this in mind, an input
bandpass filter was designed and implemented in order to reduce the out-of-band noise of
1. The signal source used in the Frisc testing lab is very old and very noisy. In practice, a very well controlled low phase noise signal generator would be used as a reference and an input filter would not be
needed.
84
the signal source. This technique was added to the Serdes II design but removed in the
subsequent design because a better input signal generator was acquired.
C1
C2
R1
R3
CML
amplifier
R2
R1
C2
C1
attenuator
R1
R2
R3
C1
C2
800 Ω
224 Ω
2 kΩ
Ω
500 fF
500 fF
R3
bandpass
filter
Figure 5-12 Schematic for input filter
The input filter is a bandpass filter centered around the reference
frequency. It is intended to filter output low and high frequency noise
associated with this signal.
Fig. 5-12 depicts the schematic of the input filter, which consists of an input
attenuator and an active bandpass filter. The active component of the filter is simply a highgain two-stage buffer with level one and level two outputs. The first stage does not effect
the voltage gain of the amplifier and has Darlington pair inputs to reduce the input current
by a factor of β. Twenty-five percent larger pull-up resistors were used to increase the total
gain to approximately 5. The input resistor tree attenuator compensates for the large total
gain of the bandpass filter by reducing the input amplitude by 78%.
The frequency transfer function for the input filter is shown in Fig. 5-13. The peak
was designed to be at precisely 625 MHz with a bandwidth large enough to account for
parameter mismatches and frequency adjusting.
Because the final effect of this filter on the output phase noise of the PLL was not
known, a multiplexer was added after this circuit so that it could be bypassed if necessary.
This opens up the ability to determine the filter’s actual usefulness.
5.4.2. Phase Detector
A phase detector produces a signal that yields information about the difference
between the phases of its two inputs. Ideally it produces a perfectly linear response for all
85
phase differences and has an arbitrary gain. For real circuits, however, we must settle for
non-linear responses that may have regions where the gain becomes negative, where the
function is periodic in π/2 or π rather than 2π, and where the gain varies across the range.
5.4.2.1. Phase detector (Serdes I)
Frequency (MHz)
1
10
100
1000
10000
0
-10
Gain (dB)
-20
-30
-40
-50
-60
Figure 5-13 Input filter frequency response
At the reference frequency of 625 MHz the input filter achieves a
slightly greater then unity gain. All other frequency are attenuated.
Two different phase detectors where investigated in Serdes I and Serdes II, the XOR,
or Gilbert Multiplier, and the 3-state, respectively. The schematics for the XOR PD, shown
in Fig. 5-14, consist of a single tree CML gate with emitter followers. At one extreme, the
inputs are in phase and the average value of the output is 0. When the inputs are 180o apart,
the other extreme, then the output is 1. For the 3-state detector the output is taken
differentially across its two internal signals VU, and VD. These signals’ rising edges, which
are outputs from the two resetable MS-latches, coincide with the rising edges of the input
signals, Vi, and Vo. The falling edges, on the other hand, are triggered together after both
have risen. This creates a wider pulse on the signal, V U, or VD, when the associated input
arrives first.
86
The output of the XOR PD, shown in Fig. 5-15, has a linear response from -180 o to
180o. Outside that range the output slope is negative and produces a temporarily unstable
PLL response before the phase detector output enters a positively sloped region again. The
gain is about 0.53 V/rad which is relatively high. It is set by the large input control range
of the VCO used in Serdes I, the Simple Current Starving version of the VCO.
XOR Phase Detector
3-State Phase Detector
1
D
vi
vi
vU
Q
R
vd
vo
vd
R
D
1
(a)
vD
Q
vo
(b)
Figure 5-14 Phase detector schematics
The XOR detector (a) uses a XOR logic cell to perform phase detection.
The 3-state detector (b) utilizes two resetable MS latches and an and
gate.
5.4.2.2. Phase detector (Serdes II)
Fig. 5-15 also shows the output of the 3-state PD. Its response is greatly improved
over that of the XOR PD. First, the slope is always positive and it extends across the entire
input phase difference range. This greatly improves the response of the PLL during lock
acquisition. This response will be discussed in Section 5.4.6. Another important
improvement appears when phase error is continuously increased above 180o, which is
common with larger frequency offsets. Although the plot shows that the output is -120 mV
above 180o, the output will step to 0 mV, and continue to rise beyond that phase. This effect
increases the pull-in range.
In order to implement the 3-state PD one significant hurdle related to the reset
feedback through the AND gate had to be resolved. Proper operation occurs when the
second output edge from the latches causes the AND to go high, reset both latches and bring
the AND low again. Through simulation, however, the very thin reset pulse was failing to
reset one of the latches. The problem was traced to the non-uniform loading of the output
latches and the asymmetry in the AND gate inputs. The solution was to use a single-ended
87
AND gate to provide symmetric loading, and matched input levels for both latches. This
ensured that both latches were uniformly reset, and alleviated all timing issues.
150
1.0
100
0.6
3 state
0.4
50
0.2
XOR
0
-270
-180
-90
0
90
180
-50
0.0
270
-0.2
-0.4
-0.6
-100
XOR Phase Detector Output (V)
3 State Phase Detector Output (mV)
0.8
-0.8
-150
-1.0
Phase Difference (degrees)
Figure 5-15 Simulated phase detector responses
Plotted above is the average of the signal output of the two phase
detectors. The XOR phase detector has a valid range between 0o and
180o, and the 3 state detector output is valid for any phase difference.
These PDs are used in a frequency synthesizer which includes a divide-by-8
component. The nature of the PLL gain K, and the 3 dB bandwidth is such that they are
both reduced by a factor of N. This factor is incorporated into the PD gain which gives the
XOR PD an adjusted gain of 66.3 mV/rad and the 3-State PD an adjusted gain of 5.25
mV/rad.
The lock-in range of the PLL using the XOR PD is (π/2)K and πK for the 3-state PD.
The larger range of the 3-state PD provides higher resistance to cycle slips and yields a
shorter pull-in time when used with a frequency detector. The pull-in time of the XOR PD
is about four times larger then the 3-state PD with the same PLL bandwidth. The pull-in
range is also four times larger for the 3-state PD. The simulated figure of merit1, M, for the
1. The figure of merit, M, for a PD is Vdo/Kd, where V do is the mean value of the PD output and K d is gain.
A low M value for a PD yields a small pull-in range.
88
XOR gate is quite high, approaching 1 million. This was expected, because of the very
simple nature of the XOR gate. The 3-state PD, on the other hand, has a value of about 22
which is appropriate for a circuit of this complexity.
5.4.2.3. Phase detector (Serdes III)
Research into Serdes III necessitated a decreased bandwidth in order to further
suppress spurious noise introduced by the PD. Side effects of a decrease are a reduction in
the pull-in range, and an increase in the pull-in time. A very effective way to counter these
negative effects is to add a frequency detector, FD, to the 3-state PD. This circuit is able to
detect cycle slips and provide a strong pull-in signal in response. A cycle slip occurs when
the phase error exceeds the bounds of the PD (0, 2π) and the output steps (See Fig. 5-15 on
page 88). This is indicative of a large frequency error and if the proper circuitry is added to
sense this event then a large change can be made to the loop filter integrator.
X
slip
Y detector
vi
loop filter
vU
3-state PD
vo
vU’
vd
vD
Y
vD’
slip
X detector
slip detector
vi
X
vo
Y
delay
R
Q
D
1
vd
cycle slip
Figure 5-16 PLL frequency detector
A frequency detector detects cycle slips from the PD and performs large
control voltage changes. This allows a much wider pull-in range, and
smaller pull-in times.
89
D
Q
vs
The schematic in Fig. 5-16 shows the implemented frequency detector that was added
to Serdes II’s design. The detector compares the input to the output of PD. When a cycle
slip occurs, an output edge normally created on vu by vi’s rising edge is missing, and this is
sensed by the slip detector. The detector will then add or remove a fixed amount of charge
from the charge pump integrator. This causes a step change on the output of the integrator.
The key to implementing the FD is to ensure that the induced frequency step, ∆ωc,
does not exceed twice the lock-in range, ωL which would force the frequency to oscillate
around ωL and never acquiring lock. Typically ∆ωc is conservatively set to ωL so that pullin time is minimized and PLL lock is ensured.
5.4.3. The VCO
Serdes I utilized the Simple CS VCO with a gain of approximately 0.5 GHz/V. Its
highly variable gain, and non-linear frequency response made analytical modeling of the
PLL difficult. The second and third prototypes used the FFI VCO which has a consistent
gain of 6 GHz/V. Its linear response made analytical modeling much easier to perform.
5.4.4. Loop Filter
The loop filter in a PLL plays a critical role in determining the PLL bandwidth.
Usually the gains of the PD and the VCO, are fixed and therefore the loop filter is the only
component available to control the bandwidth. A high bandwidth corresponds to a strong
ability to track the input phase at high frequencies. This would be very useful for a receiver
that needs to track an incoming signal plagued with transmitter and line noise. This ability
will be discussed further in the following chapter. A small PLL bandwidth, on the other
hand, ignores phase variations on the input and performs very slow tracking. This is the
necessary situation for a transmitter since it needs to generate a very clean VCO signal,
independent of the noise introduced by the input reference signal and from the VCO.
Reducing the bandwidth too much, however, prevents the PLL from tracking out the VCO
phase noise. An optimum bandwidth for minimum total output phase noise does exist and
should be determined.
90
5.4.4.1. Serdes I Loop Filter
The transmitter PLL in the first prototype utilized a passive low pass filter1. The filter
is a two stage RC ladder, and has two poles, but for the purpose of analysis, the higher
frequency pole can be ignored, since it only helps to reduce spurious modulation2. The loop
type is considered a two pole loop: one pole in the loop filter and one pole in the VCO. The
poles are at 30 MHz (ωn) and 207 MHz, when the capacitance and resistance values are 2
pF and 1 kΩ, respectively. The decision was made to use two RC stages rather than one to
increase the high frequency signal rejection.
F(s)
R
C
|F(jω
ω)| (dB)
1
R
C
ωn
C=2 pF
R=1 kΩ
Ω
log f
Figure 5-18 Tx PLL passive loop filter
A second order low pass filter utilizing a two stage RC ladder
configuration.
The resistor and capacitor component values were maximized, for low bandwidth as
discussed above, based primarily on the proper operation of the PLL and on layout
limitations (capacitors consume large amounts of area). Since the PD output is differential
in nature, symmetric loading requires a duplication of the RC ladder. the four capacitors
were therefore limited to about 2 pf because they take up a large amount of layout space.
Resistor sizes, on the other hand, were reasonably small but values larger than 1 kΩ
introduced considerable loading effects because this RC circuit had to drive the VCO aVref
control circuit.
1. The design time constraint for this critical Serdes I component was very limited, and effort was only put
into the PLL’s proper operation rather then optimization. In the end it worked well enough to drive the transmitter and allow collection of all desired data.
2. A common problem in frequency synthesizers is called spurious modulation and is a result of the normally much higher frequency output of the PD. A result of the frequency divider, these lower frequency signals are not adequately attenuated by the loop filter and are passed on the VCO as unwanted phase noise.
91
5.4.4.2. Serdes II Loop Filter
Further research and design allowed for a much improved loop filter to be used in
Serdes II. The first important enhancement was the move to an active rather than passive
filter. The use of an integrator allowed a loop filter dc gain, F(0), approaching infinity to
be used in contrast to a passive filter’s dc gain of unity. From this, the PD static phase error,
Vco
– Vdo
θ eo = ----------- + -----------------Kd F ( 0 )
Kd
(5-1)
becomes approximately zero, when the PD offset voltage1, Vdo, is zero, where Kd is the gain
of the PD, and where Vco is the static control voltage2 of the VCO. Under these conditions
the input phase difference is kept near zero, when the PLL is in lock, which improves the
purity of the synthesized frequency [41] and aids acquisition.
C
R2
C3
R1/2
Gain Stage
NPN differential
amplifier
R1/2
C3
op-amp
R2
C
low pass
filter
FET Front-End
high input
impedance
integrator
Output Stage
low output
impenitence
Figure 5-19 Tx PLL active loop filter
This active loop filter incorporates a low pass front-end followed by an
integrator. The op-amp has a FET input stage to minimize loading, a
high gain NPN stage and a low impedance output stage.
Resistors, R1, and R2, and capacitor, C, and the amplifier in Fig. 5-19 form the core
of the filter. These elements form a integrator with a zero at
1
ω 2 = ---------R C
1
(5-2)
1. Vdo is the free running, or offset phase detector voltage. It represents the DC output voltage offset for the
PD and is a property of the PD alone.
2. The static control voltage or V co, is the control voltage applied to the VCO which matches the input and
output frequencies. It is related to the input signal and VCO properties.
92
and a gain of
R
2
K h = ------R
(5-3)
1
at frequencies above ω2. This choice of 6.4 MHz for the loop bandwidth was based loosely
on comparisons with other similar loops which have bandwidths of approximately 1 MHz
[41]. These similar loops, however, utilize a much cleaner LC VCO, so a larger bandwidth
was needed to compensate.
The final design of the loop filter yielded values for R1, R2, and C, equal to 16.7 kΩ,
6.67 kΩ, and 14.1 pF respectively. ω2 was 1.7 MHz, Kh was 0.4, and the total loop gain and
bandwidth was 6.4 MHz. In addition, the low frequency gain which is governed by the gain
of the amplifier is about 5.
20
0
ω2
ω3
Gain (dB)
-20
-40
-60
-80
-100
1kHz
1MHz
1GHz
Frequency (Hz)
Figure 5-20 Active loop filter transfer function
The active loop has a 1.7 MHz zero which forces a high DC gain. A pole
at 21 MHz attenuates high frequencies to reduce spurious modulation.
The addition of a low pass filter, or pole, to minimize spurious modulation, is realized
through element C3 in Fig. 5-19, with a cut-off frequency at ω3. The frequency of the pole
is at 21 MHz and yields a capacitor value of 1.8 pF.
93
The frequency response of the open loop response is plotted in Fig. 5-20. A zero at
ω2 produces a -20 dB/dec slope which is not realized at low frequencies due to the noninfinite gain of 13.5 dB of the op-amp. Above ω2, the gain is Kh until the pole at ω3 where
the curve drops off at -20 dB/dec. An additional pole at approximately 100 MHz exists
within the op-amp for loop stability.
5.4.4.3. Serdes III Loop Filter
The implementation of the Serdes III loop filter utilizes a negative impedance
amplifier, NIA, charge pump [27]. Fig. 5-21 shows that the circuit has a RC filter which is
balanced or floated between a pull-up resistance and pull-down negative resistance. As
long as the sum of these resistances equates to zero then the filter nodes are allowed to float.
Any deviation from zero will result in a drift in the differential output voltage to infinity,
or to zero. To ensure a reasonable initial condition, the pull-up resistors should be slightly
smaller then the NIA resistance so that the differential voltage is slowly pulled toward zero.
The negative resistance is generated through a linearized CML feedback tree that is
very similar to the storage mechanism in a MS-latch. The current through one branch is
I
ia
v –v
o
0
1
= ---- – ---------------R
2
(5-4)
where Io is the total current through the tree, R is the value of the pull-up resistors, and v1
and v2 are the outputs and the nodes of the capacitor. Technically, the circuit acts as a
negative impedance
v –v
0
1
------------------- = – R n
i –i
0 1
(5-5)
which is based upon a differential voltage and current. The end result is that the differential
voltage, v1-v0, is allowed to float at any value less than RIo. The resistance value of the NIA,
Rn,
is the sum of the linearizing resistors and the emitter resistance, as described in
Appendix C.1.
94
Rp
step0
negative
impedance
amplifier
R
v1
z1
ref
C1
v0
i1 C 2
z0
i0
int0
int1
step1
Io
7x
Figure 5-21 Receiver III integrator
The integrator used in Serdes III consists of a negative impedance
amplifier which essentially “floats” a capacitor and current trees to
move charge on and off each end.
The striking benefit of this negative impedance charge pump is that it allows charge
to be removed from either end of the capacitor while the differential center voltage is
maintained. Removal of capacitor charge through a CML tree causes a differential voltage
change, and when a constant current is drawn, the voltage will ramp accordingly, thus
showing the integration.
There are two methods for affecting the differential output voltage; each method is
handled by its own circuit. The first is a standard current source which uses a linearized
CML tree with inputs int0, and int1 to draw current from either side of the filter. The
amplifier gain, Ka, is approximately 1 mA/V. This value can be derived from the linearized
CML tree plot found in Fig. C-3 on page 165. The constant includes a factor of 1/2 because
the current is split between two paths, one directly through the pull-up resistor and one
through the filter.
The second method is a step input used in conjunction with the frequency detector in
the PD. In the case of a 3-state PD, a cycle slip detected by the FD will pulse one step input
95
or the other and cause a large charge change on the capacitor. The size of the step current
source dictates the amount of change.
Serdes III was the first design with a loop gain that was optimized for minimal output
phase noise based on measured and simulated phase spectra data from the FFI VCO
discussed in Section 4.12.4. on page69. With this information and phase noise data on the
reference source, the noise spectrum plot shown in Fig. 5-22 can be created. It shows the
voltage spectral density for the FFI VCO and for a very low noise reference source. The
frequency at the point of intersection indicates the ideal value for loop bandwidth. Values
lower than this allow more VCO noise to propagate to the output while values higher than
this allows more reference noise to propagate to the output and increases the spurious
modulation from the reference.
Φ (dBc/Hz)
-40
VCO
-60
20
e
/d
c
-120
dB
-100
e
nc
ce
re
ur
fe
re
so
e
e
iv
nc
ct
re
fe
fe
ef
re
-80
optimum loop BW
for minimum noise
18 dBm
-140
0
10
M
z
z
z
H
H
M
z
z
kH
H
M
10
1
0
10
z
kH
kH
10
1
Figure 5-22 Voltage spectral density for optimal loop bandwidth
Shown above is the voltage spectral density of the VCO and the
reference source. The point where they intersect is to first order the
optimal place to define the loop bandwidth.
The reference source to be used is quoted as having a noise spectral density of -140
dB at frequencies below 1 GHz. This must then be subtracted by the PLL multiplication
factor of 8, or the equivalent of 18 dBm. The VCO voltage spectral density was found
96
through simulation, analytical and measurement results, and has a value of -90.2 dBc/Hz at
1 MHz.
The relatively high noise content of the VCO and the low noise content of the
reference source placed the optimal loop bandwidth, K, at 33 MHz. Suppressing spurious
modulation requires placing a pole at 4K, 132 MHz, far enough above K so that the PLL
response will not be affected. At a reference frequency of 625 MHz, this results in an a 13
dB suppression of spurious noise which by
π
K2
σ t = ( 50ps ) ---  πδN ------
4
f r2 
(5-6)
is equivalent to data rms jitter of 5 ps. The PD minimum duty cycle, δ, is approximately
0.03. σt is one tenth of a bit width, which is unacceptable. Clearly the suppression of
spurious modulation is critical in minimizing jitter. Instead of a loop bandwidth of 33 MHz,
a bandwidth of 6 MHz was used instead. This yields an rms jitter due to spurious
modulation of 0.14 ps, which is considerably lower.
With a K at 6 MHz, the PLL zero (ω2) is placed at K/4, or 954 KHz, to give a 13%
response overshoot, and the pole (ω3) at 4K, or 24 MHz. For a VCO gain Ko of 34.5
Grad/s/V, a PD gain, Kd, of 5.25 mV/rad, a loop filter gain, Kt, of 1 mA/V, the high
frequency gain Kh must be set to 208 for K = K oKdKtKh = 2π(6 MHz). Solving for the loop
components from
s + ω2
F ( s ) = K h ----------------------s-

----s 1 + 
ω3
(5-7)
C1 R
K h = -----------------C 1 + C2
(5-8)
1
ω 2 = ---------C1 R
(5-9)
C 1 C2 R 
ω 3 =  -----------------C 1 + C 2
(5-10)
yields C1 = 802 pF, C2 = 53 pF, and R = 208 Ω.
97
The size of the stepping transistors can be found using
2Cω
I ≤ -------------L- f c
Ko
(5-11)
where C is the capacitor size, ωL is the lock in range (πK = 18.8 MHz), K d is the PD gain
(34.5 Grad/s/V), and fc is the reference frequency (625 MHz). For this implementation the
calculated current is 3.4 mA, corresponding to a transistor size of 4 µm. The ref input is
used in conjunction with the step inputs and allows them to be driven single ended to save
power.
5.4.5. PLL Loop Response
The value of the PLL gain, K, is directly related to the 3dB point, and its design is
based on two factors: the VCO noise response and the input noise level. Small values of K
yield strong input noise immunity, as the PLL is very slow to respond to input deviations,
but transmits all of the low frequency VCO noise to the output. A small bandwidth is also
effective at reducing spurious modulation. A large value of K, on the other hand, allows the
PLL to track the input very closely and attenuate a considerable portion of the low
frequency VCO noise, but means that any input noise is passed on to the output. K, as a
frequency, also has a direct proportional effect on the pull-in range, and an inverse
relationship with the pull-in time. Put simply, a larger K allows the PLL to lock in more
quickly over a larger frequency range.
The process of choosing K is affected by the output noise specifications for the PLL,
but no noise specifications were given for the design of this PLL, as it was meant for shorthaul communications, where noise does not play a crucial role. So instead, K was chosen
small enough to limit the effects of the input noise, but not to adversely effect the layout
with large component sizes. Ensuring proper operation was also important, so design limits
were not pushed and instead a “center road” approach was taken.
The step response for the passive loop of Serdes I and the active loop of Serdes II is
shown in Fig. 5-23. Both responses show a very clean, non-oscillatory response which
represents adequate choices for pole locations. Serdes II has a longer settling time due to
98
the larger bandwidth and does not undershoot. From [41] the damping factor, ζ, is
calculated to be 0.47, and 0.65 for the PLL in Serdes I and Serdes II, respectively.
0.14
Serdes I
PLL Step Output (rad)
0.12
Serdes II
0.1
0.08
0.06
0.04
Serdes I
0.02
Serdes II
Step Input
0
0
20
40
60
80
100
120
Time(ns)
140
160
180
200
Figure 5-23 PLL simulated step responses
The above plots, simulated in MATLAB, show the step responses for
both PLLs in Serdes I and II. The longer settling time of PLL 2
corresponds to the smaller bandwidth. PLL 3 has nearly the same
response as PLL 2.
PLL phase noise in this case is realized as output phase noise of the transmitter. For
this reason, no direct PLL phase noise can be measured. Section 5.10. details the noise
results for the two transmitter designs. No simulation of phase noise in the PLL was done
for this particular design.
5.4.6. Lock Acquisition
Lock acquisition can be described by two factors: the pull-in time, Tp, and the pullin range, ωp. The pull-in time represents the maximum amount of time the PLL takes to
acquire lock and track the input phase when started out of lock. The pull-in range is the
largest frequency error for which the PLL will acquire lock. Both items are important
metrics in describing the usefulness of the PLL, and ideally Tp will be zero, and ωp will
cover the entire frequency range of the VCO.
99
-1.1
730 MHz
-1.2
Control Voltage (V)
720 MHz
-1.3
710 MHz
-1.4
700 MHz
690 MHz
-1.5
680 MHz
670 MHz
-1.6
660 MHz
-1.7
0
20
40
60
80
100
120
140
160
Time (ns)
Figure 5-24 PLL I simulated acquisition plots
The above plots show the PLL in Serdes I during simulated acquisition
which is ideal and not equivalent to real life. This is also known as the
jellyfish plot.
5.4.6.1. Serdes I Simulated Acquisition
Since Serdes I used a passive loop filter, the pull-in range is restricted by and equal
to the frequency of the dominant pole ω3 at 30.3 MHz. This is a result of the -π/2 angle shift
introduced by the pole, which effectively nulls the pull-in voltage. If, for example, a -π
angle shift was introduced then the PD output would be inverted, push-out would occur,
and the PLL would move further away from lock. The pull-in time is a complicated
parameter to derive; an expression and its derivation is presented on pages 186-187 of [41].
A rough approximation for pull-in time from simulation is 100 ns.
5.4.6.2. Serdes II Simulated Acquisition
Serdes II’s PLL simulated response is shown in Fig. 5-25. The pull-in time is about
four times that of Serdes I due to the smaller loop bandwidth and different phase detector
characteristics. With similar loop bandwidths and similar loop filters, the pull-in time for a
100
PLL with a 3-state PD versus an XOR PD is about 4 times smaller, and the pull-in range is
about 4 times larger. This is primarily due to the negative slope that exists in the XOR
response but not in the 3-state response, as shown in Fig .5-15 on pa ge88.
0.25
850 MHz
0.20
900 MHz
Loop Filter Output (V)
0.15
800 MHz
0.10
0.05
750 MHz
0.00
700 MHz
-0.05
650 MHz
-0.10
-0.15
600 MHz
-0.20
-0.25
0
100
200
300
400
500
Time (ns)
Figure 5-25 PLL II simulated acquisition plots
The above plots shows PLL II during simulated acquisition which is
fairly representative of actual acquisition, however Spice has an
advantage in setting initial conditions which can show a better response
than in real life. Here is the squid plot.
The simulated pull-in time for the Serdes II implementation is about 400 ns, and the
pull-in range is approximately 75% of the full range of the VCO (600 to 900 MHz). The
addition of the 3-state PD has greatly enhanced the pull-in range at the expense of pull-in
time. This is a very favorable trade-off since typical pull-in time specifications are on the
order of µ-seconds.
5.4.6.3. Serdes III Simulated Acquisition
The third prototype has characteristics very similar to the second prototype, including
similar parameters such as: loop bandwidth, pole and zero locations, phase detectors,
VCOs, and gains. Acquisition plots are, therefore, nearly identical to those shown in Fig.
5-25. See Section 5.4.6.2. for pull-in times, and pull-in ranges. The FLL used in this
101
implementation does not have a considerable effect, but it does reduce the pull-in time by
about 10%.
5.4.7. 20 / 40 Gb/s Implementation
One area that was pursued in the development of the second prototype was an ability
to run the transmitter at either 20 or 40 Gb/s. Adding a second higher speed VCO,
multiplexers on the outputs, and an additional multiplexed divide-by-two circuit was rather
straightforward, as shown in Fig. 5-26. The primary difficulty arose when designing the
loop bandwidth to be appropriate for both VCOs. In the 5 GHz mode, the detector gain is
Kd/8 and in the 10 GHz mode it is Kd/16. This requires a reduction in half of the loop pole
frequency so that stable operation is guaranteed for both situations. This reduction has
negative implications on the pull-in time, because pull-in time has a inverse relationship to
the pole frequency. Halving the frequency doubles of the pull-in time.
625 MHz
reference
3-state PD
loop filter
5 GHz
VCO
4 phases
10 GHz
VCO
divide-by-2
divide-by-8
Figure 5-26 5/10 GHz PLL implementation
Creating a 5 and 10 GHz PLL involved the addition of a 10 GHz VCO
and various multiplexers to select the correct phases and the proper
division circuit.
5.5. Clock Distribution
T ransm itter
Clock distribution in the transmitter involves delivering the
PLL signal outputs, to the shift registers, to the external circuitry for
data loading, and to the multiplexers, with maximum phase
alignment. All prototype transmitters utilized the same scheme for
clocking.
102
A chain of buffers delays, whose inputs are the PLL 0o and PLL 90o signals from the
PLL, constitutes the majority of the clock distribution system (see Fig. 5-27). It ensures that
data and clock travel in the same direction and that delays in the shift registers, buffers, and
multiplexers are matched to delays in the delay chain.
The most critical path in the clock distribution circuitry is found between the PLL and
the 4-to-1 multiplexer. Here the PLL 0 o and the PLL 90o signals must stay phase matched
to ensure alignment of bit edges on the output. Offsets in these signals directly translate to
phase jitter and more difficult signal reception. To ensure alignment, the delay chain was
designed to be symmetrically loaded, of minimal length, and perfectly balanced. Because
the 4-to-1 multiplexer was designed as a two stage multiplexer, and because of the critical
timing required by its architecture, a precise delay of one multiplexer was added to the 90o
line, guaranteeing perfect clock alignment at the multiplexers. Consequently the SEL 0o
and SEL 90o signals are offset by exactly one multiplexer gate delay.
The next most important timing event is the clocking of the four shift registers. The
90o branch of the delay chain and its inversion handles all four registers. Since loading from
the 8 latches (4 MS latches) was a concern, a driver buffer was added to the front of each
register. This forced the addition of an equivalent delay into the delay chain. The total
number of gate delays difference between the CLK AD input and the SEL 0o signal was
designed to be zero, to ensure maximum noise margin. The timing diagram, Fig. 5-28,
clearly depicts the precise relationship between the signals.
Loading the 16 bits of parallel data requires a clock edge every 800 ps (50 ps x 16
bits), a time four times slower than the PLL period, thus necessitating a load counter,
depicted in Fig. 5-29, which is essentially a frequency divider. Not only does the load
counter have to divide by four, it also has to create two load signals separated by 100 ps
because of the clock offset on registers A and D versus B and C. The load signals select the
multiplexer input on each bit to its load mode rather than shift mode. When the next rising
clock edge arrives data is latched into the register.
The final aspect of clock distribution is the generation of the signal that informs the
external circuitry that it is ready for new parallel data. The straight forward solution is to
use the LOAD AD signal. This guarantees that when both loads have completed, the data
has had a maximum amount of time to settle.
103
Although the use of a delay chain makes clock distribution straightforward and very
reliable, it does have one serious drawback. Since it lies between the PLL and the output
multiplexer, it contributes to the overall phase noise and jitter of the circuit. This noise is a
result of shot noise, thermal noise in the chain of buffers, fabrication mismatches between
the 0o and 90o phase lines, and coupling between the lines and substrate. Minimizing these
noise effects involved designing a symmetric and tight layout of the delay chain.
externally supplied parallel data
4
CLK AD
LOAD AD
D
D
Q
S
D
Q
S
D
Q
S
A
Q
S
BA
4
D
D
Q
S
D
Q
S
D
Q
S
Q
B
S
SO
4
D
D
Q
S
D
Q
S
D
Q
S
Q
C
S
4
CD
Q
S
D
Q
S
D
D
Q
S
LOAD CLK
delay chain
PLL
load
counter
0o
90o
Figure 5-27 Clocking scheme for transmitter
The top level schematic for the transmitter clocking circuitry includes
the PLL as the clock generator, a delay chain for distribution, the
registers, and the 4-1 multiplexer.
104
SEL 90o
Q
S
D
SEL 0o
D
PLL 0o
3 gates
PLL 90o
3 gates
CLK AD
SEL 0o
pulse every 4th CLK 0o edge
LOAD AD
A,D
B,C
BA, CD
SEL 90o
SO
0
200
400
600
800
Figure 5-28 Transmitter clock timing
The timing of the transmitter revolves around the delay chain which
ensures that the data and the clock flow in the same direction. The
bottom three signals clearly show how the 4-1 multiplexer interleaves to
produce the output.
D Q
LOAD CLK
LOAD CLK
LOAD BC
D Q
LOAD AD
800 ps
D Q
LOAD BC
D Q
LOAD AD
100 ps 200 ps
Figure 5-29 Load counter
The load counter divides the PLL signal by four and generates two 200
ps load pulses offset by 100 ps from each other.
105
time (ns)
5.6. Data Encoding
Transm itter
Data encoding is a general term for such techniques as:
encryption, compression, improved transition density, error
detection, channel alignment, byte alignment, DC voltage
balance, simplified clock recovery, and frame detection.
Typically, improved transition density and channel alignment are
performed on-chip although all could potentially be performed
off-chip. No encoding was performed in either Serdes I or Serdes II. See Section 5.11.1. on
page 118, for a brief study and recommendation of the 8B/10B encoding scheme.
5.7. Line Driver
Transm itter
The purpose of the line driver is to amplify the transmitter
signal, and drive the 50 Ω output line. Depending on the
specifications, this can either be a single-ended or differential
circuit [48], [36], [37]. At these speeds differential is usually the
optimum choice. The bandwidth of the circuit must be large
enough so that is will not attenuate the high frequency
components and close the signal eye. Noise is also an issue since
any phase noise introduced by the line driver will be directly realized on the output.
The line driver in the Serdes I circuit utilized a simple pad driver circuit which was
not optimized for this purpose. In Serdes II, however, the line driver was integrated into the
final output multiplexer which limited the introduction of noise. The output voltage swing
was designed to be 400 mV.
5.8. Internal Testing Circuitry
5.8.1. Serdes I
Serdes I was designed without the ability to accept external
parallel data. Instead, the data was generated pseudo-randomly on
chip, through a 16 bit linear feedback shift register (LFSR).
106
T ran sm itter
Designing a true maximal length 16 bit LFSR would create a sequence 65,535 bits
long, and because 16 bits are transmitted then followed by a single shift and repeated, the
serialized length is greater then 1 million bits. This was determined to be too long for the
simple reason that it would be very difficult to determine whether the transmitter was
working correctly, during testing. An oscilloscope can only capture so much information
and it would be nearly impossible to find the exact position within the sequence.
Instead, a four bit maximal length LFSR followed by a 12 bit shift register was
implemented. The circuit shown in Fig. 5-30, has 16 MS-latches clocked through a buffer
tree, an XNOR gate for feedback, and an AND gate to create a synchronizing signal. The
synchronizing signal, SYNC senses all zeros in the LFSR and was placed on an output pad
in order to detect the start of the sequence. The ZBIT is the final bit of the generator and
was also placed on a pad to analyze the operation of the circuit. A 4 input AND gate, not
shown in the figure, determines if the LFSR contains all ones and if so inverts the output of
the XNOR to force proper oscillation.
SYNC
ZBIT
0
1
2
3
4
5
15
CLOCK
4 bit LFSR
12 bit shift register
0000111011001010
1000011101100101
0100001110110010
1010000111011001
0101000011101100
0010100001110110
1001010000111011
1100101000011101
0110010100001110
1011001010000111
1101100101000011
1110110010100001
0111011001010000
0011101100101000
0001110110010100
Figure 5-30 Serdes I LFSR
A 16 bit, on-chip pseudo-random pattern generator consists of a 4 bit
LFSR and a 12 bit shift register. The circuit used in the transmitter is
capable of generating a 240 bit serial stream.
5.8.2. Serdes II
Off-chip testing of this serial communication system required testing equipment that
operates at the bandwidth of the transmitter and receiver. At the rates being designed for no
such equipment exists and comprehensive testing must be done on-chip. The testing
scheme that was implemented feeds the transmitter serial output directly to the receiver and
the parallel data received back into the transmitter as shown in Fig. 5-31 [43]. A single bit
offset between the receiver outputs and the transmitter inputs allows data input on Tx pin
0 to travel through the loop 16 times, and then output on pin 15 of the Rx. By generating a
107
pseudo random sequence (see Fig. 5-30) at the input and verifying that sequence at the
output, the bit error rate (BER) can be measured. The verifying circuit generates a pulse
every time a good sequence is measured. A missing pulse indicates a bit error. A divider
was added at the output so that high BER measurements could be made without high
bandwidth test equipment.
With a 12 bit maximal length LFSR, a 4095 bit sequence can be generated. Since the
total sequence must traverse the loop 16 times, a minimum BER of 10-5 can be detected
with this method. The maximum time is determined by the time length of the test.
transceiver
bit pattern verification
bit pattern
generator
LFSR
Rx
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Tx
good
pattern
Rx bit 15
reset
bit pattern
verification
Figure 5-31 True error rate detector
The TERD operates by feeding the transmitter output back into the
receiver and feeding the deserialized data back into the transmitter. A
one bit offset with an LFSR and verifier determines the BER.
The TERD requires proper channel alignment, which is accomplished through data
encoding and decoding. Since these circuits were not included in the second prototype, the
bit pattern generator was configured to feed directly into the transmitter through the pin
mapping shown in the top of Fig. 5-32 Various bits had to be duplicated, but after inversion
and separation the data is still sufficiently random.
108
CLOCK
0
1
Tx input pins
9 12 14
8
7 3 11 13 5 0 10 6 2 4 15 1
LFSR output pins
0 1 2 3 4 5 6 7 8 9 10 11
2
3
4
5
6
7
8
9
10
11
reset
Figure 5-32 Serdes II bit pattern generator
A 12 stage LFSR with feedback to three stages yields a maximal length
LFSR. A reset line was needed for use in the bit pattern verifying
circuit.
5.9. Implementation and Fabrication
T ransm itter
5.9.1. Serdes I
A -4.5 V power supply was chosen for this chip. This left
plenty of room for the three levels of logic and the active current
sources. Power minimization was not a design goal so this voltage
was not optimized. Fig. 5-33 shows the artwork and fabricated
pictures of the first transmitter design, and Table 5-1 shows the pad connections.
The chip has two inputs: the 625 MHz reference clock and a full/half rate frequency
selector. Three outputs were included to diagnose problems with the PLL and delay chain.
Two pads output the LFSR sequence and another pad outputs when the LFSR is reset.
5.9.2. Serdes II
The goal for the second Serdes chip was to correct problems from the first iteration,
combine the transmitter and receiver into one chip, and make the chip packagable.
Correcting the problems involved redesign of the VCO, and PLLs to meet the 20 Gb/s
specification. Combining the two systems allowed the development of an on-chip testing
circuit (TERD), which could perform full feedback testing. A drawback was that fewer
probe pads were available in the larger chip. Designing for packagability involved the use
109
of an array of C4 pads for flip-chip packaging. Pad drivers and receivers were developed
to accept and drive the 16 bits of parallel input and output data.
Table 5-1 Pin-out of Serdes I transmitter
Pin I/O
Description
S0
not used
S1
RF input
reference clock (625 MHz)
S2
DC input
frequency select (20 Gb/s or 10 Gb/s)
S3
RF output PLL output (5 GHz)
S4
RF output delay chain output (/8) (625 MHz)
S5
RF output delay chain output (5 GHz)
S6
not used
S7
RF output LFSR: sequence reset pulse
S8
RF output LFSR: sequence
S9
RF output transmitter out
S10
not used
S11
not used
The east half of the chip was comprised of the transmitter as shown in Fig. 5-34. High
frequency probe pads T4, and T5 were used for the differential serial out signals. The 625
MHz reference input pad, T8, and the PLL clock output pad, T9, were required for testing.
An on chip LFSR, which was part of the test system could be selected through a DC pad,
C8, to drive the transmitter. Bit 3 of the LFSR was routed to output pad T1 to verify the
proper functioning of the test system. The transmitter utilized two VCOs, which could be
multiplexed through pad, C11, into the clock synthesizer PLL. A selectable divide-by-2,
circuit driven by pad C10, was added to the output of the PLL for half frequency operation
of the transmitter. An input filter to help suppress high frequency phase noise from the
reference could be activated by pad C9.
110
S0
S6
S1
S7
PLL
LFSR
S8
S3
S9
test
S2
S5
delay chain
S4
mux
S10
driver
S11
artwork
fabricated chip
Figure 5-33 Serdes I transmitter layout and photograph
On the left is the final artwork for the first transmitter design. On the
right is a microphotograph of the fabricated part.
The receiver located on the west side of the chip, accepts differential serial data on
the two high frequency pads R4, and R5. The recovered clock, important for lock
verification, was routed to a pad R8. By using pads C3, and C4, four different
demultiplexed bits could be analyzed on pad R9 for proper operation. The test source built
into the receiver was controlled through C1 and C2, enabling three different test patterns.
The true error rate detector circuit pulsed pad R0 when a bad packet was seen and toggles
R1 when a good packet was detected.
In order to reduce chip power, the circuits were optimized around a supply voltage of
-3.3 V. This represents a 25% power savings when compared to the Serdes I -4.5 V supply.
111
Table 5-2 Bondpad pin-out of Serdes II chip
Pin I/O
Description
Pin
I/O
Description
T0
RF out duplicated data into Rx
R0
RF out TERD: bad packet seen
R1
RF out TERD: toggle every full packet T1
RF out LFSR: bit 3 into Tx
R2
Power
Vee (-3.3V)
T2
Power Vee (-3.3V)
R3
Power
Gnd
T3
Power Gnd
R4
RF in
differential serial in
T4
RF out differential serial out
R5
RF in
differential serial in
T5
RF out differential serial out
R6
Power
Gnd
T6
Power Gnd
R7
Power
Vee (-3.3V)
T7
Power Vee (-3.3V)
R8
RF out receiver clock
T8
RF in
R9
RF out selected demuxed data
T9
RF out PLL out (divided by 8)
C0
DC in
Rx test source control voltage
C6
Power Vee (-3.3V)
C1
DC in
Rx test source select A
C7
C2
DC in
Rx test source select B
C8
DC in
select Tx input source
C3
DC in
TERD: select A test bit
C9
DC in
enable Tx input filter
C4
DC in
TERD: select B test bit
C10 DC in
enable TX PLL divide-2
C5
Power
Gnd
C11 DC in
select VCO (5/10 GHz)
112
ref clock (625 MHz)
not used
16 bit input data
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
Rx
Tx
T0
T1
T2
T3
T4
T5
T6
T7
T8
T9
C0
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
16 bit output data
Figure 5-34 Serdes II chip layout and microphotograph
Shown here is the full Serdes II chip including a microphotograph in the
bottom left corner. The testing pads are located around the perimeter.
5.10.Testing Results
5.10.1. Serdes I (transmitter test results)
An output waveform captured directly from the oscilloscope is shown in Fig. 5-35(a).
It shows the bit pattern expected from the on-chip LFSR testing circuitry. The ability of the
PLL to achieve lock was very poor and a narrow pull-in range of 420 MHz to 460 MHz was
measured. The hold-in range was larger, from 393 MHz to 490 MHz: equivalent to a data
113
bit rate of 12.6 Gb/s to 15.7 Gb/s. At a bit rate of 15.3 Gb/s, the rms phase jitter was
measured1 to be 6.3 ps, or about 10% of the bit width.
(a)
(b)
Figure 5-35 Transmitter waveform (Serdes I)
(a) The output waveform of the transmitter running at 15 Gb/s with a
350 mVp-p swing. The pseudo-random pattern matches the expected
pattern from simulations. (b) An eye diagram at 15 Gb/s showing the
relatively large phase noise and its effects on the closing of the eye.
Although the transmitter was designed to operate at 20 Gb/s it performed 25% worse,
15 Gb/s, which can be attributed to two important factors. The first was a result of the VCO
loading environment, which ideally consists of equal loading with four minimum sized
buffers. It was instead loaded with two buffers on one stage, one on each of two others, and
none on the fourth2. The effect was a reduction in speed probably due to the double load on
one stage, and a non-quadrature phase mismatch between stages. The second factor was a
result of simulations that did not adequately compensate for interconnect parasitics.
Resistive and capacitive effects at these frequencies can have a profound effect on the
1. Performing a true phase noise, and jitter measurement requires a spectrum analyzer capable of an absolute reading. A time domain oscilloscope, such as the one used to collect this data, merely measures the jitter
between the signal and the trigger. If the trigger signal is correlated in time to the measurement signal then
the jitter measurement can be quite a bit less than the absolute jitter.
2. This was an oversight and was definitely not intended. The receiver which was designed a few weeks
after this had ideal loading characteristics. This improved its response and left the transmitter and receiver
with two non-overlapping frequency ranges.
114
overall speed of the chip. Lack of time and understanding for these simulations produced
slower than expected results.
Both of the issues discussed were addressed and solved in Serdes II. The loads on the
transmitter and receiver VCOs were carefully checked to make sure loading was balanced
and minimal. Interconnect simulations produced better designs in critical circuits such as
the VCO and PLL. A wide margin was introduced in the design of the VCO to account for
unknown effects.
5.10.2. Serdes II (transmitter test results)
The Serdes 2 design was successful in attaining the 20 Gb/s target bit rate. The
relevant eye diagram is shown in Fig. 5-36. The output voltage swing is 350 mV and the
eye is 30 ps wide and 200 mV high. This represents a big improvement from the original
design, which failed to meet the specifications. The eye diagram is also much cleaner and
symmetric with less total rms jitter.
Figure 5-36 Serdes 2 transmitter eye diagram
Shown here is an eye diagram at the target 20 Gb/s. It shows an opening
30 ps wide and 200 mV high.
115
The PLL has a wide pull-in range from 3.6 to 5.3 GHz (14.27 to 21.58 Gb/s), which
is more than 75% of the total frequency range of the FFI VCO. The hold-in range is
identical to the pull-in range, indicating a well balanced and nearly optimal PD. When using
the higher speed VCO the pull-in range changed to 5.4 to 7.6 GHz, yielding an upper data
rate of 30 Gb/s.
Jitter measures the accumulation of transition offsets over a given length of time. For
an open loop, without a PLL, a clock will have exponentially increasing jitter with respect
to time. When placed in a PLL, the jitter levels off and becomes constant after one
bandwidth time constant. For the Serdes 2 PLL, the jitter was measured with the time
domain oscilloscope at 4.3 ps with the reference signal and 2.9 ps without. This indicates
that considerable jitter was being introduced by the signal source.
Fig. 5-37 shows the phase noise spectra of the open loop VCO, the open loop
reference, the open loop reference plus 18 dB and the closed loop PLL. The reference plus
18 dB is the effective phase noise seen at the input to the PLL. The PLL closed loop phase
noise behaved as expected. First, at low frequencies the phase noise approached that of the
reference. This phase noise was expected since this was well below the loop bandwidth of
6.2 MHz and the PLL is able to track out the VCO leaving just the reference noise on the
output. The difference between the PLL and reference phase noise is likely from noise
introduced in the loop filter. Close to the loop bandwidth of 6.2 MHz, the sum of both the
reference and VCO noise contributed to the total noise. And above the loop bandwidth, the
phase noise should follow closer to the VCO phase noise and that is what was seen.
A more accurate way to measure jitter is in the frequency domain. This enables the
removal of the in-band low frequency jitter, which is easily removed by the receiver PLL,
from the rms jitter measurement. Integrating the PLL phase noise plot from 100 kHz to 100
MHz gives an rms jitter of 1.4 ps. This value is lower than the 4.3 ps found with the time
domain oscilloscope, which indicates that a larger amount of low frequency jitter can be
found in the reference signal.
The preliminary specification for OC-192 SONET indicates that the maximum
acceptable jitter must be less then 0.09 UI (Unit Interval) for 1012 bits. Finding the
associated rms jitter involves integrating the Gaussian probability density function (pdf)
116
from x to infinity and setting the result equal to the bit error rate of 10-12. The value of x is
about 7.5 standard deviations, yielding a rms jitter specification of 1.2 ps at 10 Gb/s.
Although the transmitter jitter of approximately 1.4 ps is larger than the SONET
specification of 1.2 ps, this circuit was not designed with SONET in mind. For short-haul
communications higher jitter is more acceptable.
-60
VCO open loop
-70
Phase Noise (dBc/Hz)
-80
-90
PLL closed loop
-100
ref - 18 dB
-110
-120
reference open loop
-130
-140
0.1
1
10
100
Frequency (MHz)
Figure 5-37 Tx PLL measured phase noise spectra
The PLL closed loop behaved as expected with the PLL tracking out the
VCO noise at low frequency and following the VCO noise at high
frequency.
5.11.Future Design
The extremely large scope of this project left a number of areas of research untouched
and undeveloped in the first two fabricated designs and the third simulated design. The
basic elements of the transmitter were designed with optimizations and research performed
only in specific areas. The remainder of this section describes key areas that are
recommended for future effort in order to establish these designs as highly functional,
useful, production-worthy designs.
117
5.11.1. 8B/10B Encoding
8B/10B encoding solves such issues as transition density imbalance, error detection,
command insertion, and DC balancing [26], [35]. It does so by adding an additional two
bits of additional information for every eight bit input and requires a 25% increase in speed
for the same information throughput.
The frequency of transitions in the data is a very important factor in the design of the
receiver. In general, the more transitions provided to the receiver, the better the PLL’s
ability to lock into the serial stream. 8B/10B encoding guarantees a maximum run length
of five bits, and a lowest transition density of 30 transitions per 100 bits. Defining a
minimum density makes it easier to model the data stream arriving at the receiver.
Another feature of the encoded stream is an equal number of ones and zeros. This
allows all single bit errors to be detected. In addition, because of the much larger 10 bit
word space, the decoder can detect undefined words and flag them as errors.
The DC balance is the average of the number of ones and the number of zeros. For
high speed optical links, it is very desirable to have a DC balance of 0.5, which corresponds
to an equal numbers of ones and zeros. This stabilizes effects, such as heating in the optical
circuits, which can be a function of the sign of bits being sent. 8B/10B guarantees a DC
balance of 0.5 because it forces equal number of ones and zeros per character.
Since data encoding occurs at the parallel data rate of 1.25, Gb/s the necessary
circuitry can be designed completely in CMOS. This reduces power, and space
consumption, and allows the use of powerful EDA tools for layout and design.
An additional role for 8B/10B encoding is for channel alignment, which guarantees
that the bit 0 of the Tx is connected to bit 0 of the Rx. This requires a 16 bit rotator with a
detection mechanism to rotate the streams until they match.
5.11.2. Transmitter data retiming
A technique that can be used to reduce the output phase jitter of the transmitter is to
clock the output signal directly from the PLL through an MS-latch. This retiming circuit
alleviates all the noise introduced by the multiplexers and provides the minimum signal
path between the transmitter serial output and the PLL.
118
A significant source of jitter on the output data is called deterministic jitter. It is the
result of non-periodic data induced noise. Pull-up resistors at the top of CML trees are a
common source because as current flows through the resistor they heat up; warmer resistors
produce higher rms noise. The ultimate effect is that the noise becomes dependent on the
data stream. A stream with a large number of zeros will have a higher noise component than
one with an equal number of ones and zeros.
The problem with data retiming is that it requires a latch that can operate at the
functional speed of the transmitter. In this case, that speed is 20 GHz, and if some encoding
is introduced then it can be as high as 25 GHz. Simulations show maximum operation of a
latch to be unreliable above 15 GHz. This is a result of the large delay through the two CML
tree gates and the feedback that is inherent in these circuits.
Although direct data retiming is unattainable unless a much faster latch is found,
other improvements can be made. Since the final 4-to-1 (symmetric) multiplexer defines
the output jitter, an improvement would be to drive the multiplexer directly by the PLL
rather than through the timing delay chain. This adds to design difficultly because the
timing of the entire transmitter is running opposite to the timing of the data. The primary
benefit of this method is the reduction of five buffers of phase noise introduced by the delay
chain.
Current Method
Proposed Method
data
data
transmitter
transmitter
clock
PLL
clock
(a)
PLL
(b)
Figure 5-38 Data and clock timing
By moving the PLL to the input of the multiplexer (b), the clock must
run opposite the data. This creates timing difficulties but decreasing the
output phase noise of the transmitter.
5.11.3. LC Oscillator
The primary drawback to using the FFI ring oscillator in the transmitter is its very
poor phase noise characteristics. LC oscillators have much higher quality factors and
119
considerably less phase noise and jitter [21],[22],[44],[45]. One problem with typical LC
VCOs is that they only produce a single phase clock, but the transmitter architecture in this
research requires a clock and its quadrature. A possible option, and an area for further
research is in multiphase LC oscillators [46],[47]. They have the best of both worlds: low
phase noise, and quadrature outputs.
120
6
Design of the Receiver
Receiver
6.1. Project History
The first receiver (Serdes I) was designed for fabrication
in February 1999 and only had a 1-to-4 demultiplexer and clock
extractor. Various improvements and optimizations yielded
Serdes II, which was a more efficient design, capable of full 16 bit demultiplexing and
external data input.
6.2. Receiver Architecture
4
data
16
Phase
Detector
(PD)
demultiplexed
data
loop filter
(PI control)
VCO
8 phases
Figure 6-1 Top level receiver architecture
The receiver is a PLL with a PD, called a transition detector, a PI loop
filter, a VCO, and a demultiplexer to extract the NRZ bits from the
serial data.
The receiver is a PLL and demultiplexer that locks an internal VCO to externally
supplied data and extracts the non-return-to-zero (NRZ) bits from the data. Data arrives
serially as a differential signal and is buffered in preparation for driving the PD. The
information collected about transition phases is combined and fed into a proportional and
integral loop filter. The filtered signal is used to drive the VCO to a frequency which
matches the frequency of the external data. In addition to collecting timing information, the
121
PD also performs a 1-4 non-aligned demultiplexing of the data. Another circuit, also driven
by the VCO finishes the demultiplexing and generates 16 bits of parallel data.
6.3. Receiver PLL
Receiver
The receiver PLL is considered a clock and data recovery
(CDR) circuit and has the primary role of extracting the data bits
from the serial signal and ensuring that the extracted bits are not
corrupted. The process is made more difficult than in a standard
PLL, because random or pseudo-random data has no guaranteed
transition times. The 3-state and XOR PDs used in the transmitter
PLLs, for example, can only operate with periodic signals. A specialized PD that can
handle non-periodic information and allow a VCO to lock to the fundamental frequency of
the data is required. Merely locking the VCO to the data’s frequency is only half the
problem. The system must also sample, or extract the information contained within the data
stream, using the recovered clock
The receiver designs for Serdes I through III, all utilize a transition detector (TD) PD.
It twice oversamples the data signal and generates a digital measure of the phase difference
between this signal and the clock. It essentially indicated whether the clock is too fast or
too slow relative to the data. With this information, lock can be acquired and because of the
nature of the sampling, data can easily be extracted. The problem with this PD, which was
addressed in the third prototype, is the very small pull-in range of the PLL. Without an
analog measure of phase difference, the clock and data frequencies have to be very close
for the PLL to pull-in.
Fig. 6-2 depicts block diagrams for the three receiver prototypes. The first and second
designs differ in the integrator design, and the VCO. The third integrates an entirely new
loop that is very good at acquiring frequency lock but poor at extracting the data, into the
PLL [14], [30], [51]. Together with the TD PD, the PLL’s pull-in range is greatly increased
without any sacrifice in performance.
122
data
Serdes I
transition
detector (PD)
FET
charge
pump
VCO
data
gain
block
transition
detector (PD)
negative
impedence
charge pump
Serdes II
VCO
gain
block
reference
Serdes III
3-state PD
VCO
transition
detector (PD)
data
gain
block 2
negative
impedence
charge pump
VCO
gain
block 1
Figure 6-2 Receiver PLL evolution
The receiver PLL has gone through two major improvements. The first
design utilized a FET charge pump which was replaced with a negative
impedance charge pump in the second design. The third prototype added
a referenced frequency detector which greatly improved the pull-in
range of the loop.
123
6.3.1. Phase Detector
6.3.1.1. Transition Detector (PD)
Data transitions provide the only means to measure the phase of the incoming serial
data. If the data were periodic then we could be assured of a transition at a specific time and
directly compare it with a coincident VCO transition, similar to the clock synthesizer PLL
in the transmitter. However, data by definition, is non-periodic and transition locations
cannot be assured at any time. For example, data containing ten ones followed by twelve
zeros, containing only two transitions, could be received. Since a transition between bits
cannot be guaranteed, there must be no action when no transitions are received and tracking
must be performed when transitions are received.
The aspect of the clock recovery circuit that had critical implications on its
development, was the use of the same eight phase ring oscillator used in the transmitter. It
was felt that by matching the oscillators in the transmitter and receiver, they could be
ensured to operate at the same speeds and the development of only one VCO would be
required.
Running at 5 GHz, either the CS, or FFI VCO generates eight unique phases (0o, 45o,
90o, 135o,...)1 each separated by 25 ps. Serial data, arriving at 20 Gb/s can be broken up
into bits 50 ps wide. Taking complete advantage of the multi-phase clock, the data is
sampled every clock phase resulting in a twice oversampling receiver scheme. In other
words, for every bit, two samples of the signal will be taken.
Sampling is handled by eight MS-latches whose clock inputs are tied to one of the
eight clock phases (see Fig. 6-3). In the locked and stable condition, four of the latches
sample at the center of the bits and return data information while the other four sample on
the transition and return timing information only. If the latches are labeled consecutively
by their clock phase inputs, W, X, Y, Z and their inverses, then the data latches are W, Y, W,
and Y, while the timing latches are X, Z, X and Z.
1. Although the VCO has only four unique outputs the inverse of each of them yields the remaining four
phases.
124
S
dataB
transition
location
detector
transition
detector
F
D
Q
DQ
F
F
sampling
latch
phase
slice
D
Q
serial stream
Y
X
dataA
S
Z
DQ
W
VCO
dataC
S
W
DQ
Z
X
D
Y
Q
D
F
DQ
Q
F
dataD
S
Y
Z
W
75 ps
100 ps
X
50 ps
W
X
Y
Z
W
serial
data
200 ps
25 ps
0 ps
⊗ X = FAST
X ⊗ Y = SLOW
Y ⊗ Z = FAST
Z ⊗ W = SLOW
W ⊗ X = FAST
X ⊗ Y = SLOW
Y ⊗ Z = FAST
Z ⊗ W = SLOW
W
X
Figure 6-3 Receiver topology
The receiver is made up of eight MS-latches, each tied to a unique phase
of the VCO. Since each phase is separated by 25 ps, the data is twice
oversampled, and thus, able to extract transition timing information
from all edges. FAST or SLOW in the diagram is a command to the
VCO.
Fig. 6-4 shows a detailed look at the transition detector used in Serdes I. Data is
latched with L1 using Φn, the n-th buffered phase of the VCO. Φn and Φn+1 are consecutive
phases of the VCO, separated by 25 ps, or 45 o, and Φn is equal to Φn+8.The sampled data,
125
sn,
is XORed with the sample from the previous detector, sn-1, and retimed with L2. The
clock input to this latch comes six phases later, or after 150ps, in order to allow the output
of the XOR to settle to the correct value. tn, the output of L2, indicates whether a transition
has occurred during this phase slice. The total time that the tn signal remains high is
dependent on the period of the VCO and whether additional transitions are detected in this
phase slice. With the VCO running at 5 GHz, the minimum time that tn is high is 200 ps.
This circuit is then repeated eight times to collect transition information from every
transition.
sn-1
data
Φn’
Φn sn
MS-latch
D Q
L1
Φn
D Q
tn
sn-1
L2
sn
Φn+6
Φn+6
tn
Figure 6-4 Transition detector in Serdes I
The first iteration of the transition detector had a latch to sample the
data. This sample and the sample from the previous detector are XORed
together and latched again to produce the transition detector signal.
The phase plot in Fig. 6-4 shows a transition detector on the X (45o) phase. It uses
samples from itself and from the previous detector to detect transitions within the shaded
region. The XOR of these signals is clocked six phases later.
One of the issues that defines the performance of this circuit is the time between when
the data is sampled and when the detected-transition signal changes. Assuming a 20 ps gate
delay, the approximate time is 170 ps. And since the transition detected signal is high for
200 ps, the effect of a single transition lasts for a total of 370 ps after the sample, which is
equivalent to 7 bits. This is important, because during lock it is desirable to have the
frequency of the VCO adjust as quickly as possible after a transition is detected. The digital
nature of this circuit results in discrete changes to the VCO output, so oscillations are
natural when in lock. If the PD delay is large then these oscillations will also increase, as
the VCO’s frequency continuously overshoots and undershoots. A further analysis of this
phenomena can be found in Sec. 6.3.2. on pa ge130.
The motivating factor in the design of Serdes II’s TD, shown in Fig. 6-5, was to
reduce the delay through the detector. In the first prototype this time was 170 ps, which
126
directly effected the ability of the PLL to maintain and acquire lock. In order to improve on
that design a look at the timing requirements of the XOR was required.
The two level nature of the XOR gate requires the level 2 input to precede the level
1 input by approximately 10 ps. The time between sampled data sn-1 and s is equal to 25
ps, and with the additional 5 ps of delay introduced by the level 2 output of the MS-latch a
total of 30 ps is found between the level 2 input to the XOR gate and the sn-1 signal. When
40 ps of buffer delay is added to the sn-1 signal, a time delta of 10 ps between the inputs of
the XOR gate is realized.
tn
sn-1
data
Φn’
D Q
1
2
sn
tn
sn
Φn
sn-1
L1
Figure 6-5 Transition detector in Serdes II
Optimization of the transition detector allowed the removal of the
second MS-latch and reduced the total delay by 75%. This circuit is
simplified and requires a less complicated layout.
When the timing is optimized to this extent, the necessity of the second MS-latch, L2,
is removed. The same 200 ps pulse is created, but the total transition detector delay has been
reduced from 170 ps to 40 ps. An additional benefit is in the simplified layout of this circuit;
only one clock phase is required. In the Serdes I circuit, a complex routing scheme was
required because two phases were necessary.
The gain of the transition detector is not clearly defined because of the digital nature
of the circuit. When the phase difference is greater than zero, it will generate a slow pulse
and when less then zero, it will generate a high pulse. There is no linear relationship
between phase and output. Instantaneous gain must therefore be defined to be infinite. The
average gain, however, is not infinite and can be found when a statistical distribution of
transitions or jitter is introduced.
A real data signal does not have perfect transition separation but instead has
transitions separated according to a constant plus a random gaussian variable. This jitter
acts as “transition fuzz” which effectively gives the PD gain. The process of calculating this
gain is shown in Fig. 6-6 for both a uniform and Gaussian distribution. Fundamentally, it
127
comes down to subtracting the two areas created by splitting the probability density
function (pdf) around zero, after setting a specific mean and standard distribution. For
Gaussian jitter, an approximation; the gain is assumed linear based upon a line that passes
through the point at one standard deviation.
instantaneous
PD output
average PD
output
A
θ
-A
θe
uniform transition
distribution (pdf)
STD = σ
0.58
σ
average phase
error
gaussian transition
distribution (pdf)
STD = σ
θ
K d' = A 0.58
---------σ
0.68
σ
K d' = A 0.68
---------σ
Figure 6-6 Gain of transition detector with data jitter
Solving for the gain of the transition detector must take into account the
fact that the data has jitter. This jitter spreads out the transitions
producing an average PD output.
In order to include the effect of the transition density (tpb = transitions per bit), Kd is
multiplied by tpd. A factor of four must also be included to account for the fact that a
slow/fast pulse is carried across 4 bit widths. This yields the final transition detector gain:
K d = Vp 0.68
---------- 4 ( tpb ).
σ
σ = σ t 2πrad
---------------100ps
(6-1)
In the Serdes I implementation with a pulse size, Vp, of 300 mV, a transition density of 1/4
and an rms jitter value, σt, of 4 ps, the detector gain equals 811 mV/rad. In the Serdes II
transition detector, the pulse size was reduced to 40 mV yielding a smaller gain of 108
mV/rad.
128
6.3.1.2. NRZ Phase/Frequency Detector (PD/FD) (Hogge)
The digital nature of the transition detector PD and its phase response, yields a very
poor pull-in range. When lock is acquired, however, this PD has very strong noise
immunity, and an inherent ability to extract data from the signal. The Hogge PD helps the
poor pull-in range but has no net effect on the TD PD properties. Its use, in conjunction with
the transition detector PD, was evaluated but not implemented for Serdes III.
The schematic of the Hogge PD is shown in Fig. 6-7 [52], [53] which operates on the
NRZ data and generates an analog signal based upon the difference between it and the
VCO. Data, vi, must arrive at half the frequency of the clock, vo, for the PD to operate
correctly. This is accomplished by dividing the input data signal down 4 times. This has the
negative effect of removing every three out of four edges. The two latches and the va XOR
gate retime the data by creating pulses based on data transitions but timed to the clock
transitions. The vb XOR gate, on the other hand, has a similar waveform but the edges are
timed with the data transitions. The dc component of the difference between these two
signals yields a measure of the phase difference.
critical delay
vi
vi
∆θ
D
vo
vb
Q1
vo
vd
Q1
D
va
Q2
vb
vd
−π
Q2
∆θ
π
va
for 50% transition density
Figure 6-7 Phase detector for NRZ data
This circuit shows one technique for detecting phase for NRZ data in a
PLL. The bit rate of the data and frequency of the clock must be the
same. The output is taken differentially and yields an continuous analog
signal as a function of phase difference.
129
The most important aspect in implementing this PD was maximizing the figure of
merit. It this case it is defined by the range of pulse widths expressed in vb against the
constant width of va pulses. Ideally, the widths of vb would range from 0 to twice the width
of a va pulse. Finding this solution required a fine adjustment of the critical delay, which is
approximately the delay through an MS-latch. By minimizing the integral of the vd versus
∆θ plot
over a full 2π radians, the figure of merit can be maximized.
The gain of this PD is a function of the transitions per bit (tpb) for the incoming data
stream. For a 11001100... stream, the tpb is equal to 0.5. From simulation, the gain was
found to be 80 mV/rad/tpd, which includes the divide-by-4 circuit.
Ultimately this PD was not used because it was exceeding difficult to optimize the
delays in the circuit. Slowing down the clock and data was the only way to correct the
problem and as a result the pull-in range suffered. The Serdes III implementation addressed
the small pull-in problem by using an external reference signal.
6.3.2. The Loop Filter
Receiver
The purpose of the loop filter is to take the digital
transition information from the eight transition detectors and
create an appropriate VCO signal. The transition detectors yield
relative information in regards to data and clock phase offset, so
an integrator is required. An integrator alone is insufficient in the
loop, so a proportional factor is summed with the integrator
output. Together the proportional and integral control comprise the PI loop filter.
Although the loop filter in Fig. 6-8 is expressed as a integral and proportional gain it
can also be expressed by the pole-zero equation
s+ω
K h --------------2s
K h = KP
K
ω 2 = ------I
KP
(6-2)
where ω2 is the loop zero and Kh is the high frequency gain.
Unlike the frequency synthesizer in the transmitter, the integrator and proportional
gain components must operate at the frequency of the clock and accept four faster and four
slower signals. This necessitates the use of specialized circuits able to handle the much
130
higher frequency. The Serdes III design, although slightly more complicated, still contains
the basic components shown in Fig. 6-8.
phase
detector(s)
4
KI/s
Ko
fa
st
er
sl
ow
er
loop filter
4
VCO
KP
8
Figure 6-8 Receiver loop filter
The receiver loop filter accepts eight “digital” signals from the
transition detectors and produces an analog control signal for the VCO.
6.3.2.1. FET Charge Pump / Proportional Control (Serdes I)
The charge pump integrator shown in Fig. 6-9 utilizes four field effect transistor
(FET) pairs to place and remove charge from the capacitor. Each FET can act
independently of the others, so one could be adding charge while another is removing it.
Careful consideration assured that the nFET and pFET sizes were chosen to have matching
currents.
Each FET draws on average 60 µA during one complete period of the clock. With a
300 mV input from the PD this corresponds to a 0.0002 1/Ω gain from the FETs. With C f
equal to 4 pF, a slow/fast pulse will change the capacitor voltage by ± 3 mV. Dividing the
FET gain by the capacitance yields the integrator gain K I = 50 Mrad/s.
Proportional control, on the other hand, is handled through eight differential
switches, one for each fast and slow PD output, with one branch tied together to form a
single-ended “analog” signal (Fig. 6-10). By default, without any fast or slow signals, all
fast trees will pull 0.75 mA through the pull-up resistor Rcc and all slow trees will pull 0
mA as shown in Fig. 6-10. In this way, the voltage across Rcc will increase when a fast
signal is received and decrease when a slow signal is received. Rcc was set to 100 Ω, which
produces a 75 mV change for each input pulse. The emitter follower tied to Rcc only
introduces a DC offset to interface properly with the summing junction. Designed similarly
to the integrator, the proportional circuit inputs are all able to operate independently.
131
Vcc
This MOSFET is
designed to balance the current
drawn from the
base.
Cf
S1
S4
F1
F4
4 MOSFET pairs
Vint
-2 V
S: A slow signal places a charge packet on the capacitor.
F: A fast signal removes a charge packet from the capacitor.
Figure 6-9 MOSFET charge pump integrator
The FET transistors in this circuit act as current switches removing and
adding charge to a capacitor. This action integrates the slow and fast
inputs.
R cc
Vint
F1
F1
S1
S1
aVref
(VCO)
summing junction
repeated 4 times for each S/F pair
Figure 6-10 Proportional control and summing junction
This circuit provides the proportional gain for the loop filter and sums
the result with the signal from the charge pump integrator. This
ultimately drives the aVref control voltage for the VCO.
For each 300 mV input pulse, the output of the proportional control circuit changes
by 75 mV. This corresponds to a proportional gain, Kp, of 0.25. The summing junction
combines the outputs of the integrator and the proportional gain stage. It introduces an
132
additional gain of 0.286 into the total gain of the loop. Given the gain derived above the
loop filter has a zero, ω2, at 32 MHz and a high frequency gain, K h, of 71.5 m. Collecting
all the gains from this circuit and multiplying by the pulse period shows a ±0.7 ο phase
change of the VCO for every slow/fast pulse.
6.3.2.2. Negative Impedance Charge Pump (Serdes II)
The goal for the receiver in the Serdes II implementation was to replace the FET
charge pump and proportional control with a much simpler negative impedance charge
pump, while keeping all the PLL parameters the same. There were problems associated
with the FET pump including: poor high frequency response, difficulty in matching pullup and pull-down components, high capacitance discharge, and significant complexity. The
negative impedance pump solved all of these problems with a smaller and simpler circuit.
Using the circuit in Fig. 5-21, equations (5-7)-(5-10), and the loop natural frequency,
zero, and pole of 25 MHz, 6.4 MHz, and 102 MHz, respectively, C1 = 575 pF, C2 = 38 pF,
and R = 43 Ω. A high frequency pole was added to reduce spurious modulation and reduce
the clock jitter and had little effect on the overall loop response.
6.3.2.3. Mixed Loop (Serdes III)
The primary design goal of the third Serdes implementation was to improve the poor
pull-in range of the transition detector that was due to its non-linear nature. This resulted in
the serial data frequency being required to be very close to the nominal frequency of the
VCO for pull-in to occur. Given a specific bit-rate this can be very difficult to design across
all thermal, process, and implementation deviations.
An initial approach utilized a down-counted data signal fed into a separate Hogges
style NRZ PD (Section 6.3.1.2. on page129). The idea was to utilize a second PD that had
a larger pull-in range and could be coupled with the TD PD loop for a better overall pullin range. This NRZ PD proved to be difficult to design due to very strict delay requirements
and it did not significantly improve the pull-in range.
A second approach used an additional loop which accepts a reference at the (bit
rate)/8 and was designed to respond identically to the loop in the transmitter (Section 5.4.
on page 82). The loop filter output is summed with the transition detector of the original
133
loop to create the VCO’s control voltage as shown in Fig. 6-2 on page123. The purpose of
the new loop is to acquire frequency lock, which pulls the first PLL into lock because of
the common integrator. The second loop is able to acquire solid phase lock once within its
lock-in range and then begin to extract data.
The parameters for the new loop are identical to those previously used. The only
remaining design choices are the gain of the TD PD, and its filter. Choosing an appropriate
gain for the transition detector involves a trade-off in bit error rate and the lock-in range.
At one extreme, a large gain will give the PLL a large lock-in range that is approximately
equal to the bandwidth of the loop. For instance, a doubling of the PD gain will result in a
doubling of the lock-in range. This higher gain however, results in a higher bit error rate
(BER) because of the large phase correction. On the other extreme, a small gain will limit
the bandwidth and the lock-in range, but reduce the error rate.
The effect of a large gain on BER results from consecutive transitions that are jittered
in one direction causing an accumulation of phase change. The mean frequency of the data
and of the clock are assumed to be constant, an assumption that reasonable over the few
transitions needed in this analysis.
The BER of single bit errors is given by Q (jitter > 25 ps) which is equal to 3x10 -15
for an rms data jitter of 4 ps, and bit width of 50 ps. Q(x) is the integral from x to infinity
of the normalized Gaussian probability density function (pdf). If the BER introduced by the
TD is less than this value, then its effects can, in general, be ignored.
The TD introduces a ∆t ps phase change per transition. The worst case scenario for
an error is when enough phase changes bring the clock phase to 12.5 ps from consecutive
data jitter followed by a jitter of -12.5 ps in the other direction. In such a case the phase
difference between the clock and the data will be 25 ps. Solving for this is best done by an
example. Assume ∆t equals 5 ps.
Q( jitter > 0 ps ) = 5x10-1
Q( jitter > 5 ps ) = 6x10-2
Q( jitter > 10ps ) = 9x10 -4
Q( jitter > 15ps ) = 1X10 -6
Q( jitter < 10ps ) = 9x10 -4
--------------------------total probability = 3x10-14
-- make 5 ps phase adjust
-- jitter must be > then 5ps
-- ... and so on
-- bit error!
For this example, there were four consecutive “jitters” in the positive direction,
causing a clock phase change of 25 ps. They were followed by a jitter of 10 ps in the
134
opposite direction. The probability of these individual events are multiplied together to find
the total probability for an error from this chain of events. For the same analysis, but with
∆t equal to 4 ps the result is 7x10-19. In conclusion as long as ∆t is kept below about 4 ps
then the effect of accumulated jitter on phase will be smaller than the chance of a single bit
error, and can be ignored.
Without an integrator in the loop, the VCO control voltage can not exceed the
maximum swing of the TD. Given a 1010 sequence at 20 Gb/s (tpb=1), there would be four
overlapping pulses of magnitude ∆t, which, when multiplied by the VCO gain, yields the
frequency deviation. This defines the lock-in range of the TD loop and is equal to
ω L = ∆vK o ( 4tpb )
(6-3)
where ∆v is the magnitude of the voltage pulse from the TD. The factor of 4tpb takes into
account the fact that the TD has no effect on the frequency if there are no transitions. The
more transitions, the larger the potential frequency deviation. Relating a voltage change to
an associated time change yields
∆t f c 2
-.
∆v = --------------Ko
(6-4)
Combining the previous two equations to find the lock-in range as a function of ∆t results in
ω L = 2∆t ω c (24tpb ).
(6-5)
where ωc is the clock frequency.
Typical specifications for a receiver of this type provide for a reference signal which
is within 100 ppm of the frequency of the data. Using a more conservative value of 1000
ppm gives a maximum reference deviation of 20 MHz. Using this value in (6-5) gives a
minimum ∆t of 0.4 ps.
For the final implementation, a value of 0.6 ps was chosen for the phase correction
for every transition. The lock-in range is therefore 30 MHz at a 0.25 transitions per bit. This
relates to a 4 mV pulse which is generated within the TD by combining the eight slow and
fast signals through a common set of pull-up resistors. The resistors were set at 5 Ω with an
0.8 mA current source in each tree.
135
6.3.3. PLL Loop Response
6.3.3.1. Serdes I (FET charge pump)
The total loop gain or bandwidth is found through a product of the VCO gain, K o =
3.14 Grad/s/V; the PD gain, Kd = 811 mV/rad; and the loop filter gain, Kh = 71.5 m and is
equal to 29 MHz. With the loop zero at 32 MHz this yields a damping factor
K
ζ = 0.5 -----ω2
(6-6)
equal to 0.5 which is underdamped with an overshoot of 30%. For all higher transition rates
the PD gain will increase and increase and improve the damping factor.
Fig. 6-11 depicts the Serdes I PLL locking into a 6.1 Gb/s (tpb = 0.25) data stream.
Using an AHDL program the data was given an rms jitter of 4 ps, which is approximately
the amount produced by the associated transmitter. Up until 5 ns the PLL is pulling-in and
after 10 ns lock-in has occurred. The large deviations around 6.1 GHz are due to the
proportional control mechanism pulsing the frequency to cause a phase correction. During
the phase correction the integrated is forcing the average frequency to equal that of the data.
The non-linear “digital” nature of the PD results in a very limited pull-in range. From
simulation through various initial frequency offsets yields a range of about 2%. The holdin range on the other hand is quite large due to the integrator.
6.3.3.2. Serdes II (negative impedance charge pump)
Fundamentally, the Serdes II implementation was very similar to the Serdes I
version. The key parameters, including loop bandwidth, were kept the same though a
slightly different PD, an improved loop filter, and an improved VCO were used. Because
of this, the response is nearly identical to the Serdes I design shown in Fig. 6-11.
136
6.17
6.16
Frequency (GHz)
6.15
6.14
6.13
6.12
6.11
6.1
6.09
6.08
6.07
6.06
0.0
5.0
10.0
15.0
20.0
Time (ns)
25.0
30.0
35.0
40.0
Figure 6-11 Serdes I loop locking in
This plot shows the Serdes I receiver VCO locking into 6.1 Gb/s, 4 ps
jitter data. Once frequency lock is established the proportional pulses
oscillate around the target frequency.
6.3.3.3. Serdes III (dual-loop / referenced loop)
The Serdes III implementation has two loops: one independent loop that dictates the
frequency, and a second dependent loop that phase locks to the incoming data. Fig. 6-12
shows the frequency loop locking in to a reference signal at 750 MHz which is a 6 GHz
clock. Because the same PLL was used in the transmitter of the Serdes III implementation,
the acquisition plots shown in Sect i on5.4.6.3. on page101 show behavior identical to the
operation of this frequency loop.
Also shown in Fig. 6-12, is the phase plot for the phase loop locking in to data with
tpb = 0.25. Lock-in occurs when the clock frequency is about 6.02 GHz, which is within 20
MHz of the clock frequency. It was expected that lock-in would occur when the clock was
within half of 30 MHz or 15 GHz.
The noise seen on the locked-in phase plot is from 4 ps rms jitter added to the data
through an HDL model (Appendix E.5. on page 183). This enabled a more accurate and
faster simulation. The choice of jitter is directly related to the jitter produced by the
transmitter, with the assumption that the channel introduces little noise.
137
350
6.08
6.06
250
6.04
200
6.02
frequency
150
6.00
100
5.98
50
5.96
0
5.94
100
0
20
40
60
80
Clock Frequency (MHz)
Sampling Phase (deg)
phase
300
Time (ns)
Figure 6-12 Frequency and phase lock-in of Serdes III Rx PLL
The dual loop nature of the Serdes III Rx PLL allows an independent
referenced loop to frequency lock close to the data frequency. The
second loop phase locks when the data and reference frequencies are
within 0.3% of each other.
6.4. 4-16 Demultiplexing
The
transition
detector
Receiver
naturally
performs
4-16
demultiplexing. It has eight sampling circuits, four of which are
actual data. Each of the data bits are available sequentially and
as such, all four are valid for only one bit time: 50 ps at 20 Gb/s.
This can make timing very difficult.
Serdes I was not capable of performing the 4-16
demultiplexing. It could only output the four sampled bits directly off the detector.
The demultiplexer added to Serdes II is shown in Fig. 6-13. It uses four 4-bit MSlatches each separately clocked by four phase offset clocks. The clocks are generated with
a counter driven by a phase from the PLL. The latches simultaneously sample the 4-bit data
from the transition detector. The transition from the fourth bit, followed by the transition
138
from the first bit, dictates the window that the clock has to sample the data. Delays on the
clock lines had to be carefully balanced and tightly controlled to ensure that the bits were
sampled at the correct time.
Φ1
da
db
demultiplexed data
transition
detector
dc
Φ2
dd
Φ1
Φ2
Φ3
clock
window
Φ4
Φ4+ττ
Figure 6-13 4-16 demultiplexer architecture
The demultiplexer accepts the set of four bits from the transition
detector and samples each set into four separate registers. Once 16 bits
are captured those registers are resampled by a 16 bit register to produce
the final output.
After all four latches contain a total of 16 bits, another bank of latches resamples all
the bits at once. This register uses the fourth clock, Φ4, plus a small delay. This delay should
be longer than the delay through the first register to capture the 4th bank correctly. The
delay must also be shorter than the time when the 1st bank is sampled. For a 20 Gb/s system,
the clock has a 200 ps window and was placed as close to the center as possible.
6.5. Registers and Decoding
Often a First In First Out (FIFO) system is added to the
output of the demultiplexer. This reduces the timing constraint
on the circuit that reads the 16 bits of parallel data off the chip,
through the use of a separate load clock. A FIFO was not
139
Receiver
included in either Serdes I or Serdes II in which the output data is only latched in the 4-16
demultiplexer.
Data decoding is a general term for such techniques as decryption, decompression,
error detection, channel alignment, byte alignment [38], DC voltage balance, simplified
clock recovery, frame detection [33], and so on. No encoding was performed in either
Serdes I or Serdes II. See Section 5.11.1. on page 118, for a quick study and
recommendation of the 8B/10B encoding scheme.
6.6. Line Receiver
Receiver
The line receiver accepts serial data at up to 20 Gb/s. Its
bandwidth must be wide enough, usually 50% higher than the 10
GHz fundamental, to ensure that the data is reproduced
accurately [14], [48], [36], [37], [49].
The Serdes I line receiver consists of a simple singleended pad receiver, and is not optimized for bandwidth. The
Serdes II circuit is fully differential and consists of a 6 µm buffer with emitter followers
and 50 Ω termination resistors.
6.7. Test Circuitry
Receiver
6.7.1. On-chip test pattern generation
Testing the receiver, by itself, at speed is impossible
without a 10 GHz differential signal generator to drive the data
inputs. In order to eliminate reliance on external testing
hardware, the necessary generator was added internally. This
was done in both fabricated Serdes chips by using a 5 GHz VCO
in three different configurations. The first signal was generated by multiplying separate
phases of the VCO to create a 10 GHz bit stream. The second was simply one phase of the
VCO for 5 GHz and the third signal was a phase divided by two for 2.5 GHz. A 4-to-1
140
multiplexer was added to select between these three generated signals and the forth external
data signal.
6.7.2. True error rate detector (TERD)
The true error rate detection circuit operates between the transmitter and receiver. It
determines bit error rate through an LFSR matched to the transmitter LFSR. Its operation
was discussed in detail in Section5.8.2. on pa ge107.
6.8. Implementation and Fabrication
Receiver
6.8.1. Serdes I
As stated previously, The power supply in the Serdes I
chips were choose to be -4.5 V. This left plenty of room for the
three levels of logic and the active current sources. Power
minimization was not a design goal so this voltage was not
optimized. Also a -2.0 V supply was required for the bottom of
the CMOS charge pump. Table 6-1 shows the pin-outs of the receiver chip and Fig. 6-14
shows the final layout artwork and the microphotograph of the fabricated part.
The receiver in the Serdes I implementation was limited to testing pads only, so it did
not support the full 4-to-16 demultiplexer. Instead the sampled data from the transition
detector was fed directly to output pads. No additional circuitry was added to retime the
output data, so the four bits were not presented to the output at the same time.
In order to test the high speed operation of the receiver an on-chip data test source
was created. This circuit generated periodic signals at 10 GHz, 5 GHz, and 2.5 GHz. Two
DC pads, R0 and R1, were used to select between the three data source inputs and an
externally supplied input, and R2 was used as a control voltage for the VCO. The receiver
clock was connected to pad R5, and the output data was connected to pads R8 through R11.
To aid in testing, the capacitor from the charge pump was passed to pad R4 through a high
resistance path. This pad could confirm the proper operation of the charge pump while the
circuit was operating.
141
Pin
Table 6-1 Pin-out of Serdes I transmitter
I/O
Description
R0
DC in
test source (SELECT A)
R1
DC in
test source (SELECT B)
R2
RF out
test source output
R3
DC in
control voltage for test source
R4
RF out
integrator voltage (capacitor)
R5
RF out
receiver clock
R6
Power
-2 V (FET charge pump)
R7
RF in
receiver input
R8
RF out
data 3
R9
RF out
data 2
R10 RF out
data 1
R11 RF out
data 0
142
S0
S6
test source
S1
S7
clock
S2
S8
S3
S9
transition
detector
S4
S10
charge pump
S5
S11
artwork
fabricated chip
Figure 6-14 Serdes I receiver layout artwork and photograph
On the left is the final artwork for the first receiver design. On the right
is a microphotograph of the fabricated part.
6.8.2. Serdes II
The full chip layout and pin-outs are shown and described in Section 5.9.2. on
page 109.
6.9. Testing Results
6.9.1. Serdes I (receiver test results)
The receiver circuit has a pull-in range of 18.7 to 18.9 Gb/s. This represents the range
of frequencies for which the PLL can acquire lock with the onset of new data. Once lockin has occurred, the circuit can maintain lock for its hold-in range of 16.4 to 19.6 Gb/s. This
is an undesirable situation for two important reasons. First, the lock-in range dictates the
143
allowable range of data frequencies because the communication system can not be expected
to initialize with a lower bit rate and then ramp up to the nominal bit rate. Second, the holdin range did not meet the specification of 20 Gb/s.
The cause of the poor pull-in range is the non-linear nature of the transition detector.
It has a very high gain and saturates above a small phase deviation, limiting the ability to
adjust for phase differences. The low hold-in range is due to the lower then expected
frequency range of the current starving VCO, shown in Fig.3-5 on pa ge27.
Fig. 6-15 shows the receiver locked to data at 19.4 Gb/s. (The oscilloscope is
triggered on the input signal) Fig. 6-15(a) shows a locked condition with data arriving with
20 bits per transition (0.05 tpb) and (b) shows a locked condition with 10 bits transition (0.1
tpb).
When the receiver is locked with data at 0.05 tpb (10 one’s 10 zero’s), an rms phase
jitter of 2.64 ps is measured and shown in Fig. 6-16. When the number of transitions are
decreased to 0.016 tpb (32 1’s 32 0’s) a jitter value of 8 ps is measured. Results indicate
that a locked condition can be maintained for a data stream with an edge every 300 bits
before the clock jitter becomes too large and lock is lost.
recovered
clock
sampled
data
(a)
(b)
Figure 6-15 Serdes I receiver locked to data.
The above plots show the recovered clock and the sampled data for a
data rate of 19.4 Gb/s. (a) is fed with data with 20 bits per transition and
(b) is fed with 10 bits per transition.
144
Figure 6-16 Serdes I recovered clock showing jitter.
This plot shows a receiver locked to data with a 30% duty cycle. The
recovered clock as an rms jitter of 2.6 ps.
6.9.2. Serdes II (receiver test results)
The results from the second receiver iteration were very similar to the first, as
expected. The big difference was that the receiver integrator had a circuit glitch that
prevented it from operating as an integrator. Instead it operated like a low-pass filter. This
limited the hold-in range to that of the pull-in range which was from 4.20 to 4.63 GHz or
16.8 to18.5 Gb/s. Although this small hold-in range is a problem a more serious concern is
the small pull-in range. The only way to solve this problem is to provide the receiver with
a reference signal very close to the frequency of the data. This solution was evaluated and
simulated in Serdes III.
Fig. 6-17 shows the receiver in lock with the data and the clock at 4.5 GHz. This was
achieved by using an external source running at the same frequency as the clock. The
145
internal source operated correctly with various combinations of frequencies. One included
the internal source VCO running at 3.7 GHz with the divide-by-2 enabled and a clock at
4.63 GHz. This corresponds to data with 5 ones and 5 zeros which also indicates that the
receiver is able to lock on both rising and falling data transitions.
data
clock
Figure 6-17 Serdes II Rx locked to data
The plot captured from the oscilloscope shows input data and the
receiver clock locked to it. Both are at 4.5 GHz, and the data represents
a bit pattern of 1100 at 18 Gb/s.
One way to measure the performance of the receiver is to look at the phase noise of
the recovered clock relative to the transition density [14], [31]. Fig. 6-18 shows four
different phase noise measurements for varying lengths of periodic data streams. The data
was generated with the HP 8563 low phase noise signal source.
The curve for 100 bits represents a series of 50 one’s followed by 50 zero’s. As can
be seen in the plot, the fewer the transitions the higher the phase noise. At 1 MHz, a
transition density of 0.052 yields a phase noise value of -112 dBc/Hz and a density of
0.0064 yields a value of -88 dBc/Hz. As the clock phase noise increases so does the jitter,
146
which relates to a larger BER. In the minimum, and likely, worst case of 19 bits, integrating
from 1 MHz to 1 GHz to find the phase noise gives an rms jitter of approximately 2.0 ps.
-70
Phase Noise (dBc/Hz)
-80
156 bits
-90
-100
100 bits
76 bits
-110
19 bits
-120
-130
0.1
1
10
100
Frequency (MHz)
Figure 6-18 Serdes II receiver clock phase noise
This plot shows the phase noise for various length bit sequences. The
sequence consists of a string of one’s followed by a string of zero’s with
a period indicated in the plot. As expected, the fewer transitions the
larger the phase noise.
The final test of the receiver involved connecting the output of the transmitter back
into the receiver. This utilized the full potential of the built-in testing circuitry. The first
problem encountered was the inability to feed back a differential signal. This was because
two matched lines from the output of the Tx to the input of the Rx could not be guaranteed.
The probes, connectors, and cables introduce too much variation in length to work properly.
Even a few millimeters could offset the differential signals by a considerable amount. It
was concluded that for differential testing, the part would have to be packaged and placed
on a board.
Because differential testing was out of the question, the system was set up for singleended testing. This was done by tying one end of the receiver input to a DC reference
voltage half-way between the high and low transmitter signal levels. This technique
destroyed the benefits of a differential signal and would not operate at either 20 or 10 Gb/s.
147
The feed-through pad showed a highly corrupted signal. The single-ended technique and/or
a bandwidth problem in the differential pad receiver prevented a full-test of the feedback
testing scheme.
6.10.Future Work
6.10.1. Sampling offset correction
One attribute of data arriving in a receiver, typically seen in optical systems, is bits
that are skewed toward one transition. This is usually an effect of the non-linear nature of
the light sensitive diode, but can be a result of the transmitter or from the channel itself. The
ramification is an increase in BER if samples are taken at the exact center of the bit. The
solution is to allow the offset of the data sampling points relative to the data transitions.
6.10.2. 40 Gb/s?
The first step in moving to a 40 Gb/s solution is to utilize a 10 GHz ring oscillator.
Given this possibility, the next problem is in the design of the receiver amplifier. This
amplifier will require at least a 20 GHz bandwidth and must be able to drive a significant
number of loads. It may be necessary to sacrifice phase detection of every transition and
just utilize every fourth edge to reduce the MS-latch loading effects. This solution still
requires four data latches, plus one transition latch which may still be too high. Another
solution would be to use a bang-bang phase detector that requires a clock and its quadrature
at half the baud rate [26], [32]. This solution requires only four MS-latches.
6.10.3. Demultiplexer improvements
A problem found during the testing of the Serdes II chip was in the 4-to-16
demultiplexer described in Section 6.4. on page 138. Due to stringent timing constraints
and excessive loading, the set of 4 four bit latches were failing to latch the data. Fig. 6-19
depicts an improved demultiplexer that operates in stages. The first stage latches the four
data bits from one of the PLL clock phases. The clock is then divided by two and used to
clock the next stages of eight latches. The clock is then divided again and the data is latched
148
into 16 latches. The final stage realigns all the data edges by latching the 16 bits
demultiplexed data
transition
detector
simultanously.
da
db
dc
dd
Φ1
200 ps
Φ1
2
toggle
F/F
toggle
F/F x2
Figure 6-19 Revised 4-to-16 demultiplexer
In order to reduce the timing requirements on the demultiplexer the data
is demultiplexed in stages. Each stage is successively clocked by a clock
of half the frequency from the previous stage.
149
Discussion & Conclusion
In conclusion, three 20 Gb/s communication systems were designed and two were
fabricated in IBM’s SiGe 5 HP process. Each design built on test results from the previous
implementations, and the third, and final design was intended for future research and
development.
The second iteration was a unified transceiver chip possessing a transmitter and a
receiver. It had wirebond pads for wafer probe testing as well as C4 pads for flip-chip
packaging. Through the C4 pads, 16 bits of parallel data could be supplied to and extracted
from the chip. An internal testing circuit enabled complete testing of the chip without the
need for packaging.
The Feed Forward Interpolated VCO, a four stage ring oscillator that uses novel feed
forwarding techniques, was developed. Its very high frequency nature required the use of
capacitance to slow its frequency down to 5 GHz. Its flexibility makes it an excellent choice
for short-haul communication systems. Phase noise at 1 MHz was measured as -90.5
dBc/Hz which is one of the best numbers quoted for a ring oscillator at this speed. The
associated jitter is quite small and is an interesting function of the control voltage.
The transmitter in the second prototype had a very wide operating range of 14.27 to
21.58 Gb/s. A time domain sampling oscilloscope measured an rms clock jitter value of 4.3
ps or 0.086 UI. Using a spectrum analyzer, however, rms clock jitter from 100 kHz to 100
MHz was measured at 1.4 ps. The eye diagram was very symmetric, indicating that the
symmetric multiplexer and data interleaving scheme operated as expected.
The second receiver did not have an external reference and, therefore, had only the
high speed data stream to lock to. This limited the pull-in range to 16.8 to 18.5 Gb/s. Clock
jitter measured from the oscilloscope had an rms value of 2.0 ps. At very low transition
rates of 78 bits per transition, the receiver was still able to maintain lock. This is credited
to the phase detector which is able to use every transition for phase information.
150
A third prototype was developed, but not fabricated, using the data acquired from the
first two designs. The transmitter PLL bandwidth was further optimized and a negative
impedance amplifier loop filter was added. A frequency locked loop was added to the
receiver PLL to greatly enhance the pull-in range. The demultiplexer scheme was also
improved to minimize the timing constraints.
151
References
[1]
R. C. Walker, K. Hsieh, T. A. Knotts, and C. Yen, “A 10 Gb/s Si-Bipolar TX/RX
Chipset for Computer Data Transmission,” IEEE International Solid-State Circuits
Conference, pp. 302-303, 1998.
[2] S. A. Steidl, “A 32-Word by 32-Bit Three-Port Bipolar Register File Implemented
Using a SiGe HBT BiCMOS Technology,” Candidacy document, Rensselaer Polytechnic Institute, Department of Electrical Engineering, May 1999.
[3] P. M. Cambell, H. J. Greub, A. Garg, S.l A. Steidl, S. Carlough, M. Ernest, R. Philhower, C. Maier, R. P. Kraft, and J. F. McDonald, “A Very-Wide-Bandwidth Digital
VCO Using Quadrature Frequency Multiplication and Division Implemented in
AlGaAs/GaAs HBTs,” Proc. GaAs IC Symp., pp. 311-314, 1995.
[4] A. W. Buchwald, and K. W. Martin, “High-speed voltage-controlled oscillator with
quadrature outputs,” Electronics Letters, vol. 27, no. 4, pp. 309-310, February 1991.
[5] R. Walker, C. Stout, C-S. Yen, “A 2.488 Gb/s Si-Bipolar Clock and Data Recovery
IC with Robust Loss of Signal Detection,” IEEE International Solid-State Circuits
Conference, pp. 246-247, 1997.
[6] M. Ernest, T. W. Krawczyk, and J. F. McDonald, “Symmetric Multiplexer,” Invention Disclosure Record, Rensselaer Polytechnic Institute, February 2000.
[7] T. W. Krawczyk, and J. F. McDonald, “The Feed Forward Voltage Controlled Ring
Oscillator,” Invention Disclosure Record, Rensselaer Polytechnic Institute, May
2000.
[8] D. C. Ahlgren, G. Freeman, S. Subbanna, R. Groves, D. Greenberg, J. Malinowski,
D. Nguyen-Ngoc, S. J. Jeng, K. Stein, K. Schonenberg, D. Kiesling, B. Martin, S.
Wu, D. L. Harame, and B. Meyerson, “A SiGe HBT BiCMOS technology for mixed
signal RF applications,” Proceedings of the IEEE Bipolar/BiCMOS Circuits and
Technology Meeting, Minneapolis, MN, pp. 195-197, September 1997.
[9] K. Washio, E. Ohue, K. Oda, M. Tanabe, H. Shimamoto, and T. Onai, “95 GHz fT
Self-Aligned Selective Epitaxial SiGe HBT with SMI Electrodes,” IEEE International Solid-State Circuits Conference, pp. 312-313, 1998.
[10] L. Larson, M. Case, S. Rosenbaum, D. Rensch, P. MacDonald, M. Matloubian, M.
Chen, D. Harame, J. Malinowski, B. Meyerson, M. Gilbert, and S. Mass, “Si/SiGe
HBT Technology for Low-Cost Monolithic Microwave Integrated Circuits,” IEEE
International Solid-State Circuits Conference, pp. 80-81, 1996.
[11] J. R. Long, M. A. Copealand, S. J. Kovacic, D. S. Malhi, and D. L. Harame, “RF
Analog and Digital Circuits in SiGe Technology,” IEEE International Solid-State
Circuits Conference, pp. 82-83, 1996.
[12] K. Ismail, “Si/SiGe CMOS: Can it extend the lifetime of Si,” IEEE International
Solid-State Circuits Conference, pp. 116-117, 1997.
152
[13] L. Sun, T. Kwasniewski, and K. Iniewski, “A Quadrature Output Controlled Ring
Oscillator Based on Three-Stage sub-feedback Loops,” IEEE Internation Symposium on Circuits and Systems, vol. 2, pp 176-179, 1999.
[14] R. Walker, C. Stout, and C-S. Yen, “A 2.488 Gb/s Si-Bipolar Clock and Data Recovery IC with Robust Loss of Signal Detection,” IEEE International Solid-State Circuits Conference, pp. 246-247, 1997.
[15] L. Dai, and R. Harjani, “Comparisons and Analysis of Phase Noise in Ring Oscillators,” IEEE International Symposium on Circuits and Systems, pp. 77-80, May
2000.
[16] A. Hajimiri, and Thomas H. Lee, “A General Theory of Phase Noise in Electrical
Oscillators,” IEEE Journal of Solid-State Circuits, vol. 33, no. 2, pp. 179-194, February 1998.
[17] J. A. McNeil, “Jitter in Ring Oscillators,” IEEE Journal of Solid-State Circuits, vol.
32, pp. 870-879, June 1997.
[18] A. Hajimiri, S. Limotyrakis, and T. H. Lee, “Jitter and Phase Noise in Ring Oscillators,” IEEE Journal of Solid-State Circuits, vol. 34, no. 6, pp. 790-804, June 1999.
[19] T. H. Lee, and A. Hajimiri, “Oscillator Phase Noise: A Tutorial,” IEEE Journal of
Solid-State Circuits, vol. 35, no. 3, pp. 326-335, March 2000.
[20] H. Matsuoka, and T. Tsukahara, “A 5-GHz Frequency-Doubling Quadrature Modulator with a Ring-Type Local Oscillator,” IEEE Journal of Solid-State Circuits, vol.
34, pp. 1345-1348, September 1999.
[21] J. Plouchart, H. Ainspan, M. Soyuer, and A. Ruehli, “A Fully-Monolithic SiGe Differential Voltage-Controlled Oscillator for 5 GHz Wireless Applications,” IEEE
Radio Frequency Integrated Circuits Symposium, pp. 57-60, 2000.
[22] M. Soyuer, J. N. Joachim, N. Burghartz, H. A. Ainspan, K. A. Jenkins, P. Xiao, A.
R. Shahani, M. S. Dolan, and D. L. Harame, “An 11-GHz 3-V SiGe Voltage Controlled Oscillator with Integrated Resonantor,” IEEE Journal of Solid-State Circuits,
vol. 32, no. 9, pp. 1451-1454, September 1997.
[23] S. K. Enam and A. A. Abidi, “A 300-MHz Voltage-Controlled Ring Oscillator,”
IEEE Journal of Solid-State Circuits, vol. 25, no. 1, pp. 312-315, February 1990.
[24] S. Lee, B. Kim, and K. Lee, “A Novel High-Speed Ring Oscillator for Multiphase
Clock Generation Using Negative Skewed Delay Scheme,” IEEE Journal of SolidState Circuits, vol. 32, no. 2, pp. 1451-1454, February 1997.
[25] D. C. Ahlgren, M. Gilbert, D. Greenberg, S. J. Jeng, J. Malinowskil, D. NguyenNgoc, K. Schonenberg, K. Stein, R. Groves, K. Walter, G. Hueckel, D. Colavito, G.
Freeman, D. Suderland, D. L. Harame, and B. Meyerson, “Manufacturability demonstration of an integrated SiGe HBT technology for the analog and wireless market
place,“ IEEE International Electron Devices Meeting Technical Digest, San Francisco, CA, December 1996, pp. 859-862.
[26] J. F. Ewan, A. X. Widmer, M. Soyuer, K. R. Wrenner, B. Parker, and H. A. Ainspan,
“Single-Chip 1062 Mbaud CMOS Transceiver for Serial Data Communications,”
IEEE International Solid-State Circuits Conference, pp. 32-33, 1995.
[27] D. Friedman, M. Meghelli, B. Parker, H. Ainspan, and M. Soyuer, “Sub-picosecond
SiGe BiCMOS Transmit and Receive PLLs for 12.5 Gbaud Serial Data Communication,” Symposium on VLSI Circuits, pp. 132-135, 2000.
153
[28] R. Farjad-Rad, C. Yang, M. Horowitz, and T. Lee, “A 0.3-mm CMOS 8-Gb/s 4PAM Serial Link Transceiver,” IEEE Journal of Solid-State Circuits, vol. 35, no. 5,
pp. 757-764, May 2000.
[29] H. Knapp, T. F. Mefster, M. Wurzer, D. Zoschg, K. Aufinger, and L. Treitinger, “A
79 GHz Dynamic Frequency Divider in SiGe Bipolar Technology,” IEEE International Solid-State Circuits Conference, pp. 208-209, 2000.
[30] M. Meghelli, B. Parker, H. Ainspan, and M. Soyuer, “SiGe BiCMOS 3.3V Clock
and Data Recovery Circuits for 10Gb/s Serial Transmission Systems,” IEEE International Solid-State Circuits Conference, pp. 56-57, 2000.
[31] Y. M. Greshishchev, and P. Schvan, “SiGe Clock and Data Recovery IC with LinearType PLL for 10-Gb/s SONET Application,” IEEE Journal of Solid-State Circuits,
vol. 35, no. 9, pp. 1353-1359, September 2000.
[32] A. Pottbacker, U. Langmann, and H. Schreiber, “A Si Bipolar Phase and Frequency
Detector IC for Clock Extraction up to 8 Gb/s,” IEEE Journal of Solid-State Circuits, vol. 27, no. 12, pp. 1747-1751, December 1992.
[33] S. Shioiri, M. Soda, T. Monikawa, T. Hashimoto, F. Sato, and K. Emura, “A 10 Gb/s
SiGe Framer/Demultiplexer fo SDH Systems,” IEEE International Solid-State Circuits Conference, pp. 202-203, 1998.
[34] Albert X. Widmer, “Method of Coding to Minimize Delay at a Communication
Node,” U.S. Patent 4665517, assigned to Internation Business Machines, 1987.
[35] M. Fukaishi, S. Nakamura, A. Tajima, Y. Kinoshita, Y. Suemura, H. Suzuki, T. Itani,
H. Miyamoto, N. Henmi, T. Yamazaki, and M. Yotsuyanagi, “A 2.125-Gb/s BiCMOS Fiber Channel Transmitter for Serial Data Communications,” IEEE Journal of
Solid-State Circuits, vol. 34, no. 9, pp. 1325-1330, September 1999.
[36] Y. M. Greshishchev, and P. Schvan, “A 60-dB Gain, 55-dB Dynamic Range, 10Gb/s Broad-Band SiGe HBT Limiting Amplifier,” IEEE Journal of Solid-State Circuits, vol. 34, no. 12, pp. 1914-1920, December 1999.
[37] W. Pöhlmann, “A Silicon-Bipolar Amplifier for 10 Gbit/s with 45 dB Gain,” IEEE
Journal of Solid-State Circuits, vol. 29, no. 5, pp. 551-556, May 1994.
[38] K. Kawai, and H. Ichino, “A 0.6 W 10 Gb/s SONET/SDH Bit-Error-Monitoring
LSI,” IEEE International Solid-State Circuits Conference, pp. 54-55, 2000.
[39] S. Finocchiaro, G. Palmisano, R. Salerno, and C. Sclafani, “Design of Bipolar Ring
Oscillators,” IEEE International Symposium on Circuits and Systems, vol. 1, pp 5-8,
1999.
[40] Y. Chen, S. Koneru, E. Lee, and R. Geiger, “Simulation of Random Jitter in Ring
Oscillators with SPICE,” IEEE International Symposium on Circuits and Systems,
vol. 2, pp 1154-1157, 1997.
[41] Dan H. Wolaver, Phase-Locked Loop Circuit Design., Englewood Cliffs, NJ: Prentice Hall, 1991.
[42] T. Kuroda, T. Fujita, Y. Itabashi, S. Kabumoto, M. Noda, and A. Kanuma, “1.65
Gb/s 60 mW 4:1 Multiplexer and 1.8 Gb/s 80 mW 1:4 Demultiplexer ICs Using 2V
3-Level Series-Gated ECL Circuits,” IEEE International Solid-State Circuits Conference, pp. 36-37, 1995.
[43] D. Chen, R. Waldron, “A Single-Chip 266 Mb/s CMOS Transmitter/Receiver for
Serial Data Communications,” IEEE International Solid-State Circuits Conference,
pp. 100-101, 1993.
154
[44] M. Soyuer, K. A. Jenkins, J. N. Burghartz, H. A. Ainspan, F. J. Canora, S. Ponnapalli, J. F. Ewen, and W. E. Pence, “A 2.4 GHz Silicon Bipolar Oscillator with Integrated Resonator,” IEEE Journal of Solid-State Circuits, vol. 31, no. 2, pp. 268-270,
February 1996.
[45] F. Svelto, S Deantoni, and R. Castello, “A 1.3 GHz Low-Phase Noise Fully Tunable
CMOS LC VCO,” IEEE Journal of Solid-State Circuits, vol. 35, no. 3, pp. 356-361,
March 2000.
[46] J. J. Kim, and B. Kim, “A Low-Phase-Noise CMOS LC Oscillator with a Ring
Structure,” IEEE International Solid-State Circuits Conference, pp. 430-431, 2000.
[47] C. Wu, and H. Kao, “A 1.8 GHz CMOS Quadrature Voltage-Controlled Oscillator
(VCO) Using the Constant-Current LC Ring Oscillator Structure,” IEEE International Symposium on Circuits and Systems, vol. 4, pp 378-381, 1998.
[48] J. Akagi, Y. Kuriyama, M. Asaka, T. Sugiyama, N. Lizuka, K. Tsuda, and M. Obara,
“Five AlGaAs/GaAs HBT ICs for a 20 Gb/s Optical Receiver,” IEEE International
Solid-State Circuits Conference, pp. 168-169, 1994.
[49] M. Soda, H. Tezuka, F. Sato, T. Hashimoto, S. Nakamura, T. Tatsumi, T. Suzaki, and
T. Tashiro, “Si-Analog ICs for 20 Gb/s Optical Receiver,” IEEE International SolidState Circuits Conference, pp. 170-171, 1994.
[50] A. Rofougaran, J. Rael, M. Rofougaran, and A. Abidi, “A 900 MHz CMOS LCOscillator with Quadrature Outputs,” IEEE International Solid-State Circuits Conference, pp. 392-393, 1996.
[51] B. L. Thompson, and H. Lee, “A BiCMOS Receiver/Transmit PLL Pair for Serial
Data Communications,” IEEE Custom Integrated Circuits Conference, pp. 29.6.129.6.5, May 1992.
[52] C. R. Hogge, “A Self Correcting Clock Recovery Circuit,” IEEE Journal of Lightwave Technology, vol. LT-3, no. 6, pp. 1312-1314, December 1985.
[53] D. Y. Wu, A. C. Yen, D. Meeker, S. Beccue, K. Pedrotti, J. Penney, A. Price, and K.
C. Wang, “Two Phase Detectors for 2.5-10 Gb/s NRZ Data Operation: a Hogge and
a Balanced Mixer,” GaAs IC Symp., pp. 266-269, 1996.
155
Appendix A. IBM SiGe 5 HP
A.1. NPN Vbe characteristics
The SiGe npn transistor Vbe characteristics are important for various reasons. First it
indicates the turn-on voltage of the transistor: the voltage below which the transistor is
considered off. Second, at a given operating collector current it can be used to find the
base-emitter voltage. Third, and perhaps most importantly, is that the derivative of the
transistor’s Vbe with respect to the collector current, Ic, is the transconductance. This
parameter is found in Fig. A-2 by taking the slope at half the peak f T current. This current
Normalized Ic (ln(mA/um))
flows through an optimized differential pair when both inputs are biased identically.
6
4
2
0
-2
-4
-6
-8
-10
-12
-14
-16
-18
-20
Simulated
Analytical
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Vbe (V)
Figure A-1 Ic-Vbe characteristics for npn transistor
The above plot shows the collector current at a fixed V ce of 2 V versus
Vbe. The analytical approximation is accurate up to the operating point
of 0.7 mA/µm.
156
0.95
1
Normalized Ic (mA/
m)
0.8
0.7
0.6
120 Ω / µ m
8.33m/Ω / µ m
0.5
0.4
0.3
0.2
0.1
0
0.87
0.88
0.89
0.9
0.91
0.92
0.93
V be (V)
Figure A-2 NPN transconductance
The transconductance is the point where the collector current is half the
maximum fT current.
Comparing the simulated transconductance to that found in
v
r e = γ ----Tie
i
g m = 1--- ----eγ vT
(A-1)
yields a a fudge factor, γ, of 1.65.
The simulated plot in Fig. A-1 is found from
Ic = Is e
V be
------VT
where Is is graphically determined to be 30 fA.
157
(A-2)
A.2. NPN Ic versus Vce characteristics
4.00
Collector Current (mA/ m)
3.50
250 µ A/µ m
3.00
200 µ A/µ m
2.50
150 µ A/µ m
2.00
100 µ A/ µ m
1.50
50 µ A/µ m
1.00
0.50
0 µ A/µ m
0.00
0
1
2
3
4
5
Collector-Emitter Voltage (V)
Figure A-3 Ic-Vce characteristics for npn transistor
The above plot shows the collector current response versus
collector-emitter voltage for different base currents. Breakdown occurs
at a Vce of 3 V.
The Ic versus Vce characteristics of the npn transistor reveal important design
parameters. The first is a breakdown voltage of 3 V which is the maximum voltage that can
be applied across the collector-emitter junction. Above this voltage the base current loses
control over the collector current and large amounts of current begin to flow. The Early
voltage, the voltage at which all backwards linear extrapolations of the curves meet, is
about 45 V. This parameter is related to the output resistance looking into the collector by
VA
r o = -----Ic
(A-3)
where Ic is the collector current near the active region. The normalized value of ro is 80
kΩ-µm.
158
A.3. NPN fT Curves
70
1 um
60
2.5 um
5 um
Frequency (GHz)
50
10 um
20 um
40
30
20
10
0
0.001
0.01
0.1
1
10
Normalized Collector Current (mA/µ m)
Figure A-4 fT vs Ic characteristics for npn transistor
The maximum transition frequency for the SiGe npn transistors occurs
at approximately 0.8 mA/µm. Above that current the fT drops off
rapidly and that range should be avoided during design.
The most important design parameter found in the fT curves in Fig. A-4 is the DC
collector current bias point for maximum operating frequency. Although this normalized
current increases slightly as larger transistors are used, a value of 0.8 mA/µm is reasonable
for all sizes. Also worth noting in this plot, is the fact that as larger transistors are used, and
thus more power is supplied, the faster the transistors operate. The smallest transistor has a
peak fT of approximately 50 GHz and the largest transistor peaks at 62 GHz.
159
Appendix B. CML Logic Gates
B.1. CML Voltage Swing (non-linearized, digital)
The CML voltage swing is found by analyzing the collector current flow through
each of the two transistors in a differential pair with a DC differential voltage on the inputs.
The voltage swing must be large enough to ensure that the majority of current flows
through only one transistor. Fig. B-1 depicts how the current flow shifts from one transistor
to the other as the differential voltage changes. At about ±200 mV, at least 99% of the
current is flowing through one leg of the CML buffer. This is the assigned minimum
operating voltage swing and a more conservative 250 mV or greater was used throughout
Percentage of total current log(%)
this project.
100.000%
10.000%
1.000%
0.100%
0.010%
0.001%
-300
-250
-200
-150
-100
-50
0
Differential Voltage (mV)
Figure B-1 Current switching versus differential input voltage
The input to a differential pair controls the switching of current through
two branches. A critical current level must be reached to assure that the
digital gate has completely switched. For a 99.7% current level through
one branch, a minimum of 250 mV must be applied.
B.2. CML Signals
CML circuits posess important attributes called signal levels, which are necessary to
connect multiple gates together. The need to merge multiple differential pairs arises from
160
the small, but desirable voltage swing (Appendix B.1.), the large base to emitter voltage
(Appendix A.1.), and the technique used. Merging pairs together involves stacking them so
that current through one is a function of the state of another. In this way, different current
paths can be connected to the pull-up resistors, the output. Other techniques exist for
combining differential pairs, see Section5.3.1. on pa ge75, but they are not by themselves
considered CML.
r1
r2
x1
x0
a0
b0
a1
Q1
b1
y0
y1
z0
z1
Figure B-2 Simple AND CML Gate
This gate shows how multiple differential pairs can be merged to
produce a two level gate.
In Fig. B-2, the differential input a must be of higher potential, specifically one Vbe
higher, then input b, to ensure that transistor Q1 will not become saturated. Input a is said
to be on level 1 (0 mV, -250 mV) and b is said to be on level 2 (-900 mV, -1150 mV). A
supply voltage as low at -3.2 V allows up to three levels of inputs.
Level 1 outputs, x, are found at the bottom of the pull-up or collector resistors at the
top of the tree. Level 2, y, and 3, z, outputs are generated from emitter followers and a
diode.
The size of pull-up resistors r1, and r2 is based upon the current source, to produce a
nominal voltage swing of at least 250 mV. For 1 µm sized transistors biased at a current of
0.8 mA, the resistors are set to 400 Ω. In general the normalized resistor value is 400 Ω-µm.
B.3. Voltage Reference
All CML gates require a current source to fix the current flow through the differential
pair switch. The simplest approach, a passive source, places a resistor at the bottom of the
161
tree which has a nearly constant voltage across it and is dependent only on the lowest
transistor pair. This technique has high common mode gain on the lowest differential pair
and often requires a large resistor.
0.75 mA
R1
Q2
2x
Q1
Vref
200Ω
Ω
2x
Vee
Vee
-4.5 V
-3.2 V
R1
1.73 kΩ
Ω
0.87 kΩ
Ω
1x
400Ω
Ω
1.5 mA
Vcc
Vee
Figure B-3 Reference Voltage Generator
Active current sources configured in a current mirror require a reference
voltage to control the amount.
A more common approach is to use an active current source implemented as a current
mirror. Fig. B-3 shows the generating circuit producing a mirror current of 0.75 mA/µm.
This current was chosen based upon the current necessary to achieve the maximum
operating frequency of the transistors. See Appendi xA.2.
The emitter degeneracy resistor typically has 0.4 V across it and is used to control
currents which are smaller or larger than the mirror current. For instance, if a 4 µm
transistor circuit requires 3.0 mA, then a 100 Ω emitter resistor will be used.
Transistor Q2 is used for base current compensation and supplies the base current to
all connected circuits. It allows a larger number of sources to be used and prevents current
degradation when adding sources.
The value of R1 is dependent on the supply voltage of the circuits. Designs with
different supplies need only change this resistor to ensure a fixed current throughout all.
B.4. Buffer with emitter follower outputs
A buffer accepts a single input and duplicates it on its output. Its many uses include:
impedance conversion (high input impedance and low output impedance), fixed delay
introduction, and level shifting. Buffers also form the foundation for more complicated
circuits.
162
The circuit in Fig. B-4 can accept input, a, on levels 1, 2, or 3, since it has only one
differential pair. Level 1 output, x, is taken from the bottom of the pull-up resistors, and
level 2 output, y, is taken from the output of the emitter follower.
Vcc
x1
Q1
y1
x0
a0
Q2
a1
y0
Vee
Figure B-4 CML Buffer with emitter followers
A basic buffer with level 1 and level 2 outputs. It can accept input and
any level.
The emitter follower output provides a much higher driving ability than the level 1
output. This is because the driving current from the level 1 output is passively pulled-up
through the resistors, and actively pulled-down through the differential pair. As more loads
are added, the base current from each must be supplied through the passive resistors, which
causes a voltage drop and limits the voltage swing. The passive pull-up through the
resistors also limits the speed of the gate. The emitter followers, on the other hand, provide
a high impedance output through β amplification of current through transistors, Q1, and Q2.
In this case, the output is actively pulled-up through the follower transistor and actively
pulled,down through the current source.
163
Appendix C. CML Circuit Details
C.1. Linearizing the differential amplifier
The differential amplifier is very effective in digital circuits because of its high
voltage gain. For analog circuits, where a linear response is needed, this gain must be
reduced to meet specifications. The preferred method for doing so is to include emitter
resistors to augment the emitter resistance, re, already present in the transistor.
i1
Rc
i0
Rc
a0
a1
Re
Re
Figure C-1 Linearizing the differential amplifier with emitter resistors
The addition of emitter resistors augments the output resistance of the
differential pair transistors and decreases the total gain of the circuit.
The emitter resistance is defined as the resistance from the base to the emitter looking
into the emitter, and it is the inverse of the transconductance, gm. The normalized value
found through simulation in Appendix A.1. is about 120 Ω-µm. The inverse of the sum of
this value and the emitter resistor Re yields the gain
1 Ad ≈ ---------------re + Re
V
1
r e = -----T- = -----gm
Ie
(C-1)
of the circuit with output current and input voltage. In order to find the total voltage gain
Ad must be multiplied by the collector resistance Rc.
A plot of currents, i0 and i1, versus differential input voltage, a0, and a1 is shown in
Fig. C-2. The plot with 0 Ω-µm represents the nominal transfer function for a digital gate.
The gain is high and an input voltage of 100 mV ensures a nearly complete switch of
current. For digital circuits, this allows for a high noise margin, and fast switching
164
characteristics. For analog circuits, on the other hand, the active, linear region of the curve
is very small: ±50 mV. It is clear that the addition of the emitter resistors is crucial in
reducing the gain and spreading out the linear region. The choice of resistor will be
Branch Current {i0,i1} (mA/ m)
determined by the output range needed and the gain at an input of 0 V.
0.80
0Ω − µ m
0.70
200Ω − µ m
400Ω − µ m
0.60
0.50
0.40
600Ω − µ m
800Ω − µ m
0.30
0.20
0.10
0.00
-0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
0.40
Differential Voltage (V)
Figure C-2 Branch current response for various emitter resistors
This plot shows the transfer of current from one branch to the other
when the differential inputs are changed. Each pair of curves has a fixed
emitter resistor
A comparison between (C-1) and the simulated results is plotted in Fig. C-3 and
shows a very good match.
1
0.9
0.8
1.25
0.7
1.66
0.6
0.5
0.4
2.5
0.3
5
0.2
0.1
0.0
0
200
400
600
Normalized Re (Ω -µ m)
Figure C-3 Simulated / Analytical Gain
(C-1) follows the simulated results for the transconductance of a CML
buffer with emitter resistors shown here.
165
800
Gain (mA/V/ m)
Inverse Gain (V/mA- m)
1.0
C.2. Current bypassing
In some situations it may be necessary to limit the extent of current switching in a
differential amplifier. For example, the FFI VCO requires a minimum current flow through
both branches, no matter the input. The solution is to include a bypass resistor which
ensures that some constant current flows in addition to the current defined by the
differential transistor pair.
a0
Rb
Rb
i0
a1
i1
Re
Re
Figure C-4 Limiting full current switching with bypass resistors
The addition of bypass resistors allows some current to always flow
around the differential pair. This prevents a complete switching of
current.
Two behavoirs result with the addition of the bypass resistor. First, a full switch of
current through the tree is prevented, which is a desired result. Second, there is a relative
decrease in the gain of the circuit, because of the decrease in collector current which
negatively affects the transconductance. Each of these effects is modeled in this section and
compared to simulation results. In addition, two equations which can be used as design
tools when specifications on gain, and current range are provided.
The maximum current in a branch is a function of the total current, the bypass and
emitter resistors, and the input voltage. Starting with the assumption that branch 1 has zero
emitter current, i.e. a0 is much higher then a1. The currents through each bypass resistor are
the same. It is assumed that there is a differential pair above this one with emitter voltages
166
at the same potential. We define equations
Io = i e1 + 2i b
i e1
v
v o + ----d- – v be
2
= -----------------------------R
(C-2)
(C-3)
e
vo
i b = ------
(C-4)
Rb
where Io is the total current through the tree, vd is the differential input voltage and vo is the
voltage across the bypass resistor. The value for vbe is found in Fig. A-2 on page 157.
Solving for the current through branch 0 yields
I max
I max
v
Io ( R e + R b ) –  v be – ----d-
2
= Io – i b = ----------------------------------------------------------- = i d, max
Rb + 2Re
Re = 0
 v – v----d-
 be 2 
= Io – ------------------------ .
R
b
(C-5)
(C-6)
Fig. C-5 shows the analytical and simulated results for the maximum current as a fraction
of the total current for emitter resistors of value 0 Ω−µm and 400 Ω−µm, and a differential
input of 400 mV.
With large bypass resistor values, the circuit allows almost a full current switch
because less current is bypassed around the differential pair. Values below about 10 kΩ-µm
produce a much larger reduction down to about 3 kΩ-µm when Rb is too small and no
current switching takes place.
167
Maximum Current Fraction
1.0
0 Ω -µ m
0.9
Vd =400 mV
400 Ω -µ m
0.8
Simulated Re=0
0.7
Analytical Re=0
Simulated Re=400
0.6
Analytical Re=400
0.5
0
5
10
15
20
25
30
35
40
Bypass Resistor (kΩ -µ m)
Figure C-5 Current limiting effects of bypass resistor
The bypass resistor prevents current from being completely shut off in a
differential branch. The maximum current allowed to flow divided by
the total current is called the maximum current fraction.
The next step is to examine how the gain is affected by the addition of the bypass
resistor. The primary factor in the decrease in the transconductance is because of the
decrease in collector current in the differential pair. Gain is directly related to
transconductance and emitter resistance. A second order effect results from an increase of
voltage, and current, across the bypass resistor when collector current increases through the
emitter resistor.
Solving for the gain can be broken up into separate pieces: how the emitter current
changes relative to the input voltage, and how the total current changes relative to the
emitter current.
di di
di----= -------e ⋅ ------dv
dv di e
(C-7)
shows this relationship. The next step is to solve for the bypass current relative to the
emitter current
R
di b
be + i e1 R e
d-  v-------------------------e
------- = ------ = ------.

R
Rb
di e
di e 

b
168
(C-8)
Since the sum of the bypass current and the emitter current is the total current i, then
it is possible to find the total current relative to the emitter current
Re
di b di e
di-----= ------- + ------- = ------ + 1
R
die di e
di e
b
(C-9)
Next, the emitter current relative to the other parameters is determined
R I o – 2v be
b
i e = ---------------------------.
2Re + 2Rb
(C-10)
From (C-1) on page164 the derivative of emitter current to input voltage is the inverse of
sum of the emitter resistances, and (A-1) on page 157 yields the transconductance. Using
(C-7), (C-9), and
die
1 1
------- = ---------------= -------------------------------------------------2Re + 2Rb
dv
re + R e
γv T --------------------------- + Re
R I o – 2v be
b
(C-11)
and simplifying the equation yields the desired result
di
1
di----= -------d- = ----------------------------------------------------2γv T Rb
dv d
dv
--------------------------- + R e || R b
R I o – 2v be
b
(C-12)
where id and vd are the differential current and differential voltage, respectively. Results
from this analysis compared to simulated results are shown in Fig. C-6.
The top plot in Fig. C-6 shows an upward slope as Rb is increased and increases the
transconductance. The lower plot shows a very flat response because the gain, in this case,
is fixed by the emitter resistor and is not affected by the collector current. (see Appendix
C.1. on page164).
169
9.00
Gain (mA/V/ m)
8.00
7.00
6.00
Simulated Re=0 ohm-um
5.00
Analytica Re=0 ohm-um
4.00
Simulated Re=400 ohm-um
Analytical Re=400 ohm-um
3.00
2.00
1.00
0.00
0
5
10
15
20
25
30
35
40
Bypass Resistor (kΩ -µ m)
Figure C-6 Current gain effects of bypass resistor
The bypass resistor lowers the current through the differential pair,
which in turn decreases the transconductance, subsequently decreasing
the gain.
Fig. C-7 is a surface plot showing the relationship between current gain and emitter
and bypass resistors. This can be useful when designing a linearized differential amplifier
Gain (mA/V/µ m)
1
0-1
1-2
2-3
3-4
4-5
5-6
6-7
7-8
8-9
2
800
700
600
500
400
300
3
200
4
100
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Bypass Resistor (kΩ -µ m)
Figure C-7 Designing for gain with emitter and bypass resistors
This plot is useful for designing with bypass resistors when gain is
specified.
170
Emitter Resistor ( - m)
with bypass resistors.
C.3. Increasing CML delay
It is sometimes necessary to increase the delay of a CML gate to meet certain timing
requirements. Such a need is found in a ring oscillator that must be centered at a frequency
that is lower than the free running frequency. The addition of a capacitor across the level 1
outputs degrades the rise time, and thus, increases the gate delay. This solution is easy to
implement and simple to model.
Rc
Rc
Rc
Cc
2C c
Figure C-8 Collector Capacitor
A collector capacitor can be used to degrade the delay through a CML
gate by increasing the rise time.
Modeling the new gate delay first involves determining the gate delay without the
capacitor. This nominal delay is represented by To, and is approximately equal to 12.5 ps.
The extra delay is modeled as a RC charging circuit with a time constant of 2RcCc. The
factor of 2 arises from the equivalent circuit shown in Fig. C-8, where two series capacitors
have a value of twice the original. An additional factor of ln(2) multiplies the time constant
to account for the point at which the output is considered switched. This point is
–t
-----------------
v o = To + I o Rc e
RcCc
(C-13)
which is approximately when the differential voltage is 0 V. The total delay is equal to
T = T o + ln ( 2 ) ( 2 R c C c ).
171
(C-14)
350
300
Gate Delay (ps)
250
200
150
100
Analytical
Simulated
50
0
0
100
200
300
400
500
Capacitance (fF/µ m)
Figure C-9 Delay Model with Collector Capacitor
The delay of a CML gate versus level 1 capacitance is derived in this
section and is consistent with simulated results.
172
600
Appendix D. Sizing Transistors to Minimize VCO
Delay
The design of digital logic gates in SiGe technology always includes a consideration
of transistor size. Sizes range from an emitter length of 1 µm to a length of 20 µm and if
multiple fingered emitters are used, effective lengths up to 40 µm. Usually, the larger
transistors have smaller delay, but consume proportionally higher current. A trade-off
decision among power, layout space, and delay specifications needs to be made.
Logic gates can be extremely varied and may include such functions as multiplexed
XOR, and five input AND/OR cells. Delays through each of these will depend on the
number of inputs and outputs, the input and output levels and various other factors. An
in-depth analysis of all these factors would be very complicated, and the results difficult to
utilize. A more general solution, and the one followed in this appendix, is to consider
simple buffers with emitter followers driving other buffers. Although not a completely
accurate representation of most logic gates, the analysis conclusions are very useful in the
design of all gates. If a buffer is driving multiple receivers, this condition is reduced to a
case with only one receiver whose size is equal to the sum of the receivers. For instance, if
a driver has four 1 µm loads, they can be treated as one 4 µm load.
Also worth noting, is that the following analysis is extremely useful in the
optimization of ring oscillators. These circuits incorporate a ring of two or more buffers that
oscillate because of an odd number of inversions, and are very sensitive to gate delays. If a
buffer has a delay of 25 ps, then a 1-2 ps difference in delay can have a 4% or greater impact
on the final oscillation frequency. Consideration of the type of loads that will be driven by
the VCO is also important when choosing device sizes. For instance, if the VCO has buffers
with 1 µm devices, then a 1 µm load on each stage will introduce a proportionally huge
loading effect on the system.
The assumption in this analysis is that the receiver circuit is fixed and design work
will be done on the driver. The data presented here, however, can be useful for the design
of the receiver as well.
173
Delay (ps)
10-12
12-14
14-16
16-18
18-20
20-22
22-24
24-26
26-28
28-30
1
2
3
4
9 7 5 3 1
Receiver Size
( m)
Design Points
5
6
7
8
9
10
Emitter Follower Size (µ m)
emitter
follower
amp
delay
Figure D-1 Delay from emitter follow to differential amplifier
In general the larger the emitter follow the more capable it is at driving
larger differential amplifiers. A rule of thumb in designing an emitter
follower to minimize delay and not use considerable power is to use 2
µm devices plus 1 µm per 5 µm of load.
Fig. D-1 shows the effect on the delay of using different sized emitter followers to
drive various receiver loads. The larger the emitter follower, the smaller the delay since the
higher powered follower has a lower output resistance. This, coupled with the receiver
input base capacitance, produces a smaller delay. The figure also shows the acceleration in
delay as the receiver size remains fixed and the emitter follower shrinks. The acceleration
occurs because delay is inversely proportional to output resistance.
Also shown on this plot, are design points which establish a good rule of thumb for
designing emitter followers based on receiver loading for less critical gates. Obviously, the
largest emitter followers used will yield the smallest delay, but there is a point were larger
devices do not yield substantial improvement. The design rule is to use followers of at least
2 µm and add an additional 1 µm per 5 µm of load. Following this rule yields very small
delays without huge power consumption
174
.
7-8
8-9
9-10
10-11
12-13
13-14
14-15
15-16
11-12
1
2
3
4
5
6
7
8
9
Amp Size ( m)
10 9 8 7 6 5 4 3 2 1
Design Points
10
Emiiter Follower Size (µ m)
amp
emitter
follower
delay
Figure D-2 Delay from differential amp to emitter follower
Designing CML logic gates often requires designing an emitter follower
stage. The choice of follower is based on many factors, including the
specific differential amplifier driving the followers. In general, the
larger the follower, compared to the amplifier, the larger the delay
through the gate.
After choosing an emitter follower, the next step is to design the differential amplifier
that represents the core of the driver. Fig. D-2 shows the delay from the amplifier to the
emitter follower, given different sizes of each. Here the effect is opposite from the effect
demonstrated in the previous section; a larger follower size now increases the delay. This
is because the followers are now acting as loads on the amplifier and the larger transistors
add base capacitance. The ideal situation would be to have the smallest emitter followers
possible, but this is not an option after considering loading effects. A good rule is to use an
amplifier that is at least half the size of the emitter followers. This yields good delay and
driving properties.
From Fig. D-1 and Fig. D-2, it is clear that a trade-off exists when designing an
emitter follower to be placed between two differential amplifiers. An increase in follower
size allows for a better ability to drive loads, however, this increase inhibits the ability of
the first amplifier to drive the follower. A closer look at this situation yields Fig. D-3, which
175
shows the optimum follower size to use, given a driver and receiver amplifier size. For
instance, in a ring oscillator with 2 µm buffers each driving a 1 µm load, the optimal
follower to use is about 6 µm in size. From Fig. D-4 we find that the delay through the gate
will be about 23 ps.
1
Feed Forward
VCO design
points
4
18-20
16-18
2
14-16
Ring VCO
design
points
12-14
3
10-12
8-10
4
5
6
Driver ( m)
6-8
6
4-6
2-4
8
7
8
9
10
12
10
1
2
3
4
5
6
7
8
9
10
Receiver (µ m)
Figure D-3 Size of emitter follower between driver and receiver
When a gate needs to drive another gate on level 2 or lower, or when the
receiver is a large load, emitter followers are used. The optimal
transistor size to minimize delay through the driver and receiver gates,
is a function of the transistor sizes in the driver and the receiver.
Ring oscillators typically have a buffer of size x driving the next buffer, and a load.
Minimizing and balancing the external loading on each buffer forces each stage to have 1
µm buffers hanging on it. For standard ring VCOs, an emitter follower design line exists.
This is shown on Fig. D-3 and Fig. D-4. For the feed forward VCO, each stage of size x
must drive two inputs of size x, yielding a different design curve.
The final step is to justify the use of the emitter follower. Since it adds delay to the
buffer-follower-buffer system, it may be better (less delay) to remove the follower
176
completely. Fig. D-5 shows the difference in delay between a system with and without an
emitter follower. In almost all instances it is beneficial to include the follower unless the
receiver is much smaller then the driver.
1
34-35
33-34
25
32-33
2
31-32
24
30-31
3
29-30
23
28-29
4
22
5
6
27-28
26-27
Driver ( m)
Feed Forward
VCO design
points
21
Ring VCO
design
points
7
8
9
20
10
1
2
3
4
5
6
7
8
9
10
Receiver (µ m)
Figure D-4 Delay when using optimized emitter follower
The plot above shows the minimum delay achievable between two
differential amplifiers when using an optimized emitter follower.
177
25-26
24-25
23-24
22-23
21-22
20-21
1
8.0-10.0
6.0-8.0
2
4.0-6.0
2.0-4.0
3
0.0-2.0
-2.0-0.0
5
6
1
2
3
4
Driver ( m)
4
7
5
8
9
10
6
7
8
9
10
Receiver (µ m)
Figure D-5 Delay difference between circuit with follower and circuit without
An emitter follower between differential amplifier introduces additional
delay, but in most cases reduces the overall delay of the system. Only in
cases with large drivers and smaller receivers does the emitter follower
increase the delay.
178
Appendix E. SpectreHDL models
E.1. FFI VCO
// Spectre AHDL for FFI VCO 4u, ahdl
//
// This cell emulates the functioning of the FFI VCO.
// It has 4 sine wave outputs each offset from each other
// by 45 degrees. Additional outputs give the instantaneous
// frequency and the phase relative to a fixed frequency
// source
//
// Thomas Krawczyk 7/00
//
#define PI 3.1415926535
module b_ffi5 ( w20, w21, x20, x21, y20, y21, z20, z21, Vref, s30, s31) (fc,offset,divider,mfreq)
node [V,I] w20; node [V,I] w21;
node [V,I] x20; node [V,I] x21;
node [V,I] y20; node [V,I] y21;
node [V,I] z20; node [V,I] z21;
node [V,I] s30; node [V,I] s31;
node [V,I] phase; node [V,I] freq;
node [V,I] Vref;
// Center frequency with 0 control voltage
parameter real fc = 5.96G ;
// DC voltage offset on terminal outputs
parameter real offset = -1.1 ;
// In PLL encorporate 1/8, 1/16 divider into model
parameter real divider = 1 from (0.25:64);
// Frequency with which to compare and determine phase offset
parameter real mfreq = 5 GHz;
{
table VCOdata;
real control_voltage, f;
real s[11], factor[11];
initial {
// Mapping data between input control voltage and output frequency collected
// from simulation. Must be positive so a 450m offset is introduced.
s[0] = 0.500; factor[0] = 0.733; s[1] = 0.600; factor[1] = 0.733;
s[2] = 0.700; factor[2] = 0.747; s[3] = 0.800; factor[3] = 0.805;
s[4] = 0.850; factor[4] = 0.849; s[5] = 0.900; factor[5] = 0.896;
s[6] = 0.950; factor[6] = 0.950; s[7] = 1.000; factor[7] = 1.000;
s[8] = 1.050; factor[8] = 1.046; s[9] = 1.100; factor[9] = 1.091;
s[10]= 1.150; factor[10]= 1.134; s[11]= 1.200; factor[11]= 1.168;
s[12]= 1.300; factor[12]= 1.218; s[13]= 1.400; factor[13]= 1.230;
s[14]= 1.500; factor[14]= 1.230;
VCOdata = $build_table(2, factor, s, 11);
}
analog {
control_voltage = V(s31,s30) + 450m;
// Find the frequency multiplier from the control voltage
f = $interpolate(VCOdata, control_voltage);
// Find the phase of the w20 phase
ph = 2*PI*integ(fc*f/divider,0);
// Find the phase of the signal whose frequency is being used for phase difference
mph= 2*PI*integ(mfreq,0);
// Generate the signals for each phase output
V(w20) <- offset + sin(2*PI* integ(fc*f/divider,0) );
V(w21) <- offset - sin(2*PI* integ(fc*f/divider,0) );
V(x20) <- offset + sin(2*PI* integ(fc*f/divider,0) +1*PI/4 );
V(x21) <- offset - sin(2*PI* integ(fc*f/divider,0) +1*PI/4 );
V(y20) <- offset + sin(2*PI* integ(fc*f/divider,0) +2*PI/4 );
V(y21) <- offset - sin(2*PI* integ(fc*f/divider,0) +2*PI/4 );
V(z20) <- offset + sin(2*PI* integ(fc*f/divider,0) +3*PI/4 );
V(z21) <- offset - sin(2*PI* integ(fc*f/divider,0) +3*PI/4 );
179
// Return the phase difference in degrees
V(phase) <- (ph-mph)/PI*180;
// Return the exact frequency in GHz
V(freq) <- fc*f/divider/1G;
}
}
E.2. 3-State PD
//
//
//
//
//
//
//
//
//
//
//
Spectre AHDL for SERDES3, PD_3state, ahdl
This module emulates the 3-state Phase Detector.
It looks for rising transtions of the vi and vo inputs
and forces the output to a +1 or -1 state depending on
which input went high. When both eventually go high the
output is reset. The slip outputs although not implemented
give a pulse when the detector exceeds is max value.
Thomas Krawczyk 9/27/00
module PD_3state ( vd0, vd1, vi_slip10, vi_slip11, vo_slip10, vo_slip11, Vref1, Vref2, vi20, vi21,
vo20, vo21) ()
node [V,I] vd0;
node [V,I] vd1;
node [V,I] vi_slip10;
node [V,I] vi_slip11;
node [V,I] vo_slip10;
node [V,I] vo_slip11;
node [V,I] Vref1; // Can ignore
node [V,I] Vref2; // Can ignore
node [V,I] vi20;
node [V,I] vi21;
node [V,I] vo20;
node [V,I] vo21;
{
real vo_center = -1.07; // Center output voltage
real vo_swing
= 144m; // Swing either high or low
real i_rise
= -1;
// 0 = low 1 = transition 2 = high
real o_rise
= -1;
real out0, out1;
analog {
// Make sure we get a time point at the input crossings.
$threshold( V(vi20)-V(vi21), 1 );
$threshold( V(vo20)-V(vo21), 1 );
if( V(vi20) > V(vi21)) {
if( i_rise < 2 ) i_rise++;
} else i_rise = 0;
if( V(vo20) > V(vo21)) {
if( o_rise < 2 ) o_rise++;
} else o_rise = 0;
// input vi positive transition?
if( i_rise == 1 && o_rise == 0 ) {
out0 = vo_center + vo_swing;
out1 = vo_center - vo_swing;
}
// input vo position transition?
if( i_rise == 0 && o_rise == 1 ) {
out0 = vo_center - vo_swing;
out1 = vo_center + vo_swing;
}
// Both transitions detected
// reset output back to nominal values
if( i_rise >= 1 && o_rise >= 1 ) {
out0 = out1 = vo_center;
}
if( i_rise == -1 && o_rise == -1 ) {
out0 = out1 = vo_center;
}
180
// Give the output signals a rise time and 3 gate delays
V(vd0) <- $transition( out0, 60p, 20p, 20p );
V(vd1) <- $transition( out1, 60p, 20p, 20p );
// Frequency
V(vi_slip10)
V(vi_slip11)
V(vo_slip10)
V(vo_slip11)
slip detectors are not implemented
<- -1.5;
<- -1.5;
<- -1.5;
<- -1.5;
}
}
E.3. Transition Detector PD
//
//
//
//
//
//
//
//
//
//
//
//
//
//
Spectre AHDL for SERDES3, RxEdgeExtraction, ahdl
This is a model for the Transistion Phase Detector circuit.
Clock inputs are w2 - z2.
Data inputs are dw1 - dz1.
Sampled outputs are da2 - dd2.
Fast and slow commands to the VCO are f20 and s21.
Each region is 25 ps wide.
\2|1/
3 \|/ 0
---+--4 /|\ 7
/5|6\
module RxEdgeExtraction ( da20, da21, db20, db21, dc20, dc21, dd20, dd21, f20, s21, dw10, dw11, dx10,
dx11, dy10, dy11, dz10, dz11, w20, w21, x20, x21, y20, y21, z20, z21, region) ()
node [V,I] da20; node [V,I] da21;
node [V,I] db20; node [V,I] db21;
node [V,I] dc20; node [V,I] dc21;
node [V,I] dd20; node [V,I] dd21;
node [V,I] f20; node [V,I] s21;
node [V,I] dw10; node [V,I] dw11;
node [V,I] dx10; node [V,I] dx11;
node [V,I] dy10; node [V,I] dy11;
node [V,I] dz10; node [V,I] dz11;
node [V,I] w20; node [V,I] w21;
node [V,I] x20; node [V,I] x21;
node [V,I] y20; node [V,I] y21;
node [V,I] z20; node [V,I] z21;
node [V,I] region;
// AHDL output of the current sampling region
{
integer reg = 0;
// 1-8 (0-45 = 0)
integer out[8];
// output array of detected transitions
// per region to be summed at end
integer sum;
// sum of output array
integer i;
// index for summing loop
integer da, db, dc, dd; // Sampled outputs (0,1) map to (-1, 1)
integer data_val;
// Last data value
real out_center = -1;
real out_diff = 4m;
// center of fast/slow output
// fast/slow differential output / edge
real data_center = -1.1;// Center of sampled data output
real data_amp = 150m; // Amplitude of sampled data output
analog {
if( V(w20) > V(w21) && reg
if( V(dw10) > V(dw11) )
else
reg = 0; out[reg] = 0;
}
if( V(x20) > V(x21) && reg
reg = 1; out[reg] = 0;
}
if( V(y20) > V(y21) && reg
if( V(dw10) > V(dw11) )
else
reg = 2; out[reg] = 0;
}
if( V(z20) > V(z21) && reg
reg = 3; out[reg] = 0;
== 7 ) {
da = 1;
da = -1;
== 0 ) {
== 1 ) {
db = 1;
db = -1;
== 2 ) {
181
}
if( V(w20) < V(w21) && reg
if( V(dw10) > V(dw11) )
else
reg = 4; out[reg] = 0;
}
if( V(x20) < V(x21) && reg
reg = 5; out[reg] = 0;
}
if( V(y20) < V(y21) && reg
if( V(dw10) > V(dw11) )
else
reg = 6; out[reg] = 0;
}
if( V(z20) < V(z21) && reg
reg = 7; out[reg] = 0;
}
== 3 ) {
dc = 1;
dc = -1;
== 4 ) {
== 5 ) {
dd = 1;
dd = -1;
== 6 ) {
// Look for transitions and insert
// 1 into output array of current region
if( (V(dw10) > V(dw11)) && data_val == 0 ) {
out[reg] = 1;
data_val = 1;
}
if( (V(dw10) < V(dw11)) && data_val == 1 ) {
out[reg] = 1;
data_val = 0;
}
// Sum the fast/slow regions
sum = -out[0]+out[1]-out[2]+out[3]-out[4]+out[5]-out[6]+out[7];
V(da20)
V(da21)
V(db20)
V(db21)
V(dc20)
V(dc21)
V(dd20)
V(dd21)
<<<<<<<<-
data_center
data_center
data_center
data_center
data_center
data_center
data_center
data_center
+
+
+
+
-
da*data_amp;
da*data_amp;
db*data_amp;
db*data_amp;
dc*data_amp;
dc*data_amp;
dd*data_amp;
dd*data_amp;
V(f20) <- $transition(out_center + out_diff/2*sum, 50p, 20p, 20p);
V(s21) <- $transition(out_center - out_diff/2*sum, 50p, 20p, 20p);
V(region) <- reg;
}
}
E.4. Histogram generator
// Spectre AHDL for SERDES3, histogram, ahdl
// This cell allows the plotting of a histogram of voltages.
// It samples the "vin" signal and places it in one of "bins" bins.
// The "sweep" output signal sweeps across all bins while the "plot"
// output shows the current value of that bin.
// To create the histogram simply set "sweep" as the x axis and
// "plot" as the y axis.
//
// Thomas Krawczyk 9/26/00
//
module histogram ( plot, sweep, vin, mean, rms) ( bins, low_v, high_v, begin )
node [V,I] plot;
node [V,I] sweep;
node [V,I] vin;
node [V,I] mean;
node [V,I] rms;
parameter real bins = 16 from (1:1025);
parameter real low_v = 0;
parameter real high_v = 1;
parameter real begin = 1n from (0:inf);
{
integer bin[1024];
integer index;
integer s=0;
// Current sweep index
integer count=0;
// Total samples
real
range;
// Difference between low_v and high_v
real
mu, sigma; // Mean and standard deviation
182
real
sum, sq_sum;// The sum and the sum of square samples
initial {
range = high_v-low_v;
}
analog {
if( $time() > begin ) {
count++;
sum
+= V(vin);
sq_sum += V(vin)*V(vin);
mu
= sum/(1.0*count);
sigma = sqrt(( sq_sum - 2*mu*sum + count*mu*mu )/(1.0*count));
index = (V(vin)-low_v)/range * bins;
if( index >= 0 && index < bins ) bin[index]++;
s++; if (s == bins) s=0;
}
V(mean)
V(rms)
V(sweep)
V(plot)
<<<<-
mu;
sigma;
low_v + s/(1.0*bins)*range;
bin[s];
}
}
E.5. Jittered data source
// Spectre AHDL for PeteExp, datasource, ahdl
# define PI 3.1415926535
#define getbitnum(t) floor(t*Bps)
module datasource ( d0, d1, sweep, Jout ) (Offset, Vmag, Bps, Sigma)
node [V,I] d0;
node [V,I] d1;
node [V,I] sweep;
node [V,I] Jout;
parameter real Offset=-1.50e-1;
parameter real Vmag=-1.50e-1;
parameter real Bps=2.0e10 from (0:inf);
parameter real Sigma=1.0e-11;
{
// Local Variables
integer bitnumber,newbitnumber,cbnum,cbval;
real jitter;
real c_0,c_1,c_2,d_1,d_2,d_3,T,X,p;
initial {
bitnumber=0;
newbitnumber=0;
jitter=0.0;
c_0 = 2.515517 ;
c_1 = 0.802853 ;
c_2 = 0.010328 ;
d_1 = 1.432788 ;
d_2 = 0.189269 ;
d_3 = 0.001308 ;
}
analog {
newbitnumber=getbitnum($time());
// time*bps, but want the fractions
V(sweep) <- ($time()*Bps - newbitnumber);
if (newbitnumber!=bitnumber)
{
bitnumber=newbitnumber;
// Create jitterval for this new bit
jitter=$random();
if (jitter<=0.5) p=jitter; else p=1.0-jitter;
T = sqrt( ln(1.0/(p*p)) );
X = T-(c_0 + c_1*T + c_2*(T*T))/(1 + d_1*T + d_2*(T*T) + d_3*(T*T*T));
if (jitter>0.5)
{
183
jitter=-1.0*X*Sigma;
}
else
{
jitter=X*Sigma;
}
$break_point((1.0+newbitnumber)/Bps+jitter);
}
V(Jout) <- jitter;
// Get possibly current different bit number
cbnum=floor(($time()+jitter)*Bps);
// Convert bit number to bit value
cbval=cbnum % 2;
V(d0) <- $slew(Offset-Vmag*(2*cbval-1),3.0e10,-3.0e10);
V(d1) <- $slew(Offset+Vmag*(2*cbval-1),3.0e10,-3.0e10);
}
}
184
Appendix F. Toplevel Chip Schematics
F.1. Serdes I Transmitter
185
F.2. Serdes I Receiver
186
F.3. Serdes II Tranciever
187