by
A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of
Master of Applied Science in
Electrical and Computer Engineering
Waterloo, Ontario, Canada 2003 c David J. Rennie, 2003
I hereby declare that I am the sole author of this thesis.
I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research.
David J. Rennie
I authorize the University of Waterloo to reproduce this thesis by photocopying or other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research.
David J. Rennie ii
The University of Waterloo requires the signatures of all persons using or photocopying this thesis. Please sign below, and give address and date.
iii
Acknowledgements
It’s hard to sum up on one page all the debts I owe to so many people.
I would like to thank my supervisor Dr. Sachdev for encouraging me to do graduate studies, and for making the experience enjoyable. His advice and support were always greatly appreciated.
I’d like to thank Stephen Docking, Ryan Burns and Arun Sharma for keeping my spirits high and my inbox full. Thanks to Stephen for dragging me into this, you were right. Thanks to Ryan for being a great flatmate, always ready with a sarcastic remark and a glass of scotch. Thanks to Arun for all his advice. We’ll figure it out one of these days.
For keeping me company in the VLSI lab I must thank Nelson Lam, Zhinian Shu and Tamer
Fahim. Thanks to Christian McArthur, for coffee at the C&D and beers at the grad house.
Thanks to Ronny Chan, for coming down and hanging out with the Waterloo crew so many times. Thanks to all my Flux friends, I always look forward to our gatherings. To all my other friends, thank you for the good memories.
A special word of thanks must go to Mary. Her support and encouragement meant more to me than I can properly convey.
And of course, where would one be without family? Thanks to my Mother and Father, for always being supportive of my academic choices, even though it meant rarely seeing me. Thanks to my sister for, well, for being a great sister. And to my little nephew Timmy, go into arts, ok?
iv
The volume of data which is transported over networks has dramatically increased over the past twenty years. Fibre optic cables are often used to transport data, as they have a much higher bandwidth and much less loss than traditional copper mediums. However, the interfaces between the fibre optic cables and electrical systems are a limiting factor with respect to the amount of data which can be transmitted. A great deal of research has gone into improving this situation.
Circuits which accept high speed data signals, then recover and re-time the data are known as clock and data recovery (CDR) circuits. Most leading edge CDR circuits use bipolar transistors implemented in technologies such as GaAs, SiGe and InP. However, there is a growing desire to implement these circuits in standard CMOS processes, which offer many benefits in terms of integration, power and cost. There are reasons why CDR circuits are not generally implemented in CMOS processes, as MOS transistors suffer from reduced performance as compared to bipolar transistors. Source coupled logic (SCL) is one method which helps enable a MOS based solution.
SCL gates can be implemented in CMOS processes and have been shown to operate at much higher frequencies than other logic families. However, few designers have the experience needed to implement circuits in SCL.
In this thesis a design methodology is introduced which provides a designer with a blueprint for a successful design of SCL gates to be used in a leading edge CDR circuit. The methodology gives the designer an understanding of the key relationships and tradeoffs between the different parameters in SCL circuits. In order to demonstrate the methodology, several gates common to
CDR circuits are optimized using the presented methodology.
As the goal of this thesis is to help enable integration of leading edge circuits into standard
CMOS processes, a 5Gbit/s CDR circuit is presented. The CDR circuit is implemented in a
0.18
µ m standard CMOS process and uses SCL logic. The SCL logic was optimized using the methodology introduced in this thesis. Simulations showed the CDR circuit operates well at
5Gbit/s.
v
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2 Why use CMOS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.3 Why use SCL? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2 Clock and Data Recovery Circuits 5
2.1 Introduction to CDR Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2 CDR Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2.1
Phase Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2.2
Charge Pump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.2.3
Low Pass Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4
Voltage Controlled Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1
Random Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2
Deterministic Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Jitter Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1
Root Mean Squared Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 vi
2.4.2
Peak-to-Peak Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.3
Bit Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Source-Coupled Logic 24
3.1 Logic Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Multi-Gbit/s CDR Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Basics of SCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 SCL Buffer/Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 SCL Gates Used in CDR Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.1
XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2
Multiplexor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.3
D Flip-Flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.4
Current Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Design Methodology 38
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 SCL Application Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Key Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Proposed Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Optimization of SCL Gates 53
5.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Buffer/Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3 XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.1
Symmetric vs non-symmetric XOR/MUX . . . . . . . . . . . . . . . . . . . 55
5.3.2
Optimization of a Symmetric XOR . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 MUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 vii
5.5 DFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6 Linear CDR Example 60
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.1
Hogge Phase Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.2
Charge Pump and Low Pass Filter . . . . . . . . . . . . . . . . . . . . . . . 62
6.2.3
LC-Tank Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.1
Transistor Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.2
Differential Pair Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3.3
Final CDR Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.4.1
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.4.2
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7 Conclusions 75
A Output of data from Cadence 77
B Matlab Code 78
B.1 Delay and Gain Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
B.1.1 delay.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
B.1.2 gain.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 viii
6.1 Low pass filter characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 CDR jitter measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 ix
2.1 The top-level architecture of a PLL . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2 The top-level architecture of a CDR . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.3 The transitions of the clock and data signals . . . . . . . . . . . . . . . . . . . . . .
7
2.4 Schematic of a phase-frequency detector . . . . . . . . . . . . . . . . . . . . . . . .
8
2.5 Gain of a binary and linear phase detector . . . . . . . . . . . . . . . . . . . . . . .
9
2.6 A conceptual implementation of a charge pump . . . . . . . . . . . . . . . . . . . . 10
2.7 a) simple current steering charge pump b) differential current steering charge pump 10
2.8 Schematic of a 2nd order low pass filter . . . . . . . . . . . . . . . . . . . . . . . . 11
2.9 Top level architecture of a ring oscillator . . . . . . . . . . . . . . . . . . . . . . . . 13
2.10 Schematic of an LC-tank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.11 Conversion of series parasitic resistances to a single parallel resistance . . . . . . . 14
2.12 Addition of − g m to compensate for parasitic losses . . . . . . . . . . . . . . . . . . 14
2.13 a) phase noise b) jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.14 Effect of impulse injected during peak and transition [1] . . . . . . . . . . . . . . . 17
2.15 An illustration of an random jitter on the eye diagram . . . . . . . . . . . . . . . . 18
2.16 An illustration of data dependant jitter . . . . . . . . . . . . . . . . . . . . . . . . 19
2.17 An illustration of pulse width distortion . . . . . . . . . . . . . . . . . . . . . . . . 20
2.18 The difference between RMS and peak-to-peak jitter measurements . . . . . . . . . 22 x
3.1 Schematic of a static CMOS NAND gate . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Schematic of a dynamic NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Voltage swing of differential and static logic . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Schematic of an SCL buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 The relationship between V id
( t ) and I ds
. . . . . . . . . . . . . . . . . . . . . . . . 30
3.6 Schematic of an XOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7 Operation of an SCL XOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8 Schematic of an SCL MUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.9 Example of the operation of the SCL MUX . . . . . . . . . . . . . . . . . . . . . . 34
3.10 The two D latches which make up a D flip-flop . . . . . . . . . . . . . . . . . . . . 35
3.11 Schematic of an SCL D latch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.12 Wide-swing cascode current mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1 Values of V swing
, I tail and W which give unity gain . . . . . . . . . . . . . . . . . . 42
4.2 Delay of a saturated differential pair varying I tail and W . . . . . . . . . . . . . . . 44
4.3 Delay of a saturated differential pair varying W and the output load . . . . . . . . 45
4.4 Delay of a saturated differential pair . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 Gain of a saturated differential pair . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6 Current switching of a differential pair biased in the linear region . . . . . . . . . . 49
4.7 Interference on the output of the latch . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.8 Relationship between latch output voltage, current switching and W . . . . . . . . 51
5.1 Gain of an SCL buffer with varying load . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Waveform of the output of an SCL XOR gate . . . . . . . . . . . . . . . . . . . . . 56
5.3 Schematic of a symmetric SCL XOR gate . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Waveform of the output of a symmetric SCL XOR gate . . . . . . . . . . . . . . . 58
6.1 Top level architecture of the Hogge phase detector . . . . . . . . . . . . . . . . . . 61 xi
6.2 Logical operation of a Hogge Phase Detector . . . . . . . . . . . . . . . . . . . . . 62
6.3 Architecture of the LC-tank oscillator . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.4 MOS differential pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.5 MOS differential pair layout using common centroid geometry . . . . . . . . . . . . 68
6.6 MOS differential pair layout using 1D common centroid geometry . . . . . . . . . . 69
6.7 Layout of a 18um differential pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.8 Layout of the CDR circut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.9 Waveforms of the input and re-timed data . . . . . . . . . . . . . . . . . . . . . . . 72
6.10 Simulated phase detector gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 xii
Advances in fibre optics and optical devices over the past twenty years have lead to a drastic increase the amount of data which can be transported. In fact, the technology involved in transmitting data over fibre optic cables is at the point where the electrical interfaces are the limiting factor in a serial I/O system. Often the signal from the fibre optic cable must be optically demultiplexed before it can be interfaced with an electrical system. The electrical circuit used to recover the data from a high-speed serial data signal is a clock and data recovery (CDR) circuit.
As electrical systems are much cheaper to manufacture than optical systems, a great deal of research has been put into increasing the capabilities of CDR circuits.
Most leading edge CDR circuits are not implemented with standard CMOS, but rather in less mainstream processes, such as SiGe, GaAs or InP. While these processes have attractive features, they are also much more expensive, and lack the ease of integration inherent to CMOS processes.
Source coupled logic (SCL) gates provide a means to implement leading edge CDR circuits in
1
Introduction 2
CMOS processes.
But why does it matter whether a CMOS process is used? While CMOS provides a desirable design environment for digital circuits, it is not optimal for high-speed analog or mixed mode circuits. The higher gain of bipolar transistors make them much more suitable for the type of circuits needed in leading edge CDR circuits. However, fabrication in a standard CMOS process means that it is possible to integrate the data processing capability on the same die as the CDR.
Fabrication in a CMOS process is also much cheaper than fabrication in other processes. This allows the designer to reduce both the system cost and complexity. These factors make standard
CMOS the process of choice, however, the drive for lower costs and higher integration does not negate the need for performance.
In a CDR circuit there are several logical operations which must be performed. There are many different logical styles which can be used to implement these gates in a standard CMOS process, including static CMOS, dynamic logic, pass-transistor logic, and current-steering logic [2]. SCL is the most common form of current-steering logic which uses MOS transistors [3]. Other MOS based current steering logic styles include CMOS current-steering logic (CSL)[4], folded sourcecoupled logic (FSCL) [3] and enhanced folded source-coupled logic (EFSCL) [5]. SCL is the MOS equivalent to current-mode logic (CML) in bipolar processes. In fact, SCL logic is also referred to as MOS CML [6].
The primary benefit of using SCL over other types of logic is speed. Circuits implementing with SCL logic can operate at higher frequencies than is possible using strictly static CMOS or even dynamic logic. SCL circuits are also ideal for integration into low-noise environments, due to their differential nature and their lack of switching noise. While SCL circuits offer many benefits, they also have some downsides. As SCL gates are based around current steering, they suffer from high power dissipation due to the constant current draw. Another difficulty faced when designing
Introduction 3
SCL is a general lack of experience in mixed-signal circuits. While SCL circuits implement digital functions, very few digital designers understand how to properly design them. And while an analog designer would better understand the fundamental concepts of gain and delay, they too face a lack of experience in designing circuits meant to implement logical functions. In light of these problems, the research in this thesis seeks to provide a methodology to aid designers in optimizing SCL gates optimized for use in leading edge clock and data recovery circuits. While this methodology can be used to design SCL gates for other application areas, it is aimed at CDR circuits.
The main contributions of this thesis are:
1.
Methodology
This thesis introduces a design methodology, the purpose of which is to optimize SCL gates for use in leading edge CDR circuits. The methodology helps designers understand the relationships and tradeoffs between the different circuits parameters, which is the key to a successful design.
2.
SCL Gate Optimization
In order to illustrate how the methodology works, four logic gates common to CDR circuits are implemented in SCL. The methodology was used to optimize these gates so they could be used in a CDR circuit with a data rate of 5Gbit/s.
3.
5Gbit/s CDR Circuit
A 5Gbit/s CDR is designed and implemented using SCL logic gates. The SCL gates used were optimized using the design methodology. The CDR circuit was designed, laid out, and back-annotation simulations were performed to test the operation.
Introduction 4
The thesis is organized as follows. Chapter 2 provides background information on CDR circuits, examining them from both a system and component level. Methods used to measure CDR performance are also discussed. Chapter 3 provides background information on different logic styles, focussing on SCL. The different gates used in CDR circuits and their SCL architectures are introduced here. Chapter 4 introduces the design methodology. Chapter 5 uses the design methodology in order to optimize several gates common in CDR circuits. Chapter 6 uses these optimized gates in a CDR circuit based around a linear Hogge phase detector. The CDR circuit is taken through a complete design flow, from design to schematic simulation to layout and to back-annotated simulations. Chapter 7 provides a summary and concludes the thesis.
Clock and data recovery circuits are a critical link between the transfer and processing of information. While the ability of computers to process information has increased drastically, along with the ability to transfer vast quantities of information, the interface between the two worlds is the focus of much research.
The purpose of a clock and data recovery is to take in a serial stream of data, recover the clock, and use that recovered clock signal to re-time the data. CDR circuits have been around for a long time, and are used in any situation where data entering an IC must be synchronized to an internal clock. The application area examined in this thesis are CDR circuits which operate at multi-Gbit/s data rates. Although there has been much research on multi-Gbit/s CDR circuits
[7, 8, 9, 10], there has been little published on circuit techniques to ensure a robust design.
The fundamental architecture of a CDR is very similar to a phase lock loop (PLL) [11]. The primary differences between the two are in the phase detector. The architecture of a PLL in
5
Clock and Data Recovery Circuits shown in Figure 2.1.
6
Figure 2.1: The top-level architecture of a PLL
The function of a PLL is to synchronize the VCO with an external reference clock. The output of the VCO is fed back into the phase detector, along with the reference clock signal. The phase detector determines the phase offset between the two signals. The outputs of the phase detector are UP and DOWN signals which control the charge pump in such a way as to reduce the phase offset. The UP and DOWN signals control the current flow from the charge pump into the low pass filter (LPF). The voltage on the LPF is a control voltage which determines the VCO frequency. The frequency divider divides the VCO output such that the frequency of the signal at the output of the frequency divider is equal to that of the reference clock.
The architecture of a CDR is very similar to the PLL and is shown in Figure 2.2 [12]. The
VCO output is similarly fed back into the phase detector. However, instead of the second input to the phase detector being a reference clock, it is a serial data stream. The phase detector, charge pump and LPF perform the same functions in the CDR as they do in the PLL, however their architectures may well be different.
In a CDR the phase detector performs the same function as in the PLL, namely determining the difference in phase between the two inputs. In a PLL, the phase detector sees a transition of the
Clock and Data Recovery Circuits 7
Figure 2.2: The top-level architecture of a CDR reference clock for every transition of the recovered clock. However, in a CDR the transitions of the data signal into the phase detector are random. This is illustrated in Figure 2.3.
Figure 2.3: The transitions of the clock and data signals
The fact that there is not a data transition for every transition of the clock leads to a phase detector design which is different from a phase detector for a PLL. The reason is that phase detectors used in PLLs trigger on every falling transition of each input. The most common form of phase detector in a PLL is called a phase-frequency detector (PFD). The schematic for a PFD is shown in Figure 2.4. In the PFD, if the output of the VCO lags the reference clock the UP signal will go high until the transition of the reference clock, which resets the flip flops. The UP signal causes the control voltage into the VCO to change, thereby correcting the phase error.
If this type of phase detector were used in a CDR the system would not be able to properly recover the data. This is because the PFD triggers every period of a clock input, however there would not necessarily be a corresponding data transition. This would be interpreted by the phase detector simply as a very large phase error, which it would try to correct by means of a very large
Clock and Data Recovery Circuits 8
Figure 2.4: Schematic of a phase-frequency detector
UP. All of this would be done in order to correct a phase difference which didn’t exist. Due to this, the phase detector used in a CDR circuit must only becomes operational when there is a data transition.
There are two primary classes of phase detectors used in CDR circuits, linear phase detectors and binary phase detectors [13, 14, 15]. The difference in these phase detectors is primarily in how they deal with phase errors. This property is referred to as phase detector gain. A linear phase detector applies an error signal proportional to the size of the phase error, therefore in a lock condition the output of the phase detector will be zero. However, a binary phase detector applies the same correction no matter if the phase error is large or small. This leads to the binary phase detector constantly moving back and forth across the zero phase error point. The relationship between phase error and error signal magnitude for linear and binary phase detectors is shown in Figure 2.5.
In a CDR circuit, the majority of the logical operations are performed in the phase detector.
Logic gates are used to perform the operations which determine the phase offset between the clock and data signals. The output of this logic controls the operation of the charge pump.
Clock and Data Recovery Circuits 9
Figure 2.5: Gain of a binary and linear phase detector
A charge pump is a circuit which changes the voltage on the low pass filter by adding or removing charge. The charge pump is controlled via UP and DOWN signals generated in the phase detector.
A conceptual diagram of a charge pump is shown in Figure 2.6. If the UP signal is on, the charge pump will add charge to the LPF, causing the control voltage into the VCO to increase. If the
DOWN signal is on the charge pump remove charge from the LPF, causing the control voltage into the VCO to decrease.
The figure shown in Figure 2.6 is a very simple representation of a charge pump. Most multi-Gbit/s CDR circuits use a current steering charge pump, as shown in Figure 2.7. Current steering charge pumps operate at much higher frequencies and with greater accuracy than charge pumps based on static logic. As SCL gates are differential by nature, most charge pumps are also differential, which helps to reduce the effects of common-mode noise. Figure 2.7a shows a non-differential current steering charge pump, whereas Figure 2.7b shows a differential current steering charge pump.
Clock and Data Recovery Circuits 10
Figure 2.6: A conceptual implementation of a charge pump
Figure 2.7: a) simple current steering charge pump b) differential current steering charge pump
The low pass filter (LPF) is a crucial part of a CDR. Changing the LPF allows a designer to alter the location of the poles and zeros in the transfer function of the CDR. This is important
Clock and Data Recovery Circuits 11 in controlling jitter, as will be described in the next section.
2.8.
The most commonly used filter in a CDR is the second order low pass filter, shown in Figure
Figure 2.8: Schematic of a 2nd order low pass filter
The LPF attenuates high-frequency noise on the VCO control line but allows the current flow from the charge pump. This current charges or discharges the capacitors in the LPF. This, in turn, changes their stored voltage. The voltage on the LPF is used to control the VCO. For this reason it is crucial that there be as little noise as possible on the LPF, as it directly translates to changes in frequency at the output of the VCO, which leads to jitter.
A voltage controlled oscillator (VCO) is a circuit that outputs a signal which oscillates at a particular frequency, which is based on a control voltage [16]. The VCO is possibly the analog circuit which has been researched more than any other. There are countless configurations of
VCOs and it is a circuit with a multitude of applications, including transmitters, receivers, PLLs
CDR circuits and clock generators. In both PLLs and CDR circuits the VCO plays an integral part, as it creates a clock signal which is phase aligned with an input signal. In the case of the
PLL that input signal is a reference clock, whereas in a CDR it is a serial data stream. Much
Clock and Data Recovery Circuits 12 research has gone into VCOs, for example, designing for low-power, large tuning range and good phase noise.
In a CDR circuit, the tuning range of the VCO indicates what data rates can be locked onto.
While there are designs which are able to lock to a wide range of frequencies [17], these are not the applications which are generally associated with SCL logic, so these will not be considered.
The applications which use SCL logic are generally very high performance circuits, aiming at a very particular data rate, hence the tuning range of the VCO is designed to be only enough to cover the parameter variations. VCOs with high tuning range have very high gain which amplifies noise on the VCO control line, leading to poor phase noise performance. The phase noise of a
VCO is of critical importance, as it is the frequency domain analog of jitter. Jitter is discussed in the next section. There are two basic architectures of VCOs which are commonly used in CDR circuits, the ring oscillator and the LC oscillator.
The ring oscillator achieves oscillation via a series of delay stages where the output is connected to the input so that the circuit is unstable and oscillates [18]. In order to oscillate, the combined phase shift of all the delay elements must equal 2 π . The architecture of a ring oscillator with four delay stages is shown in Figure 2.9. In this architecture each delay element must supply a phase shift of π
4
, with the inversion in the feedback path providing the remaining π phase shift. By varying the control voltage into each delay stage, the amount of delay can be controlled, which in turn controls the frequency at which the VCO oscillates. The frequency of oscillation of a ring oscillator is f osc
=
¡
2 N · T delay
¢
− 1
, where N is the number of delay elements and T delay is the delay through each delay element.
The LC oscillator is based on the LC-tank. An LC-tank is composed of an inductor and a capacitor, as shown in Figure 2.10. Assuming the inductor and the capacitor are lossless devices, once energy is introduced to the system, is it cycles without loss at a particular frequency.
Clock and Data Recovery Circuits 13
Figure 2.9: Top level architecture of a ring oscillator
Figure 2.10: Schematic of an LC-tank
The transfer function of the LC-tank is
H ( ω ) =
LC
1 − ω 2 · LC
(2.1)
An examination of the denominator of the transfer function shows that if the frequency of oscillation ω = √
1
LC
, the transfer function goes to infinity. This means that for an ideal LC-tank, there is an output at that frequency, even if there is no input. However, ideal inductors and capacitors do not exist, and this is especially true in a monolithic system. Both passive devices contain parasitic resistances which damp oscillations in the tank.
The primary source for these parasitic resistances is in the inductor and varactors, and these are generally modeled as series resistances. These series resistances can be converted to parallel resistances using the relationship R p
= Q 2 · R s
, where Q is the quality factor of the device. This relationship is illustrated in Figure 2.11.
Clock and Data Recovery Circuits 14
Figure 2.11: Conversion of series parasitic resistances to a single parallel resistance
Assuming that the parasitic resistances in the inductor and capacitor dominate the overall value of R p
, the final value for the parallel parasitic resistance is given as
R p
=
µ
1
Q 2
L
· R sL
+
Q 2
C
1
· R sC
¶
− 1
(2.2)
/noindent In order to compensate for these parasitic resistances a negative transconductance is added to the tank, as shown in Figure 2.12.
Figure 2.12: Addition of − g m to compensate for parasitic losses
In order for the LC-tank to sustain oscillations, the transconductance which is added to the system must be large enough to cancel out the parasitic resistances. As such, when designing an
Clock and Data Recovery Circuits 15
LC-tank, the following condition must be met.
g m
>
1
R p
(2.3)
R p in Equation 2.3 is a parallel representation of the parasitic resistances in the LC-tank. In the monolithic implementation of an LC-tank oscillator this transconductance is usually provided using a cross-coupled differential pair.
Jitter is the most important characteristic of a CDR. Jitter is a time domain measurement and is defined as the difference in time between when an event should have occurred, and when it actually did occur [19]. In the case of a CDR the events of interest are the zero-crossing points of our clock and data signals. Instantaneous jitter can be calculated via the equation j [ n ] = t
E
[ n ] ideal
− t
E
[ n ] actual
, (2.4) where j [ n ] is the jitter at transition n , and t
E
[ n ] is the time of the n th event.
Jitter is a time domain characterization, and is analogous to phase noise in the frequency domain [20]. This relationship is illustrated in Figure 2.13. Figure 2.13a shows the frequency domain representation of phase noise. While the dominant power spike is at the centre frequency, the sidebands indicate that there are frequency components at frequencies other than the centre frequency. Figure 2.13b illustrates the analogous time domain representation where there is a non-ideal zero-crossing time, which is jitter.
The purpose of a CDR is twofold. First the CDR recovers the clock. This means that the output of the VCO is phase aligned with the data in the serial data stream. Secondly the output of the VCO is used to re-time the data in an incoming serial data stream. Ideally the VCO output
Clock and Data Recovery Circuits 16
Figure 2.13: a) phase noise b) jitter is perfectly aligned with the data stream and we re-time the data with no errors. However, due to noise in the channel, phase noise in the VCO, and other factors, errors occur in the recovered data stream. Jitter provides a measure of how accurately the VCO output’s zero crossings are aligned with zero crossing points of the incoming serial data stream. This, in turn, can be correlated to the number of errors in the recovered data stream. There are two basic categorizations of jitter, random and deterministic, each of which is considered separately.
Random jitter is jitter which has no determinable pattern. Random jitter has a Gaussian probability density function, which implies that there is theoretically no maximum limit to its magnitude.
The primary cause of random jitter is noise from the circuit, namely thermal noise and flicker noise. The circuit designer has no control over these noise sources, they are always present.
The interaction of random noise with circuit elements produces timing errors. In SCL circuits, the point of reference for a signal changing its logical value is the crossing point of the differential signal. When there is no transition, random noise has little effect on the logical value. However,
Clock and Data Recovery Circuits 17 when there is noise during a transition, the zero crossing point is affected. This directly translates to jitter in a CDR system. This relationship is illustrated in Figure 2.14.
Figure 2.14: Effect of impulse injected during peak and transition [1]
In a CDR circuit, the ultimate effect of random jitter is to shrink the output data signal’s eye.
The eye diagram of a data signal indicates the area where the signal can be accurately sampled.
Random jitter acts to shrink the eye by moving the location of the zero-crossing points away from the ideal. An illustration of the effect of random jitter on the eye diagram is shown in Figure
2.15 [21].
So where does random jitter come from? In a CDR circuit there are both intentional and parasitic resistances, all of which generate noise as a consequence of Brownian motion. The noise
Clock and Data Recovery Circuits 18
Figure 2.15: An illustration of an random jitter on the eye diagram voltage associated with a resistor is v noise
= 4 KT R ∆ f [16]. Another source of random noise in
CMOS circuits is flicker noise. The actual causes of flicker noise remain something of a mystery, it is believed that flicker noise in MOSFETs is caused by charge trapping [16]. Flicker noise in
MOSFETs is a much greater problem than in bipolar devices, as MOSFETs are fabricated as surface devices.
Care must be taken when measuring random jitter. As it is random and has no specific value, it is measured as a root-mean-square (RMS) value. An RMS measurement must be used, as random jitter has a theoretically infinite maximum value. A peak-to-peak measurement would not yield a proper measurement.
Deterministic jitter refers to jitter which has a non-Gaussian probability density function. Deterministic jitter has identifiable causes and is limited in amplitude, unlike random jitter. It is caused by crosstalk, electromagnetic interference, and simultaneous switching outputs. Deterministic jitter is measured using a peak-to-peak jitter measurement. This is acceptable, as deterministic jitter is inherently bounded in magnitude, unlike random jitter. There are three
Clock and Data Recovery Circuits 19 primary types of deterministic jitter: Pattern Dependant Jitter, Pulse Width Distortion and
Bounded Uncorrelated Jitter.
1.
Pattern Dependant Jitter
Pattern dependant jitter, which is also called data dependant jitter and inter-symbol interference, is jitter caused by the shape of the input signal being dependant on the input data stream. This leads to a different system response for bit sequences with consecutive transitions, as opposed to bit sequences with long runs of ‘1’ or ‘0’. The problem is at its worst when the data sequence constantly transitions, i.e.: a data sequence of ‘10101010...’.
The problem of data dependant jitter is illustrated in Figure 2.16. While ideally the signal can be buffered such that this is not a problem, at very high bit rates it is not possible to get a perfect incoming data signal.
Figure 2.16: An illustration of data dependant jitter
2.
Pulse Width Distortion
Pulse width distortion (PWD), which is also called duty cycle distortion, is caused by a difference in the rise and fall times of a signal. This difference leads to a difference in the width of a pulse representing logic ‘1’ and the width of a pulse representing logic ‘0’. Figure
2.17 illustrates the problem of pulse width distortion.
Clock and Data Recovery Circuits 20
Figure 2.17: An illustration of pulse width distortion
Using differential logic (for example, SCL), the problem of pulse width distortion is eliminated. This is because the two differential signals are referenced with respect to each other.
Of primary concern in differential circuits are the zero crossing points of the signals, not their shape.
3.
Bounded Uncorrelated Jitter
Bounded uncorrelated jitter is jitter which is bounded in amplitude, yet uncorrelated to the data pattern. It is primarily sinusoidal in nature, and is caused by interference from signals sources other than the data signal either internal or external to the system. These signals can cause interference by way of capacitive or inductive coupling, or electromagnetic interference.
Clock and Data Recovery Circuits 21
Jitter is the primary criterion used to measure the performance of a CDR circuit. Three measurements that are used to determine the amount of jitter in a signal are root mean squared (RMS) jitter, peak-to-peak jitter and bit error rate (BER). While RMS jitter and peak-to-peak jitter are distinct measurements of jitter, BER provides a relationship between the two, and relates that to an identifiable system performance.
The root mean squared (RMS) jitter calculation is used as a measure of the average amount of jitter in a signal. The following equations are used compute RMS jitter.
µ j
=
1
N n =1 j [ n ] (2.5)
σ j
=
1
N − 1
X
( j [ n ] − µ j
) 2 n =1
(2.6)
In Equations 2.5 and 2.6, µ j is the mean value of the jitter, N is the number of jitter samples used in the calculations and σ j is the RMS average of the jitter.
While RMS jitter gives a measure of the average amount of jitter in a system, the peak-to-peak jitter measurement gives the worst case amount of jitter seen by the system over a given sample set. The equation for peak-to-peak jitter is
Jitter p − p
= max ( j [ n ]) − min ( j [ n ]) (2.7)
Clock and Data Recovery Circuits 22
The size of the sample set is important, as it must be large enough to give an accurate value for the peak-to-peak jitter.
Using an eye diagram it is possible to identify the different effects of random and deterministic jitter, and by implication, the difference in measuring RMS and peak-to-peak jitter. This difference is illustrated in Figure 2.18 [21].
Figure 2.18: The difference between RMS and peak-to-peak jitter measurements
The bit error rate (BER) is the probability of incorrectly identifying a data bit. In a CDR circuit is a very important measurement used to determine whether the system meets the intended specifications. For example, if a CDR circuit has a requirement of BER < 10 − 9 this means that the circuit must have less than one error for every billion bits transmitted. While the value of the BER is fundamentally linked to the amount of jitter present in the system, it is difficult to get an accurate analytic relationship between the two.
In [22] an equation is presented to attempt to relate jitter and BER. In Equation 2.8 the relationship between RMS and peak-to-peak jitter is given as α . In Equation 2.9, α is then related to the bit error rate using the complementary error function.
Clock and Data Recovery Circuits 23
Jitter p − p
BER =
= α × Jitter
1
2 erf c
³
√
RM S
´
2 × α
(2.8)
(2.9)
It should be noted that several different relationships were found in different papers. Generally speaking, BER is not calculated, but rather measured. There are dedicated BER analyzers which are generally used to measure the BER of a particular circuit.
In order to understand the benefits of source-coupled logic it is valuable to take an overview of the logic styles available to a designer. There are many different ways to implement a logical function in a standard CMOS process. Each of these has different advantages and disadvantages.
The purpose of this work is not to suggest that all circuits be implemented using SCL. Each logic family has their own application area where their use is appropriate.
1.
Static CMOS
Static logic is the most well known logic style for standard CMOS. In a static logic circuit, the output is connected to either the power or ground rail via a low resistance path. The only exception to this is while the output is switching. While the gate changes state, there is momentarily a path between the power and ground rails, which leads to a current spike.
This current spike is the cause of switching noise in static CMOS circuits. A static CMOS
NAND gate is shown in Figure 3.1. Note that this gate is fundamentally sequential. The output changes state as soon as the input changes.
24
Source-Coupled Logic 25
Figure 3.1: Schematic of a static CMOS NAND gate
2.
Dynamic Logic
Dynamic logic is based on two different operations: precharge and evaluate [2]. A dynamic
NAND gate is shown in Figure 3.2. When the clock is low, the output node is precharged to V
DD by way of the PMOS transistor. When the clock transitions high, the lower NMOS transistor provides a path to ground, but only if the two inputs are true. The NMOS pull-down network defines the logic function. As can be seen, the dynamic NAND gate is different from its static CMOS counterpart in that the output of the dynamic NAND gate cannot change until the clock signal is logically high. Because of this, dynamic gates are fundamentally synchronous. The performance of dynamic logic is much higher than static logic. The primary reason for this is the small input load to dynamic gates. However this performance comes at a price, namely higher power and reduced noise margins.
3.
Current Steering Logic Current steering logic is rarely implemented with MOS transistors. The most common implementation of current steering logic is in emitter coupled logic
(ECL) circuits which are implemented with bipolar transistors [23]. ECL circuits are very fast, due to both the switching properties of a differential pair and the high gain of bipolar
Source-Coupled Logic 26
Figure 3.2: Schematic of a dynamic NAND gate transistors. The MOS analogue of ECL is source-couple logic.
Current steering logic has benefits beyond speed. As it is differential, common-mode noise is almost eliminated. This means that if there is noise which couples to the differential lines, it will affect both lines equally. As the logical value of a differential signal is determined by one differential line taken with respect to the other, if there is equal noise on both lines it will cancel. Also, current steering logic lacks the switching noise characteristic of most other logic styles. This leads to a cleaner power and ground rails, and less noise injected into the substrate. These factors make current steering logic an attractive choice in terms of integration with sensitive analog components such as voltage controlled oscillators.
Multi-Gbit/s CDR circuits pose different challenges as compared to their lower frequency counterparts. At a certain level of performance traditional CMOS circuit techniques are no longer
Source-Coupled Logic 27 able to offer the necessary performance. The primary problem is that transistors implemented in modern standard CMOS processes (i.e.: 0.25
µ m, 0.18
µ m, and 0.13
µ m) are not able to operate with sufficient performance at very high frequencies (i.e.: > 5GHz). CDR circuits implemented using static or dynamic logic cannot operate at the multi-GHz frequencies required.
In order to allow these circuits to operate in a standard CMOS process, another type of logic is required. As will be seen, circuits implemented using source-coupled logic are able to operate at higher frequencies than their static counterparts. This allows multi-Gbit/s CDR circuits to be integrated into larger systems, reducing overall system costs.
Even in situations where a CDR could be implemented using logic styles other than SCL, it may still be advantageous to use SCL. SCL circuits are much more immune to noise and add much less noise to the system. Because of this, there may be situations where the CDR could operate when implemented using static CMOS or dynamic logic, however could only have the desired performance if implemented using SCL.
Logic gates implemented using SCL are based around steering current in such a way as to implement a particular logic function. A current mirror is used to provide a constant supply of current.
The inputs to an SCL gate use differential pairs to steer the current, thus performing the logical operation. Each differential pair acts as a single comparison, steering the current one way if the input is ‘1’ and another if the input is ‘0’. The buffer and the inverter are the only two logic gates which have one set of differential inputs. All other gates have multiple inputs, and they require multiple comparisons in order to properly determine the appropriate output. For example, in a two input AND gate, we must determine if input A is high or low, and we must determine if input B is high or low. In an SCL implementation this is done by stacking the differential pairs.
Each input into the gate acts as another level of differential pairs. This presents a limitation as voltage headroom becomes a problem. As the supply voltage is steadily decreasing with scaling,
Source-Coupled Logic 28 this problem is getting worse. Fortunately in a CDR system, there is very little need for logic gates with greater than two inputs.
SCL logic is fundamentally differential. In a differential circuit, the logical state of a signal is determined by taking one differential line referenced to its complementary differential line. This means that both the logical true and false outputs are available. With static logic, the output of the gate must be passed through an inverter to get the logically false output. Differential logic is also different from static logic in that differential signals do not use full swing. In static logic, when the logic is true the output is V dd
, and when it is false the output is zero. In an SCL gate, when the logic is true the output is V dd
, however when it is false the output is V dd
− V swing
. The difference between static and differential logic swings is illustrated in Figure 3.3.
Figure 3.3: Voltage swing of differential and static logic
The most basic gate is the buffer/inverter. The gate is both a buffer and an inverter due to the differential nature of SCL gates. A schematic of an SCL buffer is shown in Figure 3.4. If V out + and V out − were switched, the circuit would act as an inverter as opposed to a buffer.
Source-Coupled Logic 29
Figure 3.4: Schematic of an SCL buffer
In order to determine the output voltage of the SCL gate the drain current through the transistors in the differential pair is examined. The differential input to the differential pair is defined as
V id
( t ) = V in +
( t ) − V in −
( t ). The current through transistor Q1 is derived to be [24]
I ds +
( t ) =
I tail
2
+
µ n
C ox
W
L
4
V id
( t ) s
µ
4 · I tail n
C ox
W
L
− V id
2 ( t ) (3.1)
In Equation 3.2, I tail is the tail current generated by the buffer’s current source, W is the width of Q1, L is the length of Q1 and µ n and C ox are device parameters. Equation 3.2 is only valid while the current is switching in the differential pair. Once all of the current is going through either Q1 or Q2, this equation is no longer valid. The relationship between the input voltage and the drain current of Q1 is shown in Figure 3.5.
+ V a is the differential input voltage where all of the current is flowing through Q1. Conversely
Source-Coupled Logic 30
Figure 3.5: The relationship between V id
( t ) and I ds
− V a is the differential input voltage where all of the current is flowing through Q2. The range where Equation 3.2 is valid is − V a
< V id
< − V a
. The value of V a is
V a
= s
2 · I tail
µC ox
W
L
(3.2)
The single ended output voltages V out + and V out − are equal to:
V out +
( t ) = V dd
− I ds +
( t ) × R
V out −
( t ) = V dd
− ( I tail
− I ds +
( t )) × R
(3.3)
(3.4)
As can be seen from Figure 3.5, when the input voltage V id is zero, the amount of current through
Q1 is
I tail
2
. This means that
I tail
2 also goes through Q2. Intuitively (or using Equations 3.3 and
3.4) this leads to the output voltage V out + and V out − being equal. As the input voltage V id
( t ) increases, the drain current of transistor Q1 increases. As the current increases, the output voltage decreases, as the voltage drop across resistor R1 increases. When V id
( t ) = V a
, all of the
Source-Coupled Logic 31 current I tail is flowing through transistor Q1. In this case V out +
= V dd
− I tail
× R . However, as no current is flowing through transistor Q2, V out −
= V dd
.
The SCL buffer/inverter is a very simple gate, however it effectively demonstrates the importance of the differential pair to SCL logic. Now SCL gates with more than one differential pair will be examined.
The basis of this thesis is designing SCL gates which can operate in leading-edge CDR circuits.
Other than the buffer, which was described above, the logic gates most commonly used in CDR circuits are flip-flops, XOR gates, and multiplexors (MUXes). The architecture of each of these gates will be briefly examined here. In Chapter 5, each of these gates will be optimized.
The logical operation of an XOR gate is to determine when one signal is high at the same time as a second signal is low. As with all SCL gates, the SCL XOR gate works by steering the current in order to execute the proper logical logic function. Here a two-input XOR gate is examined, hence two levels of differential pairs are needed. The schematic for an SCL XOR gate is shown in Figure 3.6.
An example is presented here to help understand the operation of an SCL XOR gate. In this example the inputs V in 1 and V in 2 are logically low. This case is illustrated in Figure 3.7. As V in 1 is low, the current in diff pair #1 will flow through transistor Q2. Assuming the differential pair is fully switched, no current flows through transistor Q1, hence diff pair #2 is ignored. As V in 2 is also low, the current in diff pair #3 will flow through transistor Q5. If the current flows through
Q5, it must also flow through resistor R2, leading to a voltage drop across R2. Again, as it is assumed the differential pair has fully switched the current, no current flows through transistor
Source-Coupled Logic
Figure 3.6: Schematic of an XOR gate
Figure 3.7: Operation of an SCL XOR gate
32
Source-Coupled Logic 33
Q6. Hence, V out + will be logically low, and V out − will be logically high, which is the proper XOR operation.
The purpose of a MUX is to provide a low impedance path between one of the inputs and the output. Control signals determines which input is connected to the output. Although much large and more complicated MUXes can be designed in other logic types, only 2:1 MUXes will be considered here. In a 2:1 MUX one control signal is used to select which of two input signals will be connected to the output.
A schematic of the SCL MUX is virtually identical to that of the SCL XOR gate. However, the MUX has three inputs, as opposed to the XOR gate, which has but two inputs. The clock signal is the control signal in the SCL MUX, controling which input signal is connected to the output. The schematic for the SCL MUX is shown in Figure 3.8.
Figure 3.8: Schematic of an SCL MUX
Source-Coupled Logic 34
In the SCL implementation, the clock signal steers the current in such a way that either input
V in 1 or input V in 2 controls the value at the output. An example of a the function of an SCL multiplexer is show in Figure 3.9.
Figure 3.9: Example of the operation of the SCL MUX
In the example, the clock signal is logically high, hence the current is steered to differential pair
#2. This leads to signal V out being a buffered version of V in 1
. It can be seen from this example that if differential pair #1 does not steer all the current to differential pair #2, some of the current will go to differential pair #3. This leads to noise at the output, as part of V in 2 is superimposed onto V out
.
The D flip-flop (DFF) is the most important building block in a CDR circuit. It is used in almost all phase detectors, it is used to re-time the data signal, and it is used as a frequency divider.
A DFF is composed of two D latches connected in master/slave configuration, as illustrated in
Source-Coupled Logic
Figure 3.10. The clock signals into these two D latches are 180 ◦ out of phase.
35
Figure 3.10: The two D latches which make up a D flip-flop
When the clock into the DFF is high, the master latch samples the input signal, and the slave latch holds its value. When the clock is low, the master latch holds its value, which is sampled by the slave latch. Each D latch must have a sample stage and a hold stage. When the clock into the latch is high, the sample circuit is active, while the hold circuit is inactive. While the sample circuit is active, the input to the latch controls the value at the output of the latch. When the clock is low, the sampling stops and the hold circuit becomes active. While the hold circuit is active, the value of the input does not control the output, but rather the state of the data signal at the point of the clock transition is held. There are many different architectures of latches in static logic, however in SCL logic, the most common implementation is the one shown in Figure
3.11.
The sample circuit is composed of transistors Q3 and Q4, while the hold circuit is composed of transistors Q5 and Q6. The clock signal switches the current between these two circuits by way of the differential pair composed of Q1 and Q2. The sample circuit is a simple differential pair, where the input data signal steers the current to either R1 or R2. The hold circuit is a cross-coupled differential pair which is able to hold and regenerate a signal when it is active. It operates in a similar manner to two cross coupled CMOS inverters in a D latch implemented in a static CMOS technology.
Source-Coupled Logic 36
Figure 3.11: Schematic of an SCL D latch
As the current source of SCL gate supplies the tail current, it is very important that it be designed properly. For simplicity the schematics in this thesis have shown the current source as a single transistor, however the actual implementation is more complicated. In this thesis a wide-swing cascode current mirror is used in order to allow for a more accurate mirroring of the current [25].
The problem caused by using a single transistor as a current source is that the value of V
DS across the mirror can change the value of the tail current by a significant amount. If a wide-swing cascode current mirror is used, the tail current sunk by the current mirror is much less dependant on the voltage across the mirror. The basic schematic of the wide-swing cascode current mirror is shown in Figure 3.12.
When designing the current mirror it is desirable to minimize non-linearities so that the mirrored current is as close as possible to the desired current. To facilitate this, the length of the
MOSFETs in the current mirror is chosen to be larger than the minimum length. This reduces
Source-Coupled Logic 37
Figure 3.12: Wide-swing cascode current mirror the non-linearities caused by channel length modulation. Increasing the length also improves the matching between the transistors, again reducing non-linearities. It is common to make the length two to five times greater than the minimum.
Before an optimal design can be created, a designer must first understand the relationships between the different parameters in the system. There are two primary ways to determine the relationship between circuit parameters; one can find an analytical solution, or one can determine the relationships via experimentation. While an analytic expression is an ideal way to determine the relationships and tradeoffs between parameters, this is only so if the expression is accurate
[26]. There have been several attempts to determine these relationships in SCL logic gates by way of an analytic expression [27] [28]. The problem when dealing with SCL circuits is that there are simply too many variables, and too many simplifications must be made in the course of deriving an analytic expression. This is especially true for circuits operating at multi-GHz frequencies, as many second-order effects become significant. The design methodology presented in this thesis uses sets of simulations to provide the designer with a good understanding of the relationships between design variables. Once that is done, the designer will be able to optimize SCL gates capable of operating at the desired data rate.
38
Design Methodology 39
In this thesis the application area for SCL gates is multi-Gbit/s CDR circuits. In these circuits, only a few simple operations need to be performed, however the performance requirements are often at the limits of what the process can handle. In [6] and [29] circuits are implemented in SCL, however the application area is using these gates as an alternative to dynamic logic styles. While these circuits could be implemented in a logic style other than SCL, in these cases SCL gates provide an advantage in terms of power dissipation and robustness. This is an important distinction to understand, as different application areas require very different design methodologies. The gates implemented in this thesis could never be used for anything other than a very simple circuit, as they are large and dissipate a lot of power. However, this is the only way they can achieve the necessary performance.
The reason this is stressed is that there are papers with methods to optimize SCL gates [30]
[28], however the application area is not multi-Gbit/s CDR circuits. As such, the results one would get using the design methodology presented in this thesis may well be very different from those analyses.
There are four fundamental parameters which need to be considered when designing SCL circuits: current, delay, voltage swing and transistor width. These parameters are all inter-related. The voltage swing is directly proportional to the current, in that V swing
= I tail
× R . The amount of current that gets switched in a differential pair is proportional to the input voltage swing, the current and the transistor width, as can be seen in Equation 3.2. The delay is proportional to the swing in that the delay is dominated by the RC delay, and the value of R helps to determine the voltage swing. The delay is also proportional to the transistor width of the next stage, as that transistor’s gate capacitance dominates the C in the RC delay. What this methodology seeks to
Design Methodology 40 provide is a way to optimize all of these interrelated parameters.
With so many inter-related parameters, it is very difficult to assign one singular ‘perfectly optimized’ point. These parameters are very dependant on each other and on the loads they are driving. It is important to understand that the goal of this methodology is not to give one pat answer which will work in every situation, but rather to give the designer the appropriate sets of data and analyses which will allow them to understand the SCL circuits and apply the appropriate tradeoffs to create a successful design.
In the next section, simulations are performed in order to gather the data needed for the analyses. All simulations were done using 0.18
µ m standard CMOS transistor models. All of the circuits were designed to be used in a 5Gbit/s CDR circuit. All data presented in this thesis was gathered using Cadence (see Appendix A) and processed using Matlab (see Appendix B).
One important note which must be made relates to the testbench which is used for the simulations in the upcoming sections. One difficulty in simulating circuits operating at multi-
GHz frequencies involves the appropriate approximation of the shape of the signals. At multi-GHz frequencies, the shape of a clock pulse bears little resemblance to a square wave. The slew rates approach half the period, leading to a signal which is much better approximated via a sinusoid or a ramp function. Also, if an LC-tank oscillator is used, the clock signal will be purely sinusoidal.
Due to these considerations, the simulations referenced in the upcoming sections approximates inputs as sinusoidal in nature.
The voltage swing is an exceedingly important parameter in SCL circuits. Unlike static circuits, which operate with a full swing, SCL circuits operate in a differential mode. In a differential circuit, the state is considered to be a logical ‘1’ when V
+
− V
−
= V swing and a logical ‘0’ when V
+
− V
−
= − V swing
. SCL circuits theoretically operate faster with a
Design Methodology 41 smaller swing, as it takes less time to swing the voltage one way or the other. However, from Equation 3.2 is can be seen that the current in the differential pair will switch slower when the input has a smaller voltage swing, as a smaller swing inherently means that V id is smaller.
The frequency at which the SCL circuit operates is used as the starting point in determining the appropriate swing. The voltage swing must be small enough to allow the circuit to operate at the desired frequency, yet large enough to cause rapid switching of the current in the differential pairs. The relationship between the input voltage swing and the current switching is essentially gain. The amount of current switched will determine the output voltage swing. It is important for the SCL gates to have an output swing at least as large as the input voltage swing. Hence the gates will be designed to have a gain of at least one.
In order to better see the relationship between gain, V swing
, I tail and the width of the transistors in the differential pair a simulation is performed. Using a simple SCL buffer
(Figure 3.4), V swing
, I tail and the width of the transistors are varied. The length of the transistors is set to 0.18
µ m in all simulations, as this is the smallest gate length allowed and it gives the greatest gain and smallest area. The resistance in the SCL buffer was set so that if the current fully switched, the buffer would have a gain of 1.2. As the circuits are designed to operate in a 5Gbit/s CDR circuit, a 5GHz clock signal is the input to the buffer.
Figure 4.1 shows the results of the simulation. What is being sought with this simulation is an understanding of conditions which lead to a buffer with at least unity gain.
It is desirable to have a gain of at least one with as little current and as small a transistor width as possible, as current is proportional to power and transistor width is proportional to delay. Figure 4.1 shows some interesting interplay between the different parameters. In order to get gain it’s not as simple as increasing the tail current, or making the transistor in the differential pair larger. The different parameters must be balanced in order to get
Design Methodology 42
Figure 4.1: Values of V swing
, I tail and W which give unity gain the optimal design. It can be seen that as the swing increases, the amount of current it takes to get unity gain also increases. This makes sense, for if there is both a large swing and a low current, the resistance of the buffers must be high. This leads to a prohibitively large RC time constant for 5GHz operation. Figure 4.1 also shows that as the width of the transistors in the differential pair increases, the gain also increases. However, it is still desirable to select a transistor width as small as possible, as the transistor width determines
Design Methodology 43 the loading of the previous stage.
One very important point which must be emphasized is that the data in Figure 4.1 is based on a particular load. For a different load, the data will be different, hence a designer must be aware and use an appropriate load in the simulations. In this simulation the SCL buffer was loaded with two SCL buffers with transistors sized to 20 µ m.
The goal in this step is to determine the best voltage swing. There is no singular ‘right’ answer. In Figure 4.1 it can be seen that a voltage swing of around 0 .
6 V or 0 .
8 V provide the most flexibility in terms of the values of I tail and W which give a unity gain. Hence a voltage swing of 0 .
7 V will be used for the rest of these simulations.
Having determined the voltage swing which is to be used, the next step is to determine the value of the tail current. An increase in the value of the tail current translates to an immediate increase in the power consumption of the circuit, as
P = I tail
× V
DD
(4.1)
Although power increases linearly with I tail
, an increase in the tail current leads to a decrease in the delay. This is primarily due to the relationship between I tail
, R and the voltage swing. As the SCL circuit has a constant voltage swing, the value of R is dependant on I tail alone.
R =
V swing
I tail
(4.2)
A good first order approximation for the delay of a buffer is the RC time constant, where
R is the resistance of the SCL buffer, and C is the gate capacitance of the input transistor
Design Methodology 44 in the next stage. If I tail is increased, R decreases, hence the delay decreases. In order to better analyze this relationship, a simulation is performed. The delay of an SCL buffer is observed with respect to the I tail and the width of the transistors in the differential pair.
The input voltage swing is 0 .
7 V , as was selected in Step 1. The results of the simulation are shown in Figure 4.2.
Figure 4.2: Delay of a saturated differential pair varying I tail and W
As can be seen, the delay is strongly correlated to the tail current. The width of the transistors in the differential pair is only slightly correlated to the delay. This slight increase in the delay is due to the fact that the drain capacitance of the transistors causes an increase in the RC time constant at the output node of the buffer. However, this increase is not of particular relevance.
The goal in this step is to determine the best value of the tail current. Again, there is no singular right answer. What is desired is a tail current which gives a sufficiently low delay, sufficient gain, and reasonable power consumption. Based on Figures 4.1 and 4.2, the
Design Methodology 45 value of the tail current is selected as 4 mA . This will provide sufficient gain, while allowing a stage delay of less than 20 ps .
In this simulation, as in the simulations in Step 1, the results are based on the SCL buffer driving a constant load. As in Step 1, the SCL buffer is loaded by two SCL buffers with 20 µ m transistors. It is noted that this is quite a large load, and examining the effects of varying the load provides some useful results. Another simulation is performed having one SCL buffer load another SCL buffer. The size of the driving differential pair is varied, as is the size of the load differential pair. The input voltage swing is 0 .
7 V and the tail current is 4 mA . The results of the simulation are shown in Figure 4.3.
Figure 4.3: Delay of a saturated differential pair varying W and the output load
As can be seen, changing the load has a large effect on the delay. The values of the delays in Figure 4.3 are quite a bit lower than Figure 4.2, due to two factors. First, the load consists of only one SCL buffer, and secondly the width of the load buffer is not constant. Figure 4.3 shows that the width of the load differential pair has a larger effect on
Design Methodology 46 the delay than the width of the driver differential pair, due to the fact that the change in the gate capacitance of the load transistors is more significant than the change in the drain capacitance of the driver transistors . Figure 4.3 shows that load capacitance has a large, and in fact dominant effect on the overall delay of the gate. Due to this one must take care that the overall load capacitance (including parasitics) does not lead to the circuit having a greater than desired delay.
In SCL circuits, differential pairs which operate in both the saturation region and the linear region are found. In this step, differential pairs biased in the saturation region are considered. In order for these differential pairs to be effective, they must have a gain of at least one.
Delay:
With an SCL gate, both input delay and output delay must be considered. A good first order approximation is that input delay is directly proportional to the width of the input transistor, and output delay is directly proportional to the value of the output resistor.
This is as the RC delay dominates, with the R coming primarily from the resistor, and the
C primarily coming from the gate capacitance of the transistor. As logic gates are generally connected to each other, one gate provides the R and the other gate provides the C .
In order to minimize the output delay, the value of the output resistor is set such that
I tail
× R = V swing
× gain . Now higher gain is good, as it will lead to a more robust circuit, however it leads to higher resistance, which in turn leads to greater delay. A gain of approximately 1.2 should be sufficient whilst still keeping the delay low. As I tail
=4mA,
V swing
=0.7V and gain=1.2, the value of the resistor is set at 210Ω.
To begin, the effect of transistor width on the output delay of the SCL gate is examined.
The input swing used in this simulation is 0.7V, which was chosen in Step 1, and the tail
Design Methodology 47 current through the differential pair is 4 mA , as chosen in Step 2. The results of the simulation are shown in Figure 4.4. As can be seen, the width has little effect on the delay, which is dominated by the resistor and the input capacitance of the next stage.
Figure 4.4: Delay of a saturated differential pair
Gain:
In order for an SCL gates to be connected together an operate properly, they each must have a gain of at least one. The current plays a large role in determining how fast an SCL gate operates as the current helps to determine how fast a differential pair can switch, as can be seen in Equation 3.2. The current also effects the output delay via the output resistance, as discussed in the previous section. The width of the transistors in the differential pair also affects how fast a differential pair can switch state. If the transistors are not large enough, they will not fully switch the current, and the gain will decrease.
In Step 2 a tail current of 4 mA was chosen. In order to get a better understanding of what differential pair widths provide a gain of at least one, the data from figure 4.1 is narrowed to show only the results when V swing
= 0 .
7 V and I tail
= 4 mA . This data is presented in Figure 4.5.
Design Methodology 48
Figure 4.5: Gain of a saturated differential pair
Figure 4.5 shows that the widths of the transistors in the differential pair must be at least 12 µ m wide in order to provide unity gain. As the width of the transistors in differential pair does not have a significant effect on the delay, this factor is not considered. It is wise to select a width beyond the bare minimum unity gain, hence for a saturated differential pair a transistor width of 18 µ m is chosen. Beyond this width, there is little gain to be achieved. The primary effect will be to increase the loading on the previous stage.
Step 3 looked at the effects when parameters in a saturated differential pair were varied.
This step examines the effect of varying parameters in a differential pair biased in the linear region. It is important here to examine what region the different differential pairs are biased in. SCL XOR gates, MUXes and DFFs have architectures where differential pairs are stacked. The purpose of each differential pair is to steer the current based on the input, however the differential pairs which are stacked above the lower differential pair are under different biasing conditions. While the upper differential pairs are biased in the
Design Methodology 49 saturation region, the lower differential pairs are based in the linear region.
In this step the relationship between the voltage swing of the input signal, transistor width and the amount of current switched is examined for differential pairs biased in the linear region. The data from the simulation is show in Figure 4.6. In this simulation, an
XOR structure was used and the tail current was set to be 4mA. The plane is drawn to show where 90% of the current is switched. The area of the mesh above the plane show values of V swing and W which lead to 90% current switching with the input clock toggling at 5GHz.
Figure 4.6: Current switching of a differential pair biased in the linear region
This simulation gives a set of values of V swing and W where the differential pair will switch at least 90% of the current. Ideally the differential pair would switch 100% of the current, however as Figure 4.6 shows this is not a realistic expectation. The input voltage and transistor width would both have to be quite large for the differential pairs to completely switch the current. For a voltage swing of 0.7V, the transistors in the differential pair must
Design Methodology 50 have a width of at least 16 µ m in order to switch 90% of the current, given a 5GHz input.
In order to ensure 90% of the current switches, these transistors will be sized to 20 µ m.
It may be questioned why V swing is a variable in this simulation, as V swing was already set in Step 1. This is true, and for XOR gates this is the V swing which matters, however this is not the case for the DFF and MUX. In a DFF and a MUX, the input to this current switching differential pair is V clock
. The voltage swing of V clock is a separate variable which can be set in order to ensure greater current switching in the SCL MUX and DFF.
The latch is an essential part of the DFF. The latch must be able to keep the last sampled value once the current has switched. A problem arises, as was seen in Step 4, in that it is very difficult to completely switch the current. Hence, the latch must also contend with the noise caused by the sample circuit.
In Figure 4.7 the effect of interference from the sampling circuit can be seen. In this simulation, 85% of the current flows through the hold circuit, and 15% of the current flows through the sample circuit. The latch is able to hold the data, however, there are significant ripples on the output signal, caused by interference from the sample circuit.
The relationship between the ability of the latch to hold the data, the width of the transistors in the latch, and the percentage of the current switched to the hold circuit must be analyzed. To this end a simulation is run in order to see under what circumstances the latch is able to hold the data. Figure 4.8 shows the output voltage swing of the latch, with respect to the width of the latch, and the percentage of the tail current which flows through the latch. The value of the tail current is 4mA and the value of the resistors in the latch are set to 200Ω.
Figure 4.8 shows the value of the output swing is easily corrupted, if the transistor width
Design Methodology
Figure 4.7: Interference on the output of the latch
Figure 4.8: Relationship between latch output voltage, current switching and W
51
Design Methodology 52 of the latch and current switching differential pair are not properly taken into consideration.
The plane is at V swing
= 0 .
6 V , which is a large enough V swing to properly trigger the next stage. Based on this data, the width of the differential pairs is selected to be 24 µ m. This is large enough to ensure that the data is properly latched, assuming at least 80% of the tail current is switched to the hold circuit. The width of the transistors in the latch circuit is selected to be as low as possible, as the gates of the latch are connected to the output of our latch. Making these transistors larger will increase their gate capacitance, and hence the amount of delay they add to the DFF.
At this point, all the information needed has been gathered. Using the results of the five steps outlined here, all of the gates needed for a high speed CDR can be designed. The actual design of those gates is described in Chapter 5. The key is to use this information generated in these steps to understand the relationships between the parameters and thus be able to design robust, efficient circuits with the appropriate tradeoffs in power, delay and area.
The steps described are meant to be process independent. It is key to note that several of the assumptions made are only valid for circuits running at high frequencies. The simulations presented here were performed on circuits implemented in a 0.18
µ m process running at 5GHz.
A 5GHz circuit stresses the 0.18
µ m technology, however on a 90 nm process that may not be the case. A static CMOS circuit implemented in a 90nm technology might be able to perform a logical operation which would only have been possible using SCL logic in a 0.18
µ m technology.
Therefore it is important that the designer have some familiarity with the process before deciding whether this design methodology is appropriate or not.
What does it really mean to optimize an SCL gate? In this case what is meant is the optimization of a number of different parameters which leads to SCL gates which operate at the desired bit rate and are still efficient in terms of area and power. The parameters considered are current, delay, voltage swing and transistor width. As in the previous chapter, the gates are being optimized for use in a 5Gbit/s CDR and are implemented in a 0.18
µ m standard CMOS process.
The buffer is the simplest of the SCL gates, so it will be optimized first. In the previous chapter several simulations were performed on a simple SCL buffer in order to gain information on the interplay between delay, voltage swing, current and transistor width. Using this information, the different parameters are set.
In Step 1 it was decided that a voltage swing of 700mV was to be used. In Step 2 it was decided that the given a tail current of 4mA would be used. The value of the tail current was chosen as a balance of keeping the gain above unity while still minimizing both delay and power.
53
Optimization of SCL Gates 54
In Step 3 it was decided that a transistor width of 18 µ m would provide sufficient gain with low enough delay. A buffer can be used to drive a wide variety of loads in a CDR circuit. In order to ensure that a buffer with these parameters is able to drive a variety of loads, a simulation is performed. In this simulation, the buffer drives two other SCL buffers, whose transistor widths are varied. The results of the simulation are shown in Figure 5.1.
Figure 5.1: Gain of an SCL buffer with varying load
From this simulation, it can be seen that the SCL buffer with a width of 18 µ m is strong enough to drive a wide range of loads, with a gain of approximately one. This will become more clear later, however what this shows is that this gate could be used to buffer the input to an XOR gate or a DFF. Although the gain is below unity when the widths of the transistors in the next stage are larger than 28 µ m (with a fanout of two), that is a large load, and this is an acceptable tradeoff. It can be seen from Figure 4.1 that even when the swing falls below 0.7V, it is still sufficiently large to drive the next SCL gate.
Optimization of SCL Gates 55
In multi-Gb/s CDR circuits, the XOR gate is stressed, especially when a linear phase detector is used. In most linear phase detectors the period of the signal at the output of the XOR gate is approximately half that of the period of the data rate. For example, in a 5Gbit/s CDR circuit with a linear phase detector, the period of the data pulse is 200ps, whereas the period for signal at the output of the XOR gate is 100ps. When the clock and data are out of phase, this number can drop (See Chapter 6 for more information on linear phase detectors.). It is very difficult to generate these outputs, hence the need for a highly optimized design.
The circuit for the SCL XOR gate is shown in Figure 3.6. The lower differential pair is a current steering differential pair, and the procedure for optimization is given in Step 3. From Step
1 the swing is known to be 0.7V. Based on the data in Figure 4.6, the width of the transistors in this differential pair is chosen to be 20 µ m.
The upper two differential pairs are saturated differential pairs, and the optimization strategy is given in Step 2. As the tail current is 4 mA and the voltage swing is 0.7V, the widths of the transistors in these differential pairs is chosen as 18 µ m.
In order to test SCL XOR gate sized above, a simulation is performed. The inputs to the
XOR gate are two identical 5Gbit/s random data signals. The only difference is that one of the signals is delayed by one clock period. The differential output of the XOR gate is shown in Figure
5.2.
As can be seen in Figure 5.2, a problem is encountered in the XOR gate due to the different capacitive load seen by the two inputs.
V in 1 sees the capacitive load of differential pair, however
V in 1 sees the capacitive load of two differential pairs. Another problem is that V in 2 cannot start steering the current until it has been first steered by V in 1
. These two factors lead to the nonsymmetric output shown in Figure 5.2.
Optimization of SCL Gates 56
Figure 5.2: Waveform of the output of an SCL XOR gate
This problem can be solved by connecting two SCL XOR gates in parallel with their inputs switched, as shown in Fig 5.3.
Figure 5.3: Schematic of a symmetric SCL XOR gate
As all the inputs are loaded the same, and the logical paths are all identical, the output signal
Optimization of SCL Gates 57 of this gate is symmetric. While this symmetry is desirable, it comes at a price. This gate uses twice as many transistors, which means that the previous stage is loaded twice as much. Also, this gate uses twice as much current as the non-differential XOR gate. However, when operating a CDR near the limit of a particular process, the need for accurate output from the logic gates is still critical. In phase detectors, especially linear phase detectors, the XOR gate is a crucial component, and the price of extra power and area is one which must be paid.
In the optimization of a symmetric XOR, much the same strategy is employed as with a nonsymmetric XOR, however there are some problems which must be taken into consideration. One problem which must be looked at is excessive loading. With a non-symmetric SCL XOR gate, one input sees a 20 µ m transistor, while the other input sees two 18 µ m transistors. If the sizings remain unchanged, each input to the symmetric XOR gate will see two 18 µ m transistors and a
20 µ m transistor. This could potentially add too much parasitic capacitance to the previous stage.
However, the only way to alleviate this is to reduce the widths of the transistors, which is not an optimal solution. It is preferable to buffer the inputs to the XOR gate.
In order to test the symmetric SCL XOR gate a simulation is performed. As in the previous simulation, the inputs are two identical 5Gb/s random data pulses, one delayed by a clock period.
The results of the simulation are shown in Figure 5.4. As can be seen, the shape of the positive and negative output voltages are identical.
As the SCL XOR and MUX are very similar, the analysis is almost identical. The only difference is the fact that a mirror SCL circuit is not needed in order to make the MUX symmetric. The two inputs V in 1 and V in 2 see the same input capacitance. Although the clock circuit sees a larger input capacitance, this is not a problem.
Optimization of SCL Gates 58
Figure 5.4: Waveform of the output of a symmetric SCL XOR gate
In order to optimize the MUX, the same operation is performed as on the basic XOR gate.
The transistors in the lower differential pair must be sized such that as much of the current as possible is switched. As such, these are sized at 20 µ m. This guarantees that at least 90% of the current is switched, which helps to eliminate interference from the opposite input. The transistors in the upper differential pairs must be sized in an identical manner. It is desirable that these differential pairs have a gain of at least one, while loading the inputs as little as possible. From
Step 3, these transistors are sized to 18 µ m.
There are three sub-circuits to the DFF: the current steering differential pair, the sample differential pair, and the hold latch. Each of these sub-circuits must be considered.
The clock is connected to the differential pair which steers the current to either the sample or the hold circuitry. This differential pair operates in the linear region, therefore the optimization strategy in Step 4 is examined. As the input to the DFF is the clock signal, the voltage may be higher than the 700mV swing assumed for the data signals. Referring back to Figure 4.6, it can be seen that if the transistor width is 20 µ m, any swing greater than 700mV will lead to at least
Optimization of SCL Gates 59
90% of the current being switched.
The sample circuit is simply a differential pair. Using the optimization strategy discussed in
Step 3, the width of the transistors in this differential pair are sized to 18 µ m. The hold circuit is a cross-coupled differential pair, and the optimization strategy for this circuit was discussed in
Step 5. Using the analysis presented, the width of the transistors in the latch is set at 24 µ m.
A design methodology for optimizing SCL gates has been presented. This methodology has been used to optimize individual gates. However, the aim of this thesis was to design SCL gates for use in leading edge clock and data recovery circuit. Hence, in this chapter, a 5Gbit/s CDR circuit is designed and simulated. The circuit is a linear CDR, based around the Hogge phase detector.
The results of both schematic and back-annotated simulations are presented.
The block level architecture of a CDR circuit was given in Figure 2.2. From this, it can be seen that a phase detector, charge pump, low-pass filter, and LC-tank VCO must be designed. In the
CDR presented here, a Hogge Phase detector is used.
There are many different architectures of linear phase detectors for CDR circuits [31]. For this thesis, a phase detector architecture which uses a number of SCL gates is desirable, in order to
60
Linear CDR Example 61 properly demonstrate the methodology. A good choice is the Hogge phase detector [32]. The
Hogge phase detector is one of the simplest linear phase detector. Its architecture is comprised of two DFF and two XOR gates, as shown in Figure 6.1.
Figure 6.1: Top level architecture of the Hogge phase detector
Virtually all of the SCL gates in this CDR are located in the Hogge phase detector. The logic is used to determine the phase difference between the clock and data signals. If the clock and data signals are in perfect phase the outputs of the two XOR gates will be equal. Hence, in this case, the UP and DOWN signals to the charge pump are of equal width. As the charge pump acts as an integrator, the output of the charge pump after the UP and DOWN pulses will be same as before. Two examples showing the logical operation of the Hogge phase detector are shown in Figure 6.2. In Figure 6.2a the case where the clock and the data are perfectly in phase is illustrated. In this case the UP and DOWN pulses are of equal width. This means that the charge pump will equally charge and the discharge the capacitor in the low pass filter, leading to no net change in the VCO control voltage. Figure 6.2b illustrates the case where the clock leads
Linear CDR Example 62 the data. In this case the width of the UP signal is smaller than the width of the DOWN signal.
This will lead to a net discharge of the VCO control voltage in order to reduce the phase error.
Figure 6.2: Logical operation of a Hogge Phase Detector
In order for the Hogge phase detector to operate properly, the UP and DOWN signals must be symmetric. In order to achieve this, symmetric SCL XOR gates (as designed in Chapter 5) are used. The delay elements are simply buffers, which are used to compensate for the setup time into the DFFs. These delay elements ensure that when the clock and data signals are perfectly in phase, the UP and DOWN signals going into the charge pump are identical.
The charge pump is the circuit which takes the output of the phase detector and changes the voltage on the low pass filter by adding or removing charge. The charge pump architecture used in this thesis is based on current steering. The circuit is shown in Figure 2.7b and its operation
Linear CDR Example 63 was previously discussed in Section 2.3.2.
Many papers have been devoted to the optimization of the low pass filter’s characteristics
[33, 34, 35]. For the design in this thesis, the procedure to optimize the values of the LPF components outlined in [36] was followed. The filter used was a second order low pass filter
(Figure 2.8). The values for the filter components are given in Table 6.1.
Table 6.1: Low pass filter characteristics
Characteristic Value
R 1.57KΩ
C1
C2
2nF
200pF
Many papers are also devoted to the optimization of LC-tank oscillators [37, 38]. The oscillator used in this design is not meant to be a perfectly optimized LC-tank oscillator, it was simply designed to oscillate at 5GHz with a reasonable phase noise. An LC-tank oscillator was chosen over a ring oscillator, as in the 0.18
µ m technology used, the phase noise performance of a 5GHz ring oscillator is unacceptable.
The architecture of the LC-tank oscillator used in this thesis is shown in Figure 6.3. The crosscoupled transistors in the LC-tank oscillator provide the required negative g m
, as described in
2.3.4. In order for the LC-tank to oscillate, the transconductance of the cross-coupled differential pair must be large enough to satisfy the condition in Equation 2.3.
The oscillator must be able to oscillate over a range of frequencies. In the architecture shown in Figure 6.3, variable capacitors are used to alter the frequency, as the frequency of oscillation is based upon the relationship: ω = √
1
LC
. In this architecture there are two tuning voltages, V coarse
Linear CDR Example 64
Figure 6.3: Architecture of the LC-tank oscillator and V f ine
.
V coarse controls a large variable capacitor C
1
, hence it can vary the frequency over a wider range. The purpose of V coarse is primarily to compensate for process variations. For this circuit, V coarse will be set manually, whereas V f ine will be the voltage which is controlled by the
CDR.
V f ine controls a much smaller variable capacitor, hence a much smaller frequency range.
It is desirable to have a small frequency range for the CDR, as noise on the control line of the oscillator will not lead to as much phase noise as compared to the case where the control line is connected to a large variable capacitance.
The designer can control the g m of a transistor using a combination of the W
L ratio and the current through the transistor. Increasing the current produces another benefit in that it
Linear CDR Example 65 increases the output voltage swing, due to the fact that the output voltage is controlled by current switching across the inductor ( V = L · di dt
). However, increasing the current also increases the power consumption, hence there are limits to increasing the current. Increasing the width of the transistor increases the transconductance, however it also increases the parasitic capacitance seen at the gate of the transistor. This parasitic capacitance decreases the tuning range of the oscillator, hence must be kept small enough to allow for the desired tuning range.
The oscillator designed for this CDR has a coarse tuning range of 22 .
5% and a fine tuning range of 1 .
2%. The coarse tuning range should be sufficient to account for process variations, and the fine tuning range is sufficient to ensure the CDR circuit is able to track phase errors.
The CDR was laid out in a 0.18
µ m six metal layer process. The lower metal layers were used to route the signals, while the upper metal layers were used to route power, ground and bias signals.
Back-annotation with parasitics was performed in order to more accurately simulate the actual silicon performance of the CDR circuit.
Layout plays a huge role in the overall performance of a circuit. When designing an SCL circuit for operation at high frequency special care must be taken to ensure an optimal layout.
Two techniques which will be briefly looked are transistor folding and common-centroid geometry.
Transistor folding is a process where a large transistor is divided into smaller transistors which are connected in parallel. Here the relationship between transistor folding and gate resistance is examined. At frequencies above a few GHz, the gate resistance of a transistor can have a noticeable effect on the operation of the circuit. The problem is fundamentally one of delay.
The gate oxide of a transistor adds a capacitive load on a circuit. When we refer to the input a subsequent stage as a capacitive load, this load is most often the input to the gate of a transistor.
Linear CDR Example 66
However, the gate also has an intrinsic resistance. The gate resistance of a transistor is always there, and it combines with the gate capacitance of a transistor to form an RC time constant.
The value of the gate capacitance of a MOS transistor is approximately
C gate
= W · L ·
² ox t ox
, and the value of the gate resistance is approximately
(6.1)
R gate
=
W
3 · L
×
R sheet
F old 2 · Contact 2
, (6.2) where Fold refers to the transistor folding and Contact refers to the number of gate contacts the transistor has.
It is desirable to have the delay seen looking into the gate of the transistor be as low as possible, hence the RC time constant should be as low as possible. Once the design is completed, the designer has no control over the gate capacitance of a transistor. The oxide capacitance dominates, and this is fixed. However, an examination of Equation 6.2 shows ways of reducing the resistance seen looking into the gate. In the layout of the transistor, the designer has no control over W , L or R sheet
, however the designer does have control over the number of folds and the number of gate contacts. Due to the fact that the differential pairs are laid out using common centroid geometry (which will be discussed in the next section) changing the number of gate contacts is not practical. However, transistor folding is very practical and should be used.
In order to ensure that the delay into the gate does not effect the operation of the circuit, the delay should be very small. There is no firm rule as to exactly how small it should be, however it should be an insignificant portion of the period. As a rule of thumb, the designs in this thesis aim to have the RC delay into any transistor be less than 1% of the period.
In order to illustrate this, take the example of a transistor width 28 µ m and length 0.18
µ m.
Here the capacitance seen into the gate is approximately 50 f F . If no folding is used, using
Linear CDR Example 67
Equation 6.2, the delay into the gate (assuming an R sheet of 10Ω /
¤
) would be
τ
RC
= 50 f F ×
28 µm
3 · 0 .
18 µm
×
10
1 2 · 1 2
= 26 ps (6.3)
26ps is a significant fraction of the 200ps period in a system designed to operate at 5GHz, and this amount of delay is unacceptable. However, if the transistors are folded, this can be drastically reduced. For example, if the transistor is folded seven times, such that there are seven transistors of width 4 µ m in parallel, the delay seen into the gate reduces to
τ
RC
= 50 f F ×
28 µm
3 · 0 .
18 µm
×
10
7 2 · 1 2
= 530 f s
530fs is an insignificant percentage of the 200ps period, hence this layout is acceptable.
(6.4)
One of the largest difference between the layout of standard CMOS logic and SCL layout is the importance of transistor matching. If transistors are matched, it means that they are identical in all their characteristic. If the transistors in a differential pair are not matched, the current will not switch at the zero-crossing point of the input signals, but rather at some offset. As SCL circuits are based around the differential pairs switching at the zero-crossing points, steps much be taken to ensure transistor matching.
In order to make the layout as insensitive as possible to process variations, common-centroid geometry is used. Common-centroid geometry is a way of laying out a differential pair such that any variation in process parameters will affect both transistors in the same manner. This provides an assurance of greater linearity in silicon.
Common centroid geometry works in parallel with transistor folding. In a differential pair there are two transistors with their drain shorted, as shown in Figure 6.4.
Linear CDR Example 68
Figure 6.4: MOS differential pair
As an example, a transistor is folded 16 times and laid out using common centroid geometry. The block level layout is shown in Figure 6.5. As can be seen from Figure 6.5 any doping gradient would affect transistor A and transistor B the same way. Because of this, the transistors in the differential pair will remain matched, even if their characteristics are changed somewhat.
Figure 6.5: MOS differential pair layout using common centroid geometry
Linear CDR Example 69
In the case of the SCL gates designed in this thesis, two dimensional common centroid geometry was not used. It can be seen from Figure 6.5 that a large amount of routing is needed for two dimensional common centroid geometry. In order to reduce the amount of routing needed, the number of rows is reduced to one. Figure 6.6 shows the block level layout of a differential pair whose transistors have been folded six times and laid out using one dimensional common centroid geometry.
Figure 6.6: MOS differential pair layout using 1D common centroid geometry
In order to make the layout as compact as possible, the drain and source connections were shared between neighboring transistors. Any process variations affect both transistors in the same way, hence the transistors in the differential pair will remain matched. Figure 6.7 shows the final layout of a 18 µ m differential pair.
Linear CDR Example 70
Figure 6.7: Layout of a 18um differential pair
Using these techniques, as well as common layout techniques to keep parasitics low and the circuit well matched, the CDR circuit was laid out. The final layout is shown is shown in Figure 6.8
(most power, ground and bias signals were not shown for clarity).
Linear CDR Example 71
Figure 6.8: Layout of the CDR circut
Linear CDR Example 72
In order to test the CDR a testbench was created. The testbench generates a random data signal at 5Gbit/s and that is used as the input to the CDR. All bias signals were controlled externally, namely the coarse tuning of the VCO and the biasing current for the current mirrors. The CDR was able to lock onto that data signal and re-time the data. Figure 6.9 shows the input and re-timed data signals. As can be seen, the CDR has accurately re-timed the data.
Figure 6.9: Waveforms of the input and re-timed data
Linear CDR Example 73
The jitter characteristics of the CDR are the most useful property in determining its performance.
The data was gathered by simulating the CDR circuit for 20 µ s. Both schematic and backannotated simulations were performed. The data was collected from Cadence and analyzed using
Matlab. The results of the jitter calculations are shown in Table 6.2.
Table 6.2: CDR jitter measurements
Schematic Back-Annotated
RMS jitter 0 .
95330 ps p-p jitter 7 .
3837 ps
0
3
.
.
40105
8541 ps ps
Another important characteristic of the CDR is the phase detector gain. Ideally the phase detector will have perfectly linear gain characteristics, however the ideal is never actually reached.
Figure 6.10 shows the actual gain of the Hogge phase detector. In order to get this data, phase errors were fed into the phase detector and the widths of the UP and DOWN signals were measured. It should be noted that the Error signal in Figure 6.10 behaves poorly for a phase offset greater than 90
◦
. This is as the XOR gate is being asked to generate pulses which are extraordinarily small, and it simply can’t do it. However, even in range of phase error where the phase detector loses linearity, it still applies the correct phase correction.
While the results of the jitter calculations certainly look impressive, they are not actual silicon results. As such they should not be looked upon as an accurate representation of the performance of the CDR. The important thing to get from these results is that the CDR is able to accurately lock to the data stream. If we take the jitter values in Table 6.2 and use Equation 2.9 to try to calculate the bit error rate, we end up with a BER in the range of 10 − 50 . A typical value of BER
Linear CDR Example 74
Figure 6.10: Simulated phase detector gain is in the range of 10 − 12 , hence the calculated value is very unrealistic, and indicates the difficulty of comparing jitter results from simulation to reality. All simulation results rely on models and it is impossible for them to take into account all non-idealities. In order to get realistic values for the jitter, silicon results are needed. For this reason the jitter values have not been compared to other work in literature, but are presented instead as an indication of the ability of the CDR to function properly.
The performance of the SCL logic can be best seen in the characteristics of the phase detector.
It can be seen from Figure 6.10 that the phase detector maintains its linear characteristic over a large part of the clock period. It breaks down at +90
◦ and − 90
◦ as the DFF is operating in the region where the transitions of the data signal occur at the same time as the DFF switches from sample mode to hold mode. It is very difficult for the SCL DFF to accurately sample and hold the signal which is switching while it is switching states itself. For this reason the DFFs are meant to operate well away from this region.
As the drive for integration moves forward, the demand for clock and data recovery (CDR) circuits implemented in standard CMOS which are able to operate at multi-Gbit/s data rates is increasing. Integration of the serial I/O onto the same die as the processing circuitry reduces both system complexity and system costs. While there is a desire to integrate CDR circuits using standard CMOS processes, MOS transistors are not ideal for designing the blocks needed for leading edge CDR circuits. Typically CDR circuits are implemented in processes such as
SiGe or GaAs, which use biploar transistors with higher gain than MOS transistors. A promising technology which can aid integration is source couple logic (SCL). Circuits implemented using source coupled logic are able to operate at higher frequencies than their static counterparts, and yet still be implemented in a standard CMOS process.
While SCL logic shows much potential, few designers have the experience needed to design optimized SCL circuits. For this reason a methodology is needed in order to provide the designer with a good understanding of the relationships and tradeoffs between the different parameters.
In this thesis, such a design methodology is proposed. The methodology is then used to optimize several SCL gates common to CDR circuits. Finally a full CDR circuit was designed, laid out and back-annotated simulations were performed in order to verify its performance.
75
Conclusions 76
The contributions of this thesis have been:
1. A design methodology has been introduced which can be used to optimize SCL gates for leading edge CDR circuits. This design methodology provides the designer with the information needed to effectively create circuits using SCL.
2. Using the design methodology, four logic gates implemented in SCL were individually optimized. These gates all operated well at 5GHz.
3. A 5Gbit/s CDR has designed and implemented using SCL logic gates. The SCL gates were optimized using the proposed design methodology. The CDR is based on a linear
Hogge phase detector and was designed and laid out in a 0.18
µ m standard CMOS process.
Simulation results show the CDR effectively locks to an incoming 5Gbit/s data signal.
In order to plot the figures and perform the analysis described in this thesis, the designer must be able to perform parametric simulation and output the results to a text file. As running of parametric simulations in Cadence is a standard procedure, it is assumed the designer is able to do this. However, outputting simulation results to a text file is something which is not normally done, hence the method used by the author is described here.
In order to output the simulation data to a text file, the ocnPrint command is used. The ocnPrint command is entered into the icfb command line interface. An example of the ocnPrint command is shown below.
ocnP rint ( V T (“ /V clock + ”) ?
output “ /work/V clock + .txt
”
?
precision 16 ?
numberN otation 0 none )
This command outputs the data collected for the signal V clock +. The data is output to the file
/work/V clock + .txt
. The ?
precision 16 option means that the data is output to a precision of
16 decimal places. The final option ?
numberN otation 0 none is used as it the most rapid output available under Cadence. There are other options for the ocnPrint command, and more detailed information is available through the Cadence help files.
77
Matlab was used to analyze the data gathered from Cadence. The the delay is calculated by comparing the zero crossing times between the input and output signals. The gain is calculated by finding the output swing and dividing this by the input swing.
The code in this section accepts data from Cadence files, then calculates and plots the gain and delay.
% T h i s c o d e i n p u t s tw o f i l e s from Cadence and u s e s t h e d a t a t o f i n d t h e d e l a y , and t h e n p l o t s t h e
% d a t a o n t o a 3D p l o t .
% T h i s c o d e a s s u m e s a p a r a m e t r i c s i m u l a t i o n h a s b e e n p e r f o r m e d w i t h two v a r i a b l e s b e i n g v a r r i e d ,
% i e : w i d t h and c u r r e n t .
s a m p l e s = 1 5 ∗ 2 4 %t h i s number d e p e n d s on t h e number o f p a r a m e t r i c s i m u l a t i o n s w h i c h a r e p e r f o r m e d
% F i l e I /O
% t h e s e commands ope n t h e f i l e s w h i c h c o n t a i n t h e d a t a from Cadence f i d V o u t p = fopen ( ’ Vout +. t x t ’ ) ; f i d V o u t n = fopen ( ’ Vout − . t x t ’ ) ;
78
Matlab Code
% e a c h i t e r a t i o n o f t h i s FOR l o o p g a t h e r s d a t a from one o f t h e p a r a m e t r i c s i m u l a t i o n s f o r a0 = 1 : s a m p l e s
% t h e s e commands a r e u s e d i n o r d e r t o e x t r a c t t h e d a t a
% t h e ’ t l i n e ’ commands a r e t h e r e du e t h e f o r m a t t i n g Cadence p u t s i n when t h e d a t a i s o u t p u t t l i n e = f g e t s ( f i d V o u t p ) ; t l i n e = f g e t s ( f i d V o u t p ) ; t l i n e = f g e t s ( f i d V o u t p ) ; t l i n e = f g e t s ( f i d V o u t p ) ; t l i n e = f g e t s ( f i d V o u t p ) ;
V o u t p i n = f s c a n f ( f i d V o u t p , ’%f % f ’ , [ 2 i n f ] ) ;
Voutp = V o ut p in ’ ; t l i n e = f g e t s ( f i d V o u t n ) ; t l i n e = f g e t s ( f i d V o u t n ) ; t l i n e = f g e t s ( f i d V o u t n ) ; t l i n e = f g e t s ( f i d V o u t n ) ; t l i n e = f g e t s ( f i d V o u t n ) ;
V o u t n i n = f s c a n f ( f i d V o u t n , ’%f % f ’ , [ 2 i n f ] ) ;
Voutn = V o u t n in ’ ;
79
%c l e a r v a r i a b l e s r 1 1 =0; t e m p d e l =0;
% c o u n t i s u s e d t o k e e p t r a c k o f t h e number o f z e r o − c r o s s i n g p o i n t s c o u n t = 1 ;
% T h i s FOR l o o p s e a r c h e s t h r o u g h t h e o u t p u t v o l t a g e s i g n a l s and l o o k s f o r t h e d i f f e r e n t i a l
% c r o s s i n g p o i n t s f o r b o t h r i s i n g and f a l l i n g e d g e s .
% Once i t f i n d s a c r o s s i n g p o i n t i t d o e s an i n t e r p o l a t i o n r o u t i n e t o f i n d t h e e x a c t
% t i m e w h e r e t h e c r o s s i n g t a k e s p l a c e f o r a1 = f l o o r ( length ( Voutp ( : , 1 ) ) / 2 ) : ( length ( Voutp ( : , 1 ) ) − 1 )
% t h i s s e c t i o n o f c o d e f i n d s i f t h e r e i s a z e r o c r o s s i n g a t t h e r i s i n g e d g e o f Vout+
% t h e c o d e l o o k s f o r w h e r e Vout + < Vout − a t t i m e t 0 , b u t Vout+ > = Vout − a t t i m e t 1 i f ( Voutp ( a1 , 2 ) < Voutn ( a1 , 2 ) ) i f ( Voutp ( a1 +1 ,2) > Voutn ( a1 + 1 , 2 ) )
% T h i s s e c t i o n o f c o d e p e r f o r m s an i n t e r p o l a t i o n i n o r d e r t o d e t e r m i n e t h e
% e x a c t t i m e w h e r e t h e z e r o c r o s s i n g o c c u r s
A = ( Voutp ( a1 ,2) − Voutn ( a1 , 2 ) ) / ( Voutn ( a1 +1,2) − Voutp ( a1 + 1 , 2 ) ) ; i f A < =1 d e l t a t i m e = 0 . 5 ∗ A ∗ ( Voutp ( a1 +1,1) − Voutp ( a1 , 1 ) ) ; e l s e d e l t a t i m e = ( Voutp ( a1 +1,1) − Voutp ( a1 , 1 ) ) −
0 . 5 ∗ ( Voutp ( a1 +1,1) − Voutp ( a1 , 1 ) ) / A ; end z e r o c r o s s i n g t i m e ( c o u n t ) = Voutp ( a1 , 1 ) + d e l t a t i m e ;
Matlab Code 80 c o u n t = c o u n t + 1 ;
% t h i s e l s e i f i s j u s t i n c a s e t h e z e r o c r o s s i n g h a p p e n s a t t h e s a m p l e d t i m e
% ( t h i s i s v e r y r a r e ) e l s e i f ( Voutp ( a1 +1 ,2) == Voutn ( a1 + 1 , 2 ) ) z e r o c r o s s i n g t i m e ( c o u n t ) = Voutp ( a1 + 1 , 1 ) ; c o u n t = c o u n t + 1 ; end ; end ;
% t h i s s e c t i o n o f c o d e f i n d s i f t h e r e i s a z e r o c r o s s i n g a t t h e f a l l i n g e d g e o f Vout+
% t h e c o d e l o o k s f o r w h e r e Vout + > Vout − a t t i m e t 0 , b u t Vout+ < = Vout − a t t i m e t 1 i f ( Voutp ( a1 , 2 ) > Voutn ( a1 , 2 ) ) i f ( Voutp ( a1 +1 ,2) < = Voutn ( a1 + 1 , 2 ) )
% T h i s s e c t i o n o f c o d e p e r f o r m s an i n t e r p o l a t i o n i n o r d e r t o d e t e r m i n e t h e
% e x a c t t i m e w h e r e t h e z e r o c r o s s i n g o c c u r s
A = ( Voutn ( a1 ,2) − Voutp ( a1 , 2 ) ) / ( Voutp ( a1 +1,2) − Voutn ( a1 + 1 , 2 ) ) ; i f A < =1 d e l t a t i m e = 0 . 5 ∗ A ∗ ( Voutp ( a1 +1,1) − Voutp ( a1 , 1 ) ) ; e l s e d e l t a t i m e = ( Voutp ( a1 +1,1) − Voutp ( a1 , 1 ) ) − 0 . 5 ∗ ( Voutp ( a1 +1,1) − Voutp ( a1 , 1 ) ) / A ; end z e r o c r o s s i n g t i m e ( c o u n t ) = Voutp ( a1 , 1 ) + d e l t a t i m e ; c o u n t = c o u n t + 1 ;
% t h i s e l s e i f i s j u s t i n c a s e t h e z e r o c r o s s i n g h a p p e n s a t t h e s a m p l e d t i m e
% ( t h i s i s v e r y r a r e ) e l s e i f ( Voutp ( a1 +1 ,2) == Voutn ( a1 + 1 , 2 ) ) z e r o c r o s s i n g t i m e ( c o u n t ) = Voutp ( a1 + 1 , 1 ) ; c o u n t = c o u n t + 1 ; end ; end ; end ;
% t h i s FOR l o o p f i n d s t h e o f f s e t s o f t h e s i m u l a t e d z e r o c r o s s i n g t i m e s w i t h t h e known i d e a l t i m e s f o r a2 = 1 : length ( z e r o c r o s s i n g t i m e ) i d e a l t i m e = a2 ∗ 100 e − 12 + 900 e − 12; d e l a y t i m e s ( a2 ) = z e r o c r o s s i n g t i m e ( a2 ) − i d e a l t i m e ; end ; end ;
% n e x t t h e a v e r a g e d e l a y i s f o u n d t h i s i t e r a t i o n o f t h e p a r a m e t r i c s i m u l a t i o n a v e r a g e d e l a y ( a0 ) = mean ( d e l a y t i m e s ) ;
% t h i s e n d s t h e f i l e IO s e c t i o n f c l o s e ( ’ a l l ’ ) ;
% t h e d e l a y t i m e s must b e p u t i n t o a m a t r i x , i n o r d e r t h a t t h e y may b e p l o t t e d
Matlab Code c o u n t = 1 ; f o r a = 1 : 1 5 f o r b = 1 : 2 4 d e l a y m a t r i x ( a , b ) = a v e r a g e d e l a y ( c o u n t ) ∗ 1 e 1 2 ; c o u n t = c o u n t + 1 ; end ; end ;
% t h i s c o d e p l o t s t h e v a l u e s o f t h e d e l a y s o n t o a 3D p l o t mesh ( d e l a y m a t r i x ) a x i s ( [ 1 2 4 1 1 5 0 1 2 ] ) s e t ( gca , ’ XTick ’ , [ 4 ; 8 ; 1 2 ; 1 6 ; 2 0 ; 2 4 ] ) s e t ( gca , ’ XTickLabel ’ , [ ’ 8 ’ ; ’ 16 ’ ; ’ 24 ’ ; ’ 32 ’ ; ’ 40 ’ ; ’ 48 ’ ] ) s e t ( gca , ’ YTick ’ , [ 1 ; 3 ; 5 ; 7 ; 9 ; 1 1 ; 1 3 ; 1 5 ] ) s e t ( gca , ’ YTickLabel ’ , [ ’ 1 ’ ; ’ 2 ’ ; ’ 3 ’ ; ’ 4 ’ ; ’ 5 ’ ; ’ 6 ’ ; ’ 7 ’ ; ’ 8 ’ ] ) s e t ( gca , ’ p r o j e c t i o n ’ , ’ p e r s p e c t i v e ’ ) a x i s t i g h t ;
% T h i s c o d e i n p u t s tw o f i l e s from Cadence and u s e s t h e d a t a t o f i n d t h e g a i n , and t h e n p l o t s t h e
% d a t a o n t o a 3D p l o t .
% T h i s c o d e a s s u m e s a p a r a m e t r i c s i m u l a t i o n h a s b e e n p e r f o r m e d w i t h two v a r i a b l e s b e i n g v a r r i e d ,
% i e : w i d t h and c u r r e n t .
s a m p l e s = 1 5 ∗ 2 4 %t h i s number d e p e n d s on t h e number o f p a r a m e t r i c s i m u l a t i o n s w h i c h a r e p e r f o r m e d
% F i l e I /O
% t h e s e commands ope n t h e f i l e s w h i c h c o n t a i n t h e d a t a from Cadence f i d V o u t p = fopen ( ’ Vout +. t x t ’ ) ; f i d V o u t n = fopen ( ’ Vout − . t x t ’ ) ;
% e a c h i t e r a t i o n o f t h i s FOR l o o p g a t h e r s d a t a from one o f t h e p a r a m e t r i c s i m u l a t i o n s f o r a0 = 1 : s a m p l e s
% t h e s e commands a r e u s e d i n o r d e r t o e x t r a c t t h e d a t a
% t h e ’ t l i n e ’ commands a r e t h e r e du e t h e f o r m a t t i n g Cadence p u t s i n when t h e d a t a i s o u t p u t t l i n e = f g e t s ( f i d V o u t p ) ; t l i n e = f g e t s ( f i d V o u t p ) ; t l i n e = f g e t s ( f i d V o u t p ) ; t l i n e = f g e t s ( f i d V o u t p ) ; t l i n e = f g e t s ( f i d V o u t p ) ;
V o u t p i n = f s c a n f ( f i d V o u t p , ’%f % f ’ , [ 2 i n f ] ) ;
Voutp = V o u t p in ’ ; t l i n e = f g e t s ( f i d V o u t n ) ; t l i n e = f g e t s ( f i d V o u t n ) ; t l i n e = f g e t s ( f i d V o u t n ) ; t l i n e = f g e t s ( f i d V o u t n ) ;
81
Matlab Code 82 t l i n e = f g e t s ( f i d V o u t n ) ;
V o u t n i n = f s c a n f ( f i d V o u t n , ’%f % f ’ , [ 2 i n f ] ) ;
Voutn = V o u t n in ’ ; end ;
% T h i s c o d e f r a g m e n t c a l c u l a t e s t h e o u t p u t v o l t a g e s w i n g
% o n l y t h e f i n a l 7 5 % o f t h e o u t p u t i s e x a m i n e d i n o r d e r t o l e t a l l t r a n s i t s d i e o u t
V p = Voutp ( : , 2 ) ;
V n = Voutp ( : , 2 ) ; l e n g t h p = length ( V p ) ; l e n g t h n = length ( V n ) ;
V o l t a g e s w i n g ( d1 ) = max ( V p ( f l o o r ( l e n g t h p ∗ 0 . 7 5 ) : l e n g t h p ) ) − min ( V n ( f l o o r ( l e n g t h n ∗ 0 . 7 5 ) : l e n g t h n ) ) ;
% t h i s e n d s t h e f i l e IO s e c t i o n f c l o s e ( ’ a l l ’ ) ;
% t h e o u t p u t v o l t a g e s w i n g s a r e c o n v e r t e d t o g a i n v a l u e s and p u t i n t o a m a t r i x , i n o r d e r t h a t t h e y may b e p l o t t e d i n p u t s w i n g = 0 . 7 ; c o u n t = 1 ; f o r a = 1 : 1 5 f o r b = 1 : 2 4 g a i n m a t r i x ( a , b ) = V o l t a g e c o u n t = c o u n t + 1 ; end ; s w i n g ( c o u n t ) ∗ i n p u t s w i n g ; end ;
% t h i s c o d e p l o t s t h e v a l u e s o f t h e g a i n o n t o a 3D p l o t mesh ( g a i n m a t r i x ) a x i s ( [ 1 2 4 1 1 5 0 1 2 ] ) s e t ( gca , ’ XTick ’ , [ 4 ; 8 ; 1 2 ; 1 6 ; 2 0 ; 2 4 ] ) s e t ( gca , ’ XTickLabel ’ , [ ’ 8 ’ ; ’ 16 ’ ; ’ 24 ’ ; ’ 32 ’ ; ’ 40 ’ ; ’ 48 ’ ] ) s e t ( gca , ’ YTick ’ , [ 1 ; 3 ; 5 ; 7 ; 9 ; 1 1 ; 1 3 ; 1 5 ] ) s e t ( gca , ’ YTickLabel ’ , [ ’ 1 ’ ; ’ 2 ’ ; ’ 3 ’ ; ’ 4 ’ ; ’ 5 ’ ; ’ 6 ’ ; ’ 7 ’ ; ’ 8 ’ ] ) s e t ( gca , ’ p r o j e c t i o n ’ , ’ p e r s p e c t i v e ’ ) a x i s t i g h t ;
[1] A. Hajimiri, Sotirios Limotyrakis, and T. H. Lee. “Jitter and Phase Noise in Ring Oscillators”.
IEEE J. Solid-State Circuits , 34:790–804, June 1996.
[2] Jan M. Rabaey.
“Digital Integrated Circuits: A Design Perspective” . Prentice Hall, 1996.
[3] S. Kiaei, S.-H Chee, and D. Allstot. “CMOS Source-Coupled Logic for Mixed-Mode VLSI”.
Custom Integrated Circuits Conference , pages 1608 –1611, May 1990.
[4] D.J. Allstot, G. Liang, and H.C. Yang. “Current-Mode Logic Techniques for CMOS Mixed-
Mode ASICs”.
IEEE International Symposium on Circuits and Systems , pages 25.2/1–
25.2/4, May 1991.
[5] J. Kundan and S.M.R. Hasan. “Enhanced Folded Source-Coupled Logic Technique for Low-
Voltage Mixed-Signal Integrated Circuits”.
IEEE Transactions on Circuits and Systems II ,
47:810 –817, 2000.
[6] M. Yamashina and H. Yamada. “An MOS Current Mode Logic (MCML) Circuit for Low-
Power Sub-Gigahertz Processors”.
IEICE Trans. Electron.
, E75-C:11811187, October 1992.
[7] S. Butala Anand and B. Razavi. “A CMOS Clock Recovery Circuit for 2.5-Gb/s NRZ data”.
IEEE J. Solid-State Circuits , 36:432–439, March 2001.
83
Bibliography 84
[8] L. Wu, H. Chen, S. Nagavarapu, R. Geiger, E. Lee, and W. Black. “A Monolithic 1.25Gbits/sec CMOS Clock/Data Recovery Circuit for Fibre Channel Transceiver”.
Proc IEEE Int.
Symposium on Circuits and Systems , 2:565–568, 1999.
[9] H. Nosaka, K. Ishii, T. Enoki, and T. Shibata. “A 10-Gb/s Data-Pattern Independent Clock and Data Recovery Circuit With a Two-Mode Phase Comparator”.
IEEE J. of Solid-State
Circuits , 38(2):192–197, February 2003.
[10] Joonsuk Lee and Beomsup Kim. “A Low-Noise Fast-Lock Phase-Locked Loop With Adaptive
Bandwidth Control”.
IEEE J. of Solid-State Circuits , 35(8):1137–1145, August 2000.
[11] ed. B. Razavi.
“Monolithic Phase-Locked Loops and Clock Recovery Circuits” . IEEE Press,
Piscataway, NJ, 1996.
[12] J. Savoj and B. Razavi.
“High-Speed CMOS Circuits for Optical Receivers” . Kluwer, 2001.
[13] Jafar Savoj and B. Razavi. “Design of Half-Rate Clock and Data Recovery Circuits for
Optical Communication Systems”. In Design Automation Conference , pages 121–126, 2001.
[14] J. D. H. Alexander. “Clock Recovery from Random Binary Data”.
Electronics Letters ,
11:541–542, October 1975.
[15] Rick Walker. “Clock and Data Recovery for Serial Digital Communication”.
Hewlett-Packard
Company .
[16] T. Lee.
“The Design of CMOS Radio Frequency Integrated Circuits” . Cambridge University
Press, 1998.
[17] J.C. Scheytt, G. Hanke, and U. Langman. “A 0.155, 0.622, and 2.488 Gb/s Automatic Bit
Rate Selecting Clock and Data Recovery IC for Bit Rate Transparent SDH-Systems”.
IEEE
International Solid-State Circuits Conference , pages 348–349, February 1999.
Bibliography 85
[18] T. Weigandt.
“Low-Phase-Noise, Low-Timing-Jitter Design Techniques for Delay Cell Based
VCOs and Frequency Synthesizers” . PhD thesis, UC Berkeley, 1998.
[19] Justin Redd. “Synch and Clock Recovery - An Analog Guru Looks at Jitter”.
2001 , www.planetanalog.com/story/OEG20010827S0037.
[20] T. H. Lee and A. Hajimiri. “Oscillator Phase Noise: A tutorial”.
IEEE J. Solid-State
Circuits , 35:326–336, March 2000.
[21] Agilent Technologies.
“Jitter Analysis Techniques for High Data Rates”.
2003 , cp.literature.agilent.com/litweb/pdf/5988-8425EN.pdf.
[22] Maxim Integrated Products. “Converting between RMS and Peak-to-Peak Jitter at a Specified BER”.
2000 , pdfserv.maxim-ic.com/arpdf/AppNotes/3hfan402.pdf.
[23] J. Hughes, J. Coughlin, R. Harbott, T. van Den Hurk, and B. van de Bergh. “A 3Gb ECL multiplexer”.
IEEE International Solid-State Circuits Conference , 22:40–41, February 1979.
[24] P. Gray, P. Hurst, S. Lewis, and R. Meyer.
“Analysis and Design of Analog Integrated
Circuits” . John Wiley Sons, New York, 2001.
[25] Johns and Martin.
“Analog Integrated Circuit Design” . John Wiley Sons, New York, 1997.
[26] Stephen Docking. “A Method to Derive an Equation for the Oscillation Frequency of a Ring
Oscillator”. Master’s thesis, University of Waterloo, 2002.
[27] M. Alioto, G. Palumbo, and S. Pennisi. “Delay estimation of SCL gates with output buffer”.
In IEEE International Conference on Circuits and Systems , volume 2, pages 719 –722,
September 2001.
[28] Massimo Alioto and Gaetano Palumbo. “Design Strategies for Source Coupled Logic Gates”.
IEEE Transactions on Circuits and SystemsI , 50:640–654, May 2003.
Bibliography 86
[29] M. Anis, M. Allam, and M. Elmasry. “Impact of technology scaling on CMOS logic styles”.
Circuits and Systems II , 49(8):577–588, August 2002.
[30] M. Anis and M. Elmasry. “Power reduction via an MTCMOS implementation of MOS current mode logic”.
ASIC/SOC Conference , pages 193–197, September 2002.
[31] B. Razavi. “Challenges in the Design of High-Speed Clock and Data Recovery Circuits”.
IEEE Communications Magazine , 40(8):94–101, August 2002.
[32] C. Hogge. “A Self Correcting Clock Recovery Circuit”.
Journal of Lightwave Technology ,
3(6):1312–1314, December 1985.
[33] Hanjun Jiang, Chengming He, Degang Chen, and R. Geiger. “Optimal Loop Parameter
Design of Charge Pump PLLs for Jitter Transfer Characteristic Optimization”.
Midwest
Symposium on Circuits and Systems , 1:344–347, 2002.
[34] K. Kishine, N. Ishihara, K. Takiguchi, and H. Ichino. “A 2.5-Gb/s Clock and Data Recovery
IC With Tunable Jitter Characteristics for use in LANs and WANs”.
IEEE J. Solid-State
Circuits , 34:805–812, June 1999.
[35] M. Ramezani and C.A.T. Salama. “Jitter Analysis of a PLL-based CDR With a Bang-Bang
Phase Detector”.
Midwest Symposium on Circuits and Systems , 3:393–369, 2002.
[36] Yuriy M. Greshishchev. “Clock and Data Recovery ICs for SONET Application”.
VLSI
Circuits Symposium Short Course , June 2000.
[37] D. Ham and A. Hajimiri. “Concepts and Methods in Optimization of Integrated LC VCOs”.
IEEE J. Solid-State Circuits , 36:896–909, June 2001.
[38] M. Tiebout. “Low-power low-phase-noise differentially tuned quadrature VCO design in standard CMOS”.
IEEE J. Solid-State Circuits , 36:1018–1024, 2001.