Serializer and Deserializer (SerDes) for High Speed Serial

advertisement
1
SerDes Transceivers for High-speed Serial Communications
Dianyong Chen, Shoujun Wang, and Tad Kwasniewski
1
Introduction
In order to process and redistribute digital information, data
are constantly exchanged between different systems and also
between different functional blocks inside a system. Serial
communications and parallel communications currently and
historically coexist and serve various requirements of
intrasystem and intersystem data exchange. In parallel
communications several symbols are sent at one time over a
communications link, while in serial communications only one
symbol is sent at one time. The choice of one method over
another is usually a tradeoff on factors such as speed, cost of
materials, power consumption, and difficulty of physical
realization. In modern telecommunication systems and
computer systems, paralell communications and serial
communications are often used simulataneously. Therefore, it
is an important task to serialize and deserialize data stream.
2
SerDes Transceiver for High-Speed
Serial Communications
In principle, parallel communications are instrinsically faster
than serial communications, because the speed of a parallel
data link is equal to the number of symbols sent in parallel
times the symbol rate of each individual path; doubling the
number of symbols sent at once doubles the data rate. For this
reason, parallel commuications are widely used in internal
buses of integrated circuits and short distance chip to chip
links. However, contradicting to superficial instincts, parallel
communications are being replaced by serial communications
in high-speed data links. These links include chip to chip
communications on backplanes, computer networks, computer
peripheral buses, long-haul communications, and etc. A
conventional reason to choose serial communications instead
of parallel communications is cost. In long-haul
communication systems, copper cables or optical fibers are
expensive; doubling the number of cables or fibers doubles the
cost. In chip to chip communications, paralell data links
require more pins, which increases the cost of packaging.
According to [1] packaging already represents 25% of the total
system cost in some electronic products. Nevertheless, the
main challenges that deprecate parallel communications in
these applications are clock skew [2], [3], data skew, and
crosstalk. Skew is the difference in arrival time of symbols
transmitted at the same time. Symbols are basically
electromagnetic pulses. Because no electromagentic wave can
travel faster than the free space light, the time it takes for a
signal to travel from the transmitter to the receiver is
determined by the length of the electrical trace or optical trace
and the group velocity of the signal. Although the difference
of arrival time of signals along different pathes is usually very
small, it can lead to considerable phase difference in high-
speed data links, since the frequency is very high. For example,
1 centimeter path difference causes 240 degree phase
difference for 10 GHz clock signals traveling with a velocity
that is a half of the free space light speed. Capacitive coupling,
components delay, and process, voltage, temperature (PVT)
variation deteriorate clock skew and data skew. Clock skew
can be corrected by delay-locked-loop (DLL) composed of a
variable delay line (VDL) and a control loop due to the
periodical nature of the clock signal [4]. In principle, data
skew can also be corrected. However, due to the large number
of links and analog nature of the received signals, data skew is
much more troublesome in parallel data links. As a
consequence the system has to slow down to wait for the path
with the largest delay. Crosstalk is the interference between
adjacent data links. When data rate and the number of links
increase, crosstalk also tends to increase. In addition,
connectors and vias break the continuity of electromagnetic
fileds, and increase the chance of crosstalk [5].
Therefore, in high-speed data links, serial communications
are replacing parallel communications rapidly. High-speed
serial data links include backplane links such as PCI express,
computer networking such as ethernet, computer to peripheral
devices such USB, multimedia interface such as HDMI,
computer to storage interface such as serial ATA, serial
attached SCSI, higsh speed telecommunications such as
SONET and SDH.
On the other hand, internal buses of integrated circuits and
short distance chip to chip data links use parallel
communications to increase data transfer rate and signal
processing speed. In addition, massive data are usually stored
in slow devices such as RAM; they have to be accessed in
parallel to achieve high-speed. A SerDes transmitter serves to
transmit those parallel data to the receiver through a highspeed serial data link; the SerDes receiver receives data from
the serial data link and delivers parallel data to next stage
electronic circuits for further signal processing. A simplified
SerDes transceiver is shown in Fig.2.1.
Physical
Channel
Data
Source
Source
Encoder
Channel
Encoder
Tx Buffer
PISO
Clock
Physical
Channel
Rx Buffer
Bit
Detection
SIPO
Channel
Decoder
Source
Decoder
Data Sink
Clock
Fig. 2.1. A block diagram of a simplified SerDes transceiver
The data source is basically a set of user information to be
transmitted. It may be files, or audio video streams, etc. The
2
data source usually has some kind of paralellism and the unit
is often byte, or word, or double-word. Before transmssion,
source encoding is usually performed. Tasks in the source
encoding include framing, synchronization (SYNC) patterning,
forward error correction (FEC) encoding and etc. These tasks
sometimes involve very complicated algorithms and
paralellism must be used to increase processing speed. The
channel encoding is usually arranged in such a way that after
channel encoding the data spectrum becomes a better fit to the
physical channel and the bit detection in the receiving end
becomes easier. For example, the prevailing 8B/10B channel
coding achieves a maximum run length of 5 and maximum
digital sum variation of 6 [6]. The output of the channel
encoder is then fed to a parallel-in-serial-out (PISO) block to
generate a serial symbol stream. Symbols in the stream may be
binary or multilevel. The representation of symbols is
sometimes called line code, or signaling, or channel pulse
modulation. Typical binary line codes are non-return-to-zero
(NRZ), non-return-to-zero-inverted (NRZI), Manchester code
(return-to-zero), etc. Typical multilevel line codes are duobinary (3 levels), 4 level pulse amplitude modulation (PAM-4),
etc. The serial symbol stream is sent to a transmitter driver, or
Tx buffer to convert to proper electrical pulses or optical
pulses that can travel through the physical channel to the
receiving end. The PSIO and Tx buffer are usually controlled
by a symbol rate (or baud rate) clock. In very high-speed serial
data links, this clock is generated by a clock multiplier unit
(CMU) out of a crystal oscillator. In some applications, a Tx
equalizer (pre-emphasis) is implemented before the Tx buffer
to counteract some channel impairments.
There are many sources of impairments in the physical
channel. Those impairments include frequency dependent
attenuation, frequency dependent delay (dispersion), crosstalk,
reflection, and noise. Those impairments may be nonlinear
and time-variant. However, in engineering practice, a linear
time-invariant mathematic model describing the transceiver is
still very useful. Fig.2.2 shows a linear mathematic model of
SerDes Transceivers.
noise
n(t)
ak
c(t)
f(t)
Pulse Modulator
Transmission Line
S
r(t)
rk
Fig. 2. 2. A linear mathematical model describing the transceiver
The input data stream ak is the output of the PISO. It is a
digital sequence which is discrete both in time and amplitude.
When NRZ signaling is used, ak {0,1} or ak {-1,+1}. The
sampling rate is the baud rate fbaud or clock frequency. The
received signal r(t) is a continuous time analog signal.
r (t ) 

 a h(t  iT )  n(t )
i  
i
(2.1)
where h(t) is the time domain channel impulse response. Here
the channel includes the pulse shaping modulator of the
transmitter c(t), the physical channel response f(t), and the Tx
equalizer response and Rx equalizer response if they are
involved. The receiver use a clock to sample the received
signal and make a decision to know what is received.
rk  r (kT   ) 

 a h(k  i)T     n(kT   )
i  
i
(2.2)
The input data sequence ak is usually quite random. Assume
there is no additive noise n(t), the received signal rk can be
rewritten as
 k 1  
rk  ak h( )       ai h(k  i )T   
 i  i k 1 
(2.3)
It can be seen from equation (2.3) that the sampling phase 
is very important. If we sampled at a phase where h( ) is zero,
it would be very difficult to recover data. It is desirable to
sample at where h( ) is at its maximum. The second part of
equation (2.3) depends on the sequence before and after ak. It
is rather unpredictable because the input sequence is random.
This part is usually called inter-symbol-interference (ISI). It
has been shown that if the channel response is a Nyquist N-1
pulse, the ISI can be eliminated completely [7]. A Nyquist N-1
pulse is defined as
1, when n  0
h(nT   )  
0, when n  0
(2.4)
where n is an integer. However, the response of a physical
channel is usually not a Nyquist N-1 pulse, especially when
there is little exess bandwidth. The task of an equalizer is to
shape the combinational channel response to a Nyquist N-1
pulse. Unfortunately, this is not always possible if noise is
taken into account. A Nyquist N-1 pulse has a flat spectrum
across the passband. The additive noise is somewhat white. A
physical channel response however tends to have large
attenuation at high frequency. It even has zeros at some
frequencies because of reflection and resonance of parasitics.
If the target channel response was a Nyquist N-1 pulse, the
equalizer would have enormous gain at frequencies with large
attenuation (or even infinite gain at zeros). As a consequence,
the noise is boosted much more than the signal at those
frequencies. This is called noise enhancement. In practice,
linear least-mean-square (LMS) equalization method or
nonlinear decision feedback equalization (DFE) is used.
Ironically there is no signal power at the clock frequency,
although the sampling clock is so important to data recovery.
Most high-speed serial data links are baseband system despite
of the very high baud rate. Fiber optical systems such as
SONET are indeed modulated system. However, its
modulation method and demodulation method are noncoherent. It is usually treated as a baseband system. In Fig.2.2
the pulse shaping modulator is usually a hold function of one
symbol period. The power spectral density (PSD) of the
received signal is expressed as
Pr  (e jT )  C (e jT )  F (e jT )
2
(2.5)
3
where A is the PSD of the input sequence, C and F are the
Fourier Transform of c(t) and f(t). Since c(t) is just a hold
function of one symbol period. Its Fourier Transform is
C (e jT )  T  sin c(fT ) 
1
f baud
 sin c(
f
f baud
)
(2.6)
Equation (2.5) and (2.6) show that the received signal power
is zero at integer multiples of the clock frequency. Therefore,
it is not possible to recover the clock information with linear
method. However, in high-speed serial data links, there is no
exclusive clock signal. The clock signal has to be derived from
the received data waveform.
A clock and data recovery (CDR) circuit is a nonlinear
circuit that can recover the clock from the received data
waveform which has zero power at the clock frequency. The
tasks of a CDR are roughly the follows [8]:

Generate a clock whose frequency is the baud rate.

Adjust the phase of the clock so that it can sample the
received data waveform at the time instants when the
waveform has maximum signal to noise ratio (SNR).

Recover data with high accuracy in the presence of noise
and ISI.
Since there is no signal power at the clock frequency and its
harmonics, a CDR can lock to the clock frequency or its
harmonics. It does not know which one is correct if without
some preknowledge of the data sequence. One method to
prevent this is standardization. For example, a single rate CDR
for SONET STS-48 should lock to 2488.32 MHz. Therefore, a
predefined reference clock can be used. Another method is to
have preknowledge of the shortest runlength and the longest
runlength of the data source. A counter is used to detect the
runlength of a sequence. If the detected shortest runlength is
shorter than the predefined shortest runlength, the clock is
slower than the data rate; if the detected longest runlength is
longer than the predefined longest runlength, the clock is
faster than the data rate [9].
Implementation of SerDes transceiver in monolithic
microelectronics integrated circuits has much more to consider
than the issues mentioned above. In this chapter, we review
the representative implementations of main building blocks of
SerDes transceivers for high-speed serial data links.
3
Design challenges
It is desirable to implement SerDes transceiver in
mainstream CMOS technologies because of their low cost and
low power consumption. CMOS circuits are typically slower
than circuits implemented in non-mainstream technologies
[10]. Although the speed of CMOS circuits is constantly
getting faster and the power-consumption becomes lower
when scaling down, new circuit styles and architectures
enabling low power and high-speed are still very critical for
high-speed serial data links.
SerDes transceivers are predominantly mixed signal circuits.
With the scaling down of feature size, the supply voltage and
threshold voltage drop accordingly, leaving less voltage
headroom for signal processing. In the meantime, substrate
noise tends to increase [11]. Furthermore, most recent
technologies are usually optimized for digital baseband signal
processing and lack of accurate models for analog signal
processing. Therefore, mixed signal processing is usually
vulnerable to PVT variations. Many recent attempts in SerDes
transceiver design are to replace analog blocks with digital
ones to make the transceiver more robust against PVT
variations and noise.
4
Circuit Styles
A high-speed SerDes transceiver is a mixed-signal system as
shown in Fig.2.2. On the one hand, it takes a digital sequence
ak from the host and passes a digital sequence rk to the client,
performing digital signal processing when it is necessary; On
the other hand, it must be able to drive the physical channel.
The physical channcel distorts and adds noise to the digital
signals that travel through it. The received signals become
analog. Data recovery relies on a locally regenerated clock and
proper sampling. In addition, a high-speed SerDes transceiver
is usually a sub-system of a large system, or/and it is used for
portable devices. Thus low-power consumption is very critical.
Therefore, circuit style must be tailored to meet these
requirements.
4.1
High-Speed and Low-Power Digital
Circuits
In a high-speed SerDes transceiver, not only source
encoder/decoder and channel encoder/decoder are digital
circuits, but also phase detector, phase frequency detector, and
frequency divider are usually digital ones. Some of these
digital circuits run at baud-rate. Therefore, digital circuits
must be sufficiently fast while keeping power consumption
sufficiently low.
The main factor that limits the speed of a digital circuit is the
capacitive load and parasitic capacitive load. The voltage
change on a capacitor is be written as
V 
I  T
Q 1 t0  T
 
i (t )  dt  a
t
C
C 0
C
(4.1)
where C is the total load capacitance, T is the time it takes to
change the voltage on the load capacitor with an amplitude of
V, and Ia is the average current to charge or discharge the
load capacitor. To make a circuit fast, the time T must be
sufficiently small.
T 
V  C
Ia
(4.2)
It is quite straightforward how to make a circuit faster: to
reduce the voltage swing V, or/and make the load
capacitance C smaller, or/and increase the charging current Ia.
A large logic family exploits these fundamental methods to
make digital circuit faster. For example, Pseudo-nMOS and
Domino-Logic exclude pMOS capacitance from the input,
because pMOS input capacitance is usually 23 times as large
4
as nMOS input capacitance if they provide the same current.
Technology scaling down reduces capacitance as well.
4.1.1
Static Rail-to-Rail CMOS Logic
Static CMOS rail-to-rail logic is by far the most commonly
used type of logic ciruit. Despite the very high speed, CMOS
rail-to-rail logic is still extensively used in some high-speed
SerDes transceivers. The reasons are technology scaling down,
reduced power supply voltage, and simplicity, maturity, and
robustness of static CMOS logic. Static CMOS logic has a
pull-up network and a pull-down network. At any time except
transitions, either pull-up network is turned on to pull the
output to the power supply voltage or pull-down network is
turn on to pull down the output to ground. Since pull-up and
pull-down can not be turned on simultaneously except during
transitions, in principle satic CMOS logic consumes zero static
power. Therefore, static CMOS logic exhibits extremely low
power consumption at low frequency applications.
The speed and power consumption of static CMOS logic in
high speed applications can be roughly estimated by using two
serially connected inverters as shown in Fig.4.1.
VDD
VDD
P1
Vin
Vo1
P2
IP1
RP1
Vo2
Parasitic 1
N1
N2
IN1
Cgp2
Cgn1
Parasitic 2
RN1
Fig.4.1. Two serially connected inverters and the equivalent circuit of the first
stage
Assume the initial input signal Vin is high, thus pMOS
transistor P1 is switched off, nMOS transistor N1 is switched
on, and the voltage Vo1 is low. Let’s further assume the input
signal has very sharp edges and sufficiently large driving
capability, then when Vin jumps from high to low, the time it
takes to turn on P1 and turn off N1 is negligible. The voltage
Vo1 is pulled up to VDD through transistor P1. However, it can
not change abruptly as it has to drive the gate capacitance of
P2 and N2, as well as parastic capacitance from the four
transistors. When Vo1 increases, VDS of transistor P1 reduces
until the channel is not pinched off. Transistor , P1 falls into
triode region and the charging current reduces. When Vo1
reaches VDD, the energy (W) stored in the gate capacitors and
parasitic capacitors (Ctotal) is
1
1 I  T
W  Ctotal  VDD 2  
2
2 Ctotal
2
a
2
(4.3)
Since VDD charges the capacitors through the channel of P1,
some engergy is consumed by the channel resistance. When
the input Vin goes from low to high, pMOS transistor P1 is
switched off, and nMOS transistor N1 is switched on. Assume
this process is sufficiently fast, then power supply VDD is cut
off from the capacitors abruptly so that the power
consumption caused by short-circuit effect is negligible, and it
provides no energy during the process of discharging the
capacitors. However, the stored energy W is completely
consumed by the channel resistance of transistor N1 when the
capacitors are discharged to the ground. The energy
consumption for an input cycle is the sum of W and the energy
dissipated in charging process. The average power
consumption of an inverter can be estimated as
P  VDD  I a  T  f  Ctotal  VDD 2  f
(4.4)
Some conclusions can be drawn on the basis of the simple
analysis. Firstly, static CMOS logic has to drive the input
capacitance of the pull-up network and the input capacitance
of the pull-down network simultaneously. The pull-up
network is composed with pMOS transistors and has larger
capacitance. According to equation (4.2), this slows down the
circuit because of the large capacitance; Secondly, static
CMOS realizes a rail-to-rail output. According to equation
(4.2), this also slows down the circuit because of the large
swing. According to equation (4.4), this greatly increases the
power consumption because of the large swing; Thirdly,
according to equation (4.4), static CMOS logic consumes
much power at high frequencies because the power
consumption is proportional to switching frequency. Last but
not least, static CMOS logic is sensitive to common mode
noise because it is not differential. Therefore, high-speed
CMOS digital design favors current mode logic (CML).
An important observation from the two serially connected
inverters is that the output of the first inverter has finite slew
rate. This is different from our previously assumption that the
input to an inverter has very sharp edges and infinite driving
capability. Therefore, the second inverter will not switch
instaneously, and additional delay is added. This realistic
consideration applies to all digital circuits.
4.1.2
CML Logic
CMOS CML logic is based on differential pairs which is
shown in Fig.4.2. At the first glance, we find there are no
pMOS transistors. Therefore, it is potentially faster than static
CMOS logic. It is fully differential. Therefore, it has excellent
immunity to common mode noise. When the input voltage vin
is sufficiently large, one of the two branches can be switched
off, while the other takes all the tail current I0. The minimum
input voltage can be derived using the following equations.
I1, 2 
Cox  W 
2
  Vgs1, 2  Vth 
2 L
2
(4.5)
I1  I 2  I 0
(4.6)
Vin  Vgs1  Vgs 2
(4.7)
Solving equation (4.5)(4.7) leads to an expression of I1 (or
I2) [F.3]. The minimum voltage that can fully switch the
differential pairs is given when this current is equal to I0. It is
min( Vin ) 
2I 0
Cox (W / L)
The voltage swing is
(4.8)
5
V  V (i  0)  V (i  I 0 )  RI 0
(4.9)
The voltage swing is the product of the load resistance and
the tail current. Therefore, it is possible to reduce the voltage
swing to improve the speed of the circuit. However, excessive
reduction of voltage swing reduces the noise margin. In
addition, it may not be able to fully switch the next stage
differential pairs.
P  VDD  I 0
Obviously CML logic consumes static power. However, to the
first order estimation, the power consumption is independent
on frequency. Therefore, CML is suitable for high frequency
applications in terms of speed and power consumption.
4.2
VDD
R
I2
I0
R
1
Vout
I1
I2
N1
N2
0.5
Vin
I0
Vin
Fig.4.2. Differential pairs in CMOS CML Logic
Similar to static CMOS logic, the speed and power
consumption of CMOS CML logic can be estimated by using
two serially connected inverters. We still assume the input to
the first inverter has very sharp edges and sufficient driving
capability, then to the first order approximation, the output of
the first inverter is essentially a step response of charging or
discharging a capacitor with a current source of finite internal
impedance. The change of the output voltage in a branch is
written as
Vo1, 2 (t )   RI 0  (1  e

t
RC
)
(4.11)
Driving Circuits and Impedance
Matching
Generally a high-speed SerDes transceiver must drive a
high-speed channel that is usually much longer than the
representative wavelength of the signal. If the impedance of
the driver does not match the characteristic impedance of the
channel, the driver is unable to provide maximum power to the
channel because of reflection in the transmitter side; If the
characteristic impedance of the channel does not match the
impedance of the terminal, the channel is unable to deliver
maximum power to the terminal because of reflection in the
receiver side. If there is mismatch in both sides, then some
energy reflected from the terminal will experience another
reflection in the transmitter side. It tooks some time (T) for
this energy to complete this round-trip and suffer some loss.
When it comes back, it addes to the signal that is sent at T
later. Therefore, impedance matching is important to highspeed SerDes transceivers. However, this increases power
consumption because practical channels have low impedance.
Industrial efforts on these topics lead to many standards. Low
voltage differential signaling (LVDS) and CML are the two
most popular standards.
VDD
VDD
V-
V+
V+
V-
VDD
(4.10)
where R is the load resistance and C is the load capacitance of
the first inverter. Fast switching only relies on small RC.
However, to maintain the voltage swing, the tail current I0 has
to increase. The speed of differential pairs can be enhanced by
using inductive peaking technique. Physically this is quite
straightforward. When all tail current is switched to one arm,
the additional current is provided by the load resistor and the
load capacitor. Since the voltage on a capacitor can not change
abruptly, at the very beginning the additional current is almost
sololy provided by the load capacitor and the output voltage
changes quickly. However, when the output voltage drops, the
current provided by the load resistor increases and the current
provided by the load capacitor decreases. The slew rate of the
output voltage drops. Inductive peaking technique connects an
inductor and the load resitor in cascade. The nature of an
inductor is that it always tries to hold back the change of
current. Therefore, the current provided by the load resistor
decreases and the current provided by the load capacitor
increase, which helps to quickly drain off the charges stored
on the load capacitor, leading to fast change of the output
voltage.
The power consumption of a CMOS CML inverter to the first
order can be estimated in a quite simple way.
Receiver
I0
Ib
Fig.4.3. A diagram of representative LVDS signaling
The main advantage offered by LVDS is its low voltage
swing of 250–400 mV, which allows for high-speed interface
operation at a very low level of power consumption. In
addition, true differential signalling increases the interface’s
tolerance to ground mismatch between transmitter and
receiver. It also improves signal EMI immunity and
compliance [E.1]. Fig.4.3 shows a representative LVDS
configuration. In the transmitter side, the driver is configured
in a push-pull topology. Matched impedance is added in front
of the receiver buffer. In high-speed SerDes transceivers, the
signals traveling through the channels are broadband signals.
It becomes harder and harder to achieve full band impedance
matching when parasitics are considered. Therefore,
impedance matching in only one side can not guarantee small
reflection in very high speed. For this reason, impedance
matching is desirable in both the transmitter side and the
receiver side, and the topology of LVDS signaling becomes
very similar to CML signaling.
6
A respresentative configuration of CML signaling is shown
in Fig.4.4. Impedance matching is added to both the
transmitter and the receiver. Since CML consumes static
power, it is quite popular to switch off the driver if not in use.
VDD
R
VDD
R
R
R
VDD
RL
RL
Vin
I0
I1
logic circuits can used for those stages not requiring very
high-speed.
Shift Register is very useful for PISO and SIPO with a large
ratio. In PISO parallel data are loaded to shift registers when a
selecting signal is enabled. The parallel data are shifted out in
every baud clock cycle when the selecting signal is disabled.
In SIPO, serial incoming data are sampled and shifted in every
baud rate clock. A lower frequency clock is used to sample the
output of each register. Therefore, the outputs of all registers
are synchronized.
In order to make PISO and SIPO work, high-frequency
clock (baud-rate clock) and low-frequency clock (clock for
parallel data) are needed. Therefore, a frequency divider and a
clock multiplier are needed.
Fig.4.4 A block diagram of CML signaling
If channel impairments are negligible and clock and data can
be recovered without receiver equalization, high-speed SerDes
receiver buffers can be constructed using nonlinear amplifiers
such as limiting amplifiers and sense-amplifiers. If receiver
equalization is needed, then linear amplifiers are wanted.
5
PISO and SIPO
As discussed in section 2, in the transmitter side, user data
are in parellel, a PISO block is needed to convert these parallel
data into serial ones to make it possible to transmit them via a
high-speed channel. In the receiving end, a SIPO block is
needed to reduce the bit rate for the back-end circuit to
perform further signal processing. It is straightforward that
SIPO has a tree type structure and PISO has a reversed tree
structure.
Control Logic
(a) One Stage
Control Logic
Control Logic
(b) Heterogeneous
(c) Binary T ree
Fig.5.1 Typical PISO and SIPO topologies
Typical topologies of PISO and SIPO are shown in Fig.5.1.
The one stage structure is slow due to large parasitic
capacitance at the converging node. The heterogeous structure
is faster, the binary-tree topology is the fastest. For this reason,
2:1 multiplier (MUX) and 1:2 demultipliers (DEMUX) are
important elements for high-speed SerDes transceivers.
In high-speed SerDes transceivers, there are usually tens of
input ports of PISO and tens of output ports of SIPO.
Therefore, in a binary-tree topology, some stages require very
high-speed circuits; while some stages do not require very
high-speed circuits. Therefore, for those stages requiring very
high-speed, CML logic circuits are used; while static CMOS
6
Clock Multiplier Unit
In a high-speed SerDes transceiver, high-speed clock is very
important. Usually in the transmitter side, each symbol is
generated under the control of a baud-rate clock; in the
receiver side, a clock whose frequency is the baud-rate is
needed to sample the received data at where SNR is the
maximum. Real implementations may vary in some aspects.
For example, the clock frequency can be lower than the baudrate if a multi-phase clock is used. Even though, a very high
frequency clocks is still needed in a multi-gigahertz
transceiver. It is quite often that the transmitter and the
receiver share a high-speed clock. Phase of this clock should
be adjusted in the receiver side, because the delay of the
channel is usually a prior unknown and timing jitter and noise
are added to the received data through the channel, making the
sampling phase very critical. The quality of the high-speed
clock greatly influences the transceiver performance.
Therefore, it should be clean and accurate. A free-running
microelectronic integrated oscillator can not meet the
requirements. Therefore, the high-speed clock is synthesized
from an accurate low-frequency oscillator such as a quartz
oscillator whose frequency accuracy is within e.g. 20 ppm.
Usually this frequency synthesizer is not required to generate a
clock of any frequency within a frequency band. Instead a
clock of a fixed frequency or a clock that can be programmed
to a few discrete frequencies is wanted. It is called clock
multiplier unit (CMU) in a high-speed SerDes transceiver. A
CMU can be a common integer-N frequency synthesizer. A
dominant and mature technique that is used in the design of a
CMU is PLL.
6.1
Basic PLL-Based CMU Structure and
Performance
A representative structure of a PLL-based CMU is shown in
Fig.6.1 (a). It is composed of a phase-detector, a charge-pump,
a loop filter, a voltage controlled oscillator (VCO), and a
divider. The charge-pump provides the necessary loop filter
action. In classic PLL, the combination of the charge-pump
and the RC network is usually replaced with an operational
amplifier (op-amp). Although it is highly non-linear in
practice, it is customary to assume linearity when analyzing
loops that have achieved lock [H.14]. A Linearized model is
7
shown in Fig.6.1. In the model all variables are phases rather
than the actual inputs and outputs.
VCO
Phase
Detector
Kd1
Clock
Out
Reference
in
S
´

e Kd2 s tz +1
s
Charge Pump and Loop Filter
¸N
KD
KO/s
out
¸N
Divider
(a)
(b)
Fig.6.1 (a) a block diagram of a PLL-based CMU (b) a linear model
6.1.1
Second-Order PLL Dynamics
A PLL circuit is a feedback system that is designed to bring
the phase error signal e to zero. For several reasons, a secondorder PLL is a good choice. The first reason is that
theoretically a second-order PLL is unconditionally stable
[F.1]. Higher order PLL however may lead to instability. In
practice, a second-order PLL is not absolutely stable, because
a practical phase-detector and divider are sampled systems
[H.14]. In addition, there are many parasitic poles. The second
reason to choose a second-order PLL is that a first-order PLL
can not reduce the phase error e to zero unless the forward
loop gain is infinitely large. The closed-loop transfer function
of a second-order PLL-based CMU is written as:
F (s) 
out 1
t zs 1
 
in N s 2 /( K D K O / N )  t z s  1
(6.1)
where 1/tz is the frequency of the loop zero. For the
convenience of analysis, we can omit N so that equation (6.1)
is the close-loop transfer function for a classic second-order
PLL. The phase error can be written as:
e 
( s / n ) 2
 in
( s / n ) 2  s /  z  1
(6.2)
Phase noise or timing jitter is an unwanted input variation.
We generally want it is attenuated by the loop. However, this
idea contradicts our original purpose to force the VCO to track
the input (reference) at low frequencies. Therefore, there is a
tradeoff between them. In fact, the design of a PLL is a
tradeoff among many contradicting requirements. A desirable
feature is that a PLL genuinely copies the input to the output
at low frequency but rejects the input at high frequency.
However, jitter peaking is an unwanted feature of a secondorder PLL. The close loop transfer function or equation (6.1)
has two poles and one zero. In a Bode-Plot, the transfer
function magnitude goes up with a slope of 20 decibels per
decade after the zero location. It exceeds unity if both poles
are located at higher frequencies than the zero. A flat area with
a magnitude above 0 dB appears after the first pole as the first
pole contributes -20 dB/dec. The flat area ends and goes down
to below 0 dB with a slope of -20 dB/dec after the second pole.
In the frequency range where the magnitude exceeds one,
jitter peaking appears. If the input jitter frequency is within
this range, this jitter will be magnified in the output.
Rejection of Noise
Input jitter (reference noise) is only one noise source of a
second-order PLL. Noise is generated in every component in a
real circuit. In many linear models, noise is treated as additive.
It is useful to look into the transfer function of those noise
sources to know if the PLL attenuates or amplifies them. Seen
from equation (6.1), the transfer function of the reference
noise is of a lowpass nature but may have jitter peaking in a
specific frequency range. The transfer function of phasedetector noise and charge-pump noise are similar to equation
(6.1). The transfer function of the loop filter noise is
out
s / KD
 2
in s /( K D K O )  t z s  1
where the natural frequency n is defined as:
n  K D K O / N
Jitter Peaking
(6.3)
We can look into the properties of a second-order PLL by
investigating the frequency-domain response and time-domain
response.
Response to Input Variations and Input Noise
Equation (6.2) has two poles and one 2-fold zero, which
represents a high-pass filter. The phase error e will be
sufficiently small if the frequency of the input variations is
significantly smaller than the natural frequency n. In another
word, the difference between the input phase signal in and the
output phase signal out is small. Therefore, the PLL loop
tracks the input variations. At frequency much higher than the
natural frequency, the phase error e will be the input phase
signal in, which means the PLL loop does not response to the
input; it almost rejects the input variations completely.
Therefore, a fundamental property of a second-order PLL is
that it tracks the change of input at low frequencies but rejects
the change at high frequencies.
(6.4)
It exhibits a band pass nature. A time-domain step response
can reveal much information. Assume this noise causes a
frequency error of i, the time-domain phase-error can be
expressed as [H.14]
i
 e (t ) 
n   1
2
exp( nt ) sinh( n  2  1 t )
(6.5)
where  is the damping ratio defined as

 nt z
(6.6)
2
The maximum phase-error max appears at tmax.
  max 
t max 
 
 2 1 
exp 
tanh 1 (
)  sinh(tanh
2



 1
  1
 i
n
2
(
 2 1
)

 2 1
tanh (
)

 2 1
1
n
1
1
In the case of high dampening, they can be simplified.
(6.7)
(6.8)
 max 
t max 
8
 i
(6.9)
c
2 ln 2
(6.10)
c
where c is the crossover frequency (frequency where the
open loop transfer function is 0 dB). It is intuitively obvious
that the VCO disturbance (jitter) gets integrated over a period
of time before the loop takes action to correct it. Jitter
integration is inversely proportional to the loop bandwidth
(roughly the crossover frequency). Therefore, rejection of loop
filter noise requires high loop bandwidth. The transfer
function for VCO noise is of a high-pass nature and is
analytically expressed as
out
( s / n ) 2

in ( s /  ) 2  t z s  1
(6.11)
The noise of a VCO is not white. In a LC resonator, the
broadband thermal noise of passive components is shaped by
the resonator Q and the normalized phase noise is inversely
proportional to the square of frequency offset (). When the
frequency offset is sufficiently large, the noise spectra flatten
due to active elements (such as buffers). At sufficiently small
frequency offset, the phase noise spectra possess a 1/()3
region. Leeson has proposed an empirical equation for VCO
phase noise [F.2]
1/ f 3 
 2 FkT 
0 2 
L( )  10  
 1  (
)   (1 
) (6.12)
2Q 
 
 Psig 
Control Line Ripple and Higher-Order Poles
A real PLL circuit is not linear at all. Therefore, a linearized
second-order PLL model fails to represent a real circuit in
some important aspects. One issue needs to be addressed is the
ripple on the VCO control line. In the linearized model, it is
assumed the phase-detector is a linear subtractor. Under lock
condition, the phase-error is zero. Therefore, VCO remains
undisturbed when the loop is locked. However, in a real circuit
phase-detector or the combination of charge-pump and phasedetector may be highly nonlinear. For example, if the phasedetector is a multiplier-type, there will be higher-order mixing
products; if the phase-detector is a bang-bang type, there are
always pulses in the phase-detector output. Therefore, in some
PLL circuits, there are ripple components on the VCO control
line even under lock condition. In general, the ripple
components are at high frequencies than the reference.
Reducing bandwidth leads to higher attenuation at those high
frequencies thus can be helpful. However, rejection of VCO
noise and fast acquisition require a high bandwidth. A better
solution is to introduce additional poles. Determining how
many poles should be added and where those poles should be
placed needs careful design. Too many poles can easily
degrade the phase margin. It has been shown that one or two
higher-order poles can attenuate the ripple by about an order
of magnitude or more if they are placed around a factor of 4-7
above crossover [F.4]. Fig.6.2 (a) shows a charge-pump loop
filter with one extra pole. Fig.6.2 (b) shows a classic loop
filter with an extra pole.
VCO
Phase
Detector
Up
Down
Reference
Charge sharing, current mismatch, and reference feedthrough of charge-pump can cause spurs in the PLL output
[F.3]. The spurs directly go into the jitter budget of VCO.
Therefore, they need careful design.
Acquisition Time
A rough definition of acquisition time is the time it takes for
a free-running VCO to lock to the input. In some literatures,
acquisition time is divided into frequency acquisition time and
phase acquisition time. Assume a free running VCO is running
at angular frequency  and its phase is zero at the time instant
t=0. The input signal has a frequency of + and its phase is
0 at t=0. According to the linearized second-order PLL model,
the time-domain expression of the phase-error is
 (     )  sinh(   2  1 t )
0 n
i
n
 e (t )  exp( n t )  

 n  2 1

  0 cosh(  n  2  1 t )

(6.13)
Equation (6.13) is similar to equation (6.5). Therefore, a
similar conclusion can be drawn, which is the acquisition time
is inversely proportional to the loop bandwidth. Therefore, it
generally requires a high bandwidth for fast acquisition.
Phase
Detector
Clock
Out
Reference
Charge Pump and Loop Filter
¸N
where L is the normalized phase noise, F is an empirical factor,
1/f3 is a fitting parameter, Q is the quality factor of the
resonator, and Psig is the signal power.
VCO
Clock
Out
Charge Pump and Loop Filter
Divider
¸N
(a)
Divider
(b)
Fig.6.2 loop filter with one extra pole (a) charge-pump (b) active filter
6.1.2
Divider Delay and Phase-Detector Delay
CMU based on integer-N frequency synthesizer needs a
frequency divider in order to synthesize high frequency clock
out of a relatively low frequency reference. The divider
implies the phase detector is digital in nature. In fact phase
detectors and phase frequency detectors for high-speed SerDes
transceivers are almost digital ones. Although the loop filter
and the VCO may be analog, continuous-time circuits,
knowledge about phase error is available to the loop only at
discrete instants. It usually involves a sample-and-hold (S&H)
operation to convert a continuous-time signal into a discretetime signal. A zero-order hold (ZOH) function has a transfer
function given by
H ( s)   u (t )  u (t  T ) e st dt 

0
1  e  sT
s
(6.14)
The phase of this transfer function is
T 
T

H ( j )  e  jT / 2  T  sin c(
)  
2
2


(6.15)
The ZOH function adds additional phases (delay) to the loop
transfer function. The period of a divider output is usually
9
much larger than the period of the VCO clock. For this reason,
its delay is more critical. Divider delay and phase-detector
delay erode the phase margin of a PLL loop. As a
consequence, the loop bandwidth is forced to decrease in order
to avoid these effects. However, a reduced bandwidth may
negatively influence settling time and noise performance. In
practical implementation, phase comparison rate is set to about
10 times of the crossover frequency [H.14].
6.1.3
Granularity Problems
Since the PLL loop operates on a sampled basis and not as a
straightforward continuous-time circuit, it has more stability
problems than arise in continuous-time systems. In particular,
an analog, second-order PLL is unconditionally stable for any
value of loop gain, but the sampled equivalent will go unstable
if the gain is made too large. Even a first-order digital PLL can
be unstable [7]. It has been shown in reference [F.1] that a
second-order PLL which is based on a classic tristate phasedetector and a charge-pump has a stability limit as

 
K '  1/ 
 (1 
)
it 
 it
(6.16)
where i is the angular frequency of the reference, t=RC is the
loop filter time constant, and K' is
K' 
K O I p R 2C
2
(6.17)
where Ip is the charge-pump current, KO is the VCO gain, R is
a resistor in the loop filter, and C is a capacitor in the loop
filter. For a first-order digital PLL, the loop gain should be
smaller than 2. However, if a loop delay of M symbol intervals
is introduced, the stability range is reduced to [7]



0  K  2 sin 

 2(2M  1) 
6.1.4
(6.18)
Digital PLL
A critical problem of a conventional PLL-based CMU is its
sensitivity to process variations, noise from power and
substrate. Another problem is the limited voltage headroom
associated with low-power, deep sub-micrometer CMOS
process. In addition, the loop capacitor consumes chip area.
Digital PLL is a solution to those problems. In a digital PLL,
digital accumulator replaces the loop capacitor (integrator),
and a DCO (digitally-controlled-oscillator) replaces the VCO.
In general, LC oscillators have superior phase noise
performance to ring-oscillators. However, it is difficult to
make LC oscillators digital. A solution has been proposed in
[F.5]. The DCO is a typical differential negative-resistance LC
oscillator. Instead of one varactor, many varactors are used.
The varactors are arranged in serval banks and are connected
in parallel. CMOS varactors made with low-voltage deep submicrometer technologies exhibit very narrow linear tuning
range. Interestingly the capacitance-tuning voltage curve looks
like the input-output curve of a CMOS inverter. Therefore, the
varactors can be made “digital” by setting the tuning voltage
to two proper values. The differential varactor can be as small
as a few attofarads (aF) [F.6]. Good frequency resolution can
be achieved by switching a unit varactor on or off. Finer
frequency resolution can be achieved by applying sigma-delta
modulation to the unit varactor. The DCO enables a true
phase-domain signal processing. Therefore, spurs due to
nonlinearity is greatly suppressed [F.7]. In addition, the whole
digital PLL can be retimed to the VCO clock. For this reason,
the digital switching noise is mixed to become DC offset. The
asynchronym between the VCO oscillation and system
reference clock is compensated by using a time-to-digital
converter (TDC).
Tradeoffs in PLL-Based CMU Design
A PLL-based CMU faces many conflicting requirements.
The following table shows some tradeoffs of a PLL-based
CMU design.
High
Bandwidth
Rejection of input noise, PD noise, and
charge pump noise
Rejection of loop filter noise and VCO
noise
Fast acquisition
Reduction of jitter integration
Rejection ripple on VCO control line
Improve loop stability against parasitic
poles
6.2
Low
Bandwidth






Basic DLL-Based CMU Structure and
Performance
In PLL-based CMUs, the output clock is directly derived
from the VCO oscillation, and the loop filter has lowpass
filtering effect for the input. If the reference is noisier than
VCO oscillation, there is an obvious advantage. However, in
practice, the reference is much cleaner than the VCO
oscillation in high-speed SerDes transceivers. Therefore, it is
desirable to directly derive the output clock from the reference.
This idea is a fundamental concept behind DLL-based CMU
design. Basic DLL-based CMU structures are edge-combiners
and cyclic reference injection ring oscillators.
6.2.1
Tunable Delay Cells
Intrinsic delay of logic gates can be used in DLL. If N
identical gates are serially connected, the total delay is N times
the delay of a unit cell. Adding or removing one gate results in
a change of the total delay. In high-speed SerDes transceivers,
the delay of such a unit cell is not trivial compared with a
symbol period. In addition, the delay is process dependent
Therefore, this method can not provide very fine phase
resolution and can only be used for coarse tuning [F.8], [F.9].
The beauty of this method is that it provides a kind of “digital
delay”. If the unit delay provided sufficiently fine phase
resolution, the problem of unit cell mismatch and process
dependence would be solved by advanced digital signal
10
processing algorithms such as calibration. Fig.6.3 shows a
CMU based on digitally tuned delay cells [H.31].
Phase
Detector
Reference
Time to Digital
Converter
Digital
Integrator
Divider
C[0]
C[1]
C[N]
C[m]
C[m+1]
DCO
VCDL has a total delay of 2, the phase error is zero and the
DLL loop is locked. Assume the unit-delay-cells are identical,
each stage will have a delay of 2/N. The edge combining
logic circuits combines edges of each stage. The highest
output clock will have a frequency that is N/2 times of the
reference [F.14]. A fixed-ratio edge combing CMU is shown
in Fig.6.4.
C[m+N]
Edge Combiner
0
Fig.6.3. A CMU based on digitally tuned delay cells
Delay is a physical process. It can be roughly classified into
two catalogues. The first is caused by charging or discharging
a capacitive load. The second is caused by finite propagation
speed. A representative of the second type may be a piece of
transmission line. However, it is hard to tune its delay. The
first type provides some freedom to tune the delay, since the
time it takes to charge or discharge a capacitive load is
determined by the current, the load capacitance and the
voltage swing, we can tune the delay by adjusting the current
or/and the capacitance if the voltage swing is fixed.
VDD
VDD
VDD
I
Vctr
Vin
I
Vctr
3
Clock Out
Up
Phase
Detector
Reference
Down
0
1
2
3
4
5
Clock Out
Fig.6.4. Fixed-Ratio edge combining CMU
Despite the simplicity of the fixed-ratio edge combining
CMU, it has a few problems. The first problem is its
susceptibility to false lock to harmonics, because the reference
is a periodic signal, a delay of 2k is equal to a delay of zero;
furthermore, the unit-delay-cell usually has large tuning range.
The second problem is its fixed ratio between input and output.
The third problem is that it is sensitive to mismatch. In
practice it is impossible to make the unit delay cells identical.
As a consequence, the output clock will have strong pattern
dependent jitter and duty cycle mismatch.
Vout
Vctr
Vin
1 2
Vout
Vout-
Vctr
Vout+
Vin+
Vout-
A number of solutions can be used to prevent false lock to
harmonics. One method is to use lock detector [F.15].
Information of lock to harmonics can be detected if phase
information of each unit delay cell is used.
Programmable Edge Combing CMU
(b)
(a)
(c)
Fig.6.4 some schemes to tune unit delay cells
Fig.6.4 shows some schemes to tune the delay of a unit cell
[F.10], [F.11]. Fig.6.4.(a) is a current-starved-inverter. The
control voltage Vctr can control the current that flows through
the inverter to the load capacitor to tune the delay. Fig.6.4.(b)
is an RC delay cell. Vctr can control the on-state resistance of
the nMOS transistor that connects the capacitor. Fig.6.4.(c) is
a differential delay cell. The control voltage can control the
current that flow through the positive feedback pMOS
transistors. The idea to control delay through current can also
be found wide use in phase interpolators and phase mixers
[F.12], [F.13].
6.2.2
Edge Combining DLL-Based CMU
A DLL can only generate one delayed version or some
delayed versions of the reference. A DLL-based CMU must
include additional circuits to generate an output clock whose
frequency is an integer multiple of the reference frequency.
Edge combiner serves this purpose.
Fixed-Ratio Edge Combining CMU
In a fixed-ratio edge combing CMU, the reference is
delayed by an N stage voltage-controlled-delay-line (VCDL).
A phase detector detects the phase difference between the
reference and the output of the last unit-delay-cell. When the
A programmable edge combining CMU extends the
applications of edge combing CMU. A straightforward
method is to build some logic circuits to selectively feed the
delayed clocks (0, 1, ... , N) to the edge combiner [F.16],
[F.17]. In [F.17], a total number of N identical delay cells are
connected in cascade to form a VCDL. A multiplier factor
controller can select M delay cells out of them and mask the
rest. The output of the last delay cell is feed to the phase
detector. Therefore, under lock condition, the delay of each
cell is 2/M. At the rising edge of the output of any of the M
delay cells, the edge combiner toggles its output. Therefore,
the frequency of the output of the edge combiner is M/2 times
the frequency of the reference. Since the number M can be
programmed, the edge combining CMU is programmable.
There are many challenges in designing of a programmable
edge combing CMU. Firstly, it usually incurs complicated
logic circuits that are slow and power consuming. Secondly,
the problems of an ordinary edge combing CMU remain.
Those problems include harmonic locking, and susceptibility
to mismatch.
6.2.3
Cyclic Reference Injection DLL-based
CMU
It is not an easy work to solve mismatch in high-speed and
low power deep submicrometer CMOS processes. Therefore,
edge combining CMUs tend to have high spur in the output.
Conventional ring oscillators are not susceptible to mismatch,
11
because the oscillation circulates all delay cells. Conventional
ring oscillators can be made programmable by using a
programmable divider too. However, conventional ringoscillator-based CMUs suffer from jitter integration. Cyclic
reference injection can solve the problem [H.1]. Fig.6.6 shows
a block diagram of a cyclic reference injection DLL-based
CMU. The circuit can work in a ring oscillator mode or a
direct delay line mode, depending on the switch (MUX).
When the delay cells are connected in the form of a ring
oscillator, the CMU becomes a conventional PLL. It suffers
from jitter accumulation, but not from mismatch. However,
the jitter accumulation can only persist for several periods and
will be eliminated periodically by an injected clean reference.
However, it is very challenging to align the edge of the
reference and an edge of the oscillation of the ring oscillator.
This usually leads to high spur in the output.
Clock Out
Up
PD
Sel
¸M
Fig.6.6 A block diagram of cyclic reference injection CMU
Reduction of Phase Noise
In a DLL-based CMU, there are many phase noise and
timing jitter sources. The main sources include the in-lock
error due to the mismatch in the charging and discharging
current sources in the charge pump, the mismatch of the phase
detector outputs, and the phase noise due to the mismatch
among the delay stages, edge combining cells in the edge
combiner based ones, or the re-alignment error caused by the
reference injection in the cyclic reference injection multipliers.
All of those errors can be considered as the systematic in-lock
error.
A number of techniques can be applied to mitigate some of
the problems. In [F.18], the static phase error due to the
imbalance of the mismatches in PFD/CP is compensated by
adding a second low-bandwidth loop. The compensation loop
is digital, and it comprises of a bang-bang PD and an
accumulator to implement an integrator with infinite DC gain,
and the output of the integrator controls a current digital-toanalog-converter (DAC) that leaks current from either side of
the charge pump. The harmonic-locking problem of a cyclic
reference injection CMU is solved in [F.19] by adding a logic
circuit to dynamically control the switch and the divider. In
[F.20], chopping, auto-zeroing and various other circuit
techniques are employed to reduce static phase offset and
crosstalk between the reference and the output clock.
6.3
If the reference is much cleaner than the VCO, DLL-based
CMUs are more advantageous than PLL-based CMUs in terms
of jitter performance. Conversely, if the reference is nosier
than the VCO, then PLL-based CMUs may be better than
DLL-based CMUs. Both PLL and DLL have filtering effect
and jitter-peaking effect to input jitter. PLL always exhibits a
lowpass filtering effect to input jitter; DLL exhibits an allpass
filtering effect to input jitter, but the transfer function can be
changed to a lowpass one if some techniques such as loop
filtering and phase filtering are involved [F.22]. Jitter
accumulation is generally a more serious problem to PLL than
DLL; while spurs are commonly a more severe problem to
DLL-based CMUs than PLL-based integer-N CMUs. PLL and
DLL are usually modeled as second-order linear feedback
system and first-order linear feedback system, respectively.
However, both of them can be unstable, since they are
sampled systems and parasitic poles exist.
Down
Reference
6.2.4
jitter will be dominated by other sources [F.21]. Simple
comparisons are made in the following paragraph in terms of
implementation easiness, jitter accumulation, jitter transfer,
stability, and acquisition time.
Comparison between PLL-Based
CMUs and DLL-Based CMUs
According to Mark A. Horowitz, in CMU design one can
mess up DLL and PLL, because either has its own strength
and weakness. If designed correctly, either works well, and
7
Equalization
In the past years, transfer rate of high-speed serial data links
has been ever increasing. Meanwhile cheap and low quality
transmission lines are still extensively used in many
applications to save cost. With the increased data rate, various
impairments become more and more severe. For example,
Fig.7.1 shows a measured backplane transfer function [G.1].
The physical channel exhibits considerable phase distortion
and amplitude attenuation at frequencies above 4 GHz. The
time domain impulse response of this channel
Fig.7.1. Measured performance of a Tyco backplane
The physical channel exhibits considerable phase distortion
and amplitude attenuation at frequencies above 4 GHz. The
time domain impulse response of this channel dampens
considerably after one symbol period from the time instant of
the amplitude peak if baud rate is low, for example below 1
Gbps. In this case, ISI is not a serve problem if the maximum
runlength is constrained. However, the time domain impulse
12
response of this channel does not dampen sufficiently even
after several symbol periods after the time instant of the
amplitude peak, if the baud rate is high, for example 10 Gbps.
In this case, some zero crossing points may be missing and
some sampling points at the data center change their polarity.
The eye is completely closed and clock and data recovery is
impossible without proper equalization.
In principle, the impairments can be reduced considerably by
replacing the low quality transmission lines with high quality
ones or by equalization [G.2]. High quality transmission lines
tend to vastly increase the cost; while sub-micrometer CMOS
technologies and equalization usually provide excellent cheap
solutions. The impairments of physical channel are also
strongly dependent on the length of the transmission lines. A
short transmission line may not need any equalization. In
some applications, the length of the transmission lines may
vary considerably and their prosperities is not time invariant.
Therefore, considerable efforts in the design of high-speed
serial data links are paid to adaptivity.
implemented in symbol spaced current domain [G.6], [G.7],
[G.8], [G.9]. It does not help the loss at Nyquist frequency.
This scenario is shown in Fig.7.2. The tap delay is achieved by
D type filp flop (DFF), and tap coefficents are controled by
bias current. The bias current is set by a digital to analog
converter (DAC). The control of the bias current usually
involves linear transconductors and current mirrors.
I
C (i )
 i
C ( 0) I 0
(7.1)
N
Dout (k )   C (i ) Din (k  i)
(7.2)
i 0
DoutP
DoutN
Io
I1
QP
7.1
A catalogue of equalization schemes
Equalization methods can be linear or non-linear. Equalizer
can be implemented in the transmitter side or receiver side or
both sides. In microelectronic circuit implementation, it can be
either continuous time (un-sampled) or discrete time
(sampled). The signal amplitude can be discrete (digital) or
continuous (analog). The equalizer can also be either adaptive
or fixed, and the adaptive algorithm can be zero forcing (ZF)
or LMS (or minimum-mean-square-error: MMSE) or some
nonlinear approaches. In addition, the equalizer’s target
response can be either full response or partial response. In the
case of sampling equalizer, it can be baud rate sampled or
over-sampled. Filter design can be either FIR or IIR.
Therefore, there is quite a big set of combination of the above
equalization schemes. Each scheme has its pros and cons
regarding to a specific application or a specific history. The
equalization schemes profoundly interact with CDR structure.
7.2
IN
QN
QP
QN
DinP
DFF
DFF
DinN
Clock
Fig. 7.2. A current domain Tx equalizer
A Tx equalizer can also be implemented in the time domain
by utilizing pulse width modulation [G.10]. In this scheme, the
duty cycle of baseband shaping pulse c(t) shown in Fig.2.2 is
not a hold function for one symbol period, but a biphase or
Machester code pulse whose duty cycle is manipulated to
shape the combinational channel impulse response. This may
be beneficial in deep sub-micrometer CMOS circuits, because
the time resolution is better than the voltage resolution in deep
sub-micrometer CMOS circuits. Furthermore, this solution is
less constrained by the voltage headroom. However, this
solution may not work well when the transfer function of the
physical channel is complicated due to reflection and parasitic
resonance. Duty cycle manipulation has insufficient variations
to match channels with complicated transfer function.
Tx equalization
Tx equalization is usually called pre-emphasis as it always
tries to emphasize the frequencies where high attenuation is
located or de-emphasize the frequencies where low attenuation
is located to make a flat frequency domain response across the
passband. Since the clock is readily available in the transmitter
side; the inputs to the equalizer are usually binary data; and
the noise from the channel does not play an important role. Tx
equalizers are usually simpler than Rx equalizers. However,
Tx equalizers do not have the ability of adaptivity unless a
backchannel is added. In addition, Tx equalizer is constrained
by the peak power of the transmitter as the power is truly
consumed by the load resistors [G.3.].
Tx equalizer can be implemented in continuous time analog
circuits [G.4.]. This kind of equalizer is simply a continuous
time analog high-pass filter. It has limited tuning range and a
constant group delay is difficult. Tx equalizers are almost
exclusively of a discrete time finite impulse response (FIR)
feature [G.5.], since the clock is available and the input to
equalizer is digital. Usually FIR Tx equalizers are
7.3
Rx continuous time analog linear
equalizer
Rx equalizer has much more varieties of implementations than
Tx equalizer, and adaptivity can be realized in Rx equalizer.
Physical channel of high-speed serial data links is usually of a
low-pass nature. A passive high-pass filter (HPF) or active
HPF is able to flatten the joint frequency domain channel
response and reduce ISI. The HPF can be continuous time
analog filter composed of passive RLC network [G.11]. It can
also be active filters based on operational power amplifier
(Opamp) [G.12] or transconductor-capacitor (Gm-C) [G.13.].
13
Butterworth polynomials work very well up to gigahertz
[G.19]. Inverter-based delay units with active inductor load
(INV-AIL) are reported in fractionally spaced equalizer up to
2.5 Gbps [G.20]. In very high baud rate such as 40 Gbps,
passive LC network or transmission line is usually used as
delay cell [G.21]. The continuous time analog delay cells are
not affected by the clock jitter. They enable Rx continuous
time analog FIR equalizer. However, they also suffer from the
challenges for high-speed analog CMOS design.
Fig. 7.3. A high HPF cell and its transfer function
CML buffer is a kind of “natural” equalizer. This has been
exploited in [G.6] and [G.14]. As shown in Fig.7.3, the low
frequency gain can be tuned by a MOS resistor M1 and the
high frequency gain can be tuned by varactors Cd1 and Cd2.
The high frequency gain can be further boosted by an on-chip
inductor [G.15.]. In addition, the HPF cells are usually
connected in cascade to give more gain at high frequencies.
Rx continuous time analog linear equalizers are not sampled.
Therefore, clock jitter does not affect their performance.
However, there are many challenges for this kind of equalizer.
Some challenges are listed as follows

It has limited tuning range and rarely matches channel,
especially when there are both frequency dependent
attenuation and frequency dependent delay.

Linearity is a challenge, especially when input swings
vary greatly in amplitude.

Limited by gain bandwidth of each stage of differentialpair.

It is sensitive to PVT variations.

It is sensitive to device mismatch and non-linearity.

Offset cancellation and calibration are difficult.

Multi-stage can achieve high gain, but it can also lead to
clipping.
Continuous time analog Rx linear equalizers are sometimes
used as pre-equalizer for decision feedback equalizer. The task
is to make the impulse response causal, with most of its
energy concentrated in the time origin (with some fixed delay).
It is also desirable to have a noise whitening filter
functionality so that the DFE works best [7].
7.4
Rx FIR equalizer
The basic structure of an Rx FIR equalizer is shown in
Fig.7.4. The main building blocks are delay cell, multiplier
and adder. There are many variations in practical
implementations, since each block can be either continuous
time or discrete time, either analog or digital.
The delay cell can be implemented in continuous time
analog circuits. An ideal delay cell is an all-pass filter whose
group delay is constant but tunable. An ideal all-pass filter is
not realizable. In practice, low pass filter is used to
approximate it. Active Opamp-MOSFET-C filter using Bessel
type polynomials [G.16], [G.17], [G.18] or Gm-C filter using
Delay cells can also be implemented in discrete time manner.
The tapped delay can not be a simple DFF, which is very
different from Tx FIR equalizer, because the received signals
are basically analog. The tapped delay usually incurs sample
and hold.
Data In
C0
Delay
C1
Delay
Delay
Delay
CN-1
CN
Data Out
Fig. 7.4. Schematic diagram of Rx FIR equalizer
The multiplier is usually implemented in the current domain.
A linear voltage to current (V-I) converter (transconductor) is
needed to convert the voltage of each tap to current [G.19],
[G.22]. The coefficients of the equalizer are realized by
weighting the current. The coefficients are first normalized so
that they do not exceed one. The current is led to a network of
differential CMOS pairs. In any time one transistor of the
differential pair is in the triode region while the other is off.
The drain of all differential pairs are connected to the output
of the V-I converter. In each pair, the source of one transistor
is connected to the ground, while the source of the other
transistor is connected to a shared output resistor. The
differential pairs are controlled by weight setting logic circuits.
When all transistors connected to the ground are on, there is
no current passing through the shared resistor and the weight
is zero. If they are all off, the weight is one. The weighted
current is mirrored. The current adder can be simply realized
by connecting mirrored current together.
Rx FIR equalizer is difficult to implement. If it is continuous
time, the delay cells are difficult, with little flexibility and
limited tuning range. If it is discrete time, it is susceptible to
clock jitter. If it is digital, it is very challenging for high-speed.
In addition, if the discrete time equalizer is symbol space
sampled, the output only contains samples at data center.
Additional efforts are needed to find the zero-crossing points
if threshold type CDR is used. Furthermore, Rx FIR equalizers
are linear equalizers which tend to amplify noise and crosstalk.
Therefore, in practice, Tx FIR equalizer and Rx decision
feedback equalizer are more commonly used in SerDes
transceivers.
7.5
Decision feedback equalizer
Decision feedback equalizer has many advantages over
linear equalizers in signal processing and microelectronic
circuit implementations. A simple DFE diagram is shown in
Fig.7.5. The feedback equalizer has the same structure as a Tx
14
a
FIR equalizer, thus the circuits and techniques for Tx FIR
equalizer directly apply to the feedback filter. From the
system’s point of view, DFE does not need to have a flat
spectrum across the passband, thus does not enhance noise and
corsstalk. In addition, although noise still injects into the
feedback filter through each tap, the noise level is reduced by
the nonlinear decision circuit.
The impulse response of the physical channel is rarely
minimum phase or with most of its energy concentrated near
the time origin. Suppose the maximum amplitude appears at
n2T+ (<T) and the first non-zero point emerges after n1T+.
Since DFE discards all data power that stems from past data
symbols, it is desirable to make (n2+1)T+2 the first tap of the
feedback equalizer to achieve the maximum SNR. Because the
symbol power is of the removal of some energy, DFE is still
suboptimum even when additive noise is white [7].
Feedback Filter
T
C0
C1
T
T
Data In
T
CN-1
Decision
CN
S
Data Out
Fig. 7.5. Schematic diagram of DFE FIR equalizer
Mathematically DFE can be written as
n 2 1
aˆ (kT   )  a(k )  h(n 2 T   )   a(k  i)  h(iT   )
i  n1

n2  M

i  n2 1
i  n 2  M 1
(7.3)
 a(k  i)  h(iT   )  c(iT   )   a(k  i)  h(iT   )
where the second item is precursor ISI; the third item is the
post cursor ISI that can be cancelled if the feedback filter
matches the channel; and the third item is the residual ISI due
to finite number of taps if the feedback equalizer is FIR. When
n1>=n2-1, the precursor ISI is effectively zero. The residual ISI
is zero if the channel impulse response only lasts for a period
shorter than the taps of the feedback filter. As a physical
channel usually does not satify these requirements, a pre-filter
is needed to reshape the channel response to effectively reduce
precusor ISI and reduce the taps of the feedback equalizer.
This can be done with a Tx equalizer or/and a prefilter (feed
forward equalizer FFE) in the receiving end [G.23], [G.24],
[G.25], [G.26]. The FFE can be implemented in either
continuous time [G.27] or discrete time.
DFE can be implemented in either continuous time [G.28] or
in discrete time. When it is implemented in continuous time, it
is not affected by clock jitter. From equation (7.3) we can see
that DFE is much less sensitive to clock jitter than linear FIR
equalizer. Nevertheless, a direct implementation of DFE can
consume significant power, area, and complexity since it
involves resolving the previous data and using them to add an
analog value to the input within the next symbol period [G.24].
Loop unrolling can relax the requirements. Fig.7.6 shows a
one-tap unrolled DFE.
x(n)
Mux
D
Q
a
Fig. 7.6. schematic diagram of an unrolled DFE
The received signal x(t) is sampled at time nT+ and the
sampled signal is denoted as x(n). If the first two taps of the
impulse response of the channel is 1 and a. The signal to the
decision circuit is
 x(k )  a , when a(k  1)  1
y (k )  x(k )  f (k )  x(k )  a  a(k  1)  
 x(k )  a , when a(k  1)  1
(7.4)
The decision circuit will make a decision based on y(k). As
we see from equation (7.4), the decision can be made
according to the comparison between x(k) and +a or x(k) and a. The previous symbol a(k-1) is used to choose +a or -a.
This unrolls the DFE into the structure shown in Fig.7.6,
where no multiplier and summation are needed. The unrolling
method can be extended to partial response DFE and PAM-4
DFE. Unfortunately the complexity of DFE goes up by 2 N (N
is the number of taps) and the calibration of offset levels are
needed. In addition, the probability of error propagation goes
up when number of taps increases. In practice, Tx FIR preemphasis and FFE are used together with DFE. Those linear
equalizers cancel precursor ISI and DFE cancel postcursor ISI.
7.6
Digital equalizer
Digital equalizers have many advantages over analog
equalizers. It does not inject noise from each tap node, which
is a problem for FIR based analog equalizers, and it is not
sensitive to PVT variations and power and substrate noise. In
addition, it is possible to implement much more sophisticated
equalization method in digital domain. Nevertheless, digital
equalizers require very ADC, which is very challenging in
high-speed serial data links. However, with the scaling down
of CMOS processing, a baud rate digital equalizer has been
reported using 65 nm CMOS technologies [G.29]. According
to Nyquist sampling theorem, a sampling frequency higher
than baud rate is needed if there is frequency offset or phase
drifting in the incoming data. A sample rate converter is
wanted when the sampling clock’s frequency is not the baud
rate [G.30].
7.7
Bandwidth compression
Full response is aimed to eliminate all ISI at data centers,
which requires a flat (folded) spectrum across the passband.
However, a physical channel always exhibits a strong lowpass
nature. In another word, in many applications, full response
equalization does not match the channel response well. Partial
response [G.31] is not aimed to eliminate all ISI, but rather to
utilize the controlled ISI or constructive ISI. Therefore, it
requires fewer boosts for high frequencies, has less noise
enhancement and matches the channel better. Duobinary is a
15
widely used partial response method [G.32]. Duobinary has
the advantage to eliminate ISI at data transitions. For this
reason, an equalizer aiming at a target response as duobinary
is also called edge equalizer [G.33]. This is beneficial for
clock recovery. However, the received signal has 3 levels
because of ISI. Recovery of one bit datum becomes dependent
on the current received symbol and its previous symbol. This
leads to possible error propagation. Error propagation can be
circumvented by a precoding technique, such as TomlisonHarashima precoding [G.34], [G.35]. The precoding technique
also applies to DFE. Two-level signaling is also possible if the
shortest transitions are removed [G.33].
Strictly speaking, duobinary is not a bandwidth compression
method. Duobinary signaling has the same symbol rate and bit
rate rate as NRZ. If there is a hard cutoff frequency, the
maximum bit rate that duobinary achieves is the same as NRZ.
In reality, duobinary is usually regarded as a bandwidth
compression methond because it does not require a flat
spectrum as target response. True bandwidth compression
method is based on multi-level such as PAM-4. Equalization
methods for multi-level transceiver are basically the same as
binary system. However, the microelectronic implementation
is more complicated because of the additional levels [G.26].
Multi-level transceivers are usefull for highly loss channels.
7.8
Adaptation
In high-speed serial data links, it is desirable for equalizers
to have adaptivity. Firstly, there is an ensemble of fixed
channels, but which one is available is a priori unknown. For
example, the length of a high-speed Ethernet cable may vary;
the properties of backplane channels are different due to
fabrication variations. Second, a physical channel may not be
time invariant. Adaptive equalizer is usually implemented in
the receiving end, because the channel properties can be
detected from the receiving signals. Adaptation is not possible
in the transmission side unless a back channel is specified.
noise
ak
h(t)
r(t)
ak
n(t)
S
r(k)
c(k)
y(k)
S
e(k)
delay
Minize [e(k)]2
Fig.7.7. A block diagram of an adaptive linear equalizer
Fig.7.7. shows a block diagram of an adaptive linear
equalizer. The sampled received signal r(k) contains various
impairments such as ISI and noise. The linear filter c(k) is able
to eliminate much ISI if the coefficients are correctly set to
match the channel. The difference between the output of the
equalizer and the original data a(k) is
N

e(k )  y (k )  a~(k )   r (k  n)c(n)  a~(k ) (7.5)
n0

where ã(k) is a delayed version of the original data sequence
a(k) to match the channel delay. In practice we do not know
the original data sequence. Otherwise, we do not need any
equalization or adaptation, since the original data sequence is
already known. However, ā(k) can be an exact replica of ã(k)
if correct decisions have been made.
If the coefficients have been correctly set, y(k) should be
equal to of ã(k), and e(k) should be always zero. In practice,
we inevitably have noise and residual ISI. Therefore, our cost
function (error) is defined as the square of the e(k). To
optimize the system, we need to bring the error to its
minimum. Conventionally gradient descent method is very
useful to find the minimum. It is done by iteration.


cˆn 1  cˆn   e 2 (k )  
(7.6)
where ĉ is the coefficient vector, and  is the step. Since we
have assumed a linear filter, the gradient can be derived as



 e2 (k )  2e(k )  r (k )
(7.7)
where ř(k) is a vector defined as

r (k )  r (k ) r (k  1) ... r (k  N )
(7.8)
Equation (7.7) shows that the gradient is reduced to the
product of e(k) and ř(k). Even though, in practice it is still very
challenging to design analog multiplier to meet the very high
speed of SerDes transceivers. To simplify the iteration method,
the step  is made very small and the gradient is replaced with
the sign of e(k) and the sign of ř(k), which is

cˆn 1  cˆn    signe(k ) signr (k )
(7.9)
This approach will be very sensitive to data patterns and
noise distribution if the coefficients are updated with baud rate.
Therefore, instead of minimizing the instantaneous error, we
minimize the expectation of the error. In practical SerDes
receivers, the effectiveness of equalization is dependent on the
sampling phase of the clock, and the recovered clock depends
on equalization too. Therefore equalizer adaptation loop
interacts with the CDR loop. To avoid the two loops fight each
other, the bandwidth of the two loops must be much different.
In practical design, the CDR loop has a much wider bandwidth
than the equalizer adaptation loop. We can write the cost
function as
2
N


 

error  E a~(k )   r (k  n)c(n)   min
n 0

 


(7.10)
For this reason, the adaptation algorithm is called least mean
square (LMS). The algorithm according to equation (7.9) is
called sign-sign LMS. LMS may be the most widely used
algorithm in equalizer adaptation.
16
delay
+
y(k)
8.1
ã(k)
-
-
S
e(k)
Minize [e(k)]2
ak
noise n(t)
+
z(t)
r(t)
S
+
+
h(t)
y(t)
S
-
ā(k)
b(k)
c(k)
Fig.7.8. A block diagram of adaptive decision feedback equalizer
A block diagram of adaptive DFE is shown in fig.7.8. The
cost function is
2
N

 


error  E  z (k )  a~   a~ (k  n)c(n)   min
n 1






(7.11)
Equation (7.11) shows that the DFE is mimicking the
channel, since [1, c(1), c(2), …] is actually the normalized
channel impulse response if the error is zero. However, in
practice a normalization factor is priori unknown, or z(k) is not
normalized. For this reason, practical design may add one
more adaptive parameter. It may be a adaptive reference level
or a variable gain.
8
Data and Clock Recovery
In high-speed SerDes transceivers where there are strong
impairments, ISI is severe and eye is usually closed. After
equalization, the impairments are compensated to a certain
degree. ISI is reduced to an accept level and eye is open. Since
most equalization schemes are aimed to eliminate the ISI at
data centers and the system impulse response is at its
maximum at those positions, it is desirable to sample the
incoming waveform at the data centers because they have the
maximum SNR. An abstract eye diagram is shown in Fig.8.1
[H.1]. A practical transition probability density curve may
look quite different from the one shown in Fig.8.1.
Transition Probability
Density
T
2
Resample
Data Out
Data In
Clock
Recovery
tu
tr
th
tm
Clock
Fig.8.1 An abstract eye diagram
As mentioned in section 2, there is no exclusively allocated
clock signal in high-speed SerDes transceivers. However, a
clock signal whose frequency is the baud rate and whose
phase is aligned to the data centers is indispensable for correct
data recovery. As shown in Fig.8.1, a nonlinear clock recovery
circuit exacts clock information from the incoming data
waveform, and uses this regenerated clock to resample the
data waveform to recover the data.
Design Considerations for CDR
Since the signal power at clock frequency or the baud rate is
essentially zero in the incoming data waveform, clock signal
has to be generated locally in the receiving end through
nonlinear methods. A microwave filter type clock recovery
scheme or more commonly called “nonlinear spectral line
method” usually incurs a nonlinear device such as a rectifier,
and a bandpass filter [H.2]. The nonlinear device generates
signal power at the clock frequency and the filter eliminates
spectra at other frequencies. Nevertheless, it is hard to achieve
a high-Q bandpass filter with wide tuning range in
microelectronic circuits. A more commonly used method is to
have a local oscillator to generate a clock signal. The phase
error and frequency error between the local oscillation and the
incoming data waveform can be detected. The detected errors
are feed to control loops that are able to minimize those errors.
This type of clock recovery scheme usually involves a phase
locked loop. Paractical microelectronic implementation of
CDR is a tradeoff among many design considerations. Those
design considerations are also called figures of merit (FOM).
8.1.1
Acquisition Time
The first two tasks for a clock recovery circuit are locking
the clock frequency to the baud rate of the incoming data and
locking the phase of the clock to the data centers. However,
there are initial frequency offset and phase offset due to the
uncertainty of the data rate and VCO frequency. For example,
due to PVT variations, a free running VCO has frequency
offset from the nominal oscillation frequency and its phase is
priori unknown after power-on. On the other hand, the data
rate may change when switching from one application to
another. The time it takes for a CDR to lock to the desired
clock frequency and clock phase is called acquisition time. It
is desirable to use separate loops for frequency tracking and
phase tracking so that frequency tracking can achieve fast
frequency acquisition and phase tracking can achieve good
jitter performance.
Frequency acquisition can be achieved by using frequency
sweep [H.3], or frequency discriminator [H.4], [H.5], [H.6],
[H.7], or frequency reference. In high-speed SerDes
transceiver, the most commonly used solution is frequency
reference, since the data rate of an application is predefined by
standardization organizations. Frequency acquisition can be
achieved by locking the locally regenerated clock frequency to
the reference before real data transfer starts. In practice, CDR
can be classified as continuous-rate CDR, multi-rate CDR, and
single-rate CDR. A continuous-rate CDR accepts input data
stream of any rate within a certain range. A multi-rate CDR
can lock to input data stream whose rate is a priori unknown
but is an integer multiple of a known base rate. Frequency
reference helps fast frequency acquistion for single-rate CDR
and multi-rate CDR, but is not so useful for continuous-rate
CDR. In a high-speed SerDes transceiver, the transmitter
clock is usually generated by a CMU that synthesizes highfrequency clock from a low-frequency but very accurate
oscillator such as a quartz oscillator [H.8] or even an atomic
clock [H.54]. When frequency is locked, the frequency lock
loop (FLL) is usually frozen (or switched off), and the PLL
17
begins to align the clock phase to the incoming data. Usually
the PLL can only tolerate a very small amount of frequency
offset. As a consequence, FLL must reduce the difference
between the regenerative clock frequency and the baud rate to
a very small value, for example,  250 ppm, before handing
the control to PLL. The dual-loop CDR structure embodies the
idea of fine loop and coarse loop control.
8.2
Phase Selection (Picking)
After FLL locks, the main task of a CDR circuit is to adjust
the phase of the clock to sample the incoming data at the
optimum sampling time instants. To a wide sense, this process
is phase selection or phase picking. Within a clock period, the
number of available phases may be infinite or finite,
dependent on microelectronic implementations. Conventional
notion of phase picking or phase selection may exclusively
refer to finite available phases or discrete phases.
A wide sense phase selection can be achieved indirectly by
adjusting the clock frequency or directly by adjusting phase of
the clock. The indirect phase selection usually involves PLL
and the control of VCO frequency, while direct phase
selection usually incurs DLL and control of delay. Since phase
is the integration of frequency over time, in principle the
indirect way of phase selection can give infinitely fine phase
resolution and infinite large numbers of available phases.
However, in the case that clock frequency is already equal to
the baud rate, phase error can only be compensated by first
forcing the clock frequency to deviate from baud rate and later
pulling it back, which usually leads to a long acquisition time
and jitter integration. The direct way is desirable because it
can adjust the sampling phase without disturbing the clock
frequency. Thus can achieve faster acquisition and low jitter
integration. One disadvantage is that reference input jitter can
directly propagate to regenerative clock. Another disadvantage
is that it can tolerate only a very small frequency offset.
Nevertheless, from a wide-sense phase-selection’s point of
view, PLL and DLL are just two methods for phase selection.
Although it has been argued much about their difference, good
designs using either of them can satisfy most applications.
8.3
Jitter Performance
As shown in Fig.8.1, in order to correctly recover the
incoming data, a clock period T must meet
T  Tu  Tr  Th  Tm  Tcjitter
(8.1)
where Tu is the uncertainty of zero-crossing, Tr is the time it
takes to cross the threshold (vm), Th is time that the incoming
waveform must be held above or below the threshold, Tcjitter is
the clock jitter, and Tm is the time margin. When the sum of Tu,
Tr, Th, and Tcjitter is larger than a clock period, the time margin
disappears and bit detection errors increase drastically.
8.3.1
Jitter Generation and Jitter Transfer
There are basically two types of jitter. The first type is
pattern dependent jitter. Although ISI at data centers can be
suppressed greatly by equalization, ISI may still be severe at
zero-crossing points, especially when there is little excess-
bandwidth. In addition, DC offset and duty cycle mismatch in
the transmitter clock also contribute to pattern dependent jitter.
An analysis of data-induced zero-crossing jitter and the
consequential pattern-dependent jitter can be found in [7],
[H.9] and [H.10]. For symmetric system response, these
papers show that zero-crossing jitter has a power spectral
density with a depression near DC. Another type is noiseinduced jitter. Noise can cause zero-crossing points to shift. In
some literature [H.16], jitter is further classified as random
jitter (RJ), deterministic jitter (DJ), ISI, periodic jitter (PJ),
duty cycle distortion (DCD) and etc. Because CDR is a highly
nonlinear circuit, jitter-performance exhibits high nonlinearity.
Modeling of jitter in CDR can be found in [H.11], [H.12], and
[H.16]. Fig.8.2 shows various sources of jitter.
Decision
S
Data
Out
Sampling Noise
Input
Jitter
Input
S
PD
noise
Phase
Detector
S
Charge
Pump Noise
Charge
Pump
S
Filter
Noise
Loop
Filter
S
VCO
Noise
VCO
S
Clock
Out
Fig.8.2 Noise sources in a conventional CDR
Traditionally the input jitter is separated from the jitter
generated by CDR. Jitter transfer is the ratio of the output
jitter to the input jitter over jitter frequencies, which mainly
describes the attenuation or amplification of the input jitter by
CDR [H.13]. In Fig.8.2, the jitter transfer function can be
approximated by a linear model as:
H( f ) 
out
F( f )

in 1  F ( f )
(8.2)
where F(f) is a linear approximation of the forward loop
transfer function. F(f) is usually of a lowpass nature and has
large gain near DC. Therefore, jitter transfer function is
approximately a unit at low frequency and smaller than a unit
at high frequency. However, depending on the zeros and poles
of the forward loop transfer function, there might be a
frequency range where the jitter transfer function is larger than
one. This is called jitter peaking. Jitter peaking is not wanted
as the result is to amplify the input jitter. It has been shown in
[H.14] that jitter peaking can be totally eliminated by using a
voltage controlled delay element.
After separation of the input jitter, jitter generation is
defined as the amount of jitter added to the data signal or
clock signal by the CDR circuit when the input data stream is
essentially clean. In an analog CDR, an important source of
jitter generation is the VCO. In digital CDR, spur is another
main source of jitter generation due to the finite numbers of
available phases (discrete phases).
8.3.2
Jitter Tolerance and Input Jitter
Tolerance Mask
A CDR circuit must work properly under the disturbance of
noise. Jitter tolerance is defined as the capability of a CDR
circuit to achieve a specified bit error rate (BER) under the
worst-case jitter conditions. Jitter tolerance is a function of
frequency. It is straight forward that at high frequencies
18
comparable with the symbol rate, jitter tolerance is small,
while at low frequencies the jitter tolerance is large. At high
frequency the loop gain is small, and jitter is rejected by
averaging rather than tracking. Jitter larger than 1 UI can lead
to a decision error. While at low frequency, the loop gain is
large and jitter is effectively tracked and compensated by
moving the clock phase accordingly. Very slow frequency
drift such as baseband wandering can lead to enormous phase
error (jitter), if the observation time is very long and the phase
tracking loop is switched off. Luckily this jitter can be
tolerated, since the frequency drift only causes a small phase
error in one symbol period. This error can be effectively
tracked and compensated by the PLL loop. Different
applications require different jitter tolerance. For a specific
application, its input jitter tolerance mask is usually defined by
standardization. In many cases, jitter transfer exhibits a
lowpass characteristic; while jitter generation exhibits a highpass characteristic. When jitter generation is taken into
account, the input jitter mask has some turning points and
looks like what is shown in Fig.8.3, which is actually an input
jitter tolerance mask of SONET OC-12.
decompose the VCO control into fine and coarse inputs,
allowing the latter to remain quiet after the system is phaselocked [8]. This concept leads to dual-loop topologies.
Referenceless Dual-Loop Topology
Shown in Fig.8.4 the referenceless dual-loop topology is
useful for continuous-rate CDR [9], [H.7]. In this topology the
frequency-lock-loop is designed to lock the VCO frequency
and the phase-lock-loop is designed to lock the VCO phase.
The frequency detector usually involves quadrature phasedetectors [H.29]. In order to prevent locking to harmonics,
harmonic detector (HD) based on special data patterns are
usually used. It needs careful design to prevent the two loops
from fighting each other. It is desirable to hand over the
control of VCO from FLL to PLL after frequency acquisition
is acquired and switch back to FLL when PLL loses lock.
Usually a loss-of-lock-detector (LOLD) and a lock-detector
(LD) are designed to serve these purposes.
Decision
Circuit
Recovered
Data
Serial data
stream
VCO
Frequency
Detector
Loop
Filter
S
Phase
Detector
Divider
Multi-band
Selection
Loop
Filter
Recovered
Clock
Fig.8.4. A block diagram of a referenceless CDR
Referenced Dual-Loop Topology
Fig.8.3. Sinusoidal input jitter tolerance mask of OC-12.
Fig.8.3 only shows the sinusoidal jitter tolerance or jitter
tolerance in the frequency domain. Apart from the sinusoidal
part, the total jitter is also composed of random jitter and
deterministic jitter. In the case of XAUI, the tolerance should
be at least 0.37UI for deterministic jitter, 0.1UI for sinusoidal
jitter, and 0.18 UI for random jitter. All together this amounts
to 0.65UI total jitter. Therefore, a XAUI receiver has to work
properly with only 112-ps (35%) stable data within a 320-ps
data cycle [H.17]. For this reason, sometimes time-domain
jitter tolerance such as peak-to-peak jitter or root-mean-square
(rms) jitter is used. It has been argued about the equivalence of
timing jitter and phase jitter in some literature [H.18].
8.4
Basic CDR Topologies
As mentioned before, basic tasks of a CDR circuit must
include frequency acquisition and phase acquisition to ensure
BER performance despite PVT variations and added noise to
the circuit. A critical difficulty in modern CDR circuits stems
from the use of low supply voltages. The gain of VCOs must
increase as the supply is scaled down because the tuning range
must remain a constant percentage of the center frequency. As
a result, for a given ripple on its control line, the VCO suffers
from greater jitter. A method to alleviate this issue is to
It is rare in practical applications to have a continuous-rate
CDR circuit. In most applications, the data rate is well predefined in various standards. For instance, data rate of SONET
is defined on the basis of OC-1 which is 51.840 megabits per
second (Mbps). OC-X is X times of OC-1. The data rate of
10GBASE Ethernet is 10 gigabits per second (Gbps). The
nominal data rate of 1X Blu-ray Disc (BD) is 66 Mbps and X
speed BD has a data rate of 66X Mbps. The accuracy and
stability of the data rate varies from 1 in 1011 for primary
reference clock (PRC) in SONET and SDH to a fraction of a
percent in optical disk drives. In most high-speed SerDes
transceiver applications, the rate stability is as good as tens of
ppm. Therefore, it is advantageous to use reference for single
rate CDR or multi-rate CDR. A block diagram of this topology
is shown in Fig.8.5.
Serial data
stream
Decision
Circuit
Recovered
Data
VCO
PD
S
Loop
Filter
Reference
CMU
PFD
Loop
Filter
Recovered
Clock
Fig.8.5. A block diagram of a dual-loop CDR with a reference
19
The frequency acquisition is achieved by locking the VCO
frequency to the reference. It can start before actual data
transfer takes place. Once transfer starts, acquisition can be
fast because only phase error needs to be dealt with.
Shared Reference Topology
In some multi-channel receiver applications, the
conventional dual-loop topology is not recommended. If each
channel was allocated a VCO, there would be many VCOs.
These VCOs could interact with each other through power
supply and substrate. Thus, the architecture would suffer from
inter-channel crosstalk problems [H.30]. To avoid this
problem, a global reference is shared by all channels. A block
diagram of shared reference topology is shown in Fig.8.6. A
shared global reference also helps to reduce power
consumption and chip area. In the shared reference topology,
frequency offset between the bit-rate of each channel and the
reference clock is dynamically compensated by manipulating
the phase of the clock. This may incur phase interpolator,
phase shifter and phase rotator. The shared reference topology
also applies to single channel applications because of its
potential to provide a better immunity to supply and substrate
noise and PVT variations [H.8]. Many digital CDR circuits
have this topology.
Data
Decision
Multiphase
clock generator
Serial data
stream 1
Multiphase Digital
PD generator
clock
Filter
CMU
Reference
Decision
Multiphase Digital
PD generator
clock
Filter
Serial data
stream k
Serial data
stream 1
PD
Clock
Loop
Filter
Data
Phase Selection
Clock
Control
Data
Decision
Phase Selection
Control
Clock
Phase Selection
Control
Fig.8.6 A block diagram of shared reference topology
In practical microelectronic implementations, there are many
variations of each topology. For example, the phase lock loop
in a dual-loop CDR can be a conventional PLL or a phase
picking PLL [H.7]. The VCO can be LC oscillator or ring
oscillator. The VCO can also be analog or digital [H.30],
[H.31], [H.32]. In the following section, some representative
microelectronic implementations are discussed.
8.5
Hogge Phase-detector
A simple way to avoid misinterpreting missing edges is to
generate phase errors only at data transitions. The design of a
Hogge PD reflects this idea [H.33]. The phase error signal e
is generated by XOR the input datum and its delayed (retimed)
replica. An XOR gate outputs "0" when its inputs are equal
and outputs "1" when its inputs are different. Therefore, e is
generated only at transitions. The width of e is linearly
proportional to the phase difference between the input data
and the regenerated clock. A reference signal with a width of a
half clock period is introduced to facilitate microelectronic
implementation. As a consequence, the phase-detector output
is e minus the reference. Under lock condition, e and the
reference are equally wide, and the average of the phasedetector output is zero. A Hogge PD is sensitive to duty cycle
mismatch because of the 0.5 clock period reference. In
addition, a Hogge PD is also sensitive to transition density
[H.14]. To understand why, we have to take into account of
the loop integrator. Although under lock condition, the output
of a Hogge PD is composed of a positive rectangular pulse and
a negative rectangular pulse at a data transition, and the width
of the two pulses is equally a half clock period, the output of
the loop integrator has positive net area. The presence or
absence of such a pulse affects the average output of the loop
integrator. The data-dependent jitter thus introduced is often
large enough to be objectionable. A modified triwave PD
which is shown in Fig.8.7 (b) solves this problem by forcing
the net area of the output of the loop integrator to zero at each
data transition under the lock condition.
Representative CDR Architectures
There are a number of clock and data recovery schemes, for
example, maximum-likelihood clock recovery, nonlinear
spectral line clock recovery, zero forcing clock recovery,
MMSE clock recovery and etc [7]. In high-speed SerDes
transceiver, threshold-crossing clock recovery is the most
common scheme because of its simplicity and suitability for
high-speed microelectronic implementation. In the following
sections, some representative CDR architectures are briefly
discussed.
8.5.1
in a frequency syntheisizer is strictly periodic; but the input
data stream in a CDR circuit is random. Therefore, some
phase-detectors and phase frequency detectors that work well
for frequency synthesizer do not work for CDR at all. A
general problem is that they usually misinterpret missing
edges as frequency errors. Therefore, phase-detectors and
phase frequency detectors are very importance to CDR circuits.
In high-speed SerDes transceivers, phase-detectors are
predominately logic circuits. If a threshold-crossing timing
scheme is used, phase errors are generated by comparing data
transition edges with clock transition edges. Therefore,
exclusive-or (XOR) gate is a fundamental building block of
phase-detectors.
Phase-detector and Phase Frequency
Detector
Clock recovery circuits in high-speed SerDes transceivers
are similar to frequency synthesizers. However, the reference
e
Phase
Error
Phase
Error
reference
Data
D
Q
D
Clock
Q
Data
D
Q
D
Q
D
Q
D
Q
Clock
(a) A Hogge PD
(b) A modified triwave PD
Fig.8.7 A Hogge PD and a modified triwave PD
Under lock condition, the DC contents of the output of a
Hogge PD are zero. Therefore, CDRs with Hogge PDs are
generally less noisy than those with bang-bang PDs. However,
linear phase-detectors used in CDRs suffer from a very small
frequency locking range and slow acquisition time. These
disadvantages are due to the fact that linear phase-detectors do
20
not respond correctly in the presence of a frequency difference
between the clock and data [H.34].
Bang-Bang Phase-detector
Bang-bang phase-detectors (BBPD) are also usually called
binary phase-detectors or early-late phase-detectors.
Compared with linear phase-detectors such as Hogge PD,
bang-bang phase-detectors are simpler and easier in
implementations in microelectronic monolithic circuits. A
binary bang-bang PD can be a simple DFF [H.35], [H.38] or a
sample-and-hold (S&H) cell [H.29]. In a bang-bang PD which
is shown in Fig.8.8, the input data signal is used to trigger the
DFF. When clock is leading data, a “down” signal is generated.
Conversely, when clock is lagging data, an “up” signal is
generated. It is obvious that the output of this binary PD is
independent on transition density. Despite of this advantage
and its simplicity, there are two main disadvantages of this
type of binary PD. First, an additional data recovery circuit is
needed. Second, it is sensitive to consecutive-identical-digit
(CID). A CID pattern fixes the binary phase comparator
output until the next data pulse appears. This causes clock
phase drift and jitter generation [H.38].
Fig.8.9. An Alexander PD based on DFFs
It can be seen from the truth table that the output under the
lock condition is not defined. As a consequence, Alexander
PD generates phase error signals even when there is actually
no phase difference between data and clock. For this reason,
CDRs with Alexander PDs are generally regarded as noisier
than those with linear PDs. Theoretically the gain for an
Alexander PD is infinite around zero input phase error.
Therefore, CDRs with Alexander PDs exhibit ability for fast
acquisition and wide frequency locking range. In practice,
there is a linear region as shown in the right side of Fig.8.9
because of effect of metastability [H.55]. Therefore, linear
methods to analyze the loop dynamics are still wide used. An
Alexander PD is also sensitive to transition density because it
does not generate a phase error signal if there is no data
transition, and the net area is not zero under lock condition.
In [H.38] a two-mode bang-bang PD is proposed to improve
the performance against CID and transition density. It has a
pulse stretcher working as a CID detector. The PD works as a
binary PD when the input data CID is shorter than the
stretcher time constant (t), and as a ternary PD when the data
CID is longer than the stretcher time constant.
Recovered clock
PD
D
Data
LF
Clock
Error
VCO
D
Quadrature Correlation Phase Frequency
Detector
Data
Q
Q
Recovered data
Fig.8.8. A simple binary phase-detector and a timing chart
Alexander Phase-detector
Alexander PD [H.36] is a kind of tristate bang-bang PD. It is
not as sensitive to CID neither as a conventional binary PD,
since it outputs neither “up” nor “down” when there is no
transition, namely it has a third state, a “doing nothing” state.
In such a bang-bang PD the incoming data is oversampled and
transitions can be found at where there is polarity change.
Each transition is compared with an ideal transition to give
phase error [H.37]. This idea is shown in Fig.8.9.
A
0
0
0
0
1
1
1
1
B
0
0
1
1
1
1
0
0
C
0
1
1
0
0
1
1
0
UP
0
0
1
-0
0
-1
Down
0
1
0
-1
0
-0
Remark
No transition
Clock is early
Clock is late
Out of range
Clock is early
No transition
Out of range
Clock is late
A
B
C
Fig.8.9. PD outputs based on three successive sampling points
If the oversampling rate is 2, the oversampled bang-bang PD
is an Alexander PD. An implementation based on DFFs can be
easily derived from the truth table and it is shown in Fig.8.10.
Since the input is buffered, the output of an Alexander PD
changes only at clock edges.
Phase-detectors usually have very small frequency pull-in
range. They do not work properly when there is large
frequency difference between input data and the regenerated
clock. Therefore, frequency detectors (FD) or phase frequency
detectors (PFD) are sometimes needed. In conventional
frequency synthesizer applications, quadricorrelator frequency
FD and rotational FD are widely used [H.39], [H.40]. Special
challenge of FD for CDRs is that the input data are of a
random nature. In [H.29] a quadrature correlation phase
frequency detector was introduced. Fig.8.10 illustrates how it
works. Whenever a transition of Q1 occurs while Q2 is low, the
sign of the transition corresponds to the sign of the frequency
difference between the clock and the bit rate. Therefore, at a
rising edge of Ql the FD output is set to “down” and at a
falling edge to “up”, thus reducing the frequency difference.
At any transition of Q1, while Q2 is high, the “up” and “down”
can be reset. Obviously this can only extends frequency pullin range within a limited range. It is limited by two conditions.
Firstly, it cannot be larger than half the hold-in range due to
the 50% duty cycle of the frequency detector output. Secondly,
at least one transition of the NRZ signal should occur within
one quarter of the beat note period. The latter condition
ensures that phase positions of Q1 and Q2 can be analyzed
correctly by the FD.
Clock is faster
t
Data
Data
D
Q
C
D
Q A
UP
D
C2
Data
C1
C2
Q1
Decision
Clock
Phase error
Q1
Q2
PD Output
D
C1
Q
2
Up
Down
Data
C1
C2
Q1
Clock
D
Q
D
Q
B
DOWN
Q2
Clock is slower
21
Fig.8.10 A quadrature correlation PFD
If the shortest runlength and the longest runlength of the
input data are known, more powerful and harmonic-free
frequency detectors can be devised [9], [H.6]. Those
frequency detectors usually involve a counter to count the
number of clock cycles between two consecutive transitions.
Phase-detectors Working Under Reduced Rate
At very high-speeds, it may be difficult to design oscillators
that provide an adequate tuning range with reasonable jitter.
For this reason, CDR circuits may sense the input random data
at full rate but utilize a VCO running at a reduced rate. This
technique also relaxes the speed requirements for phasedetectors and, in some CDR configurations, the frequency
dividers [8]. This concept results in the design of half-rate PD
[H.41], quarter-rate PD [H.42], and 1/8 rate PD [H.43], [H.44].
The basic idea of a reduced-rate PD was described in [8] and
[H.41]. Fig.8.11 shows a linear quarter-rate PD and part of a
timing chart. The basic requirements are still the same as a
full-rate linear PD. Each data transition must produce an
“error” pulse whose width is equal to the phase difference. It
must also generate a reference clock with a fixed width. The
rising edge of “error” pulses is triggered by data transition and
the falling edge is triggered by clock transition. The pulse
width increases if the recovered clock transits behind the
center of input data and decreases otherwise. For “reference”
pulses, however, both rising and falling edges are triggered by
clock transition and thus the pulse width is always kept
constant. The speed requirements for some parts of this linear
PD are relaxed due to the reduced frequency of the multiphase clock. However, the speed requirements for the whole
PD are not reduced dramatically because of two reasons.
Firstly “error” pulses become very narrow when the phase
error is close to -. Secondly, at a node in the circuit, the 4
phase error signals will be combined and fed to a single loop
filter. The second challenge can be relaxed by using parallel
charge-pumps [H.45].
CLK0
Error 0
Latch
Data
CLK2'
Reference 0
CLK3''
Data
CLK1
Error 1
Latch
CLK3'
Reference 1
CLK0
Error 2
CLK1
CLK0''
CLK2
Latch
CLK0'
Reference 2
CLK2
CLK1''
CLK3
Error 3
CLK3
Latch
CLK1'
Reference 3
CLK2''
Fig.8.11 A block diagram of a quarter-rate PD
Reduced-rate PDs are not limited to linear PDs, a half-rate
bang-bang PD and a quarter-rate bang-bang PD are reported in
[H.46] and [H.47], respectively. In generally, the reduced-rate
PDs share the basic design concepts with their full-rate
counterparts. However, the higher the reduced-rate is, the
more complicated circuits are needed to deliver phase error
signals.
8.5.2
Analog CDR
CDRs may be implemented in different technologies such as
silicon CMOS, or SiGe, or GaAs, or HBT to meet different
requirements. Even in mainstream CMOS technologies,
numerous CDRs have been reported. Those CDRs can be
classified into a few catalogues according to different criteria.
The first criterion could be how to adjust the sampling phase.
According to this, CDRs can be classified into continuousphase CDRs and discrete-phase CDRs. The second criterion
might be through which structure the sampling phase is
obtained. According it, CDRs can be classified into PLLbased CDRs and DLL-based CDRs. Another criterion would
be how much digital circuits are used in a CDR. According to
this, CDRs can be classified as analog CDRs, semi-analog
CDRs, and digital CDRs. CDRs for high-speed SerDes
transceivers are predominantly mixed signal circuits. Pure
analog CDRs and pure digital CDRs are rare. Even in those
pure digital CDRs, analog properties are important and static
rail-to-rail CMOS logic circuits are seldom used. We roughly
divide CDR structures into analog ones and digital ones only
for the convenience of discussion.
Referenceless Analog CDR
A representative analog CDR structure is the referenceless
analog CDRs. Some emerging telecommunication applications
require CDRs to operate over a continuous rang of frequencies
[H.7]. In those CDRs, reference can be used [H.5]. However,
referenceless structure is more common, since the rate of the
incoming data is unknown prior to receiving [9], [H.6], [H.7].
Fig.8.12 shows a block diagram of such a CDR. In the CDR, a
wide tuning range VCO whose tuning range is at least 2/3 of
its center frequency is needed. It is very challenging for
conventional LC-based VCO to realize such a wide tuning
range. VCOs reported in [9], [H6] are ring oscillators. Ring
oscillators generally have inferior phase noise performance to
LC-based oscillators and tend to increase jitter generation of
CDRs. Therefore, LC-based VCO is still desirable. To extend
the tuning range, two LC-tank VCOs with switched capacitors
to select subbands are proposed in [H.7]. LC-based digitally
controlled oscillator (DCO) with wide tuning range was
proposed in [H.32], [H.48]. It may be a good candidate to
improve the performance against PVT variations. Special care
is needed to deal with frequency detection and locking to
harmonics because of the wide tuning range VCO. The
locking range of a conventional quadrature correlation phase
frequency detectors (QPFD) is only 20% for full-rate systems
[H.49] and 15% for half-rate systems [9]. For a QPFD, the
harmonic locking will occur when the frequency range of a
VCO exceeds twice the bit rate of data. Additional auxiliary
circuit is required to avoid harmonic locking. In [H.6] a
harmonic-free frequency tracing circuit (FTC) is proposed. A
weakness of the FTC is that it does not work if the maximum
runlength and the minimum runlength are unknown. In [H.7] a
coarse frequency detector (CFD) is proposed and prior
knowledge of the runlength is not needed. The CFD uses a
frequency sweeping technique. The VCO frequency is initially
set to the minimum. When presence of a single NRZ pulse
within a single clock period is detected, then the clock is slow,
and the VCO band is incremented. To eliminate sensitivity to
bimodal jitter, the implemented CFD detects the presence of
two consecutive pulses within two clock periods. CFD can
reduce the frequency error to typically less than 10%, which is
within the locking range of a conventional QPFD or a
rotational frequency detector (RFD) [H.40], [H.50], [H.51].
22
RFD or QPFD is able to acquire the remaining frequency error
to an accuracy of better than 250 ppm. The frequency error is
within the pull-in range of a conventional PLL or DLL. In
[H.7] a delay and phase locked loop (D/PLL) was proposed to
decouple the jitter transfer and jitter tolerance corner
frequencies, since in a conventional second order PLL, good
jitter transfer performance and good jitter tolerance
performance are not compatible.
Fig.8.13 A full-rate edge tracking CDR with a reference clock
The CDR employs a dual-loop architecture where the FLL
acts as an acquisition aid to the PLL and is disengaged during
normal operation when the CDR is locked to the serial input
data. Thus the structure realizes a large frequency acquisition
range, while maintaining the precise control of phase
alignment. This structure is suitable for single-rate CDRs and
multi-rate, if the divider is programmable.
Fig.8.12 A block diagram of an analog continuous-rate CDR
The continuous-rate CDRs are generally analog intensive
circuits requiring careful design of VCO and control loops. In
practical applications, additional control logic, such as lockdetectors and loss-of-lock-detectors are implemented to
prevent the loops from fighting each other.
Although referenceless analog CDRs are able to lock to data
streams of any bit rate within a frequency range, the
acquisition time is usually too long. In [H.7] the acquisition
time is 1 ms for a 2.5 Gb/s input, which is equal to 2.5 million
symbol periods. In practice, the rate is predefined. It is too
squandering to spend so long a time to find out the rate.
Therefore, CDRs with a reference are much more common
than referenceless CDRs.
Full-Rate Edge Tracking CDR With A Reference
Reference is useful for acquisition aid. It is especially useful
for single-rate CDRs; because VCO frequency can be locked
to the reference before data transmission actually starts.
Therefore, the main phase-lock-loop can achieve fast
acquisition once data transmission begins. Fig.8.13 shows a
typical structure of a referenced full-rate edge-tracking CDR.
Edge-tracking is a method that phase information is extracted
from data transition edges and data sampling is performed 0.5
unit interval (UI) away from clock edges.
Data Eye Tracking CDR
In the above mentioned CDRs, phase information is
extracted from data transition edges, and data is sampled at 0.5
UI away from the regenerated clock edges. However, in case
of severe and asymmetric jitter environment, it is difficult to
maintain the optimum BER performance. In [H.35] a data eye
tracking CDR is proposed to improve the performance. The
concept of data eye tracking can be understood through
Fig.8.14. If we assume the optimum sampling point is DCLK,
the transition probability and bit error rate at this point
theoretically is the minimum. Away from this point to either
side, the transition probability and bit error rate will increase.
For each DCLK, two sampling points LCLK and RLCK are
added to its left side and right side. Therefore, on the one hand,
under lock condition, the transition probability on both sides
should be equal. On the other hand, if the transition
probability on both sides is equal, DCLK should be very close
to the optimum sampling point, since the transition probability
is proportional to the jitter histogram area whose origin is the
optimum sampling point. When DCLK strays away from the
optimum position, the transition probability on one side
always goes larger and the other side goes smaller.
Fig.8.14 Phase detection concept of data eye tracking
Fig.8.15 shows a block diagram of a data eye-tracking CDR.
It is composed of three loops. The reference loop is for
23
frequency acquisition aid. The second loop is an edge tracking
loop. The third loop is an eye measuring loop.
Reference Loop
Data Loop
fUP[0:3]
Din
Phase
detector
CP1
VCO
CP2
VCDL
CLKref
Tracking
Loop
fDN[0:3]
DS[0:3]
dDN
dUP
Eye
measuring
Loop
LCLK[0:3]
DCLK[0:3]
RCLK[0:3]
It is straightforward to delay the oscillating of a VCO by 90
so that a quadrature clock is obtained. This can be achieved by
using a delay cell. A delay cell can be a piece of transmission
line. In the range of gigahertz the transmission line may be too
long to fit in an integrated circuit. Another issue is that a piece
of transmission line can only provide 90 phase delay at one
frequency. Capacitor-based delay cells are more advantageous
as the capacitor or the charging current can be tuned to deliver
variable delay. Therefore, 90 phase delay can be achieved for
a wide range of frequency. Ring oscillator composed of two or
more stages of differential inverters can realize a very wide
tuning range [H.60].
Fig.8.15 A block diagram of a data eye-tracking CDR
Whenever there is a transition between LCLK and DCLK,
the sampling position is thought to be too close to the left and
the clock is assumed to be too fast, and fDN is asserted high.
If there is a transition between DCLK and RCLK, the
sampling position is thought to be too close to the right and
the clock is assumed to be too slow, and fUP is asserted high.
In the case of a very small eye opening or LCLK and DCLK
are too far away, both fDN and fUP can be high. The tracking
loop uses fDN and fUP as phase error signals. Under lock
condition, fDN and fUP have the same expected value. When
LCLK and RCLK are too close, CDR jitter performance may
deteriorate, because data transitions happening before LCLK
and after RCLK do not contribute to detection of phase error.
Therefore, the eye measuring loop needs careful design to
control the interval between LCLK and RCLK. In [H.35],
when both fUP and fDN are low for n UIs, dUP is asserted
high; otherwise dDN is asserted high. The requirements are
mathematically expressed as:
E fUP  E fDN 
(8.3)
EdDN   1  E fUP  1  E fDN   0.5
n
n
(8.4)
If n is 4, E[fUP] is 0.083 and E[dDN] is 0.5. Since there are
three loops, attention must be paid to avoid instability. The
bandwidth of the tracking loop can be much larger than that of
the eye measuring loop. Therefore, the phase of DCLK is
adjusted by the tracking loop and will not be affected by the
operation of the eye measuring loop.
Delay Cell Based Analog CDR
VCO is a very important component for CDRs. In many
implementations, VCO is a LC-oscillator whose oscillating
frequency can be tuned by adjusting the voltage applied to the
varactor. In some applications, such as quardi-correlator PFD,
VCO is required to deliver quadrature clocks. Design options
to generate quadrature signals include:

Combination of VCO, polyphase-filter (or R-C C-R filter),
and output buffers (or limiters) as used in, e.g., [H.56],
[H.57].

VCO at double frequency followed by master–slave flipflops.

Two cross-coupled VCOs as proposed in [H.58], [H.59].
8.5.3
Digital CDR
Digital CDR has many advantages over analog CDR [H.19].
First, digital processing units are noise-immune. The only
remaining noise-sensitive components are the clock generator
and samplers, and clock-generating PLL are generally less
sensitive to self-induced noise (e.g., the phase noise of the
voltage-controlled oscillator) than clock recovery PLLs. This
is because the reference clock is likely to have less noise than
the random NRZ data stream; Second, digital circuits are easy
to port to another process. This is an important criterion, as the
SerDes is becoming a frequently demanded intellectual
property (IP) component in system-on-chip (SoC)
development. Third, instantaneous phase acquisition is
possible if we store the samples until the phase decision is
made. It is useful, for instance, in switch network applications
where the CDR must acquire lock within a few bytes of the
preamble period. Digital CDR can be classified as baud-rate
sampling CDR [H.20], [7] and oversampling CDR according
to the sampling rate [H.21]. It can also be classified as
asynchronous sampling CDR and synchronous sampling CDR
according to if the sampling clock is synchronized to the
incoming data.
Asynchronous (blind) oversampling CDR
Fig.8.4 shows the topology of a blind oversampling CDR
[H.19]. This type of CDR is mainly composed of three blocks:
a multi-phase clock generator, parallel samplers, and phase
decision logic. The sampling is inevitably asynchronous,
because the sampling phases are defined by reference clock.
The samples are stored in the sample storage. The input data
stream is binary after a limit amplifier or a slicer. The working
principle of blind oversampling is described as follows:
If there is polarity change in directly adjacent samples, there
is a transition between these two samples. If the system has a
symmetrical response, the central point between the two
adjacent transitions is the best sampling instant, and it is
regarded as the nominal sampling phase of the multi-phase
clock. Accordingly, the nominal tranasistion position is
between phase 0 and phase 1. Seen from Fig.8.1, the
probability density function has its maximum at (M+1)/2 away
from the nominal sampling phase in a M oversampling system.
A watching window of W bits is used to decide the proper
sampling phase. Let us assume the best sampling phase
changes to (M+3)/2 due to low frequency jitter, or frequency
offset or whatever, the most probable transistion position will
24
move to between phase 1 and phase 2. A simple majority
voting method can be applied. For example, it can be done in
this way: If we assume the most probable bit boundary is
between phase 0 and phase 1, we count the number of actual
transistions in phase 0 and phase 1 to be N1. If we assume the
most probable bit boundary is between phase 1 and phase 2,
we count the actual transisitions to be N2. We continue until
NM. We compare the numbers N1, N2, …, and NM, and we find
N2 is the maximum. Therefore, we update (M+3)/2 to be the
nominal sampling phase. It is straightforward that the length
of the watching window W and oversampling rate M
determine jitter tolerance of the system. The high frequency
jitter is averaged out and the low frequency jitter is tracked
due to updating of nominal sampling phase in every W bits. It
has been shown in [H.19] that WM should be very large in
order to achieve a reasonable jitter tolerance and low bit error
rate. M is usually selected to be 3, 5, or 7, while W is usually a
few thousands. Therefore, the sample storage is costy in chip
area and power consumption. This method is sometimes called
phase picking or phase selection because it selects the best
phase of a multi-phase clock to sample the incoming data.
Parallel samplers
Recovered
Data
Reference
Clock
Multiphase clock generator
Bit boundary detection
Fig.8.4. A block diagram of a blind oversampling CDR.
The above-mentioned method usually suggests a binary
input stream. The incoming data stream for an asynchronously
sampled CDR does not need to be binary. The sampling rate
needs not be over-sampled or an integer multiple of the baud
rate. It has been shown in [7] and [H.22] that asynchronous-tosynchronous conversion can be performed with the aid of an
interpolator [H.23] under the control of a numerically
controlled oscillator (NCO). A block of this type of
asynchronously sampled CDR is shown in Fig.8.5.
Loop
Filter
Recovered
Clock
NCO
Phase
detector
antialiasing
filter
ADC
Prefilter
Synchronous oversampling CDR
Synchronous oversampling CDR can detect the phase error
and adjust the sampling phase promptly according to the phase
error. In this way, it needs not big sample storage and
consumes less chip area and power than asynchronous ones.
Fig.8.6 shows the block diagram of a synchronously sampled
CDR [H.24].
Multiphase clock generator
Crystal Oscillator
Interpolator
Phasor
Phase Selection
Control
Serial data
stream
PD
Detector
Recovered
Data
Fig.8.5 A block diagram of digital CDR with a interpolator
This type of CDR involves a very high-speed ADC and a
digital signal processing block. It is very challenging in
implementing them in very high-speed SerDes transceiver.
The recovered clock of a digital CDR usually contains a lot
of spur due to the discrete phase. This is generally not a big
issue because once the data is correctly detected; another clock
can be used for jotter-free transmission [H.23]. For example,
an elastic buffer and a system clock can be used.
Digital
Filter
Data
sampler
Recovered
Data
Data selection
Sample storage
DSP
8.5.4
Phase detection logic
Serial data
stream
Reference
Clock
Asynchronously sampled digital CDR usually incurs
implementation which is costy in aera and power consumption.
Synchronously sampled digital CDR is advantageous in these
aspects.
Recovered
Clock
Fig.8.6. A block diagram of a synchronously sampled CDR
The phase error between the incoming data and the clock is
expressed in the phasor diagram. When the phasor is greater
than 2, it modulo 2 and the remainder will come back to 0
2. However, as a consequence, two adjacent nominal
sampling phases may be within one clock period or beyond
two periods if there is frequency offset or large low frequency
jitter. This phenomenon is a kind of cycle slip. This is
generally not a big issue as the low frequency jitter or small
frequency offset can only cause a very small phase shift
during one symbol period. When updating the sampling phase
to a new one, the old sampling phase can still be kept without
causing big problem if the number of sampling phases is
sufficient. Nevertheless, some logic circuits have to be
developed in order not to confuse the phase selection control.
In [H.25] a flywheel method is proposed to solve this problem.
The low frequency jitter tolerance and frequency offset
tolerance is decided by the number of available phases and the
CDR loop bandwidth. A treatise on tolerance of frequency
offset is [H.26]. In [H.27], a detailed implementation of this
type of digital CDR is discussed. The phase-detector is a bangbang PD and the digital loop filter is derived from an analog
loop filter by using a backwards difference substitution of s
with z.
e sT  z 1  s 
1  z 1
T
(8.3)
The output of the digital filter is fed to a digital to phase
converter (DPC) to select proper the phase for sampling. In
multi-gigahertz SerDes transceivers, it is very difficult to
make the digital filter work at baud rate. Therefore, the output
of the bang-bang PD is decimated before it enters the digital
loop filter. Nevertheless, since the implementation simply
replacing analog blocks with their digital counter parts, some
problems harassing analog CDR such as slow acquisition time
25
still exist. Furthermore, the power consumption is expected to
be high due to the large high resolution digital process.
An improved synchronous digital CDR is proposed in [H.13],
and its block diagram is shown in Fig.8.7. It replaces the high
resolution digital loop filter with a logic decision circuit,
which is of benefit to reduction of power and chip area and
fast acquisition. Furthermore, instead of using a large number
of sampled data to decide the proper sampling phase, it
decides the proper sampling phase by comparing the current
nominal phase with a stored phase pointer. The phase pointer
can be regarded as a historically decided phase based on a
group of past sampled data. Therefore, it does not require
large sample storage. In addition, the chance of cycle slip is
reduced much, since it oversamples the incoming data and
decides the correct sampling phase by using a watching
window. A similar concept is explored in [H.28].
Parallel samplers
Phase detection logic
Sample storage
Data
selection
Serial data
stream
Recovered
Data
L
Reference
Clock
Phase Pointer
Clock
Selection
Multiphase clock generator
Recovered
Clock
Fig. 8.7. A synchronous oversampling CDR
Theoretical high frequency jitter tolerance of the digital
CDR shown in Fig.8.7 can be derived as:
k
JTp p (UI )  1 
N
(8.4)
where N is the oversampling factor and k is the threshold
distance between a transition edge and its adjacent nominal
data center. And the low frequency jitter is
1
M  N    Fj
(8.5)
where M is the longest runlength of the input data stream or
continuous identical data (CID) and Fj is the jitter frequency
normalized to baud rate. The theoretical frequency offset
tolerance is
f
1

f0 M  N
(8.6)
When jitter tolerance is considered, the frequency tolerance
becomes much smaller than equation (8.6). Nevertheless in
most high-speed SerDes transceivers, the both the transmitter
reference frequency and receiver reference frequency are
generated by very stable oscillators with a stability of 100 ppm
or better [H.8]. Therefore digital CDR is very suitable for
high-speed single rate or multi-rate SerDes transceivers.
8.6
[1]. Adam, J.; Chi Shih Chang; Stankus, J.J.; Iyer, M.K.;
Chen, W.T., “Addressing packaging challenges,” IEEE
Circuits and Devices Magazine, vol. 18, Issue 4, July
2002, Pages:40 – 49.
R
Phase Rotating Control Logic
JTp p (UI ) 
Reference
[2]. Friedman, E.G., ed., Clock Distribution Networks in
VLSI Circuits and Systems, IEEE Press, 1995.
Transition Detection
Digital Threshold Detection
9
Interaction with equalizer
[3]. Maheshwari, N., and Sapatnekar, S.S., Timing Analysis
and Optimization of Sequential Circuits, Kluwer, 1999.
[4]. Lee, T.H.; Bulzacchelli, J.F.; “A 155-MHz clock
recovery delay- and phase-locked loop, ” IEEE Journal
of Solid-State Circuits, vol. 27, Issue 12, Dec. 1992,
Page(s):1736 - 1746
[5]. Chia, B.; Kollipara, R.; Oh, D.; Yuan, C.; Boluna, L.S.;
“Study of PCB trace crosstalk in backplane connector
pin field,” IEEE Conference on Electrical Performance
of Electronic Packaging, Oct. 2006, Page(s):281 – 284.
[6]. A. X. Widmer and P. A. Franaszek, "A DC-balanced,
partitioned-block, 8B/ul10B Transmission Code,", IBM J.
Res. Develop, vol. 27, No.5, september 1983, page(s)
440-451.
[7]. J. W. M. Bergmans, Digital Baseband Transmission and
Recording, Kluwer Academic Publishers, 1996, ISBN 07923-9775-4.
[8]. Razavi, B.; “Challenges in the design high-speed clock
and data recovery circuits,” IEEE Communications
Magazine, vol. 40, Issue 8, Aug. 2002, Page(s):94 – 101.
[9]. Rong-Jyi Yang; Kuan-Hua Chao; Sy-Chyuan Hwu;
Chuan-Kang Liang; Shen-Iuan Liu; “A 155.52 mbps3.125 gbps continuous-rate clock and data recovery
circuit,” IEEE Journal of Solid-State Circuits, vol. 41,
issue 6, June 2006 Page(s):1380 – 1390.
[10]. Horowitz M.; Chih-kong Ken Yang; S. Sidiropoulos,
“High-speed electrical signaling: Overview and
limitations,” IEEE Micro, vol. 18, issue 1, Jan.-Feb. 1998,
Page(s):12-24.
[11]. K. Muhammad and R. B. Staszewski, “Direct RF
sampling mixer with recursive filtering in charge
domain,” in Proceedings of the International Symposium
on Circuits and Systems (ISCAS ’04), vol. 1, pp. I-577–
26
I-580, Vancouver, BC, Canada, May 2004, sec. ASPL29.5.
on, Volume 1, 6-9 May 2001 Page(s):176 - 179 vol. 1,
Digital Object Identifier 10.1109/ISCAS.2001.921819
[12]. [G.1.] P. J. Pupalaikis, "Advanced tools for high-speed
serial data measurements: equalizer emulation and
virtual probing," DesignCon 2007.
[24]. [G.13.] Aliparast, P.; Khoei, A.; Hadidi, KH.; “A Novel
Fully-Differential
Gm-C
Filter
Structure
for
Communication Channel Equalizer,” 14th International
Conference on Mixed Design of Integrated Circuits and
Systems, 21-23 June 2007, Page(s):209 – 214.
[13]. [G.2.] Sinsky, J.H.; Duelk, M.; Adamiecki, A.; “Highspeed electrical backplane transmission using duobinary
signaling,” IEEE Transactions on Microwave Theory and
Techniques, Volume 53, Issue 1, Jan. 2005 Page(s):152 160 .
[14]. [G.3.] B. Casper, P. Pupalaikis, and J. Zerbe, “Serial
equalization,” DesignCon 2007 TechForum.
[15]. [G.4.] Zhilong Tang, "A 100 MHz Gm-C analog
equalizer for 100Base-TX application," Proceedings of
6th International Conference on Solid-State and
Integrated-Circuit Technology, 2001.vol. 1, 22-25 Oct.
2001 Page(s): 240-242.
[16]. [G.5.] T. M. Hollis, D. J. Comer, D. T. Comer, and B.
Young, “Self-calibrating continuous-time equalization
targeting Inter-symbol Interference,” IEEE North-East
Workshop on Circuits and Systems, June 2006,
Page(s):109 – 112.
[17]. [G.6.] Miao Li; Kwasniewski, T.; Shoujun Wang;
Yuming Tao; “A 10Gb/s transmitter with multi-tap FIR
pre-emphasis in 0.18 m CMOS technology,”
Proceedings of Asia and South Pacific Design
Automation Conference, vol. 2, 18-21 Jan. 2005,
Page(s):679 – 682.
[18]. [G.7.] A. Fiedler et al., “A 1.0625 Gb/s transceiver with
2X-oversampling and transmit signal pre-emphasis,” in
IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers,
Feb. 1997, pp. 238–239.
[19]. [G.8.] W. J. Dally and J. Poulton, “Transmitter
equalization for 4-Gb/s signaling,” IEEE Micro, vol. 17,
pp. 48–56, Jan./Feb. 1997.
[20]. [G.9.] R. Farjad-Rad, C.-K. Yang, M. Horowitz, and T.
H. Lee, “A 0.4m CMOS 10-Gb/s 4-PAM pre-emphasis
serial link transmitter,” IEEE J. Solid-State Circuits, vol.
34, pp. 580–585, May 1999.
[21]. [G.10] J. H. R. Schrader, E. A. M. Klumperink, J. L.
Visschers, and B. Nauta, “Pulse-Width Modulation PreEmphasis Applied in a Wireline Transmitter, Achieving
33 dB Loss Compensation at 5-Gb/s in 0.13-m CMOS,”
IEEE JOURNAL OF SOLID-STATE CIRCUITS, vol.
41, no. 4, APRIL 2006, pages: 990-999.
[22]. [G.11.] Ruifeng Sun; Jaejin Park; O'Mahony, F.; Yue,
C.P.; “A low-power, 20-Gb/s continuous-time adaptive
passive equalizer,” IEEE International Symposium on
Circuits and Systems, Vol.2, 23-26 May 2005,
Page(s):920 – 923.
[23]. [G.12.] P. Amini and O. Shoaer, "A low-power gigabit
ethernet analog equalizer," Circuits and Systems, 2001.
ISCAS 2001. The 2001 IEEE International Symposium
[25]. [G.14.] J. S. Choi, M. S. Hwang, and D. K. Jeong, "A
0.18-m CMOS 3.5-Gb/s Continuous-Time Adaptive
Cable Equalizer Using Enhanced Low-Frequency Gain
Control Method," IEEE JOURNAL OF SOLID-STATE
CIRCUITS, VOL. 39, NO. 3, MARCH 2004, page(s):
419-425.
[26]. [G.15.] Mohan, S.S.; Hershenson, M.D.M.; Boyd, S.P.;
Lee, T.H.; “Bandwidth extension in CMOS with
optimized on-chip inductors, ” IEEE Journal of SolidState Circuits, Volume 35, Issue 3, March 2000,
Page(s):346 - 355
[27]. [G.16.] Groenewold, G.; "Low-power MOSFET-C 120
MHz Bessel allpass filter with extended tuning range,"
IEE Proceedings on Circuits, Devices and Systems,
Volume 147, Issue 1, Feb. 2000, Page(s):28 – 34.
[28]. [G.17.] Johnson, J.; Johnson, D.; Boudra, P.; Stokes, V.;
“Filters using Bessel-type polynomials,” IEEE
Transactions on Circuits and Systems, Volume 23, Issue
2, Feb 1976, Page(s):96 – 99.
[29]. [G.18.] Sands, N.P., Hauser, M.W., Liang, G.,
Groenewold, G., Lam, S., Lin, C.H., Kuklewicz, J., Lang,
L., and Dakshinamurthy, R., “A 200Mb/s analog DFE
read channel,” Proceedings of the ISSCC, 1996.
[30]. [G.19] Burlingame, E.; Spencer, R.; "An analog CMOS
high-speed continuous-time FIR filter," Proceedings of
the 26th European Solid-State Circuits Conference, 1921 Sept. 2000 Page(s):288 – 291.
[31]. [G.20] Xiaofeng Lin; Hoi Lee; Jin Liu; "A continuoustime adaptive FIR equalizer with LNV-AIL delay line
for 2.5Gb/s data communication Proceedings of the
IEEE Custom Integrated Circuits Conference, 18-21 Sept.
2005 Page(s):413 – 416.
[32]. [G.21] Sewter, J.; Carusone, A.C.; “A CMOS finite
impulse response filter with a crossover traveling wave
topology for equalization up to 30 Gb/s,” IEE Journal of
Solid-State Circuits, Volume 41, Issue 4, April 2006
Page(s):909 – 917.
[33]. [G.22] J. E. Jauss, S. R. Mooney, “Discrete-time analog
filter,” US patent 6,791,399, Sep. 14, 2004.
[34]. [G.23] Balan, V.; Caroselli, J.; Chern, J.-G.; Chow, C.;
Dadi, R.; Desai, C.; Fang, L.; Hsu, D.; Joshi, P.; Kimura,
H.; Liu, C.Y.; Tzu-Wang Pan; Park, R.; You, C.; Yi
Zeng; Zhang, E.; Zhong, F.; “A 4.8-6.4-Gb/s serial link
for backplane applications using decision feedback
equalization,” IEEE Journal of Solid-State Circuits,
Volume 40, Issue 9, Sept. 2005 Page(s):1957 – 1967.
27
[35]. [G.24] Krishna, K.; Yokoyama-Martin, D.A.; Caffee, A.;
Jones, C.; Loikkanen, M.; Parker, J.; Segelken, R.;
Sonntag, J.L.; Stonick, J.; Titus, S.; Weinlader, D.;
Wolfer, S.; “A multigigabit backplane transceiver core in
0.13-/spl m CMOS with a power-efficient equalization
architecture,” IEEE Journal of Solid-State Circuits,
Volume 40, Issue 12, Dec. 2005 Page(s):2658 – 2666.
[36]. [G.25] Payne, R.; Landman, P.; Bhakta, B.; Ramaswamy,
S.; Song Wu; Powers, J.D.; Erdogan, M.U.; Yee, A.-L.;
Gu, R.; Lin Wu; Yiqun Xie; Parthasarathy, B.; Brouse,
K.; Mohammed, W.; Heragu, K.; Gupta, V.; Dyson, L.;
Wai Lee; “A 6.25-Gb/s binary transceiver in 0.13m
CMOS for serial data transmission across high loss
legacy backplane channels,” IEEE Journal of Solid-State
Circuits, Volume 40, Issue 12, Dec. 2005 Page(s):2646
– 2657.
[37]. [G.26] Zerbe, J.L.; Werner, C.W.; Stojanovic, V.; Chen,
F.; Wei, J.; Tsang, G.; Kim, D.; Stonecypher, W.F.; Ho,
A.; Thrush, T.P.; Kollipara, R.T.; Horowitz, M.A.;
Donnelly, K.S.; "Equalization and clock recovery for a
2.5-10Gb/s 2-PAM/4-PAM backplane transceiver cell,"
IEEE Journal of Solid-State Circuits, Volume 38, Issue
12, Dec 2003 Page(s):2121 – 2130.
[38]. [G.27] Momtaz, A.; Chung, D.; Kocaman, N.; Jun Cao;
Caresosa, M.; Bo Zhang; Fujimori, I.; “A Fully
Integrated 10-Gb/s Receiver With Adaptive Optical
Dispersion Equalizer in 0.13-m CMOS,” IEEE Journal
of Solid-State Circuits, Volume 42, Issue 4, April 2007
Page(s):872 – 880.
[39]. [G.28] Mark J. Marlett, and Mark D. Rutherford,
“Continuous-time decision feedback equalizer,” US
patent 2006/0239341, Oct. 26, 2006.
[40]. [G.29] M. Harwood, N. Warke, R. Simpson, T. Leslie, A.
Amerasekera, S. Batty, D. Colman, E. Carr, V.
Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal,
B. Killips, T. Krause, S. Lytollis, A. Pickering, M.
Saxton, D. Sebastio, G. Swanson, A. Szczepanek, T.
Ward, J. Williams, R.d Williams, T. Willwerth, “A
12.5Gb/s SerDes in 65nm CMOS Using a Baud-Rate
ADC with Digital Receiver Equalization and Clock
Recovery,” IEEE International Solid-State Circuits
Conference, 2007, page(s): 436-613.
[41]. [G.30] E.F. Stikvoort and J.A.C. v. Rens, “An all-digital
bit detector for compact disc players,” IEEE J. Selected
Areas Commun., vol. SAC-10, N0.1, Jan. 1992, page(s)
191-200.
[42]. [G.31] P. Kabal and S. Pasupathy, “Patrial-response
signaling,” IEEE Trans. Commun., vol. COM-23, No. 9,
Sept. 1975, pages:921-934.
[43]. [G.32] A. Lender, “The duobinary technique for highspeed data transmission,” IEEE Trans. Commun.
Electron., vol.82, May 1963, pages: 214-218.
[44]. [G.33] Yamaguchi, K.; Sunaga, K.; Kaeriyama, S.;
Nedachi, T.; Takamiya, M.; Nose, K.; Nakagawa, Y.;
Sugawara, M.; Fukaishi, M.; “12Gb/s duobinary
signaling with /spl times/2 oversampled edge
equalization,” Digest of Technical Papers, IEEE
International Solid-State Circuits Conference, 6-10 Feb.
2005 Page(s):70 –71.
[45]. [G.34] M. Tomlison, “New automatic equalizer
employing modulo arithmetic,” Electron. Lett, vol.7, Nos
5/6, March 1971, pp.138-139.
[46]. [G.35] H. Harashima and H. Miyakawa, “Matchedtransmission technique for channels with intersymbol
inference,” IEEE Trans. Commun. Technol., vol. COM20, Aug. 1972, pp. 774-780.
[47]. [H.1] Farjad-Rad, R.; Dally, W.; Hiok-Tiaq Ng;
Senthinathan, R.; Lee, M.-J.E.; Rathi, R.; Poulton, J; “A
low-power multiplying DLL for low-jitter multigigahertz
clock generation in highly integrated digital chips,”
IEEE Journal of Solid-State Circuits, Volume 37, Issue
12, Dec. 2002 Page(s):1804 – 1812.
[48]. [H.2] W. R. Bennett, “Statistics of regenerative data
transmission,” Bell Syst. Tech. J., Vol.37, pp.1501-1542,
Nov. 1958.
[49]. [H.3] F. M. Gardner, Phase Lock Techniques, New
York: Wiley, 2nd edition, 1979.
[50]. [H.4] F. M. Gardner, “Properties of frequency difference
detectors,” IEEE Trans. Commun, vol. COM-33, No. 3,
Feb. 1985, pages: 131-138.
[51]. [H.5] Frambach, J.-P.; Heijna, R.; Krosschell, R.; “Single
reference continuous rate clock and data recovery from
30 Mbit/s to 3.2 Gbit/s,” Proceedings of the IEEE
Custom Integrated Circuits Conference, 12-15 May 2002,
Page(s):375 – 378.
[52]. [H.6] Rong-Jyi Yang; Kuan-Hua Chao; Shen-Iuan Liu;
“A 200-Mbps/s2-Gbps continuous-rate clock-and-datarecovery circuit,” IEEE Transactions on Circuits and
Systems I: Regular Papers, Volume 53, Issue 4, April
2006 Page(s): 842 – 847.
[53]. [H.7] Dalton, D.; Kwet Chai; Evans, E.; Ferriss, M.;
Hitchcox, D.; Murray, P.; Selvanayagam, S.; Shepherd,
P.; DeVito, L.; “A 12.5-Mb/s to 2.7-Gb/s continuous-rate
CDR with automatic frequency acquisition and data-rate
readback,” IEEE Journal of Solid-State Circuits, Volume
40, Issue 12, Dec. 2005 Page(s): 2713 – 2725.
[54]. [H.8] C. Kromer, G. Sialm, C. Menolfi, M. Schmatz, F.
Ellinger, and H. Jackel, "A 25-Gb/s CDR in 90-nm
CMOS for high-density interconnects," IEEE Journal of
Solid-State Circuits, vol.41, No.12, Dec. 2006, pages:
2921-2929.
[55]. [H.9] E. Panayirci, “Jitter analysis of a phase-locked
digital timing recovery system,” IEE Proc.-I, vol.139,
No.3, June 1992, pp.267-275.
[56]. [H.10] B. R. Saltzberg, “Timing recovery for digital
synchronous data transmission,” Bell syst. Tech. J.,
vol.46, pp.593-622, March 1967.
28
[57]. [H.11] Jri Lee; Kundert, K.S.; Razavi, B.; "Modeling of
jitter in bang-bang clock and data recovery circuits,"
Proceedings of the IEEE Custom Integrated Circuits
Conference, 21-24 Sept. 2003 Page(s):711 – 714.
[58]. [H.12] Buckwalter, J.F.; Hajimiri, A.; "Analysis and
equalization of data-dependent jitter," IEEE Journal of
Solid-State Circuits, Volume 41, Issue 3, March 2006
Page(s):607 – 620.
[59]. [H.13] Qingjing Du, “A Low-Power, High-JitterTolerance, All-Digital CDR and a Programmable, AntiHarmonic DLL Clock Multiplier with a Period Error
Compensation Loop,” Ph.D. dissertation of Carleton
University, 2007.
[60]. [H.14] Thomas H. Lee, The design of CMOS RadioFrequency Integrated Circuits, Cambridge University
Press, 1998.
[61]. [H.15]
[62]. [H.16] Ou, N.; Farahmand, T.; Kuo, A.; Tabatabaei, S.;
Ivanov, A.; “Jitter models for the design and test of
Gbps-speed serial interconnects,” IEEE Design & Test of
Computers, Volume 21, Issue 4, July-Aug. 2004
Page(s):302 – 313.
[63]. [H.17] Jitter Fundamentals: Jitter Tolerance Testing with
Agilent 81250 ParBERT, available at
http://cp.literature.agilent.com/litweb/pdf/5989-0223EN.pdf
[64]. [H.18] Chin, J.; Cantoni, A.; “Phase jitter ≡ timing
jitter?” IEEE Communications Letters, Volume 2, Issue
2, Feb. 1998 Page(s):54 – 56.
[65]. [H.19] Jaeha Kim; Deog-Kyoon Jeong; “Multi-gigabitrate clock and data recovery based on blind
oversampling,” IEEE Communications Magazine,
Volume 41, Issue 12, Dec. 2003 Page(s):68 – 74.
[66]. [H.20] Harwood, M.; Warke, N.; Simpson, R.; Leslie, T.;
Amerasekera, A.; Batty, S.; Colman, D.; Carr, E.;
Gopinathan, V.; Hubbins, S.; Hunt, P.; Joy, A.;
Khandelwal, P.; Killips, B.; Krause, T.; Lytollis, S.;
Pickering, A.; Saxton, M.; Sebastio, D.; Swanson, G.;
Szczepanek, A.; Ward, T.; Williams, J.; Williams, R.;
Willwerth, T.; “A 12.5Gb/s SerDes in 65nm CMOS
Using a Baud-Rate ADC with Digital Receiver
Equalization and Clock Recovery,” IEEE International
Solid-State Circuits Conference, Digest of Technical
Papers. 11-15 Feb. 2007 Page(s):436 – 591.
[67]. [H.21] Hyung-Rok Lee; Moon-Sang Hwang; Bong-Joon
Lee; Young-Deok Kim; Dohwan Oh; Jaeha Kim; SangHyun Lee; Deog-Kyoon Jeong; Kim, W.; “A 1.2-V-only
900-mW 10 gb ethernet transceiver and XAUI interface
with robust VCO tuning technique,” IEEE Journal of
Solid-State Circuits, Volume 40, Issue 11, Nov. 2005
Page(s):2148 – 2158.
[68]. [H.22] G. Ascheid, M. Oerder, J. Stahl, and H. Meyr,
“An all digital receiver architecture for bandwidth
effective transmission at high data rates,” IEEE Trans.
Commun., vol. COM-37, No.8, Aug. 1989, pages: 804813.
[69]. [H.23] F. M. Gardner, “Interpolation in digital modems –
part I: fudamentals,” IEEE Trans. Commun., vol. COM41, No.3, March 1993, pp.501-507.
[70]. [H.24] Demir, A.; Feldmann, P.; “Stochastic modeling
and performance evaluation for digital clock and data
recovery circuits,” Proceedings of Design, Automation
and Test in Europe Conference and Exhibition, 27-30
March 2000, Page(s):340 – 344.
[71]. [H.25] W. Rhee, S. V. Rylov, and D. Friedman,
“Semidigital delay-locked loop using an analog-based
finite state machine,” United States Patent 6927611, Aug.
9, 2005.
[72]. [H.26] A. L. Coban, M. H. Koroglu, and K. A. Ahmed,
“A 2.5–3.125-Gb/s Quad Transceiver With SecondOrder Analog DLL-Based CDRs,” IEEE JOURNAL OF
SOLID-STATE CIRCUITS, VOL. 40, NO. 9,
SEPTEMBER 2005, Pages: 1940-1947.
[73]. [H.27] Sonntag, J.L.; Stonick, J.; “A Digital Clock and
Data Recovery Architecture for Multi-Gigabit/s Binary
Links, ” IEEE Journal of Solid-State Circuits, Volume 41,
Issue 8, Aug. 2006 Page(s):1867 – 1875.
[74]. [H.28] Y. Miki, T. Saito, H. Yamashita, F. Yuki, T. Baba,
A. Koyama, M. Sonehara, “A 50-mW/ch 2.5-Gb/s/ch
Data Recovery Circuit for the SFI-5 Interface With
Digital Eye-Tracking”, IEEE Journal of Solid State
Circuits, Vol. 39, No.4 , April 2004.
[75]. [H.29] Pottbacker, A.; Langmann, U.; Schreiber, H.-U.;
“A Si bipolar phase and frequency detector IC for clock
extraction up to 8 Gb/s,” IEEE Journal of Solid-State
Circuits, Volume 27, Issue 12, Dec. 1992 Page(s):1747
– 1751.
[76]. [H.30] Kreienkamp, R.; Langmann, U.; Zimmermann,
C.; Aoyama, T.; Siedhoff, H.; “A 10-gb/s CMOS clock
and data recovery circuit with an analog phase
interpolator,”
IEEE
Journal
of
Solid-State
Circuits,Volume 40, Issue 3, March 2005 Page(s):736 –
743.
[77]. [H.30] Oh, Do-Hwan; Kim, Deok-Soo; Kim, Suhwan;
Jeong, Deog-Kyoon; Kim, Wonchan; “A 2.8 GB/s AllDigital CDR with a 10b Monotonic DCO,” Digest of
Technical Papers of IEEE International Solid-State
Circuits Conference, 11-15 Feb. 2007 Page(s):222 – 598.
[78]. [H.31] Olsson, T.; Nilsson, P.; “A digitally controlled
PLL for SoC applications,” IEEE Journal of Solid-State
Circuits, Volume 39, Issue 5, May 2004 Page(s):751 –
760.
[79]. [H.32] Staszewski, R.B.; Chih-Ming Hung; Barton, N.;
Meng-Chang Lee; Leipold, D.; “A digitally controlled
oscillator in a 90 nm digital CMOS process for mobile
phones,” IEEE Journal of Solid-State Circuits, Volume
40, Issue 11, Nov. 2005 Page(s):2203 – 2211.
29
[80]. [H.33] C. Hogge, “A Self-Correcting Clock Recovery
Circuit,” IEEE Journal of Lightwave Technology, vol.
LT-3, pp.1312-1314, December 1985.
[81]. [H.34] Xinyu Chen; Green, M.M.; “A CMOS 10 Gb/s
clock and data recovery circuit with a novel adjustable
Kpd phase-detector,” Proceedings of the 2004
International Symposium on Circuits and Systems,
Volume 4, 23-26 May 2004 Page(s): IV - 301-304.
[82]. [H.35] Rennie, David; Sachdev, Manoj; “Comparative
Robustness of CML Phase-detectors for Clock and Data
Recovery Circuits,” 8th International Symposium on
Quality Electronic Design, 26-28 March 2007
Page(s):305 – 310.
[83]. [H.36] J. D. H. Alexander, “Clock recovery from random
binary signals,” Electron. Lett., vol. 11, pp. 541–542, Oct.
1975.
[84]. [H.37] Youngdon Choi, Deog-Kyoon Jeong, and
Wonchan Kim,"Jitter Transfer Analysis of Tracked
Oversampling Techniques for Multigigabit Clock and
Data Recovery," IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS—II: ANALOG AND
DIGITAL SIGNAL PROCESSING, VOL. 50, No. 11,
NOVEMBER 2003, pages: 1573-1580.
[85]. [H.38] Nosaka, H.; Ishii, K.; Enoki, T.; Shibata, T.; “A
10-Gb/s data-pattern independent clock and data
recovery circuit with a two-mode phase comparator,”
IEEE Journal of Solid-State Circuits, Volume 38, Issue
2, Feb. 2003 Page(s):192 – 197.
[86]. [H.39] Soyuer, M.; Meyer, R.G.; “Frequency limitations
of a conventional phase-frequency detector,” IEEE
Journal of Solid-State Circuits, Volume 25, Issue 4,
Aug. 1990 Page(s):1019 – 1022.
[87]. [H.40] Messerschmitt, D.; “Frequency Detectors for PLL
Acquisition in Timing and Carrier Recovery,” IEEE
Transactions on Communications, Volume 27, Issue 9,
Sep 1979 Page(s):1288 – 1295.
[88]. [H.41] Savoj, J.; Razavi, B.; “A 10-Gb/s CMOS clock
and data recovery circuit with a half-rate linear phasedetector,” IEEE Journal of Solid-State Circuits, Volume
36, Issue 5, May 2001 Page(s):761 – 768.
[89]. [H.42] Byun, S.; Lee, J.C.; Shim, J.H.; Kim, K.; Yu, H.K.; “A 10-Gb/s CMOS CDR and DEMUX IC With a
Quarter-Rate Linear Phase-detector,” IEEE Journal of
Solid-State Circuits, Volume 41, Issue 11, Nov. 2006
Page(s):2566 – 2576.
[90]. [H.43] T. Toifl,C. Menolfi, P. Buchmann, C. Hagleitner,
M. Kossel, T. Morf, J. Weiss, M. Schmatz, "A 72mW
0.03mm2 Inductorless 40Gb/s CDR in 65nm SOI CMOS
[Abstract] [PDF]Titles (Sorted Alphabetically): A 72mW
0.03mm2 Inductorless 40Gb/s CDR in 65nm SOI
CMOS," ISSCC 2007, session 12, pages: 226-227, 598.
[91]. [H.44] Seong-Jun Song; Sung Min Park; Hoi-Jun Yoo;
“A 4-Gb/s CMOS clock and data recovery circuit using
1/8-rate clock technique, ” IEEE Journal of Solid-State
Circuits, Volume 38, Issue 7, July 2003 Page(s):1213 –
1219.
[92]. [H.45] Ohtomo, Y.; Nishimura, K.; Nogawa, M.; "A
12.5-Gb/s Parallel Phase Detection Clock and Data
Recovery Circuit in 0.13-m CMOS,” IEEE Journal of
Solid-State Circuits, Volume 41, Issue 9, Sept. 2006
Page(s):2052 – 2057.
[93]. [H.46] Mehrdad Ramezani, C. Andre, and T. Salama,
"Analysis of a Half-Rate Bang–Bang Phase-LockedLoop," IEEE TRANSACTIONS ON CIRCUITS AND
SYSTEMS—II: ANALOG AND DIGITAL SIGNAL
PROCESSING, VOL. 49, NO. 7, JULY 2002, pages:
505-509.
[94]. [H.47] Jri Lee, and Behzad Razavi, "A 40-Gb/s Clock
and Data Recovery Circuit in 0.18-m CMOS
Technology," IEEE JOURNAL OF SOLID-STATE
CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003,
Pages: 2181-2190.
[95]. [H.48] Staszewski, R.B.; Leipold, D.; Muhammad, K.;
Balsara, P.T.; “Digitally controlled oscillator (DCO)based architecture for RF frequency synthesis in a deepsubmicrometer CMOS Process,” IEEE Transactions on
Circuits and Systems II: Analog and Digital Signal
Processing, Volume 50,
Issue 11,
Nov. 2003
Page(s):815 – 828.
[96]. [H.49] B. Stilling, “Bit rate and protocol independent
clock-and-data-recovery,” Electron. Lett., vol. 36, pp.
824–825, Apr. 2000.
[97]. [H.50] L. DeVito, “A versatile clock recovery
architecture and monolithic implementation,” in
Monolithic Phase-Locked Loops and Clock Recovery
Circuits, B. Razavi, Ed. Pistacaway, NJ: IEEE Press,
1996, pp.405–420.
[98]. [H.51] L. DeVito et al., “A 52MHz and 155 MHz clock
recovery PLL,” in IEEE Int. Solid-State Circuits Conf.
(ISSCC) Dig. Tech. Papers, San Francisco, CA, Feb.
1991, pp. 142–143.
[99]. [H.52] J. Cao, M. Green, A. Momtaz, K. Vakilian, D.
Chung, K.-C. Jen, et al, “OC-192 Transmitter and
Receirver in Standard 0.18-um CMOS”, IEEE Journal of
Solid State Circuits, Vol. 37, No. 12, December 2002, pp.
1964 - 1973.
[100]. [H.53] S.-H. Lee, M.-S. Hwang, Y. Chor, S. Kim, Y.
Moon, B.-J. Lee, D.-K. Jeong, W. Kim, Y.-J. Park and G.
Ahn, “A 5-Gb/s 0.25-um CMOS Jitter-Tolerant
Variable-Interval Oversampling Clock/Data Recovery
Circuit”, IEEE Journal of Solid State Circuits, Vol.37,
No.12, December 2002.
[101]. [H.54] International Engineering Consortium,
“SONET
Tutorial”,
available
at
http://www.iec.org/online/tutorials/sonet/index.html
[102]. [H.55] J. Lee, K. S. Kundert, and B. Razavi,
“Analysis and modeling of bang-bang clock and data
30
recovery circuits,” IEEE Journal of Solid-State Circuits,
Vol.39, No.9, Sept. 2004, pp.1571-1580.
[103]. [H.56] J. Craninckx and M. S. J. Steyaert, “A fully
integrated CMOS DCS-1800 frequency synthesizer,”
IEEE J. Solid-State Circuits, vol. 33, pp. 2054–2065,
Dec. 1998.
[104]. [H.57] M. Borremans and M. Steyaert, “A CMOS 2V quadrature direct up-converter chip for DCS-1800
integration,” in Proc. 26th Eur. Solid-State Circuits Conf.
(ESSCIRC), Stockholm, Sweden, Sept. 2000, pp. 88–91.
[105]. [H.58] M. Rofougaran, A. Rofougaran, J. Rael, and A.
A. Abidi, “A 900-MHz CMOS LC-oscillator with
quadrature outputs,” in Proc. IEEE Int. Solid-State
Circuits Conf., New York, NY, 1996, p. 392.
[106]. [H.59] Tiebout, M.; “Low-power low-phase-noise
differentially tuned quadrature VCO design in standard
CMOS,” Solid-State Circuits, IEEE Journal of Volume
36, Issue 7, July 2001 Page(s):1018 – 1024.
[107]. [H.60] Grozing, M.; Phillip, B.; Berroth, M.; “CMOS
ring oscillator with quadrature outputs and 100 MHz to
3.5 GHz tuning range,” Proceedings of the 29th
European Solid-State Circuits Conference, 16-18 Sept.
2003 Page(s):679 – 682
[108]. [F.1] Gardner, F.; “Charge-Pump Phase-Lock
Loops,” IEEE Transactions on Communications, Volume
28, Issue 11, Nov 1980 Page(s):1849 – 1858
[109]. [F.2] D. B. Leeson, “A simple model of feedback
oscillator noise spectrum,” Proc. IEEE, vol. 54, Feb.
1996, pp.329-330.
[110]. [F.3] J. Rogers, C. Plett, and F. Dai, Integrated circuit
design for high-speed frequency-synthesis, Artech House,
2006.
[111]. [F.4] J. Bulzacchelli, “A delay-locked loop for clock
recovery and data synchronization,” Master’s thesis,
Massachusetts Institute of Technology, 1990.
[112]. [F.5] Staszewski, R.B.; Leipold, D.; Muhammad, K.;
Balsara, P.T.; “Digitally controlled oscillator (DCO)based architecture for RF frequency synthesis in a deepsubmicrometer CMOS Process,” IEEE Transactions on
Circuits and Systems II, Volume 50, Issue 11, Nov.
2003 Page(s):815 – 828
[113]. [F.6]
Waheed,
K.;
Staszewski,
R.B.;
“Characterization
of
deep-submicron
varactor
mismatches in a digitally controlled oscillator,”
Proceedings of the IEEE Custom Integrated Circuits
Conference, 18-21 Sept. 2005 Page(s):605 – 608.
[114]. [F.7] Staszewski, R.B.; Balsara, P.T.; “Phase-domain
all-digital phase-locked loop,” IEEE Transactions on [see
also Circuits and Systems II, Volume 52, Issue 3,
March 2005 Page(s):159 – 163
[115]. [F.8] C. Chung and C. Lee, “An all-digital phaselocked loop for high-speed clock generation,” IEEE J.
Solid-State Circuits, vol. 38, pp. 347–351, Feb. 2003.
[116]. [F.9] T. Hsu, C. Wang, and C. Lee, “Design and
analysis of a portable high speed clock generator,” IEEE
Trans. Circuits Systems II: Analog and Digital Signal
Processing, vol. 48, pp. 367–375, Apr. 2001.
[117]. [F.10] Johnson, M.G.; Hudson, E.L.; “A variable
delay line PLL for CPU-coprocessor synchronization,”
IEEE Journal of Solid-State Circuits, Volume 23, Issue
5, Oct. 1988 Page(s):1218 – 1223.
[118]. [F.11] Chen C.-C. ; Chang J.-Y. ; Liu S.-I.; “ DLLBased Variable-Phase Clock Buffer,” EEE Transactions
on Circuits and Systems II, accepted for future
publication, Volume PP, Issue 99, 2007 Page(s):1 – 1
[119]. [F.12] Sidiropoulos, S.; Horowitz, M.A.; “A
semidigital dual delay-locked loop,” IEEE Journal of
Solid-State Circuits, Volume 32, Issue 11, Nov. 1997
Page(s):1683 – 1692.
[120]. [F.13] Lee, T.H.; Donnelly, K.S.; Ho, J.T.C.; Zerbe,
J.; Johnson, M.G.; Ishikawa, T.; “A 2.5 V CMOS delaylocked loop for 18 Mbit, 500 megabyte/s DRAM,” IEEE
Journal of Solid-State Circuits, Volume 29, Issue 12,
Dec. 1994 Page(s):1491 – 1496
[121]. [F.14] G. Chien and P. R. Gray, “A 900-MHz Local
Oscillator Using a DLL-Based Frequency Multiplier
Technique for PCS Applications”, IEEE Journal of Solid
State Circuits, Vol. 35, No. 12, December 2000.
[122]. [F.15] D. J. Foley and M. P. Flynn, “CMOS DLLBased Synthesizer and Temperature-Compensated
Tunable State Circuits, Vol. 36, No. 3, March 2001.
[123]. [F.16] C.-C. Wang, H.-C. She and R. Hu, “A 1.2
GHz Programmable DLL-Based Frequency Multiplier
for Wireless Application”, Transaction on VLSI, 2002.
[124]. [F.17] J.-H. Kim, Y.-H. Kwak, S.-R. Yoon, M.-Y.
Kim, S.-W. Kim, C. Kim, “A CMOS DLL-Based
120MHz to 1.8GHz Clock Generator for Dynamic
Frequency Scaling”, IEEE ISSCC, 2005, pages: 516-517.
[125]. [F.18] G.-Y. Wei, J. T. Stonick, D. Weinlader, J.
Sonntag, S. Searles, “A 500MHz MP/DLL Clock
Generator for a 5Gb/s Backplane Transceiver in 0.25 um
CMOS”, IEEE ISSCC, February 2003.
[126]. [F.19] Qingjin Du; Jingcheng Zhuang; Kwasniewski,
T.; “A Low-Phase Noise, Anti-Harmonic Programmable
DLL Frequency Multiplier With Period Error
Compensation for Spur Reduction,” IEEE Transactions
on Circuits and Systems II, Volume 53, Issue 11, Nov.
2006 Page(s):1205 – 1209.
 [F.20] Maulik, P. C.; Mercer, D. A.; “ A DLL-Based
Programmable
Clock
Multiplier
in
0.18 MS t    S  
a  SStat t    
  a  
  M  t  t  Sta
 t
31
   M a    
a  aaa  t  Stat
a  tt ta aatt  
a   t a  t
  a  SStat t 
      a  
   at    Kata a 
a tt a tat  a 
 S tt    a
at  a  t a St 
  t  a 
[131].     a a   S
 ta   at  
m CMOS,” IEEE Journal of Solid-State Circuits,
Vol.36, No.4, April 2001, pages: 706-771.
[132].
Download