Phase Mismatch Detection and Compensation for PLL/DLL Based

advertisement
IEEE 2006 Custom Intergrated Circuits Conference (CICC)
Phase Mismatch Detection and Compensation for
PLL/DLL Based Multi-Phase Clock Generator
Amber Han-Yuan Tan and Gu-Yeon Wei
Division of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138
Abstract—Device mismatch and systematic imbalances in the
physical design can cause static phase mismatch in a PLL/DLL
based multi-phase clock generator and degrade performance.
This problem gets worse in deep sub-micron technologies.
Interleaved transceiver architectures require precise clocking
to maximize data rate and minimize bit errors. In this paper, a
static phase mismatch compensation scheme for multiple
sampling clocks is proposed and tested in an adaptivebandwidth mixing PLL/DLL based multi-phase clock
generator. The proposed charge pump compensator and power
efficient phase-averaging network together reduce the static
phase mismatch standard deviation by 37% when operating in
DLL mode. A simple and robust duty-cycle correction circuit
exhibits a small residual error of 0.65% across a wide range
(36% to 49%) of input clock duty-cycle values.
I.
MX-PDLL based multi-phase clock generator as the core
clock generator loop. It also describes various sources of
mismatch that can cause timing uncertainty in the system and
their impact on performance. Section III then describes the
detailed design of the following blocks that detect the phase
mismatches and compensate for them – phase mismatch
detection sampler and XOR array, CP compensator, PAN,
and DCC. Lastly, measurement results are presented in
Section IV.
II.
xclk Duty-Cycle
Corrector &
2
%2
INTRODUCTION
The popular interleaved transceiver architecture for
multi-gigabit chip-to-chip and backplane communication
links requires precise clocking to improve the data rate [1].
The clock timing uncertainty can result in receiver timing
margin degradation and sub-optimal sampling positions,
which can both degrade performance. The clock timing
uncertainty can be categorized into two types – dynamic and
static. This paper focuses on mitigating timing uncertainty
arising from static sources such as device mismatch and
systematic imbalances. The resulting static phase mismatch
can be calibrated using a phase mismatch detector and
several compensation schemes on chip.
In this paper, we focus on static phase mismatch
compensation schemes for an adaptive-bandwidth mixing
PLL/DLL (MX-PDLL) based multi-phase clock generator.
This system is described more thoroughly in [2]. In
particular, we focus on three parts of the clock generator.
Delay mismatch through the phase frequency detector (PFD)
and current mismatch in the charge pump (CP) can lead to a
static phase offset when locked; especially problematic for
DLLs. A CP compensation scheme is proposed to correct for
this offset. Device mismatch and systematic imbalances in
the physical design can lead to uneven phase spacing in the
multi-phase clock generator. A passive resistor-ring based
phase-averaging network (PAN) is presented to mitigate
uneven phase spacing. Lastly, a duty-cycle correction (DCC)
circuit, which works in conjunction with a divide-by-2 (%2)
circuit, is presented to generate 90-degree phase-shifted
clocks for a phase rotator.
The next section describes the architecture of the
proposed dual-loop CDR that utilizes an adaptive-bandwidth
1-4244-0076-7/06/$20.00 ©2006 IEEE
DUAL-LOOP CDR ARCHITECTURE
i
q
i_b
q_b
4:2
Mux
2
Phase
Rotator
Interpolator
refclk
2
2
2
5
Control
[6:0]
Digital
CDR
Control
(off-chip)
2
Mixing PLL/DLL
bclk
Multi-phase
clocks [15:0]
Phase-Averaging
Network
16
sam_clk
Sampler & XOR Array 2
Phase
Mismatch
Figure 1. Block diagram of the proposed dual-loop CDR architecture.
This paper investigates techniques for phase mismatch
detection, compensation, and duty cycle correction in the
context of a dual-loop CDR proposed in [2]. Fig.1 presents a
block diagram of the dual-loop architecture, which utilizes a
MX-PDLL based multi-phase clock generator as the core
clock generator loop. The MX-PDLL can be configured as a
PLL, a DLL, or a mixture of the two adaptively. A digitallycontrolled phase rotator, which advances or retards the phase
of the reference clock (refclk) that feeds into the MX-PDLL
can adjust the phase of the output clock (bclk) as well as
cover small frequency differences between bclk and the
external reference clock (xclk). The duty-cycle corrected xclk
is used to generate four 90° phase-shifted clocks (i, q, i_b,
and q_b) after the %2 circuit. A 4:2 mux selects a pair of
adjacent clock phases (quadrant selection) to be interpolated
by the subsequent interpolator for fine phase adjustment. The
MX-PDLL generates 16 sampling clock phases for the PAN,
data samplers, and phase detection XOR array. In this paper,
we will cover the designs of the XOR array, CP compensator
inside the MX-PDLL, PAN, and DCC.
The static phase offset between bclk and refclk, due to
mismatches in the CP and PFD, can manifest itself as uneven
phase spacing between the multi-phase clocks being
P-54-1
417
generated. Consider the MX-PDLL operating as a DLL. If
there is a phase offset at the DLL input (between refclk and
bclk), the delay of the first cell in the voltage-controlled
delay line (VCDL) will be different from that of the rest of
the delay cells. This is due to the fresh refclk edge that drives
the VCDL every reference cycle. As a result, this static phase
offset can lead to uneven phase spacing in the multi-phase
clock generator, when operating in DLL mode. PLL-mode
operation obviates this concern. However, process variations
and systematic imbalances in the physical design can also
cause delay mismatch between the delay cells when
configured either as a VCO or VCDL. Both of the above
sources of phase mismatch can cause timing uncertainty in
the sampling clocks and increase bit errors. The next section
describes two techniques to mitigate phase error via a CP
compensator and a PAN.
The dual-loop CDR proposed in [2] relies on a phase
rotator for phase and frequency tracking. Duty-cycle
mismatch on xclk can lead to quadrant mismatch between i
and q clocks after the %2 circuit. This quadrant mismatch
degrades the linearity of the phase rotator. Measured results
show that the linearity has bigger impact on the output jitter
than the resolution does [2]. To address this issue, we present
a feedback loop based DCC to minimize mismatch between
quadrants by integrating the phase error between the four 90º
phase-shifted clocks.
at one time to reduce I/O pin count. By measuring the width
of all 16 pulses, the static phase mismatch between the
sampling clocks can be measured.
16 sampling clocks from PAN
sam_clk
Sampler Array
sam_clk_b
sa1, sa2, … , sa15, sa16
16 sampler ouputs to XOR array
sa1
tr2
sa3
16:2
Mux
sa15
tr15
sa16
tr16
Figure 2. XOR-based phase detection circuit.
B. Charge Pump Compensator
(a)
comp_bias
(upb_comp/
dnb_comp)
biasn
comp_ctrl_up/dn[0]
comp_ctrl
(fine/coarse)
comp_ctrl_up/dn[6]
1x
In the next section, we describe in detail the techniques
and circuits that detect and compensate for the above three
main mismatching sources in the CDR. First, we describe the
phase detection scheme that is used to detect phase mismatch
coming from all of the different mismatching sources.
Second, we present a CP compensator that can overcome the
static phase offset caused by mismatch in the CP and PFD.
In addition, we describe a power-efficient phase-averaging
network (PAN) that reduces delay mismatch between delay
cells in the MX-PDLL resulting from process variations and
systematic imbalances in the physical design. Lastly, a DCC
circuit that enables evenly spaced i and q clocks is presented.
III.
tr1
sa2
64 x
comp_ctrl_up[6:0]
CP
Compensator1
upb_comp
to charge
pump
comp_ctrl
comp_ctrl_dn[6:0]
CP
Compensator2
dnb_comp
(b)
Vref
dnb
upb
biasp
CMFB
Vcp1
Vcp2
CIRCUIT DESIGN
A. Phase Detection XOR Array
In order to facilitate mismatch compensation, we rely on
a XOR-based phase detection circuit. The basic idea is to
measure phase mismatch by sampling an external clock
signal (sam_clk), which has a frequency that is very close
(but not equal) to the sampling clock frequency. The sam_clk
edge transitions (both rising and falling) can be determined
by taking two adjacent sampler outputs and feeding them
into a XOR gate as shown in Fig. 2. This is essentially a
simple implementation of a bang-bang phase detector. The
pulse duration of the XOR output (tr1,…, tr16) is
proportional to the phase difference between two adjacent
sampling clocks. Since we can choose the sam_clk frequency
to be very close to the sampling clock frequency, the pulse
duration can be made long for very high resolution
measurements. A 16:2 mux selects any two of the 16 pulses
upb_comp
(from CP
compensator)
to loop
filter
dnb_comp
(from CP
compensator)
bias_cm
sw
dnb
upb
Figure 3. Charge pump compensation scheme: (a) Schematic of the CP
compensator. (b) Schematic of the pseudo-differential CP.
Given the ability to detect phase mismatch, we propose a
CP compensation scheme to purposely skew the CP current
to compensate for the static offset between refclk and bclk.
The charge pump compensator comprises a pair of 7-bit
current DACs to generate the compensation bias voltages
(upb_comp and dnb_comp) that feed the charge pump as
shown in Fig. 3(a). A control signal, comp_ctrl, is used to set
the resolution of the digitally controlled output. When
comp_ctrl is set high, the output voltage has higher
resolution but smaller compensation range. Two charge
P-54-2
418
pump compensators are used to generate upb_comp and
dnb_comp separately according to the control bits. The
pseudo-differential CP has two auxiliary NMOS current sink
devices that are controlled by CP compensation bias voltages
in addition to the main pull-down current paths that are
controlled by CMFB as shown in Fig. 3(b). Therefore, the
differential CP current can be skewed via digital
compensation codes to cancel out the static phase offset
between refclk and bclk.
small resistance causes too much active current, significantly
reducing the voltage swing and increasing power
consumption. Hence, there is a tradeoff between the amount
of reduction in phase spacing mismatch and the power
consumption, with respect to transmission gate sizing. Fig. 5
plots the ratio of phase mismatch reduction over power
consumption vs. transistor size for three different P/N ratios.
A P/N ratio of 3 was chosen to achieve the maximum
reduction rate without significant power penalty.
C. Phase-Averaging Network
D. Duty-Cycle Corrector
1x
1x
ph1_in
1x
d iv c lk
x c lk
ph1_out
D C C _ tu n e
ph_ctrl_b
x c lk _ b
ph_ctrl
ph2_in
i
D iv -b y -2
( %2 )
d iv c lk _ b
q
i_ b
q_b
ph2_out
n tu n e
ph15_in
ph15_out
ph16_in
p tu n e
x c lk
i
q
n tu n e (X O R )
ph16_out
p tu n e (X N O R )
Figure 6. Block diagram of DCC.
Phase mismatch reduction rate/power consumption
(%/mW)
Figure 4. Schematic of phase-averaging network.
Most pow er efficient
NMOS w idths
18
16
14
12
10
8
3
2.5
2
1.5
2
1
1.5
P/N ratio
0.5
1
0
NMOS w idth in the transmission gate (um)
Figure 5. Power efficiency vs. trasmission gate sizing.
To mitigate uneven phase spacing between multi-phase
clocks in the MX-PDLL, a phase-averaging network (PAN)
is proposed. Fig. 4 presents a schematic of the PAN. The 16
clock phases out of the MX-PDLL are uniformly connected
to two layers of interconnected resistors (R-ring) [3]. The Rring has the benefit of smearing or averaging the voltage
transitions, which reduces the phase errors caused by
mismatch along the different clock paths. Phase averaging is
achieved through the RC low-pass filtering between all of
the clock phases, reducing phase spacing offsets. The resistor
is implemented with a transmission gate, which can be
enabled by signals, ph_ctrl and ph_ctrl_b. The filtering
capacitance comes from the parasitic capacitance of the
internal nodes. The sizing of the transmission gate can be
tricky. If the transistors are too small, the corresponding
resistance is too large to provide sufficient voltage averaging
such that the reduction of phase offsets is negligible. On the
other hand, if the sizes are too large, the correspondingly
As shown in Fig. 6, the DCC block consists of a dutycycle tuning block (DCC_tune) that drives a divide-by-2
circuit (%2), which subsequently generates the quadrature
clock phases. The %2 circuit is a simple divide-by-2 counter.
A set of XOR and XNOR gates generate output pulses that
correspond to the phase spacing between the i and q clock
signals. A pair of passive RC circuits filters the pulses to
generate a pair of pseudo-differential duty-cycle tuning
signals, ptune and ntune. Through negative feedback, the
tuning signals feed into DCC_tune block to compensate for
any duty-cycle mismatch on xclk to create evenly-spaced
quadrature phases. Given the %2 circuit, this corresponds to
a 50% duty-cycle on divclk. In the RC filter, a small C can be
used by implementing a large R to reduce layout area. While
this DCC utilizes very simple phase detection and filter
circuitry, one drawback is the small non-zero static-state
offset resulting from a finite DC gain.
xclk
divclk
xclk_b
divclk_b
ptune
ntune
Figure 7. Schematic of DCC_tune.
The DCC_tune block consists of two tuning stages with
four NMOS tuning devices that are controlled by the pseudodifferential tuning signals (ptune and ntune), shown in Fig. 7.
P-54-3
419
IV.
DLL mode. PAN further helps reduce the STD from 13.31ps
to 11.74ps when operating in DLL mode and from 7.28ps to
6.71ps in PLL mode.
C. Duty-Cycle-Corrector
(a)
180
MEASUREMENT RESULTS
Phase difference (ps)
Com pensated CP (DLL)
90
80
6
8
10
Clock phase
12
14
Figure 8. Measured static phase spacing mismatch in DLL mode.
DLL-CP
DLL-CP & PAN
Phase mismatch (ps)
PLL-No compensation
PLL-CP
PLL-CP & PAN
10
0
-10
-20
4
6
8
10
Clock phase
12
60
25.4
25.3
25.2
25.1
38
40
42
44
46
48
External clock (xclk) duty-cycle (%)
CONCLUSIONS
Precise generation of evenly-spaced clock is needed for
high-performance multi-gigabit links. We have presented
three circuits to compensate for static phase mismatch
coming from three different mismatch sources. The proposed
CP compensator offers over 28% reduction of phase spacing
mismatch. The combined reduction by using both the CP
compensator and PAN is over 37%. A simple and robust
duty-cycle corrector design ensures even quadrant spacing to
improve the linearity of the phase rotator.
DLL-No compensation
2
50
25.5
V.
40
0
20
30
40
Phase rotator code
Fig. 10(a) plots the measured phase range out of the
phase rotator for quadrants I and II given a 36%-duty-cycle
xclk at the input. The DCC reduces the quadrant mismatch
from ±1.59% to ±0.1%. Fig. 10(b) plots the measured
quadrant spacing with respect to xclk duty-cycle swept from
36% to 49%. The residual mismatch is less than 0.65% of the
desired 25% spacing.
B. Phase-Averaging Network
20
10
Figure 10. Measured DCC performance: (a) Phase range out of the phase
rotator for quadrants I and II w/ DCC and w/o DCC given a 36% dutycycle xclk. (b) Quadrant spacing vs. external clock (xclk) duty-cycle.
16
The uncompensated CP in DLL mode shows a large
delay in the first delay cell (clock phase 16) due to the static
phase offset issue that was discussed previously. Measured
results in Fig. 8 show that after CP compensation, this big
delay difference is removed, and the overall phase spacing
mismatch is also slightly reduced.
30
w/ DCC: Quadrant I:25.1%
Quadrant II: 24.9%
w/o DCC: Quadrant I: 23.41%
Quadrant II: 26.59%
25.6
25
36
Mismatch due to the rclk
being muxed in
at DLL mode
4
60
25.7
Quadrant spacing (%)
Uncom pensated CP (DLL)
2
80
0
110
50
100
20
A. Charge Pump Compensator
60
120
40
(b)
70
w /o DCC
140
In this section, we present the measured results of the CP
compensator, PAN, and DCC. The bclk clock frequency is
750MHz for all the following measurement results.
100
w / DCC
160
Phase (degree)
The NMOS tuning devices can delay either the rising or
falling transition of xclk according to ptune and ntune. Since
only NMOS devices are controlled by the tuning signal, only
the rising edges on the internal nodes are delayed. The
additional four feed-forward NMOS pull-up devices are
added in to balance the discharging currents from the tuning
devices and also provide some duty-cycle correction. When
the DCC feedback loop is in lock, ptune and ntune settle to
values which guarantee that the phase difference between i
and q (detected by XOR) and the phase difference between q
and i_b (detected by XNOR) are the same.
14
16
REFERENCES
Figure 9. Measured CP compensator and PAN static phase spacing
mismach results in PLL and DLL modes.
[1]
Fig. 9 presents the benefits of using the PAN along with
the CP compensator. CP compensation reduces the standard
deviation (STD) of phase spacing mismatch among all 16
clock phases from 18.72ps to 13.31ps when operating in
[2]
[3]
P-54-4
J. Jaussi, et al., “A 20Gb/s Embedded Clock Transceiver in 90nm
CMOS,” ISSCC Dig. of Tech. Papers, pp. 340–341, Feb. 2006.
A. H.-Y. Tan and G.-Y. Wei, “Adaptive-Bandwidth Mixing
PLL/DLL Based Multi-Phase Clock Generator for Optimal Jitter
Performance,” unpublished. Submitted to CICC 2006.
J.-M. Chou, Y.-T. H., and J.-T. Wu, “A 125 MHz 8b Digital-to-Phase
Converter,” ISSCC Dig. of Tech. Papers, pp. 436–437, Feb. 2003.
420
Download