IEEE 2006 Custom Intergrated Circuits Conference (CICC) Phase Mismatch Detection and Compensation for PLL/DLL Based Multi-Phase Clock Generator Amber Han-Yuan Tan and Gu-Yeon Wei Division of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138 Abstract—Device mismatch and systematic imbalances in the physical design can cause static phase mismatch in a PLL/DLL based multi-phase clock generator and degrade performance. This problem gets worse in deep sub-micron technologies. Interleaved transceiver architectures require precise clocking to maximize data rate and minimize bit errors. In this paper, a static phase mismatch compensation scheme for multiple sampling clocks is proposed and tested in an adaptivebandwidth mixing PLL/DLL based multi-phase clock generator. The proposed charge pump compensator and power efficient phase-averaging network together reduce the static phase mismatch standard deviation by 37% when operating in DLL mode. A simple and robust duty-cycle correction circuit exhibits a small residual error of 0.65% across a wide range (36% to 49%) of input clock duty-cycle values. I. MX-PDLL based multi-phase clock generator as the core clock generator loop. It also describes various sources of mismatch that can cause timing uncertainty in the system and their impact on performance. Section III then describes the detailed design of the following blocks that detect the phase mismatches and compensate for them – phase mismatch detection sampler and XOR array, CP compensator, PAN, and DCC. Lastly, measurement results are presented in Section IV. II. xclk Duty-Cycle Corrector & 2 %2 INTRODUCTION The popular interleaved transceiver architecture for multi-gigabit chip-to-chip and backplane communication links requires precise clocking to improve the data rate [1]. The clock timing uncertainty can result in receiver timing margin degradation and sub-optimal sampling positions, which can both degrade performance. The clock timing uncertainty can be categorized into two types – dynamic and static. This paper focuses on mitigating timing uncertainty arising from static sources such as device mismatch and systematic imbalances. The resulting static phase mismatch can be calibrated using a phase mismatch detector and several compensation schemes on chip. In this paper, we focus on static phase mismatch compensation schemes for an adaptive-bandwidth mixing PLL/DLL (MX-PDLL) based multi-phase clock generator. This system is described more thoroughly in [2]. In particular, we focus on three parts of the clock generator. Delay mismatch through the phase frequency detector (PFD) and current mismatch in the charge pump (CP) can lead to a static phase offset when locked; especially problematic for DLLs. A CP compensation scheme is proposed to correct for this offset. Device mismatch and systematic imbalances in the physical design can lead to uneven phase spacing in the multi-phase clock generator. A passive resistor-ring based phase-averaging network (PAN) is presented to mitigate uneven phase spacing. Lastly, a duty-cycle correction (DCC) circuit, which works in conjunction with a divide-by-2 (%2) circuit, is presented to generate 90-degree phase-shifted clocks for a phase rotator. The next section describes the architecture of the proposed dual-loop CDR that utilizes an adaptive-bandwidth 1-4244-0076-7/06/$20.00 ©2006 IEEE DUAL-LOOP CDR ARCHITECTURE i q i_b q_b 4:2 Mux 2 Phase Rotator Interpolator refclk 2 2 2 5 Control [6:0] Digital CDR Control (off-chip) 2 Mixing PLL/DLL bclk Multi-phase clocks [15:0] Phase-Averaging Network 16 sam_clk Sampler & XOR Array 2 Phase Mismatch Figure 1. Block diagram of the proposed dual-loop CDR architecture. This paper investigates techniques for phase mismatch detection, compensation, and duty cycle correction in the context of a dual-loop CDR proposed in [2]. Fig.1 presents a block diagram of the dual-loop architecture, which utilizes a MX-PDLL based multi-phase clock generator as the core clock generator loop. The MX-PDLL can be configured as a PLL, a DLL, or a mixture of the two adaptively. A digitallycontrolled phase rotator, which advances or retards the phase of the reference clock (refclk) that feeds into the MX-PDLL can adjust the phase of the output clock (bclk) as well as cover small frequency differences between bclk and the external reference clock (xclk). The duty-cycle corrected xclk is used to generate four 90° phase-shifted clocks (i, q, i_b, and q_b) after the %2 circuit. A 4:2 mux selects a pair of adjacent clock phases (quadrant selection) to be interpolated by the subsequent interpolator for fine phase adjustment. The MX-PDLL generates 16 sampling clock phases for the PAN, data samplers, and phase detection XOR array. In this paper, we will cover the designs of the XOR array, CP compensator inside the MX-PDLL, PAN, and DCC. The static phase offset between bclk and refclk, due to mismatches in the CP and PFD, can manifest itself as uneven phase spacing between the multi-phase clocks being P-54-1 417 generated. Consider the MX-PDLL operating as a DLL. If there is a phase offset at the DLL input (between refclk and bclk), the delay of the first cell in the voltage-controlled delay line (VCDL) will be different from that of the rest of the delay cells. This is due to the fresh refclk edge that drives the VCDL every reference cycle. As a result, this static phase offset can lead to uneven phase spacing in the multi-phase clock generator, when operating in DLL mode. PLL-mode operation obviates this concern. However, process variations and systematic imbalances in the physical design can also cause delay mismatch between the delay cells when configured either as a VCO or VCDL. Both of the above sources of phase mismatch can cause timing uncertainty in the sampling clocks and increase bit errors. The next section describes two techniques to mitigate phase error via a CP compensator and a PAN. The dual-loop CDR proposed in [2] relies on a phase rotator for phase and frequency tracking. Duty-cycle mismatch on xclk can lead to quadrant mismatch between i and q clocks after the %2 circuit. This quadrant mismatch degrades the linearity of the phase rotator. Measured results show that the linearity has bigger impact on the output jitter than the resolution does [2]. To address this issue, we present a feedback loop based DCC to minimize mismatch between quadrants by integrating the phase error between the four 90º phase-shifted clocks. at one time to reduce I/O pin count. By measuring the width of all 16 pulses, the static phase mismatch between the sampling clocks can be measured. 16 sampling clocks from PAN sam_clk Sampler Array sam_clk_b sa1, sa2, … , sa15, sa16 16 sampler ouputs to XOR array sa1 tr2 sa3 16:2 Mux sa15 tr15 sa16 tr16 Figure 2. XOR-based phase detection circuit. B. Charge Pump Compensator (a) comp_bias (upb_comp/ dnb_comp) biasn comp_ctrl_up/dn[0] comp_ctrl (fine/coarse) comp_ctrl_up/dn[6] 1x In the next section, we describe in detail the techniques and circuits that detect and compensate for the above three main mismatching sources in the CDR. First, we describe the phase detection scheme that is used to detect phase mismatch coming from all of the different mismatching sources. Second, we present a CP compensator that can overcome the static phase offset caused by mismatch in the CP and PFD. In addition, we describe a power-efficient phase-averaging network (PAN) that reduces delay mismatch between delay cells in the MX-PDLL resulting from process variations and systematic imbalances in the physical design. Lastly, a DCC circuit that enables evenly spaced i and q clocks is presented. III. tr1 sa2 64 x comp_ctrl_up[6:0] CP Compensator1 upb_comp to charge pump comp_ctrl comp_ctrl_dn[6:0] CP Compensator2 dnb_comp (b) Vref dnb upb biasp CMFB Vcp1 Vcp2 CIRCUIT DESIGN A. Phase Detection XOR Array In order to facilitate mismatch compensation, we rely on a XOR-based phase detection circuit. The basic idea is to measure phase mismatch by sampling an external clock signal (sam_clk), which has a frequency that is very close (but not equal) to the sampling clock frequency. The sam_clk edge transitions (both rising and falling) can be determined by taking two adjacent sampler outputs and feeding them into a XOR gate as shown in Fig. 2. This is essentially a simple implementation of a bang-bang phase detector. The pulse duration of the XOR output (tr1,…, tr16) is proportional to the phase difference between two adjacent sampling clocks. Since we can choose the sam_clk frequency to be very close to the sampling clock frequency, the pulse duration can be made long for very high resolution measurements. A 16:2 mux selects any two of the 16 pulses upb_comp (from CP compensator) to loop filter dnb_comp (from CP compensator) bias_cm sw dnb upb Figure 3. Charge pump compensation scheme: (a) Schematic of the CP compensator. (b) Schematic of the pseudo-differential CP. Given the ability to detect phase mismatch, we propose a CP compensation scheme to purposely skew the CP current to compensate for the static offset between refclk and bclk. The charge pump compensator comprises a pair of 7-bit current DACs to generate the compensation bias voltages (upb_comp and dnb_comp) that feed the charge pump as shown in Fig. 3(a). A control signal, comp_ctrl, is used to set the resolution of the digitally controlled output. When comp_ctrl is set high, the output voltage has higher resolution but smaller compensation range. Two charge P-54-2 418 pump compensators are used to generate upb_comp and dnb_comp separately according to the control bits. The pseudo-differential CP has two auxiliary NMOS current sink devices that are controlled by CP compensation bias voltages in addition to the main pull-down current paths that are controlled by CMFB as shown in Fig. 3(b). Therefore, the differential CP current can be skewed via digital compensation codes to cancel out the static phase offset between refclk and bclk. small resistance causes too much active current, significantly reducing the voltage swing and increasing power consumption. Hence, there is a tradeoff between the amount of reduction in phase spacing mismatch and the power consumption, with respect to transmission gate sizing. Fig. 5 plots the ratio of phase mismatch reduction over power consumption vs. transistor size for three different P/N ratios. A P/N ratio of 3 was chosen to achieve the maximum reduction rate without significant power penalty. C. Phase-Averaging Network D. Duty-Cycle Corrector 1x 1x ph1_in 1x d iv c lk x c lk ph1_out D C C _ tu n e ph_ctrl_b x c lk _ b ph_ctrl ph2_in i D iv -b y -2 ( %2 ) d iv c lk _ b q i_ b q_b ph2_out n tu n e ph15_in ph15_out ph16_in p tu n e x c lk i q n tu n e (X O R ) ph16_out p tu n e (X N O R ) Figure 6. Block diagram of DCC. Phase mismatch reduction rate/power consumption (%/mW) Figure 4. Schematic of phase-averaging network. Most pow er efficient NMOS w idths 18 16 14 12 10 8 3 2.5 2 1.5 2 1 1.5 P/N ratio 0.5 1 0 NMOS w idth in the transmission gate (um) Figure 5. Power efficiency vs. trasmission gate sizing. To mitigate uneven phase spacing between multi-phase clocks in the MX-PDLL, a phase-averaging network (PAN) is proposed. Fig. 4 presents a schematic of the PAN. The 16 clock phases out of the MX-PDLL are uniformly connected to two layers of interconnected resistors (R-ring) [3]. The Rring has the benefit of smearing or averaging the voltage transitions, which reduces the phase errors caused by mismatch along the different clock paths. Phase averaging is achieved through the RC low-pass filtering between all of the clock phases, reducing phase spacing offsets. The resistor is implemented with a transmission gate, which can be enabled by signals, ph_ctrl and ph_ctrl_b. The filtering capacitance comes from the parasitic capacitance of the internal nodes. The sizing of the transmission gate can be tricky. If the transistors are too small, the corresponding resistance is too large to provide sufficient voltage averaging such that the reduction of phase offsets is negligible. On the other hand, if the sizes are too large, the correspondingly As shown in Fig. 6, the DCC block consists of a dutycycle tuning block (DCC_tune) that drives a divide-by-2 circuit (%2), which subsequently generates the quadrature clock phases. The %2 circuit is a simple divide-by-2 counter. A set of XOR and XNOR gates generate output pulses that correspond to the phase spacing between the i and q clock signals. A pair of passive RC circuits filters the pulses to generate a pair of pseudo-differential duty-cycle tuning signals, ptune and ntune. Through negative feedback, the tuning signals feed into DCC_tune block to compensate for any duty-cycle mismatch on xclk to create evenly-spaced quadrature phases. Given the %2 circuit, this corresponds to a 50% duty-cycle on divclk. In the RC filter, a small C can be used by implementing a large R to reduce layout area. While this DCC utilizes very simple phase detection and filter circuitry, one drawback is the small non-zero static-state offset resulting from a finite DC gain. xclk divclk xclk_b divclk_b ptune ntune Figure 7. Schematic of DCC_tune. The DCC_tune block consists of two tuning stages with four NMOS tuning devices that are controlled by the pseudodifferential tuning signals (ptune and ntune), shown in Fig. 7. P-54-3 419 IV. DLL mode. PAN further helps reduce the STD from 13.31ps to 11.74ps when operating in DLL mode and from 7.28ps to 6.71ps in PLL mode. C. Duty-Cycle-Corrector (a) 180 MEASUREMENT RESULTS Phase difference (ps) Com pensated CP (DLL) 90 80 6 8 10 Clock phase 12 14 Figure 8. Measured static phase spacing mismatch in DLL mode. DLL-CP DLL-CP & PAN Phase mismatch (ps) PLL-No compensation PLL-CP PLL-CP & PAN 10 0 -10 -20 4 6 8 10 Clock phase 12 60 25.4 25.3 25.2 25.1 38 40 42 44 46 48 External clock (xclk) duty-cycle (%) CONCLUSIONS Precise generation of evenly-spaced clock is needed for high-performance multi-gigabit links. We have presented three circuits to compensate for static phase mismatch coming from three different mismatch sources. The proposed CP compensator offers over 28% reduction of phase spacing mismatch. The combined reduction by using both the CP compensator and PAN is over 37%. A simple and robust duty-cycle corrector design ensures even quadrant spacing to improve the linearity of the phase rotator. DLL-No compensation 2 50 25.5 V. 40 0 20 30 40 Phase rotator code Fig. 10(a) plots the measured phase range out of the phase rotator for quadrants I and II given a 36%-duty-cycle xclk at the input. The DCC reduces the quadrant mismatch from ±1.59% to ±0.1%. Fig. 10(b) plots the measured quadrant spacing with respect to xclk duty-cycle swept from 36% to 49%. The residual mismatch is less than 0.65% of the desired 25% spacing. B. Phase-Averaging Network 20 10 Figure 10. Measured DCC performance: (a) Phase range out of the phase rotator for quadrants I and II w/ DCC and w/o DCC given a 36% dutycycle xclk. (b) Quadrant spacing vs. external clock (xclk) duty-cycle. 16 The uncompensated CP in DLL mode shows a large delay in the first delay cell (clock phase 16) due to the static phase offset issue that was discussed previously. Measured results in Fig. 8 show that after CP compensation, this big delay difference is removed, and the overall phase spacing mismatch is also slightly reduced. 30 w/ DCC: Quadrant I:25.1% Quadrant II: 24.9% w/o DCC: Quadrant I: 23.41% Quadrant II: 26.59% 25.6 25 36 Mismatch due to the rclk being muxed in at DLL mode 4 60 25.7 Quadrant spacing (%) Uncom pensated CP (DLL) 2 80 0 110 50 100 20 A. Charge Pump Compensator 60 120 40 (b) 70 w /o DCC 140 In this section, we present the measured results of the CP compensator, PAN, and DCC. The bclk clock frequency is 750MHz for all the following measurement results. 100 w / DCC 160 Phase (degree) The NMOS tuning devices can delay either the rising or falling transition of xclk according to ptune and ntune. Since only NMOS devices are controlled by the tuning signal, only the rising edges on the internal nodes are delayed. The additional four feed-forward NMOS pull-up devices are added in to balance the discharging currents from the tuning devices and also provide some duty-cycle correction. When the DCC feedback loop is in lock, ptune and ntune settle to values which guarantee that the phase difference between i and q (detected by XOR) and the phase difference between q and i_b (detected by XNOR) are the same. 14 16 REFERENCES Figure 9. Measured CP compensator and PAN static phase spacing mismach results in PLL and DLL modes. [1] Fig. 9 presents the benefits of using the PAN along with the CP compensator. CP compensation reduces the standard deviation (STD) of phase spacing mismatch among all 16 clock phases from 18.72ps to 13.31ps when operating in [2] [3] P-54-4 J. Jaussi, et al., “A 20Gb/s Embedded Clock Transceiver in 90nm CMOS,” ISSCC Dig. of Tech. Papers, pp. 340–341, Feb. 2006. A. H.-Y. Tan and G.-Y. Wei, “Adaptive-Bandwidth Mixing PLL/DLL Based Multi-Phase Clock Generator for Optimal Jitter Performance,” unpublished. Submitted to CICC 2006. J.-M. Chou, Y.-T. H., and J.-T. Wu, “A 125 MHz 8b Digital-to-Phase Converter,” ISSCC Dig. of Tech. Papers, pp. 436–437, Feb. 2003. 420