422 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 58, NO. 7, JULY 2011 Clock- and Data-Recovery Circuit With Independently Controlled Eye-Tracking Loop for High-Speed Graphic DRAMs Jun-Yong Song, Student Member, IEEE, and Oh-Kyong Kwon, Member, IEEE Abstract—An independently controlled eye-tracking clockand data-recovery (CDR) circuit that achieves enhanced highfrequency jitter tolerance is presented in this brief. In the proposed CDR, a data-tracking loop compensates interchannel timing skews and rejects low-frequency jitter of the data, and an eye-tracking loop tracks asymmetric jitter distribution and high-frequency jitter of the data to enhance high-frequency jitter tolerance. This can be achieved by independently controlling two loops in the digital domain. The CDR is implemented using an 0.18-μm CMOS process, and a bit error rate of less than 10−12 was achieved for a data rate up to 5.8 Gb/s using a 231 − 1 pseudorandom binarysequence input. Index Terms—Bang-bang phase detector (PD), clock and data recovery (CDR), complementary metal–oxide–semiconductor (CMOS), dynamic random access memory (DRAM), eye tracking, jitter tolerance. I. I NTRODUCTION T HE DEMANDS of a high-bandwidth dynamic random access memory (DRAM) for graphic memory have recently increased for 3-D graphics to process large amounts of multimedia data [1]. Source-synchronous multichannel links such as DRAM input/output have large interchannel timing skews and poor receiver timing margins [2]. In addition, as the data rate between the memory controller and the memory module reaches several gigabits per second, the received data suffer from intersymbol interference and reflection noise, and the total jitter of the transmitted data has asymmetric distribution [3]. Consequently, it is difficult to achieve low bit error rate (BER) for a delay-locked loop (DLL)-based DRAM receiver because a sampling clock is simply 90◦ shifted by using DLL at the receiver. A clock- and data-recovery (CDR) circuit can compensate interchannel timing skews and track the jitter distribution of the received data. Conventional CDRs have a single loop for datatracking operation and can have high BER with asymmetric jitter distribution of the received data. The CDR, with variableinterval oversampling, can measure the data eye width and can achieve low BER with asymmetric jitter distribution [3]. Reference [3] has a reference loop, tracking loop, and eyeManuscript received July 22, 2010; revised November 24, 2010 and January 31, 2011; accepted April 4, 2011. Date of publication July 5, 2011; date of current version July 20, 2011. This work was supported in part by Hynix Semiconductor Inc. and the IC Design Education Center. This work was recommended by Associate Editor H.-J. Yoo. The authors are with the Department of Electronics and Computer Engineering, Hanyang University, Seoul 133-791, Korea (e-mail: okwon@hanyang. ac.kr). Digital Object Identifier 10.1109/TCSII.2011.2158254 Fig. 1. CDR architecture with DTL and ETL for multichannel link. measuring loop. Because the reference loop and the tracking loop share the voltage-controlled oscillator (VCO), it is difficult to adapt to a multichannel link. The tracking loop and eyemeasuring loop are tightly coupled, and the eye-measuring loop has a relatively low jitter tracking bandwidth to acquire stable operation. This brief presents a CDR with an independently controlled eye-tracking loop (ETL) for high-speed graphic DRAM. It adopts a shared phase-locked loop (PLL) for multichannel DRAM architecture [4] and two loops for phase tracking and eye tracking. Two loops are independently controlled in the digital domain to independently optimize the bandwidth of each loop in a small area. Section II describes the overall architecture of the CDR and each loop. In Section III, the CDR building blocks for phase detection and eye tracking are presented. The macromodeling result of the CDR is presented in Section IV. The experiment results of the CDR are presented in Section V, followed by the conclusion in Section VI. II. CDR A RCHITECTURE Fig. 1 shows the block diagram of the proposed CDR. The CDR operates at a rate of 1/4 to reduce the operation speed of the digital logic and maximize the data rate at the DRAM process. The PLL multiplies the frequency and reduces the jitter of the reference clock. The PLL synthesizes the four-phase 1/4-rate clocks (CLK0−3 ) from the reference clock (REF _CLK). A data-tracking loop (DTL) compensates skews between data and sampling clocks at each channel and tracks low-frequency jitter of the data. The DTL aligns edge-sampling clocks (ϕ0−3 ) to the averaged transition positions of the 4-bit data by controlling the phase of CLK0−3 . 1549-7747/$26.00 © 2011 IEEE SONG AND KWON: CDR CIRCUIT WITH INDEPENDENTLY CONTROLLED ETL FOR HIGH-SPEED GRAPHIC DRAM 423 Fig. 4. Phase diagram of the DTL. Fig. 2. Block diagram of the PLL. Fig. 3. Block diagram of the DTL. An ETL takes over ϕ0−3 from the DTL, which has the phase information of the data. The ETL generates two eye-monitoring clocks and one data-sampling clock per data by phase interpolating ϕi and ϕi+1 . The phase of the data-sampling clock is designed to be the center of the data eye where the lowest BER position is. The bandwidth of the ETL is designed to be higher than that of the DTL to track high-frequency jitter of the received data by separating a control loop of the ETL from the DTL and controlling two eye-monitoring clocks, independently. A. PLL The shared PLL provides two equally spaced 1/4-rate differential clocks (CLK0−3 ) from REF _CLK, as shown in Fig. 2. Differential delay cells of the VCO are implemented using symmetric loads [5] for high-supply-noise rejection characteristic and compatibility with the DRAM process. The frequency of REF _CLK is 1/8 rate of the data, and CLK0−3 is 1/4 rate of the data. The phase difference between CLKi and CLKi+1 is a data unit interval (UI). CLK0−3 is distributed to the CDR with current mode logic buffers. B. DTL Fig. 3 shows the block diagram of the DTL. The DTL adopts a digitally controlled phase-tracking loop with a 1/4-rate twice-oversampling bang-bang phase detector (PD) and a digital phase rotator (PR) controller. ϕ0−3 is aligned to the averaged transition position of the data by adjusting a phase of CLK0−3 using the PR and is sent to the ETL. PIs generate two kinds of sampling clocks (E0−3 and D0−3 ) for bang-bang phase detection. Di is generated by phase interpolating ϕi and ϕi+1 with the same weight, and Ei is generated by buffering ϕi Fig. 5. Block diagram of the ETL. using a PI to match the skews between Di and Ei [6]. The phase relationship is shown in Fig. 4. The phase difference between ϕi and ϕi+1 is 90◦ of 1/4-rate clock, and the phase difference between Ei and Di is 45◦ of 1/4-rate clock. The DTL PD adopts a bang-bang phase detection to align Ei to edges of the data. A majority vote circuit determines a state of U P _DN 0 per one clock cycle by comparing the number of HIGHs in U P [0 : 3] with that in DN [0 : 3]. A DTL digital loop filter accumulates U P _DN 0 at an update frequency to filter out the high-frequency jitter of the received data. The accumulated result generates U P _DN 1 to control the phase of ϕ0−3 using a PR controller. The PR controller determines the phase of ϕ0−3 with a 30-bit thermometer code T W [0 : 29]. The PR controller is implemented with a bidirectional shift register and an area-optimized 5-bit binary-to-thermometer decoder to increase linearity of the PR. C. ETL The ETL adopts a triple-oversampling eye-monitoring scheme [3] with two digitally controlled eye-monitoring clocks (L0−3 and R0−3 ) and a data-sampling clock (C0−3 ). The ETL generates Li , Ri , and Ci by phase interpolating ϕi and ϕi+1 , which has the phase information of the data. The ETL also independently controls Li , Ri , and Ci to track high-frequency jitter of the data using an ETL PD, a digital loop filter, a PI controller, and PIs as shown in Fig. 5. Fig. 6(a) shows a phase diagram of the ETL when the data eye opening leans to the right side of the UI, and Fig. 6(b) shows phases of clocks with P DFjit , where P DFjit denotes the probability density function (PDF) of jitter. In the locked state, Li and Ri are positioned to the edges of the data eye, and Ci is positioned to the center of the data eye, which is the lowest BER position. Consequently, the data sampled with Ci is recovered data (RE_D[i]), and C0 424 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 58, NO. 7, JULY 2011 Fig. 8. Schematic diagram of the 5-bit binary-to-thermometer decoder. Fig. 9. (a) Schematic diagram of the PI and (b) the PICS. Fig. 6. (a) Phase diagram of the ETL and (b) phases of clocks with P DFjit . Fig. 7. (a) Schematic diagram of the DTL PD and (b) schematic diagram of the ETL PD. is the recovered clock (RE_CLK). Additionally, the data eye width can be measured with the digital control signals of Li and Ri . The ETL PD generates DN _U P _L0 and U P _DN _R0 to independently control the phase of Li and Ri to align them to the edges of the data eye. The ETL digital loop filter updates DN _U P _L1 and U P _DN _R1 by sampling DN _U P _L0 and U P _DN _R0 at an update frequency. The update frequency of the ETL is higher than that of the DTL to track highfrequency jitter of the data. Because there is no feedback from the ETL to the DTL, the bandwidth of the each loop can be independently optimized. An ETL PI controller generates two 15-bit thermometer codes T W L[0 : 14] to control the phases of Li , Ri , and Ci using each PI. between Li and Ci , it means that Li is led to the left edge of the data eye, and DN _U P _L0 becomes HIGH to delay Li . In the same case for Ri and Ci , Ri is lagged to the right edge of the data eye, and U P _DN _L0 becomes LOW to advance Ri . B. 5-bit Binary-to-Thermometer Decoder The 5-bit binary-to-thermometer decoder is used to generate a thermometer code from the binary 5-bit PR control code W [0 : 4] at the PR controller of the DTL. The 5-bit binaryto-thermometer decoder is implemented with a 4-bit binary-tothermometer decoder and MUXs to optimize a resolution and the area of the PR controller, as shown in Fig. 8. The most significant bit of the 5-bit PR control code is used as a MUX selection signal. When W [4] is HIGH, T W [0 : 14] is all HIGH, and T W [15 : 29] is determined by W [0 : 3]. Accordingly, the 5-bit binary-to-thermometer decoder has 31 steps. C. PI III. CDR B UILDING B LOCKS A. PD The DTL PD is composed of samplers [7] for data sampling, a retimer to align the sampled data in-phase, and XOR gates for bang-bang phase detection, as shown in Fig. 7(a). The states of U P [i] and DN [i] is determined by a phase of Ei and the data UI. The ETL PD adopts a triple-oversampling eye-monitoring scheme [3], as shown in Fig. 7(b). If there is any transition The PIs have two roles in the proposed CDR. First, the PIs generate multiphase clocks to oversample the received data. Second, the PIs control phases of the sampling clocks. Because the inputs of the PI (V ID0 and V ID1) are differential signals, which are generated from the symmetric-load-based VCO, the PIs are also implemented with symmetric loads for constant input and output voltage swing, as shown in Fig. 9(a). A PI current source (PICS) is composed of 60 PI control units (PICUs), and controlled with two 30-bit thermometer code SONG AND KWON: CDR CIRCUIT WITH INDEPENDENTLY CONTROLLED ETL FOR HIGH-SPEED GRAPHIC DRAM 425 Fig. 10. Phase ranges of Li , Ci , and Ri . Fig. 12. Macromodeling results with an input data of 0.66-UI peak-to-peak jitter. (a) Control signal for ϕi . (b) Control signal for Li . (c) Control signal for Ri . IV. M ACROMODELING R ESULTS OF CDR Fig. 11. Floor plan of the six ETL PICSs with control signals. signals (T W [0 : 29] and T W b[0 : 29]). The PICU consists of a switch for digital control signal (T W [n] or T W b[n]) and a current source with bias voltage of V BN . Because PICUs are connected in parallel between a common source node (CS0 or CS1) and V SS, two adjacent PICUs are sharing CSi or V SS to minimize the area, as shown in Fig. 9(b). According to simulation results, the integral nonlinearity of the PI is ±0.25 LSB due to process variations. D. ETL PI Controller The ETL PI controller generates 15-bit thermometer code signals (T W L[0 : 14] and T W R[0 : 14]) to control the phases of Li and Ri . The phase control signal for Ci (T W C[0 : 14]) is composed of T W L[2m] and T W R[2m + 1], where m is from 0 to 7, to position Ci to the center of Li and Ri , and to minimize the area. The phase ranges of Li , Ri , and Ci are limited to half of the UI, as shown in Fig. 10, to simplify the PI controller logic by assuming that the left or right edge of the data eye does not exceed the center of the UI. Consequently, the ETL PI has 5-bit resolution and 4-bit phase control range. Fig. 11 depicts a floor plan of the six ETL PICSs with control signals. There are two PICSs for each Li , Ri , and Ci . To minimize process variation by pattern density, dummy PICSs are placed on both sides of the six PICSs. To limit the phase range of Li to left side and Ri to right side of the UI, fixed control signals in the left and right PICSs are connected to HIGH and LOW, respectively, as shown in Fig. 11. Center PICSs have eight fixed control signals with HIGH and seven fixed control signals with LOW. At the initial state, in order to position Li , Ri , and Ci at the center of ϕi and ϕi+1 , T W L[0 : 14] and T W R[0 : 14] are reset to HIGH and LOW, respectively. We verified the tracking characteristics of the proposed eyetracking method and designed the circuit using a macromodel of the channel and the circuit. The transition probability and jitter characteristic of the data is modeled using 231 − 1 pseudorandom binary-sequence (PRBS) input, white noise, and the PCB channel. The channel length was varied to change the magnitude of jitter. The length of observation for data eye opening is determined by the bandwidth of the ETL. Fig. 12 shows macromodeling results of the CDR when the input data has 0.66UI peak-to-peak jitter, and the reference clock is led to the input data. The bandwidth of the ETL is set to four times faster than that of the DTL. The weights of the control signals for ϕi , Li , and Ri are presented in Fig. 12; high weight means lead phase. The DTL positions the phase of ϕi to the edge of the received data, and at the same time, the ETL positions the phases of Li and Ri to the edges of the data eye. During the locking state, Li approaches the left edge of the data eye, and Ri is fast delayed toward the right edge of the data eye due to the high bandwidth of the ETL. After the DTL is locked to the data UI, Li and Ri are fast tracking the jitter of the data, whereas Ci is located at the center of the data eye for low BER. V. E XPERIMENT R ESULTS OF CDR The CDR circuit was fabricated in 0.18-μm standard CMOS technology. The microphotograph of the chip is shown in Fig. 13. The core of the CDR circuit is 0.90 mm × 0.55 mm. It consumes a power of 109.8 mW at 1.8-V supply voltage. The area and power consumption are not optimized for a large design margin. The chip has been tested with a chip-onboard assembly. With the 5.8-Gb/s 231 − 1 PRBS data input, the peak-to-peak jitter of the received data is 25 ps by impedance mismatch, and the measured BER is less than 10−12 . The jitter histogram of the recovered clock is shown in Fig. 14, where the rms and 426 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 58, NO. 7, JULY 2011 TABLE I P ERFORMANCE S UMMARY AND C OMPARISONS Fig. 13. Chip microphotograph. VI. C ONCLUSION A CDR circuit with an independently controlled ETL for high-speed graphic DRAM has been presented. The ETL independently controls two eye-monitoring clocks to monitor the data eye and to track high-frequency jitter of the data. In addition, digital control enables simple eye-opening measurement. To achieve area efficiency, a modified 5-bit binaryto-thermometer decoder and PI controller has been presented. The CDR is implemented in 0.18-μm CMOS technology, the maximum data rate is 5.8 Gb/s, and the BER is less than 10−12 with 231 − 1 PRBS data. The CDR can be applicable as a highspeed graphic DRAM receiver with high jitter tolerance. Fig. 14. Recovered 1/4-rate 1.45-GHz clock. ACKNOWLEDGMENT The authors would like to thank K.-S. Kwak, S.-J. Ahn, M.-S. Shin, E.-J. Kim, H.-R. Choi, and Y.-J. Kim for their useful discussion and feedback. R EFERENCES Fig. 15. Eye diagram of the 1/4-rate 1.45-Gb/s recovered data. peak-to-peak jitter are 13.5 and 121.2 ps, respectively. Fig. 15 shows the recovered parallel data at 1/4-rate 1.45 Gb/s. Maximum data rate and jitter tolerance of the CDR arelimited by the large peak-to-peak jitter of the sampling clock because the supply noise of digital buffers is transferred to the large jitter of the clock through low bandwidth of the PLL. The measured performance summary of the CDR and comparisons with other works are listed in Table I. [1] H. Lee, K.-Y. K. Chang, J.-H. Chun, T. Wu, Y. Frans, B. Leibowitz, N. Nguyen, T. J. Chin, K. Kaviani, J. Shen, X. Shi, W. T. Beyene, S. Li, R. Navid, M. Aleksic, F. S. Lee, F. Quan, J. Zerbe, R. Perego, and F. Assaderaghi, “A 16 Gb/s/link, 64 GB/s bidirectional asymmetric memory interface,” IEEE J. Solid-State Circuits, vol. 44, no. 4, pp. 1235–1247, Apr. 2009. [2] E. Yeung and M. A. Horowitz, “A 2.4 Gb/s/pin simultaneous bidirectional parallel link with per-pin skew compensation,” IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 1619–1628, Nov. 2000. [3] S.-H. Lee, M.-S. Hwang, Y. Choi, S. Kim, Y. Moon, B.-J. Lee, D.-K. Jeong, W. Kim, Y.-J. Park, and G. Ahn, “A 5-Gb/s 0.25 μm CMOS jitter-tolerant variable-interval oversampling clock/data recovery circuit,” IEEE J. SolidState Circuits, vol. 37, no. 12, pp. 1822–1830, Dec. 2002. [4] Y.-S. Seo, J.-W. Lee, H.-J. Kim, C. Yoo, J.-J. Lee, and C.-S. Jeong, “A 5-Gb/s clock- and data-recovery circuit with 1/8-rate linear phase detector in 0.18 μm CMOS technology,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 1, pp. 6–10, Jan. 2009. [5] J. G. Maneatis, “Low-jitter process-independent DLL and PLL based on self-biased techniques,” IEEE J. Solid-State circuits, vol. 31, no. 11, pp. 1723–1732, Nov. 1996. [6] S. Sidiropoulos and M. A. Horowitz, “Asemidigitaldualdelay-lockedloop,” IEEE J. Solid-State Circuits, vol. 32, no. 11, pp. 1683–1692, Nov. 1997. [7] P. K. Hanumolu, G.-Y. Wei, and U.-K. Moon, “A wide-tracking range clock and data recovery circuit,” IEEE J. Solid-State Circuits, vol. 43, no. 2, pp. 425–439, Feb. 2008. [8] A. Agrawal, A. Liu, P. K. Hanumolu, and G.-Y. Wei, “An 8 × 5 Gb/s parallel receiver with collaborative timing recovery,” IEEE J. Solid-State Circuit, vol. 44, no. 11, pp. 3120–3130, Nov. 2009.