1 A Fast-Locked All-Digital Phase-Locked Loop for Dynamic Frequency Scaling Dian Huang and Ying Qiao Abstract— All-Digital Phase-Locked Loop (ADPLL) for digital system clock generation is widely studied to replace the traditional analog PLL as the process technology enters the nanometer regime. Numerous researches have been performed to reduce the frequency/phase locking time as well as mitigating the output clock jitter. This paper presents a novel successiveapproximation frequency-search algorithm to center the starting frequency of digitally controlled oscillator (DCO) on the target frequency while aligning the phase of the output clock to that of the reference clock. With small design complexity overhead, the implemented ADPLL using 45nm Nangate open cell library shows period jitter 4.56ps (peak-to-peak), 1.32ps (rms) with 790ns locking time at 4.5GHz under 1.0V supply voltage in simulation. Index Terms— all digital phase-locked loop (ADPLL), bangbang phase detector (BBPHD), successive approximation registers (SAR) I. INTRODUCTION A digital system such as system-on-chip microprocessor generally requires Phase-Locked Loop (PLL) for clock generation. However, traditional analog PLL typically contains several important components not included in a static CMOS standard cell library, such as resistors and low leakage capacitors, which makes it difficult to be integrated into a digital system and to be remapped when porting to new technologies [1]. Furthermore, the yield and performance spreads of the analog PLL may not be well-correlated to those of the digital designs on chip. Therefore, in recent years, the benefits from IC process technology scaling have enabled the practical implementation of all-digital phase-locked loop (ADPLL). On the other hand, dynamic per core clock frequency scaling [2] has become critical for state-of-the-art multi-core microprocessor power management. As the microprocessor commonly pauses computing and waits for the clock to be ready during dynamic frequency scaling, fast-locked ADPLL has obtained growing attention among the design house, in addition to the conventional focus on jitter performance. Various approaches on fast-locked ADPLL have been proposed for microprocessor clock generation under dynamic frequency scaling. These designs can be divided into two categories, simultaneous vs. detangled frequency and phase locking. ADPLL designs with detangled frequency and phase locking [9] separate the scheme for frequency acquisition and phase locking. In frequency tracking stage, the bang-bang phase detector (BBPHD) outputs an UP/DN signal, by detecting the phase error between the reference clock and the output clock, to adjust the DCO output frequency accordingly. When the BBPHD outputs an UP-DN pair, the frequency settles within the range of current step size of the DCO control code and the code then switches to the next significant bit, which is also known as binary search algorithm (BSA). During the phase tracking stage when the frequency already settles, time-to-digital converter (TDC) senses the remained phase error and generates DCO control code to adjust the period of several DCO output clocks and align the phase; and then its frequency changes back to the originally settled one when phase error is eliminated. However, although BSA is easily implemented, a tradeoff among locking time and frequency range must be made because tlock is proportional to log2(frange). Moreover, highresolution TDC suffers from design complexity and large area. One of the key issues that slow down PLL locking is that even though the PLL already outputs a correct frequency, if its phase error is large, it still needs to adjust this frequency. As a result, a modified bang-bang algorithm [7] has been proposed to resolve this issue. When the output frequency is close to the locking frequency, ADPLL still needs to adjust its frequency away from the target in order to cancel out this phase error, which dramatically increases the locking time. In order to quickly adjust this large phase error, this algorithm stores the proportional path code into integral path. Thus, the DCO frequency changes hugely and corrects a large phase error; however, as long as the bang-bang phase detector outputs an UP-DN pair, the phase adjustment is ended, the proportional path code stored in the integral path is cleared, and the ADPLL goes back to the original frequency. When the phase error is small enough, this algorithm is disabled and the ADPLL works the same as conventional BBPHD ADPLL. This algorithm does improve the locking time; however, this ADPLL still suffers from the tradeoff between locking time and jitter as it only works when the DCO frequency is close to the target frequency, which therefore, is not suitable for microprocessor that demands a wide range of frequency scaling. To overcome this issue, an adaptive loop-gain fast-locked algorithm is proposed in [5]. In this algorithm, the average values of the BBPHD signals are transformed through leaky integrators, which form a filter that dynamically adjusts the loop gain of the system. Thus, when the frequency error is large, the gain is enlarged for shorter locking time. When the error is small, it changes the loop gain to a very small one for 2 low jitter performance. However, the actual DCO gain needs to be estimated properly a prior, which is unreliable under process and temperature variations. Even though this issue is resolved in [5] with dynamically reconfigured digital loop filter, it significantly increases the design complexity. In this work, we propose a BBPHD based ADPLL with novel successive-approximation frequency-search algorithm to not only center the starting frequency of DCO on the desired frequency but also align the phase of the output clock to that of the reference clock. In this algorithm, it first checks how much delay on the reference clock is needed to align the falling edge of divider output clock with that of the reference clock; it then searches the center frequency with successiveapproximation registers (SAR). Finally, conventional ADPLL technique is used to resolve the remaining frequency and phase error. This paper is organized as below. In Section II, we will discuss the design considerations of the bang-bang ADPLL. Our proposed ADPLL architecture is shown in Section III. We show the simulation results of the implemented ADPLL in Section IV, and followed by conclusions in Section V. frequency step of the DCO, which is proportional to the jitter value. Thus, an integral path with smaller gain is added to decouple the conflict, as shown in Figure 1. However, if the CPU wants to switch to a much lower or higher frequency during frequency scaling, a large 𝛼 value is needed to adjust the integral path code quickly, which stores the center frequency code of the DCO. This integral path also contributes to the output jitter when a large 𝛼 value is applied, as shown in [6]; the peak-to-peak output jitter is related to both integral path gain and proportional path gain: 𝐭 𝒑𝒑 = 𝑵𝒌𝒗𝒄𝒐 𝟒𝒒𝟐 ((𝟏 + 𝑫)𝟒 𝜶𝟑 + 𝟒(𝟏 + 𝑫)𝟑 𝜶𝟐 𝒒 + 𝟖(𝟏 + 𝑫)𝟐 𝜶𝒒𝟐 + 𝟖(𝟏 + 𝑫)𝒒𝟑 ) 𝐪= 𝛃− 𝛂(𝟏+𝟐𝐃) 𝟐 (2) (3) where N is the division ratio, D is the delay cycle of loop gain, 𝛼 is the integral path gain. As shown in Figure 2, the output jitter is directly related to the q value above. Figure 2 Relationship of output jitter Δtpp and path gains Figure 1 Conventional bang-bang ADPLL Architecture III. SAR-BASED ADPLL II. FAST-LOCKED ADPLL DESIGN CONSIDERATIONS A conventional bang-bang ADPLL, of which architecture is shown above in Figure 1, generally has to adjust the output frequency and phase through the binary UP/DN bang-bang code, which makes it suffer from long locking time and may cause stability problem if the output frequency is far from the center reference frequency. In general, the locking time of conventional bang-bang ADPLL strongly relies on the bandwidth of system because the gain of the system determines how much phase error a PLL can correct for each cycle [4]. For a first order system, the locking time can be expressed as: 𝐭 𝒍𝒐𝒄𝒌 = 𝝅 𝟏 𝟐𝝅 ×(𝜷𝒌𝒗𝒄𝒐 −𝒇𝒐𝒇𝒇 ) 𝒇𝒓𝒆𝒇 𝒇𝒓𝒆𝒇 (1) where fref is the reference clock frequency, 𝛽 is the proportional gain, and foff is the initial frequency error. When 𝛽𝑘𝑣𝑐𝑜 , the gain of the system, is smaller than the initial frequency error, the system becomes instable and can never settle. With this lower bound on the proportional gain of firstorder bang-bang ADPLL, a large gain is favored to prevent instability problem. However, a larger 𝛽 induces a wider As a result, there is a tradeoff between the locking time, which determines the number of stalled instructions, and the clock jitter performance. However, as shown in Section I, recently reported ADPLLs have their common tradeoffs. The major issue is that when the ADPLL is adjusting the frequency discrepancy, it is not reducing its phase error; meanwhile, it is not able to reduce its phase error without changing the locked frequency. Besides, if the frequency error is large, cycle slipping will occur. The system is neither adjusting frequency nor phase error; and it finds the center frequency by averaging many cycles of the UP/DN signal from BBPHD. Therefore, to circumvent these inherent conflicts, this work proposes a novel successive-approximation frequency-search algorithm to not only center the starting frequency of DCO on the desired frequency but also align the phase of the output clock to that of the reference clock. The architecture of the proposed ADPLL is illustrated in Figure 3, which is composed of a bang-bang phase detector (BBPHD), a PI digital loop filter, a divider, a five-stage digitally controlled oscillator (DCO), a SAR delay-search block that determines the amount of delay needed for phase alignment, and a SAR frequency-search block that determines the initial control code of the DCO. 3 SAR Delay-Search Ref_clk MUX2 MUX1 DCDL β Ref_clkd 1 BBPHD BBPHD DCO 1 1 + /SET D Q /CLR SAR Delay-Search Clock_out 0 0 + α Q UserDefined MUX3 reset 0 PI Controler activate 0 1 Frequency Serch Mode SAR Frequency Search Divide by 16 Figure 3 Top level block diagram of the proposed ADPLL A. SAR-based Frequency-Search In dynamic frequency scaling application, when the microprocessor starts the adjustment, generally (but not totally), it stalls the instructions and waits for the ADPLL to relock. This is because, during the adjustment stage, the clock is experiencing quick changes of frequency and gives huge jitter. Thus, when CPU is experiencing frequency scaling, it sends a trigger signal to reset the ADPLL and starts its frequency-initialization. In this mode, MUX2 selects output code from frequency-search block instead of the PI digital filter. The frequency-search block, firstly, initializes the most significant bit (MSB) of the DCO binary code to be 1 and keeps the remained bits to be 0. When the clock is high, the DCO, consisting of tri-state buffers, is disabled with all buffers turned off, while the divider is also reset to output Vdd. At the falling edge of the reference clock, the divider is activated and it starts falling from Vdd. MUX2 then passes the DCO thermal code, which also comes from the frequencysearch block, and enables the DCO. As shown in Figure 5, with divide-by-16 circuitry, the divider output signal will start rising after 8 cycles of DCO output. The BBPHD, by identifying whether divider clock signal rises earlier than the reference, is now able to decide whether the MSB should be 1 or 0. When the reference clock is high again, the divider and DCO are reset again; and the frequency-search block sets the MSB based on this BBPHD decision. Starting the next round, the frequency-search block sets the next MSB [MSB-1] to be 1 and then go through the above process to check whether this MSB-1 should be 1 or 0. Therefore, with frequency search based on this successiveapproximation algorithm, illustrated in Figure 4, after log2(N) cycles, the frequency-search block is able to approximate the final DCO control code, whereas N is the DCO binary code word length. Besides, the hardware that implements this algorithm is mainly composed of shift registers. Thus, its digital complexity is low. After the algorithm is finished, the code is then passed to the integral path as the initial frequency code. After the least significant bit (LSB) is finished with this algorithm, the remaining phase error between the divider output clock and the reference clock is tiny because they are falling almost at the same time with frequencies close to each other. As a result, the ADPLL can lock in a short amount of time. Set DCO[MSB]=1 ref_clk=0 1->MSB activate DCO & divider Tref_clk>TDivider? ref_clk=1 0->MSB deactivate DCO & divider Set DAC[MSB-1]=1 ref_clk=0 activate DCO & divider 1->[MSB-1] Tref_clk>VDivider? 0->[MSB-1] 1->[LSB] Tref_clk>VDivider? 0->[LSB] ref_clk=1 deactivate DCO & divider Frequency serch done Figure 4 Flowchart of DCO frequency-search based on SAR algorithm 4 Figure 5 Timing diagram of SAR-based frequency-search DCO is activated when reference clock is low; BBPHD detects the rising edge of divider and reference clock Figure 6 Frequency vs Time during frequency-search. The first two cycles detect delay; the remained cycles detect frequency Figure 7 Locking procedure of the proposed ADPLL B. SAR-based Delay-Search The SAR-based frequency-search will work well for fastlocked ADPLL with one issue solved ahead and discussed in this subsection. Since the DCO and divider are triggered to be re-activated when the reference clock goes down, in real implementation, there is delay between the falling edge of the reference clock and this re-activation trigger signal as well as delay between this trigger signal and the divider output clock. Besides, the DCO ring oscillators also need a certain amount of time to start up, causing the divider output clock to always fall after the reference clock for several hundred picoseconds. However, since the BBPHD makes decision only based on the rising edges of the two clocks and neglects the differences on their falling edges, this gives significant frequency error in frequency-search stage. Therefore, in order to compensate such delay for accurate determination of the final DCO frequency, extra delay is added to the reference clock. In the implementation, during frequency-search mode, the BBPHD is actually comparing a delayed version of the reference clock, REF_CLKD in Figure 8, instead of the reference clock itself, REF_CLK in Figure 8. Since such delay is sensitive to process and temperature variations, a digitally controlled delay line (DCDL) is employed to adjust the amount of delay needed to align the falling edge of the reference clock with the divider output clock. As shown in Figure 8, this DCDL is implemented with NAND gates, consisting of four different delay paths controlled by SAR-based delay search. Similar to the SAR algorithm used in frequency-search, a conventional SAR delay-search block is implemented here. When the reference clock is high, it sets the MSB to be 1 and resets the divider and oscillator. When the reference clock is low, another detector, BBPHD2, checks the falling edge of the delayed reference clock and the divider clock to determine the MSB. Then it continues to check the remaining bits with the same algorithm and begins the SAR-based frequency-search later when it is done. This extra delay line does generate additional phase 5 DELAYN[0] DELAYN[1] DELAYN[2] REF_CLK DELAY[0] DELAY[1] DELAY[3] Vdd REF_CLKD DELAY[2] Figure 8 Delayed reference clock design based on SAR algorithm C. Digitally-controlled Oscillator Design In [3], a three-stage ring oscillator is implemented with tristate buffer, which achieves wide frequency range because its maximum frequency is determined by the delay of a single tristate buffer when all the tri-state buffers are on. Meanwhile, its minimum frequency is determined by the number of tristate buffers on the ring oscillator because once the tri-state buffers are off they serve as on-path capacitances in the oscillator to lower frequency. However, this type of three-stage ring oscillator does not reach full swing at the output when it drives capacitive gates such as a divider or even an inverter. Therefore, a five-stage tri-state buffer based ring oscillator is implemented as shown in Figure 9, which consists of 960 tri-state buffers controlled by thermo code. In order to reduce the complexity of binaryto-thermo decoder, 15 tri-state buffers are grouped into a row controlled by a 64-bit thermo code converted from a 4-bit binary code; and then within the row group, the row decoder controls whether this group should be all on or off or should be partially on based on a 16 bits thermo code. When the ring oscillator is off, all the internal nodes are not driven and may not be able to stay at either Vdd or gnd due to leakage, which will reduce the DCO start-up speed. Therefore, five extra tristate buffers are added to drive each node to either 1 or 0 during the reset. The top 4 rows of this DCO are always on after frequency-search, which sets the minimum frequency and worst case resolution. Since the maximum frequency is only limited by the delay of tri-state buffer cells, and the minimum frequency is limited by the always-on buffers in the top row driving all the buffers, this DCO achieves a wide range of frequency (0.42-12GHz). R o w 64 bits noise to the clock, which can somehow be filtered out through a low-pass ADPLL system. In the proposed architecture shown in Figure 3, MUX1 is used to select whether the input clock should be the original reference clock or its delayed version after both delay-search and frequency-search. If the ADPLL system bandwidth is large and CPU clock jitter requirement is strict, then the original reference clock should be selected as the input clock. In this situation, the ADPLL still needs several cycles to compensate the phase error due to divider and oscillator delay before it resolves the remaining frequency error. On the other hand, if the ADPLL system bandwidth is small and CPU demands fast locking property of ADPLL during frequency scaling to improve its power performance, then the delayed version of the reference clock should be selected because in this case, once the frequency search is done, there is almost negligible phase error between the divider and the reference clock, and the ADPLL can adjust the remained frequency error right away. C o n t r o l Figure 9 Five-stage DCO using tri-state buffers D. PI Controller After the SAR frequency-Search, both frequency and phase errors are small, which allows this ADPLL to use a much smaller proportional path gain and integral path gain to reduce the jitter due to quantization noise. However, the resolution of DCO sets a limit on this quantization noise. In order to achieve stability, the ratio of proportional path gain (β) to integral path gain (α) needs to be greater than a certain number. If α is set to DCO resolution, then β needs to be several times larger, which results in huge quantization noise. If the DCO resolution is kept small, a large number of tri-state buffers need to be on, resulting in huge power consumption. In order to use close to minimum resolution, in this ADPLL, integral path code increment or decrement by 1 when it is a multiple of 4, which creates an equivalent integral path gain of α=¼. The proportional path gain is chosen to be β=2. As a result, the largest step is when the integral path increment by 1 and the proportional path gives a 2, which is 3KDCO, whereas KDCO is the resolution of DCO. IV. SIMULATION RESULTS The proposed ADPLL is implemented in Nangate Open Cell Library with CMOS 45nm technology. It consumes 16mW when operating at 4.5GHz at 1.0V supply voltage. The simulated locking procedure is shown in Figure 6 and Figure 7. As shown in Figure 6, the first two cycles are for SAR delaysearch. Since the word length of DCO is 10, it then spends 10 cycles in searching the frequency. After this search is done, as shown in Figure 7, it corrects the remained frequency and phase error and finally locks at 790ns. Figure 10 shows the jitter histogram observed from 5000 cycles. The root-meansquare and peak-to-peak jitter values are 1.32ps and 4.56ps, respectively. Table 1 summarizes process technology, locking time, jitter RMS, core area and power consumption of recently reported ADPLL designs. These data are compared with our proposed SAR-based ADPLL implementation using 45nm Nangate Open Cell Library. With the proposed SAR-search algorithm, this work locks much faster than other work. 6 V. CONCLUSION In this piece of work, we proposed a novel successiveapproximation algorithm to improve the frequency and phase locking time of BBPHD-based ADPLL without much scarification of output clock jitter. 790ns locking time demonstrates that such design is suitable for microprocessor, which needs to switch to a different frequency during dynamic frequency scaling. Although this work achieves fast-locking, it still needs to stall CPU during the locking process because the oscillator is reset and the frequency changes dramatically during SARsearch. Therefore, future works will be to design an ADPLL that can both achieve fast locking and a smooth frequency change in DCO. Figure 10 Output clock jitter histogram Table 1 Performance Comparisons of Various ADPLL Implementations CMOS Process Core Area Power Output Range Locking Time Jitter RMS Jitter peak-to-peak [10] Hsu 0.18µm 0.14 mm2 26.7mW@600MHz 62~616MHz NA 7.28ps @600MHz 56ps @600MHz [8] Kim 0.13µm 0.2 mm2 16.5mW@1.35GHz 0.3~1.4GHz 3.5µs 3.7ps @1.35GHz 32ps @ 1.35GHz [9] Chung 65nm 0.07mm2 1.81mW@520MHz 90~527MHz NA 8.64ps @527MHz NA [2] Tierno 45nm 0.07 mm2 NA 0.8~12GHz *46 µs 1ps @5GHz NA This Work 45nm N/A 16mW@4.5GHz 0.42~12GHz 790ns 1.32ps @4.5GHz 4.56ps @4.5GHz *Estimated from plot REFERENCES [1] [2] [3] [4] [5] R.B. Staszewski et al. “All-digital PLL and transmitter for mobile phones,” IEEE Journal of SolidState Circuits, vol. 40, pp. 2469-2482, Dec. 2005. J. Tierno et al. “A DPLL-based per Core Variable Frequency Clock Generator for an Eight-Core POWER7TM Microprocessor”, Symposium on VLSI Circuit Digest of Tech Papers, pp. 85-86, Jun. 2010 A. Rylyakov et al. “A Wide Tuning Range (1GHz-to-15GHz) Fractional-N All-Digital PLL in 45nm SOI”, Proc. CICC, pp. 431-434, Jun. 2008 M. Chan, A. Postula. “Transient analysis of bang-bang phase locked loops,” IET Circuits, Devices & Systems, vol. 3, pp. 76-82, 2009; S. Y. Yang et al, “A 7.1mW, 10GHz All Digital Frequency Synthesizer With Dynamically Reconfigured Digital Loop Filter in 90 nm CMOS Technology,” IEEE Journal of Solid-State Circuits, vol. 45, pp. 578586, Mar. 2010; [6] N.D.Dalt, “A Design-Oriented Study of Nonlinear Dynamics of Digital Bang-Bang PLLs,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no.2, pp. 181-185, Feb. 2007 [7] C. C. Hung et al, “A 40-GHz Fast-Locked All-Digital Phase-Locked Loop Using a Modified BangBang Algorithm,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 58, pp. 321-325, June 2011. [8] D.-S. Kim et al. “A 0.3-1.4 GHz All-Digital Fractional-N PLL With Adaptive Loop Gain Controller,” IEEE Journal of Solid State Circuits, vol. 45, pp. 2300-2311, Nov.2011 [9] C.-C. Chung et al. “A Fast Tracking ADPLL for Video Pixel Clock Generation in 65nm CMOS Technology”, IEEE Journal of Solid State Circuits, vol. 40, pp. 2300-2311, Oct.2011 [10] H.-J. Hsu et al, “A Low-Jitter ADPLL via a Suppressive Digital Filter and an Interpolation-Based Locking Scheme”, IEEE Trans on VLSI Systems, vol.19, pp.165-170, Jan. 2011