DOCX

advertisement
1
A Fast-Locked All-Digital Phase-Locked Loop for
Dynamic Frequency Scaling
Dian Huang and Ying Qiao
Abstract— All-Digital Phase-Locked Loop (ADPLL) for digital
system clock generation is widely studied to replace the
traditional analog PLL as the process technology enters the
nanometer regime. Numerous researches have been performed to
reduce the frequency/phase locking time as well as mitigating the
output clock jitter. This paper presents a novel successiveapproximation frequency-search algorithm to center the starting
frequency of digitally controlled oscillator (DCO) on the target
frequency while aligning the phase of the output clock to that of
the reference clock. With small design complexity overhead, the
implemented ADPLL using 45nm Nangate open cell library
shows period jitter 4.56ps (peak-to-peak), 1.32ps (rms) with
790ns locking time at 4.5GHz under 1.0V supply voltage in
simulation.
Index Terms— all digital phase-locked loop (ADPLL), bangbang phase detector (BBPHD), successive approximation
registers (SAR)
I. INTRODUCTION
A
digital system such as system-on-chip microprocessor
generally requires Phase-Locked Loop (PLL) for clock
generation. However, traditional analog PLL typically
contains several important components not included in a static
CMOS standard cell library, such as resistors and low leakage
capacitors, which makes it difficult to be integrated into a
digital system and to be remapped when porting to new
technologies [1]. Furthermore, the yield and performance
spreads of the analog PLL may not be well-correlated to those
of the digital designs on chip. Therefore, in recent years, the
benefits from IC process technology scaling have enabled the
practical implementation of all-digital phase-locked loop
(ADPLL).
On the other hand, dynamic per core clock frequency
scaling [2] has become critical for state-of-the-art multi-core
microprocessor power management. As the microprocessor
commonly pauses computing and waits for the clock to be
ready during dynamic frequency scaling, fast-locked ADPLL
has obtained growing attention among the design house, in
addition to the conventional focus on jitter performance.
Various approaches on fast-locked ADPLL have been
proposed for microprocessor clock generation under dynamic
frequency scaling. These designs can be divided into two
categories, simultaneous vs. detangled frequency and phase
locking.
ADPLL designs with detangled frequency and phase
locking [9] separate the scheme for frequency acquisition and
phase locking. In frequency tracking stage, the bang-bang
phase detector (BBPHD) outputs an UP/DN signal, by
detecting the phase error between the reference clock and the
output clock, to adjust the DCO output frequency accordingly.
When the BBPHD outputs an UP-DN pair, the frequency
settles within the range of current step size of the DCO control
code and the code then switches to the next significant bit,
which is also known as binary search algorithm (BSA).
During the phase tracking stage when the frequency already
settles, time-to-digital converter (TDC) senses the remained
phase error and generates DCO control code to adjust the
period of several DCO output clocks and align the phase; and
then its frequency changes back to the originally settled one
when phase error is eliminated.
However, although BSA is easily implemented, a tradeoff
among locking time and frequency range must be made
because tlock is proportional to log2(frange). Moreover, highresolution TDC suffers from design complexity and large area.
One of the key issues that slow down PLL locking is that
even though the PLL already outputs a correct frequency, if its
phase error is large, it still needs to adjust this frequency. As a
result, a modified bang-bang algorithm [7] has been proposed
to resolve this issue. When the output frequency is close to the
locking frequency, ADPLL still needs to adjust its frequency
away from the target in order to cancel out this phase error,
which dramatically increases the locking time. In order to
quickly adjust this large phase error, this algorithm stores the
proportional path code into integral path. Thus, the DCO
frequency changes hugely and corrects a large phase error;
however, as long as the bang-bang phase detector outputs an
UP-DN pair, the phase adjustment is ended, the proportional
path code stored in the integral path is cleared, and the
ADPLL goes back to the original frequency. When the phase
error is small enough, this algorithm is disabled and the
ADPLL works the same as conventional BBPHD ADPLL.
This algorithm does improve the locking time; however, this
ADPLL still suffers from the tradeoff between locking time
and jitter as it only works when the DCO frequency is close to
the target frequency, which therefore, is not suitable for
microprocessor that demands a wide range of frequency
scaling.
To overcome this issue, an adaptive loop-gain fast-locked
algorithm is proposed in [5]. In this algorithm, the average
values of the BBPHD signals are transformed through leaky
integrators, which form a filter that dynamically adjusts the
loop gain of the system. Thus, when the frequency error is
large, the gain is enlarged for shorter locking time. When the
error is small, it changes the loop gain to a very small one for
2
low jitter performance. However, the actual DCO gain needs
to be estimated properly a prior, which is unreliable under
process and temperature variations. Even though this issue is
resolved in [5] with dynamically reconfigured digital loop
filter, it significantly increases the design complexity.
In this work, we propose a BBPHD based ADPLL with
novel successive-approximation frequency-search algorithm to
not only center the starting frequency of DCO on the desired
frequency but also align the phase of the output clock to that
of the reference clock. In this algorithm, it first checks how
much delay on the reference clock is needed to align the
falling edge of divider output clock with that of the reference
clock; it then searches the center frequency with successiveapproximation registers (SAR). Finally, conventional ADPLL
technique is used to resolve the remaining frequency and
phase error.
This paper is organized as below. In Section II, we will
discuss the design considerations of the bang-bang ADPLL.
Our proposed ADPLL architecture is shown in Section III. We
show the simulation results of the implemented ADPLL in
Section IV, and followed by conclusions in Section V.
frequency step of the DCO, which is proportional to the jitter
value. Thus, an integral path with smaller gain is added to
decouple the conflict, as shown in Figure 1. However, if the
CPU wants to switch to a much lower or higher frequency
during frequency scaling, a large 𝛼 value is needed to adjust
the integral path code quickly, which stores the center
frequency code of the DCO. This integral path also contributes
to the output jitter when a large 𝛼 value is applied, as shown in
[6]; the peak-to-peak output jitter is related to both integral
path gain and proportional path gain:
𝐭 𝒑𝒑 =
𝑵𝒌𝒗𝒄𝒐
𝟒𝒒𝟐
((𝟏 + 𝑫)𝟒 𝜶𝟑 + 𝟒(𝟏 + 𝑫)𝟑 𝜶𝟐 𝒒 + 𝟖(𝟏 + 𝑫)𝟐 𝜶𝒒𝟐 + 𝟖(𝟏 + 𝑫)𝒒𝟑 )
𝐪= 𝛃−
𝛂(𝟏+𝟐𝐃)
𝟐
(2)
(3)
where N is the division ratio, D is the delay cycle of loop gain,
𝛼 is the integral path gain. As shown in Figure 2, the output
jitter is directly related to the q value above.
Figure 2 Relationship of output jitter Δtpp and path gains
Figure 1 Conventional bang-bang ADPLL Architecture
III. SAR-BASED ADPLL
II. FAST-LOCKED ADPLL DESIGN CONSIDERATIONS
A conventional bang-bang ADPLL, of which architecture is
shown above in Figure 1, generally has to adjust the output
frequency and phase through the binary UP/DN bang-bang
code, which makes it suffer from long locking time and may
cause stability problem if the output frequency is far from the
center reference frequency.
In general, the locking time of conventional bang-bang
ADPLL strongly relies on the bandwidth of system because
the gain of the system determines how much phase error a
PLL can correct for each cycle [4]. For a first order system,
the locking time can be expressed as:
𝐭 𝒍𝒐𝒄𝒌 =
𝝅
𝟏
𝟐𝝅
×(𝜷𝒌𝒗𝒄𝒐 −𝒇𝒐𝒇𝒇 ) 𝒇𝒓𝒆𝒇
𝒇𝒓𝒆𝒇
(1)
where fref is the reference clock frequency, 𝛽 is the
proportional gain, and foff is the initial frequency error.
When 𝛽𝑘𝑣𝑐𝑜 , the gain of the system, is smaller than the initial
frequency error, the system becomes instable and can never
settle. With this lower bound on the proportional gain of firstorder bang-bang ADPLL, a large gain is favored to prevent
instability problem. However, a larger 𝛽 induces a wider
As a result, there is a tradeoff between the locking time,
which determines the number of stalled instructions, and the
clock jitter performance. However, as shown in Section I,
recently reported ADPLLs have their common tradeoffs. The
major issue is that when the ADPLL is adjusting the frequency
discrepancy, it is not reducing its phase error; meanwhile, it is
not able to reduce its phase error without changing the locked
frequency. Besides, if the frequency error is large, cycle
slipping will occur. The system is neither adjusting frequency
nor phase error; and it finds the center frequency by averaging
many cycles of the UP/DN signal from BBPHD. Therefore, to
circumvent these inherent conflicts, this work proposes a
novel successive-approximation frequency-search algorithm to
not only center the starting frequency of DCO on the desired
frequency but also align the phase of the output clock to that
of the reference clock. The architecture of the proposed
ADPLL is illustrated in Figure 3, which is composed of a
bang-bang phase detector (BBPHD), a PI digital loop filter, a
divider, a five-stage digitally controlled oscillator (DCO), a
SAR delay-search block that determines the amount of delay
needed for phase alignment, and a SAR frequency-search
block that determines the initial control code of the DCO.
3
SAR Delay-Search
Ref_clk
MUX2
MUX1
DCDL
β
Ref_clkd
1
BBPHD
BBPHD
DCO
1
1
+
/SET
D
Q /CLR
SAR Delay-Search
Clock_out
0
0
+
α
Q
UserDefined
MUX3
reset
0
PI Controler
activate
0
1
Frequency
Serch Mode
SAR Frequency Search
Divide by 16
Figure 3 Top level block diagram of the proposed ADPLL
A. SAR-based Frequency-Search
In dynamic frequency scaling application, when the
microprocessor starts the adjustment, generally (but not
totally), it stalls the instructions and waits for the ADPLL to
relock. This is because, during the adjustment stage, the clock
is experiencing quick changes of frequency and gives huge
jitter. Thus, when CPU is experiencing frequency scaling, it
sends a trigger signal to reset the ADPLL and starts its
frequency-initialization. In this mode, MUX2 selects output
code from frequency-search block instead of the PI digital
filter.
The frequency-search block, firstly, initializes the most
significant bit (MSB) of the DCO binary code to be 1 and
keeps the remained bits to be 0. When the clock is high, the
DCO, consisting of tri-state buffers, is disabled with all
buffers turned off, while the divider is also reset to output Vdd.
At the falling edge of the reference clock, the divider is
activated and it starts falling from Vdd. MUX2 then passes the
DCO thermal code, which also comes from the frequencysearch block, and enables the DCO. As shown in Figure 5,
with divide-by-16 circuitry, the divider output signal will start
rising after 8 cycles of DCO output. The BBPHD, by
identifying whether divider clock signal rises earlier than the
reference, is now able to decide whether the MSB should be 1
or 0. When the reference clock is high again, the divider and
DCO are reset again; and the frequency-search block sets the
MSB based on this BBPHD decision. Starting the next round,
the frequency-search block sets the next MSB [MSB-1] to be
1 and then go through the above process to check whether this
MSB-1 should be 1 or 0.
Therefore, with frequency search based on this successiveapproximation algorithm, illustrated in Figure 4, after log2(N)
cycles, the frequency-search block is able to approximate the
final DCO control code, whereas N is the DCO binary code
word length. Besides, the hardware that implements this
algorithm is mainly composed of shift registers. Thus, its
digital complexity is low. After the algorithm is finished, the
code is then passed to the integral path as the initial frequency
code. After the least significant bit (LSB) is finished with this
algorithm, the remaining phase error between the divider
output clock and the reference clock is tiny because they are
falling almost at the same time with frequencies close to each
other. As a result, the ADPLL can lock in a short amount of
time.
Set DCO[MSB]=1
ref_clk=0
1->MSB
activate DCO & divider
Tref_clk>TDivider?
ref_clk=1
0->MSB
deactivate DCO & divider
Set DAC[MSB-1]=1
ref_clk=0
activate DCO & divider
1->[MSB-1]
Tref_clk>VDivider?
0->[MSB-1]
1->[LSB]
Tref_clk>VDivider?
0->[LSB]
ref_clk=1
deactivate DCO & divider
Frequency serch done
Figure 4 Flowchart of DCO frequency-search based on SAR algorithm
4
Figure 5 Timing diagram of SAR-based frequency-search
DCO is activated when reference clock is low; BBPHD detects the rising edge of divider and reference clock
Figure 6 Frequency vs Time during frequency-search.
The first two cycles detect delay; the remained cycles detect frequency
Figure 7 Locking procedure of the proposed ADPLL
B. SAR-based Delay-Search
The SAR-based frequency-search will work well for fastlocked ADPLL with one issue solved ahead and discussed in
this subsection. Since the DCO and divider are triggered to be
re-activated when the reference clock goes down, in real
implementation, there is delay between the falling edge of the
reference clock and this re-activation trigger signal as well as
delay between this trigger signal and the divider output clock.
Besides, the DCO ring oscillators also need a certain amount
of time to start up, causing the divider output clock to always
fall after the reference clock for several hundred picoseconds.
However, since the BBPHD makes decision only based on the
rising edges of the two clocks and neglects the differences on
their falling edges, this gives significant frequency error in
frequency-search stage. Therefore, in order to compensate
such delay for accurate determination of the final DCO
frequency, extra delay is added to the reference clock.
In the implementation, during frequency-search mode, the
BBPHD is actually comparing a delayed version of the
reference clock, REF_CLKD in Figure 8, instead of the
reference clock itself, REF_CLK in Figure 8. Since such delay
is sensitive to process and temperature variations, a digitally
controlled delay line (DCDL) is employed to adjust the
amount of delay needed to align the falling edge of the
reference clock with the divider output clock.
As shown in Figure 8, this DCDL is implemented with
NAND gates, consisting of four different delay paths
controlled by SAR-based delay search. Similar to the SAR
algorithm used in frequency-search, a conventional SAR
delay-search block is implemented here. When the reference
clock is high, it sets the MSB to be 1 and resets the divider and
oscillator. When the reference clock is low, another detector,
BBPHD2, checks the falling edge of the delayed reference
clock and the divider clock to determine the MSB. Then it
continues to check the remaining bits with the same algorithm
and begins the SAR-based frequency-search later when it is
done. This extra delay line does generate additional phase
5
DELAYN[0]
DELAYN[1]
DELAYN[2]
REF_CLK
DELAY[0]
DELAY[1]
DELAY[3]
Vdd
REF_CLKD
DELAY[2]
Figure 8 Delayed reference clock design based on SAR algorithm
C. Digitally-controlled Oscillator Design
In [3], a three-stage ring oscillator is implemented with tristate buffer, which achieves wide frequency range because its
maximum frequency is determined by the delay of a single tristate buffer when all the tri-state buffers are on. Meanwhile,
its minimum frequency is determined by the number of tristate buffers on the ring oscillator because once the tri-state
buffers are off they serve as on-path capacitances in the
oscillator to lower frequency.
However, this type of three-stage ring oscillator does not
reach full swing at the output when it drives capacitive gates
such as a divider or even an inverter. Therefore, a five-stage
tri-state buffer based ring oscillator is implemented as shown
in Figure 9, which consists of 960 tri-state buffers controlled
by thermo code. In order to reduce the complexity of binaryto-thermo decoder, 15 tri-state buffers are grouped into a row
controlled by a 64-bit thermo code converted from a 4-bit
binary code; and then within the row group, the row decoder
controls whether this group should be all on or off or should
be partially on based on a 16 bits thermo code. When the ring
oscillator is off, all the internal nodes are not driven and may
not be able to stay at either Vdd or gnd due to leakage, which
will reduce the DCO start-up speed. Therefore, five extra tristate buffers are added to drive each node to either 1 or 0
during the reset. The top 4 rows of this DCO are always on
after frequency-search, which sets the minimum frequency
and worst case resolution. Since the maximum frequency is
only limited by the delay of tri-state buffer cells, and the
minimum frequency is limited by the always-on buffers in the
top row driving all the buffers, this DCO achieves a wide
range of frequency (0.42-12GHz).
R
o
w
64 bits
noise to the clock, which can somehow be filtered out through
a low-pass ADPLL system.
In the proposed architecture shown in Figure 3, MUX1 is
used to select whether the input clock should be the original
reference clock or its delayed version after both delay-search
and frequency-search. If the ADPLL system bandwidth is
large and CPU clock jitter requirement is strict, then the
original reference clock should be selected as the input clock.
In this situation, the ADPLL still needs several cycles to
compensate the phase error due to divider and oscillator delay
before it resolves the remaining frequency error. On the other
hand, if the ADPLL system bandwidth is small and CPU
demands fast locking property of ADPLL during frequency
scaling to improve its power performance, then the delayed
version of the reference clock should be selected because in
this case, once the frequency search is done, there is almost
negligible phase error between the divider and the reference
clock, and the ADPLL can adjust the remained frequency
error right away.
C
o
n
t
r
o
l
Figure 9 Five-stage DCO using tri-state buffers
D. PI Controller
After the SAR frequency-Search, both frequency and phase
errors are small, which allows this ADPLL to use a much
smaller proportional path gain and integral path gain to reduce
the jitter due to quantization noise. However, the resolution of
DCO sets a limit on this quantization noise. In order to
achieve stability, the ratio of proportional path gain (β) to
integral path gain (α) needs to be greater than a certain number.
If α is set to DCO resolution, then β needs to be several times
larger, which results in huge quantization noise. If the DCO
resolution is kept small, a large number of tri-state buffers
need to be on, resulting in huge power consumption. In order
to use close to minimum resolution, in this ADPLL, integral
path code increment or decrement by 1 when it is a multiple of
4, which creates an equivalent integral path gain of α=¼. The
proportional path gain is chosen to be β=2. As a result, the
largest step is when the integral path increment by 1 and the
proportional path gives a 2, which is 3KDCO, whereas KDCO is
the resolution of DCO.
IV. SIMULATION RESULTS
The proposed ADPLL is implemented in Nangate Open
Cell Library with CMOS 45nm technology. It consumes
16mW when operating at 4.5GHz at 1.0V supply voltage. The
simulated locking procedure is shown in Figure 6 and Figure 7.
As shown in Figure 6, the first two cycles are for SAR delaysearch. Since the word length of DCO is 10, it then spends 10
cycles in searching the frequency. After this search is done, as
shown in Figure 7, it corrects the remained frequency and
phase error and finally locks at 790ns. Figure 10 shows the
jitter histogram observed from 5000 cycles. The root-meansquare and peak-to-peak jitter values are 1.32ps and 4.56ps,
respectively.
Table 1 summarizes process technology, locking time, jitter
RMS, core area and power consumption of recently reported
ADPLL designs. These data are compared with our proposed
SAR-based ADPLL implementation using 45nm Nangate
Open Cell Library. With the proposed SAR-search algorithm,
this work locks much faster than other work.
6
V. CONCLUSION
In this piece of work, we proposed a novel successiveapproximation algorithm to improve the frequency and phase
locking time of BBPHD-based ADPLL without much
scarification of output clock jitter. 790ns locking time
demonstrates that such design is suitable for microprocessor,
which needs to switch to a different frequency during dynamic
frequency scaling.
Although this work achieves fast-locking, it still needs to
stall CPU during the locking process because the oscillator is
reset and the frequency changes dramatically during SARsearch. Therefore, future works will be to design an ADPLL
that can both achieve fast locking and a smooth frequency
change in DCO.
Figure 10 Output clock jitter histogram
Table 1 Performance Comparisons of Various ADPLL Implementations
CMOS Process
Core Area
Power
Output Range
Locking Time
Jitter RMS
Jitter peak-to-peak
[10] Hsu
0.18µm
0.14 mm2
26.7mW@600MHz
62~616MHz
NA
7.28ps @600MHz
56ps @600MHz
[8] Kim
0.13µm
0.2 mm2
16.5mW@1.35GHz
0.3~1.4GHz
3.5µs
3.7ps @1.35GHz
32ps @ 1.35GHz
[9] Chung
65nm
0.07mm2
1.81mW@520MHz
90~527MHz
NA
8.64ps @527MHz
NA
[2] Tierno
45nm
0.07 mm2
NA
0.8~12GHz
*46 µs
1ps @5GHz
NA
This Work
45nm
N/A
16mW@4.5GHz
0.42~12GHz
790ns
1.32ps @4.5GHz
4.56ps @4.5GHz
*Estimated from plot
REFERENCES
[1]
[2]
[3]
[4]
[5]
R.B. Staszewski et al. “All-digital PLL and transmitter for mobile
phones,” IEEE Journal of SolidState Circuits, vol. 40, pp. 2469-2482,
Dec. 2005.
J. Tierno et al. “A DPLL-based per Core Variable Frequency Clock
Generator for an Eight-Core POWER7TM Microprocessor”, Symposium
on VLSI Circuit Digest of Tech Papers, pp. 85-86, Jun. 2010
A. Rylyakov et al. “A Wide Tuning Range (1GHz-to-15GHz)
Fractional-N All-Digital PLL in 45nm SOI”, Proc. CICC, pp. 431-434,
Jun. 2008
M. Chan, A. Postula. “Transient analysis of bang-bang phase locked
loops,” IET Circuits, Devices & Systems, vol. 3, pp. 76-82, 2009;
S. Y. Yang et al, “A 7.1mW, 10GHz All Digital Frequency Synthesizer
With Dynamically Reconfigured Digital Loop Filter in 90 nm CMOS
Technology,” IEEE Journal of Solid-State Circuits, vol. 45, pp. 578586, Mar. 2010;
[6] N.D.Dalt, “A Design-Oriented Study of Nonlinear Dynamics of Digital
Bang-Bang PLLs,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54,
no.2, pp. 181-185, Feb. 2007
[7] C. C. Hung et al, “A 40-GHz Fast-Locked All-Digital Phase-Locked
Loop Using a Modified BangBang Algorithm,” IEEE Transactions on
Circuits and Systems II: Express Briefs, vol. 58, pp. 321-325, June 2011.
[8] D.-S. Kim et al. “A 0.3-1.4 GHz All-Digital Fractional-N PLL With
Adaptive Loop Gain Controller,” IEEE Journal of Solid State Circuits,
vol. 45, pp. 2300-2311, Nov.2011
[9] C.-C. Chung et al. “A Fast Tracking ADPLL for Video Pixel Clock
Generation in 65nm CMOS Technology”, IEEE Journal of Solid State
Circuits, vol. 40, pp. 2300-2311, Oct.2011
[10] H.-J. Hsu et al, “A Low-Jitter ADPLL via a Suppressive Digital Filter
and an Interpolation-Based Locking Scheme”, IEEE Trans on VLSI
Systems, vol.19, pp.165-170, Jan. 2011
Download