A DPLL-based per Core Variable Frequency Clock Generator

advertisement
A DPLL-based per Core Variable Frequency Clock Generator for an Eight-Core
POWER7TM Microprocessor
Jose Tierno1, Alexander Rylyakov1, Daniel Friedman1, Ann Chen2, Anthony Ciesla2, Timothy Diemoz2, George English2, David
Hui2, Keith Jenkins1, Paul Muench2, Gaurav Rao2, George Smith III2, Michael Sperling2, Kevin Stawiasz1
1
IBM Research, 2Systems & Technology Group, IBM
IBM T. J. Watson Research Center
Yorktown Heights, NY 10598 USA
tierno@us.ibm.com, sasha@us.ibm.com, dfriedmn@us.ibm.com
Abstract
A per-core clock generator for the eight-core POWER7TM
processor is implemented with a digital PLL. This frequency
generator is capable of smooth, controlled frequency slewing,
minimizing the impact of di/dt. Frequency can be dynamically
adjusted while the clock is running, and without skipping any
cycles, thus enabling aggressive power management
techniques.
Introduction
As the number of cores in modern processors keeps
growing, power management at the core level is becoming
increasingly important. One common method used to trade off
power against performance is changing the clock frequency of
the processor [1]. Due to time dependent variations in load for
a single core and across multiple cores, dynamically adjustable,
per core clock generators are highly desirable. Such circuits,
to be practical in a real system environment, must have low
area and power overhead. Moreover, to extract the full value
of per core dynamic frequency synthesis capability, it must be
possible to change the frequency while the core is executing
code. This drives a set of requirements for the variable
frequency generator: a near-continuous set of achievable
output frequencies; a controllable frequency slew rate with no
skipped cycles, limiting the power supply drop caused by di/dt;
and no short cycles, preventing timing hazards. In this paper
we present a variable frequency generation circuit integrated in
each of the eight cores in the POWER7TM processor [2], which
meets all the requirements given above.
Architecture
The variable frequency generator is built around a
fractional-N DPLL [3]. Figure 1 shows the top-level block
diagram of the circuit.
The key components of the
DPLL-based generator include a bang-bang, self-timed phase
and frequency detector (PFD), a digital loop filter, a digitally
controlled oscillator (DCO), prescalers, a multi-modulus
feedback divider together with a delta-sigma modulator, and a
multiplier filter.
The output frequency of the DPLL can be changed
dynamically by controlling the modulus of the feedback
divider. The multiplier filter’s function is to generate a
sequence of modulus values resulting in a controlled,
programmable frequency slew rate. The multiplier filter block
diagram is shown in Figure 2. When a new target frequency
multiplier Mult_in and frequency slew rate Mult_slew are
programmed into the filter, a linear sequence of multipliers
Mult_out is generated by incrementing or decrementing the
internal Mult_acc register until its value matches the new
target multiplier. The slew rate of the Mult_out multiplier is
controlled by dividing down the clock applied to the multiplier
filter, according to the Mult_slew value.
In order to increase the frequency tracking bandwidth, and
hence tracking capability, of the clock generator, a
modification of the bang-bang PFD is required. The modified
PFD is shown in Figure 3. The key new feature of the circuit is
its ability to detect cycle slips between the reference clock
Ref_Clk and the feedback clock FB_Clk. The cycle-slip
detector creates signals Ref_Faster / FB_Faster with duty
cycle proportional to the frequency difference between
reference and feedback clock. These signals are multiplied by
a very large gain in the loop filter, which allows the modified
DPLL to achieve frequency lock very quickly. These signals
are always de-asserted around phase lock, and therefore do not
perturb the stability of the locked clock generator.
Bang-bang operation of the modified PFD is similar to that
reported in [5]. In the modified PFD, unlike the previously
reported BB-PFD design, cycle slips are detected for
downstream use. Note that when a cycle-slip occurs, a second
edge of one of the clocks arrives without an intervening edge
of the other clock. In this case, within the modified PFD, the
second edge of the fast clock arrives when the corresponding
edge detector latch is still set, as Reset can only be asserted
after both clock edges have arrived. The corresponding Faster
signal will then be set, indicating which clock is running faster.
Because the Faster signals are asserted every time that a
cycle-slip happens, they are asserted with a frequency that is
proportional to the difference in frequency between the two
clocks. Once the DPLL is locked, the two clock frequencies
are the same, and the Faster signals are automatically
de-asserted.
Measurements
We show in Figure 4 measurements of the well controlled
frequency transient from the variable clock generator
integrated within a POWER7TM core, for three different setting
of the slew rate. Figure 5 shows the final piece of the transient,
where three distinct regions are clearly identifiable: frequency
acquisition, phase acquisition, and phase lock. In the
frequency acquisition region, the generator output follows the
frequency multiplier ramp. Every time that a cycle slip occurs
between the reference and the feedback clock, the output
frequency is changed by about 30 MHz. Once the frequency
multiplier is stable, and the generator output is close to the
target frequency, cycle slips stop occurring. In this phase
acquisition region, the final phase and frequency are attained
Acknowledgment: This material is based upon work
supported by the Defense Advanced Research Projects Agency
under its Agreement No. HR0011-07-9-0002
References
[1] A. Allen, J. Desai, F. Verdico, F. Anderson, D Mulvhill, D.
Kruger, “Dynamic Frequency-Switching Clock System on
a Quad-Core Itanium® Processor” In Proc. IEEE Solid
State Circuits Conference, Feb. 2008
[2] R. Kalla “POWER7: IBM’s Next Generation POWER”
HOT Chips 2009 Tech. Digest, Aug. 2009
[3] J.A. Tierno, A.V. Rylyakov, D.J. Friedman, “A Wide
Power Supply Range, Wide Tuning Range, All Static
CMOS All Digital PLL in 65 nm SOI” IEEE Journal of
Solid-State Circuits, Vol. 43, Issue 1, pp. 42-51, Jan. 2008
Ref First
Reset
Ref Clk
R
FB Early
Mutex
Ref Early
FB First
FB Clk
R
Ref Edge
C
FB Edge
A first
B first
A
B
Ref Faster
Ref Clk
W
FB Faster
FB Clk
Figure 3 Self-timed bang-bang PFD with cycle-slip detector
Frequency Ramp
4.50
4.00
Frequency (GHz)
using the integrator in the DPLL loop filter. Proper selection
of DPLL loop filter constants ensures that no overshoot or
undershoot in cycle time is present in the transient. The
demonstrated combination of the controlled frequency slew
rate and the absence of short cycles proves that the proposed
DPLL-based variable frequency generator can be used to
dynamically adjust the frequency while the core is executing
code.
The DCO has a measured tuning range from 800 MHz to
12 GHz over PVT. The DCO output is divided by two to
improve the duty cycle of the clock provided to the processor
core logic. RMS period jitter is 1 ps at 5 GHz, and 6 ps at 1
GHz. The DPLL has an area of 200 μm x 350 μm, half of
which is occupied by the voltage regulator for the DCO.
Figure 6 shows a micrograph of the POWER7TM core, with the
size and position frequency generator outlined.
70 MHz/uS
140 MHz/uS
3.50
3.00
17.5 MHz/uS
2.50
2.00
1.50
-30.00
20.00
70.00
120.00
170.00
Time (uS)
Figure 4 Measured frequency transients
0.26
Frequency Acquisition
0.258
DCO Tcycle Step (~ 2 ps)
Tcycle (ns)
0.256
Programmable
Voltage Regulator
Proportional Bypass
Reference
Clock
Early/
Late
Integer
Control
2
9
PFD Fast/Slow
2
PI
Loop
Filter
DCO
Control
1/2/4
Output
Divider
Row
16
8
FeedForward
ΔΣ
8
Filter
8
Slew-rate
ΔΣ
10
Integer
Multiplier
6
Lock
17.5 MHz/us
DCO
2/4/8
Prescaler
Dither
3
0.248
25.00
30.00
35.00
40.00
45.00
50.00
Time (us)
Figure 5 Detailed measured transient showing cycle time
(Tcycle) against time at the end of a frequency ramp.
Feedback clock,
to all digital logic
3
Fractional
Multiplier Multiplier Frac-N
Phase Acquisition
0.25
Clock-gating signals
Multiplier
0.254
0.252
Col
24
Fract.
Control
Output
Clock
Multi-modulus
Feed-back
Divider
10 mm
Figure 1 Top level block diagram of the DPLL
Mult_acc
Shift and
Pad
14
dec
a
a>b
Comp
a<b inc
b
0
1
14
10
Mult_out
to Frac-N ΔΣ
5.5 mm
Mult_in
L2 Cache
14
Core
14
Filter Bypass
Mult_slew
DPLL Clk
8
%N
350 μm x 200 μm
Clock Divider
Figure 2 Frequency Multiplier filter / ramp generator
DPLL
L3 Cache + Interconnect
Figure 6 Micrograph of the IBM Power7 processor core,
showing the frequency generator in the lower left corner
Download