A DPLL-based per Core Variable Frequency Clock Generator for an Eight-Core POWER7TM Microprocessor Jose Tierno1, Alexander Rylyakov1, Daniel Friedman1, Ann Chen2, Anthony Ciesla2, Timothy Diemoz2, George English2, David Hui2, Keith Jenkins1, Paul Muench2, Gaurav Rao2, George Smith III2, Michael Sperling2, Kevin Stawiasz1 1 IBM Research, 2Systems & Technology Group, IBM IBM T. J. Watson Research Center Yorktown Heights, NY 10598 USA tierno@us.ibm.com, sasha@us.ibm.com, dfriedmn@us.ibm.com Abstract A per-core clock generator for the eight-core POWER7TM processor is implemented with a digital PLL. This frequency generator is capable of smooth, controlled frequency slewing, minimizing the impact of di/dt. Frequency can be dynamically adjusted while the clock is running, and without skipping any cycles, thus enabling aggressive power management techniques. Introduction As the number of cores in modern processors keeps growing, power management at the core level is becoming increasingly important. One common method used to trade off power against performance is changing the clock frequency of the processor [1]. Due to time dependent variations in load for a single core and across multiple cores, dynamically adjustable, per core clock generators are highly desirable. Such circuits, to be practical in a real system environment, must have low area and power overhead. Moreover, to extract the full value of per core dynamic frequency synthesis capability, it must be possible to change the frequency while the core is executing code. This drives a set of requirements for the variable frequency generator: a near-continuous set of achievable output frequencies; a controllable frequency slew rate with no skipped cycles, limiting the power supply drop caused by di/dt; and no short cycles, preventing timing hazards. In this paper we present a variable frequency generation circuit integrated in each of the eight cores in the POWER7TM processor [2], which meets all the requirements given above. Architecture The variable frequency generator is built around a fractional-N DPLL [3]. Figure 1 shows the top-level block diagram of the circuit. The key components of the DPLL-based generator include a bang-bang, self-timed phase and frequency detector (PFD), a digital loop filter, a digitally controlled oscillator (DCO), prescalers, a multi-modulus feedback divider together with a delta-sigma modulator, and a multiplier filter. The output frequency of the DPLL can be changed dynamically by controlling the modulus of the feedback divider. The multiplier filter’s function is to generate a sequence of modulus values resulting in a controlled, programmable frequency slew rate. The multiplier filter block diagram is shown in Figure 2. When a new target frequency multiplier Mult_in and frequency slew rate Mult_slew are programmed into the filter, a linear sequence of multipliers Mult_out is generated by incrementing or decrementing the internal Mult_acc register until its value matches the new target multiplier. The slew rate of the Mult_out multiplier is controlled by dividing down the clock applied to the multiplier filter, according to the Mult_slew value. In order to increase the frequency tracking bandwidth, and hence tracking capability, of the clock generator, a modification of the bang-bang PFD is required. The modified PFD is shown in Figure 3. The key new feature of the circuit is its ability to detect cycle slips between the reference clock Ref_Clk and the feedback clock FB_Clk. The cycle-slip detector creates signals Ref_Faster / FB_Faster with duty cycle proportional to the frequency difference between reference and feedback clock. These signals are multiplied by a very large gain in the loop filter, which allows the modified DPLL to achieve frequency lock very quickly. These signals are always de-asserted around phase lock, and therefore do not perturb the stability of the locked clock generator. Bang-bang operation of the modified PFD is similar to that reported in [5]. In the modified PFD, unlike the previously reported BB-PFD design, cycle slips are detected for downstream use. Note that when a cycle-slip occurs, a second edge of one of the clocks arrives without an intervening edge of the other clock. In this case, within the modified PFD, the second edge of the fast clock arrives when the corresponding edge detector latch is still set, as Reset can only be asserted after both clock edges have arrived. The corresponding Faster signal will then be set, indicating which clock is running faster. Because the Faster signals are asserted every time that a cycle-slip happens, they are asserted with a frequency that is proportional to the difference in frequency between the two clocks. Once the DPLL is locked, the two clock frequencies are the same, and the Faster signals are automatically de-asserted. Measurements We show in Figure 4 measurements of the well controlled frequency transient from the variable clock generator integrated within a POWER7TM core, for three different setting of the slew rate. Figure 5 shows the final piece of the transient, where three distinct regions are clearly identifiable: frequency acquisition, phase acquisition, and phase lock. In the frequency acquisition region, the generator output follows the frequency multiplier ramp. Every time that a cycle slip occurs between the reference and the feedback clock, the output frequency is changed by about 30 MHz. Once the frequency multiplier is stable, and the generator output is close to the target frequency, cycle slips stop occurring. In this phase acquisition region, the final phase and frequency are attained Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 References [1] A. Allen, J. Desai, F. Verdico, F. Anderson, D Mulvhill, D. Kruger, “Dynamic Frequency-Switching Clock System on a Quad-Core Itanium® Processor” In Proc. IEEE Solid State Circuits Conference, Feb. 2008 [2] R. Kalla “POWER7: IBM’s Next Generation POWER” HOT Chips 2009 Tech. Digest, Aug. 2009 [3] J.A. Tierno, A.V. Rylyakov, D.J. Friedman, “A Wide Power Supply Range, Wide Tuning Range, All Static CMOS All Digital PLL in 65 nm SOI” IEEE Journal of Solid-State Circuits, Vol. 43, Issue 1, pp. 42-51, Jan. 2008 Ref First Reset Ref Clk R FB Early Mutex Ref Early FB First FB Clk R Ref Edge C FB Edge A first B first A B Ref Faster Ref Clk W FB Faster FB Clk Figure 3 Self-timed bang-bang PFD with cycle-slip detector Frequency Ramp 4.50 4.00 Frequency (GHz) using the integrator in the DPLL loop filter. Proper selection of DPLL loop filter constants ensures that no overshoot or undershoot in cycle time is present in the transient. The demonstrated combination of the controlled frequency slew rate and the absence of short cycles proves that the proposed DPLL-based variable frequency generator can be used to dynamically adjust the frequency while the core is executing code. The DCO has a measured tuning range from 800 MHz to 12 GHz over PVT. The DCO output is divided by two to improve the duty cycle of the clock provided to the processor core logic. RMS period jitter is 1 ps at 5 GHz, and 6 ps at 1 GHz. The DPLL has an area of 200 μm x 350 μm, half of which is occupied by the voltage regulator for the DCO. Figure 6 shows a micrograph of the POWER7TM core, with the size and position frequency generator outlined. 70 MHz/uS 140 MHz/uS 3.50 3.00 17.5 MHz/uS 2.50 2.00 1.50 -30.00 20.00 70.00 120.00 170.00 Time (uS) Figure 4 Measured frequency transients 0.26 Frequency Acquisition 0.258 DCO Tcycle Step (~ 2 ps) Tcycle (ns) 0.256 Programmable Voltage Regulator Proportional Bypass Reference Clock Early/ Late Integer Control 2 9 PFD Fast/Slow 2 PI Loop Filter DCO Control 1/2/4 Output Divider Row 16 8 FeedForward ΔΣ 8 Filter 8 Slew-rate ΔΣ 10 Integer Multiplier 6 Lock 17.5 MHz/us DCO 2/4/8 Prescaler Dither 3 0.248 25.00 30.00 35.00 40.00 45.00 50.00 Time (us) Figure 5 Detailed measured transient showing cycle time (Tcycle) against time at the end of a frequency ramp. Feedback clock, to all digital logic 3 Fractional Multiplier Multiplier Frac-N Phase Acquisition 0.25 Clock-gating signals Multiplier 0.254 0.252 Col 24 Fract. Control Output Clock Multi-modulus Feed-back Divider 10 mm Figure 1 Top level block diagram of the DPLL Mult_acc Shift and Pad 14 dec a a>b Comp a<b inc b 0 1 14 10 Mult_out to Frac-N ΔΣ 5.5 mm Mult_in L2 Cache 14 Core 14 Filter Bypass Mult_slew DPLL Clk 8 %N 350 μm x 200 μm Clock Divider Figure 2 Frequency Multiplier filter / ramp generator DPLL L3 Cache + Interconnect Figure 6 Micrograph of the IBM Power7 processor core, showing the frequency generator in the lower left corner