Circuit Technologies for Multi

advertisement

Circuit Technologies for Multi-Core Processor Design Stefan Rusu Intel Corporation Santa Clara, CA stefan.rusu@intel.com

Xeon, Itanium, Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be Copyright © 2006, Intel Corporation

1

Outline • Dual-core architectural directions • Interconnect trends • Power and leakage reduction – Cache Sleep and Shut-off Modes – Long-Le Transistor Usage • Voltage domains • Clock distribution • Package details • DFT/DFM features • Thermal management • Summary Copyright © 2006, Intel Corporation

2

Dual Core Processors Everywhere!

Intel tips Viiv, Yonah in consumer push Mark LaPedus, 01/06/2006 SAN JOSE, Calif. — Making a big splash in the consumer market, Intel Corp. on Thursday unveiled a dual-core processor, PC platform and several content alliances that are said to provide the foundation for digital entertainment and wireless laptops.

Laptop Intel unwraps dual-core Xeon server processors Tom Krazit, 10/10/2005 At an event Monday in San Francisco, Intel unveiled its first dual-core Xeon chips for two-processor and four-processor servers, previously known by the Paxville code name. Server Intel ready to ship dual-core processors Daniel A. Begun, 3/15/2005 Earlier this year, Intel announced that desktop processors using dual-core technology will be available by the end of June, but the company recently hinted at March's Intel Developer Forum (IDF) that dual-core processors could be available even sooner than that.

Desktop

3

Copyright © 2006, Intel Corporation

Tulsa – Xeon ® Processor

FSB TOP

Core 0 1MB L2 Core 1 1MB L2 Shared 16MB L3 Cache Bus Interface

Core 1 T A G 1MB L2 Control Logic 1MB L2 Core 0 T A G FSB BOT

• Process technology: 65 nm, 8 Cu interconnect layers • Transistor count: 1.328 Billion • Die area: 435 mm 2

16MB L3

Copyright © 2006, Intel Corporation

4

Montecito – Itanium ® 2 Processor Core 0 1MB L2I 256kB L2D 12MB L3 Core 1 1MB L2I 256kB L2D 12MB L3

12MB L3 1MB L2I Core 1

Bus Interface

Core 0 1MB L2I

• Process technology: 90nm, 7 Cu interconnect layers • Transistor count: 1.72 Billion • Die area: 596mm 2

12MB L3

Copyright © 2006, Intel Corporation

5

Yonah – Mobile/Desktop/Blade Server Core 0 Core 1 Shared 2MB L2 Cache Bus Interface

Core 0 Core 1 Bus 2 MB L2 Cache

• Process technology: 65 nm, 8 Cu interconnect layers • Transistor count: 151 Million • Die area: 90.3 mm 2 Copyright © 2006, Intel Corporation

6

Moore’s Law for Multi-Core Processors

1.E+10 1.E+09 Itanium ® Dual-core Itanium ® 2 Dual-core Xeon ® 2 Pentium ® 4 1.E+08 Pentium ® III 1.E+07 Pentium ® II Pentium ® Itanium ® 486 1.E+06 386 286 1.E+05 1980 1990 2000 2010

Process technology scaling enables the number of cores to double every two years Copyright © 2006, Intel Corporation

7

Server Processors L3 Cache Trend On-Die L3 Cache [MB] 28 24 20 16 12 8 4 0 Itanium ® ► Processors ◄ Xeon ® Processors 180nm 130nm 90nm 65nm Cache size doubles with every process generation Copyright © 2006, Intel Corporation

8

100 10 1 SRAM Cell Size Scaling 44 21 10.6

5.6

2.4

1.0

0.57

0.1

0.5 0.35 0.25 0.18 0.13 90n 65n SRAM cell size scales ~0.5x per generation Copyright © 2006, Intel Corporation

9

Server Processors L3 Cache Trend Dual-core Processor Process L3 cache per core [MB] L3 cache per thread [MB] Itanium ® 90nm 12 6 Xeon ® 65nm 8 4 Large per-core and per-thread caches enable server-class performance Copyright © 2006, Intel Corporation

10

Relative delay 100 250 On-chip Interconnect Trend 180 Feature size (nm) 130 90 65 45

Global interconnect without repeaters

10 32

Global interconnect with repeaters

1

Local interconnect (M1,2)

Source: ITRS

Gate delay (FO4)

0.1

• Local interconnects scale with gate delay • Global interconnects do not scale Copyright © 2006, Intel Corporation

11

Layout Techniques to Reduce Coupling • Interleave bussed signals with other busses that switch at a different time • Eliminates capacitive coupling and reduces inductive noise Bus_A V Bus_B t • Switch bit order of bussed signals at every turn • Noise will not be additive across the entire bus route

3 0 4 1 5 2 0 1 2 3 4 5

Swizzled Bus Route Copyright © 2006, Intel Corporation

12

Layout Techniques (Cont) • Interleave narrow Vcc/Vss lines through bus Vcc Vss Vcc Vss Vss • Use staggered inverting buffers [7] Copyright © 2006, Intel Corporation

13

L3 Cache Sleep and Shut-off Modes Active Mode Sleep Mode Shut-off Mode Sub-array Sub-array Sub-array Virtual VSS Block Select Sleep Bias Shut off

X X X

1.1V

Virtual VSS 0V 2x lower leakage 250mV 2x lower leakage 520mV 0V Copyright © 2006, Intel Corporation

14

Leakage Shut-off Infrared Images 16MB SKU All 16MB in sleep mode 8MB SKU 8MB in sleep mode 8MB in shut-off mode Shut-off feature reduces the leakage of the 8MB disabled sub-arrays by about 3W Copyright © 2006, Intel Corporation

15

Dynamic Intel ® Smart Cache Sizing • First implementation in Yonah dual-core mobile processor – Dynamic implementation of the shut-off mode • HW based algorithm predicts cache usage requirements – Considers the % of time the CPU is in Active state compared to the various sleep states • During periods of low activity or inactivity the processor dynamically adapts its effective cache size – Cache content is gradually flushed to system memory – Cache ways are gradually turned off (physically as well as logically), thus reducing power • Cache ways are re-powered on demand to deliver full performance when needed Copyright © 2006, Intel Corporation

16

Leakage Mitigation: Long-Le Transistors Nominal Le Long Le (Nom+10%) Copyright © 2006, Intel Corporation • All transistors can be either nominal or long-Le • Most library cells are available in both flavors • Long-Le transistors are about 10% slower, but have 3x lower leakage • All paths with timing slack use long-Le transistors • Initial design uses only long channel devices

17

Long-Le Transistors Usage Map 100% 80% 60% 40% 20% 0%

Cor 1 Cont rol Core 0 L3 Cache

Copyright © 2006, Intel Corporation

18

Long-Le Transistors Summary Percentage of Long-Le device width excluding RAM arrays: Cores Uncore Nominal 46% Nominal 24% Long-Le 54% Long-Le 76% To reduce sub-threshold leakage, most devices will be slower and only a handful of transistors will be fast Copyright © 2006, Intel Corporation

19

w

Equal Loading

w u

Stack Forcing

10 Two-stack High-V t Two-stack Low -V t 1

w w l u

? ½ w ? ½ w

w u +w l

Wu = Wl Performance Loss High-V t Low T Leakage Reduction

w l

[Narendra, et al – ISLPED 2001] • Force one transistor into a two transistor stack with the same input load • Can be applied to gates with timing slack • Trade-off between transistor leakage and speed Copyright © 2006, Intel Corporation

20

Montecito Voltage Domains [2]

Core 0, frequency tracks voltage, 40W Bus Arbiter Fixed frequency, 2.5W

f1

Core 1, frequency tracks voltage, 40W

Foxton Control 1GHz fixed frequency Copyright © 2006, Intel Corporation

21

Tulsa Voltage Domains

FSB TOP

Core PLL Voltage Profile Cut Line 1.25V

Core 1 1MB L2 Control Logic T A G 1MB L2 Core 0 T A G FSB BOT

Cores

16MB L3

Ctrl + Tag 1.10V

16MB array 0.25V

Uncore I/O Virtual VSS

22

Tulsa Clock Domains [4]

FSB TOP Core 1 1MB L2 T A G 16MB L3 1MB L2

System Clock (BCLK)

Core 0 FSB BOT

Legend: Core Copyright © 2006, Intel Corporation PLL

T A G

Uncore I/O

23

Tulsa Uncore Clock Distribution [4]

Un-Core pre-global ZCLK spine Un-Core sparse SCLK grid Un-Core pre-global MCLK spine

Copyright © 2006, Intel Corporation Horizontal clock spines

24

Montecito Clock System [5] Variable Supply Pins Fuses Frequency Translation Table Divisors Fixed Supply Core0 Core1 Bus Clock PLL 1/M 1/1 Foxton I/Os Bus Logic DFD DFD DFD DFD DFD DFD RVD SLCB SLCB RAD RAD SLCB SLCB SLCB Phase Aligner CVD Gater Balanced Tree Clock Distribution CVD Gater 1/N CVD CVD CVD Gater Gater Gater 1/N Copyright © 2006, Intel Corporation

25

Montecito Clock Distribution [6] L0 route L1 route L2 route L3 Route CVD GATERS SLCB Latches Latches REPEATERS Bus Clock PLL

core0 core1 Foxton IOs

DFD DFD DFD DFD DFD DFD

Bus Logic

RAD SLCB RAD SLCB SLCB SLCB CVD GATERS CVD CVD GATERS GATERS CVD GATERS Latches Latches Latches Latches Latches Fixed frequency Low Voltage Swings Differential Copyright © 2006, Intel Corporation Variable Frequency Full Rail Transitions Single Ended

26

Single-Ended vs. Differential Clocks GND CLK + CLK GND GND VDD GND VDD GND VDD • Differential clock – Lower skew – High power – Longer distance between repeaters VDD CLK GND Copyright © 2006, Intel Corporation • Single-ended clock – Lower power – Need sharp edges to control skew

27

Dense vs. Sparse Grid Tiles Un-Core Dense Tile Core Full Grid Tile Copyright © 2006, Intel Corporation Un-Core Sparse Tile

28

Xeon ® Processor Package • 12 layers organic package (53.3 mm/side) • 4-4-4 stacking • Integrated heat spreader (38.5 mm/side) • 604 total pins • 366 signal I/Os • System management components and decoupling capacitors on package

29

Copyright © 2006, Intel Corporation

Itanium ® 2 Processor Package Power delivery connector Heat spreader Decoupling Caps System Management Components Copyright © 2006, Intel Corporation

30

Intel ® Core™ Duo Processor Package Copyright © 2006, Intel Corporation

31

Design for Test and Debug Features Die-level DFT/DFM •Parallel structural core test with XOR •Scan and observability registers (scan-out) •Three TAP controllers (core0, core1, uncore) •Within-die process monitors •On-die clock shrink L3 cache DFT/DFM •Built-in pattern generator (PBIST) •Programmable weak-write test •Low-yield analysis •Stability test mode •32-entry cache line disable (Pellston) FSB DFT/DFM •I/O loopback •I/O test generator Copyright © 2006, Intel Corporation

32

Itanium ® 2 Thermal Map • Two thermal sensors per core • Mux thermal diodes into VCOs to measure temp Copyright © 2006, Intel Corporation

33

Xeon ® Infra-red Emission Image Temperature Sensors Thermal Diode Copyright © 2006, Intel Corporation

34

Potential Multi-Core Thermal Control TS Core TS TS Core TS TS Core TS TS Core TS I/O Cache Cache Cache TMU Cache Cache Cache Cache Cache I/O TS Core TS TS Core TS TS Core TS TS Core TS TS Thermal Sensor TMU Thermal Management Unit • Multiple core designs require a central thermal management unit and an efficient mechanism for transmitting thermal sensor measurements [8] Copyright © 2006, Intel Corporation

35

Summary • Dual-core processors cover the entire compute spectrum, from laptops, to desktop and servers • Moore’s Law applied to multi-core processors: – Process technology scaling enables number of cores and cache size to double every two years • Power and leakage reduction circuit techniques are essential for multi-core processors – Massive Long-Le usage

a few fast devices many slow transistors and – Cache sleep and shut-off modes (static and dynamic) • Multiple on-die clock and voltage domains are required to control active power and leakage – Need new verification tools and automated checks Copyright © 2006, Intel Corporation

36

References [1] S. Rusu, et al., “A Dual-Core Multi-Threaded Xeon ® Processor with 16MB L3 Cache,” ISSCC Dig. Tech. Papers, Paper 5.3, Feb. 2006.

[2] S. Naffziger, et al., “The Implementation of a 2-Core, Multi-Threaded Itanium ® Family Processor,” IEEE J. Solid-State Circuits, pp. 197-209, Jan. 2006.

[3] R. Korner, “Yonah and Sossaman Processor Briefing”, Intel Developer

Forum

, Fall 2005 [4] S. Tam, et al., “Clock Generation and Distribution of a Dual-Core Xeon ® Processor with 16MB L3 Cache,” ISSCC Dig. Tech. Papers, Paper 21.2, 2006.

[5] T. Fischer, et al., “A 90-nm variable frequency clock system for a power managed Itanium ® architecture processor,” IEEE J. Solid-State Circuits, pp. 217–227, Jan. 2006.

[6] P. Mahoney, et al., “Clock Distribution on a Dual-core Multi-threaded Itanium™-Family Microprocessor,” ISSCC Dig. Tech. Papers, 2005 [7] S. Rusu, “Timing analysis of high-speed VLSI designs - Trends and challenges,” International Workshop on Timing Issues (TAU), 1997 [8] S. Rusu and S. Tam, “Apparatus for thermal management of multiple core microprocessors,” US patent 6,908,272, issued 7/21/2005 Copyright © 2006, Intel Corporation

37

Download