Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group Multi-Cores are Coming (here?) Many processors in normal desktops/laptops are ‘dual core’ or ‘quad core’ • • • • • What does this mean? Why is it happening? How are they different? Where are they going? Do they change anything? Moore’s Law 45nm Fun Facts A human hair= 90000nm Bacteria = 2000nm Silicon atom = 0.24nm The need for Multi-Core For over 30 years the performance of processors has doubled every 2 years Driven mainly by shrinkage of circuits Smaller circuits • more transistors per chip • shorter connections • lower capacitance Smaller circuits go faster In early 2000s the rate started to decrease Motivation Is cooling a problem? Intel Nehalem: In the event of all the cores not being used, the unused cores can be shutdown allowing the remaining cores to use the spare resources and speed up. The Memory Wall Memory Speed is failing to keep up with processor speed. Why? Processor utilization (15%-25%) The End of “Good Times” Slowdown for several reasons • Power density increasing (more watts per unit area) cooling is a serious problem • Small transistors have less predictable characteristics • Architectural innovation hitting design complexity problems (limited ILP) • Memory does not get faster at the same rate as processors A solution is replication Put multiple CPUs (cores) on a single integrated circuit (chip) Use them in parallel to achieve higher performance Simpler to design than a more complex single processor Need more computing power – just add more cores? How to Connect Them? Could have independent processor/store pairs with interconnection network At the software level the majority of opinion is that shared memory is the right answer for a general purpose processor But, when we consider more than a few cores, shared memory becomes more difficult to implement Can We Use Multiple Cores? Small numbers of cores can be used for separate tasks – e.g. run a virus checker on one core and Word on another If we want increased performance on a single application we need to move to parallel programming General purpose parallel programming is known to be hard – consensus is that new approaches are needed There Are Problems We don’t know how to engineer extensible memory systems We don’t know how to write general purpose parallel programs If we develop new approaches to parallel programming do they fit with existing serial processor designs? Intel Core i7 (Nehalem) 2 Simultaneous Multi-Threading per core Traditional Structure – "Historical View” (Processor, Front Side Bus, North Bridge, South Bridge) Processor and Cache (single die/chip SRAM) Front Side Bus Graphics Card North Bridge Chip Memory Controller Main Memory (DRAM) South Bridge Chip Motherboard … I/O Buses (PCIe, USB, Ethernet, SATA HD) core L1 Inst L1 Data L2 Cache core L1 Inst L1 Data L2 Cache L3 Shared Cache Memory Controller Typical Multi-core Structure Main Memory (DRAM) On Chip QPI or HT PCIe Input/Output Hub PCIe Graphics Card Input/Output Controller Motherboard … I/O Buses (PCIe, USB, Ethernet, SATA HD) Simplified Multi-Core Structure core core core core L1 L1 Inst Data L1 L1 Inst Data L1 L1 Inst Data L1 L1 Inst Data Shared Bus Level 2 Cache On Chip Main Memory Nehalem Caches Private L1: split D$ & I$, 32KB each, 4-way I$ & 8-way set associative, approx. LRU, block size 64 bytes, write-back & write-allocate Private L2: 8-way set associative, idem. Shared L3: 16-way set associative, idem Cache Coherence? Summary Multi-core systems are here to stay • Physical limitations • Design costs The industry did not want to come but there is no current alternative One of the biggest changes for our field • General Purpose Parallel Programming must be made tractable For further reading Patterson and Hennessy 4th Edition Chapter 1