COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept. stasys.maciulevicius@ktu.lt Development of processor architecture Main processor development and production companies, creating a new processors to the various market segments, are seeking: enhance its performance; to reach this goul they: increase clock frequency use a variety of microarchitecture enhancements move to multi-core microarchitectures reduce energy consumption 2014 ©S.Maciulevičius 2 Word length: from 32 to 64 bits 32-bit processor can do operations over integers to 232 or 4.3 billion 64-bit processor’s facilities reach 264 or round 18.4 quintillion (18,400,000,000,000,000,000); 32-bit processors and operating systems can support up to 4 gigabytes of memory, including only 2 gigabytes for applications; CAD/CAM and scientific calculations this is not enough at present 2014 ©S.Maciulevičius 3 Data in processors Register set Functional unit x86 word x86-64 word Integers GPR ALU 32 64 Addresses GPR ALU or AGU 32 64 Floating point numbers FPR FPU 64 64 VR VPU 128 128 Data type Vectors As can be seen, but differs only length of integers and addresses 2014 ©S.Maciulevičius 4 x86-64 specification The x86-64 specification was designed by Advanced Micro Devices (AMD) as an extension of the x86 instruction set It allows far larger virtual and physical address spaces than x86, doubles the width of the integer registers from 32 to 64 bits, increases the number of integer registers, and provides other enhancements 2014 ©S.Maciulevičius 5 Intel® EM64T Intel has released their “64-bit technology” in order to compete with AMD’s 64-bit technology Intel EM64T enhances system performance enabling access more than 4 GB memory Intel EM64T supports: 2014 64-bit virtual address space 64-bit pointers 64-bit general purpose registers 64-bit integers ©S.Maciulevičius 6 EM64T (and x86-64) registers 2014 ©S.Maciulevičius 7 Multi-core processors Increase the frequency towards increasing performance, becoming more and more difficult Instead, the companies have focused their efforts to increase the parallelism - developed dual-core processors, later moving to a multicore processors This way follow Intel, AMD, Motorola, Sun and other companies 2014 ©S.Maciulevičius 8 Intel Core microarchitecture summary 2014 ©S.Maciulevičius 9 Intel Nehalem microarchitecture Nehalem is the codename for an Intel processor microarchitecture, successor to the Core microarchitecture The first processor released with the Nehalem architecture was the desktop Core i7, which was released in November 2008. Nehalem differs radically from Netburst. Nehalem-based microprocessors use higher clock speeds and are more energy-efficient. Hyper-threading is reintroduced, along with a reduction in L2 cache size, as well as an enlarged L3 cache that is shared by all cores 2014 ©S.Maciulevičius 10 Intel Nehalem microarchitecture 64 KB L1 cache/core (32 KB L1 Data + 32 KB L1 Instruction) and 256 KB L2 cache/core 4–12 MB L3 cache Native (all processor cores on a single die) quad- and octacore processors Intel QuickPath Interconnect in high-end models replacing the legacy front side bus Integration of PCI Express and DMI into the processor, replacing the northbridge Integrated memory controller supporting two or three memory channels of DDR3 SDRAM or four FB-DIMM2 channels Second-generation Intel Virtualization Technology 2014 ©S.Maciulevičius 11 Some of Intel Nehalem processors Processor Interface Number of Cores Turbo Boost Hyper-Threading L1 Cache L2 Cache L3 Cache Memory Channels Max. Memory Rate Chipset Price 2014 Core i7 (LGA 1366) Core i7 (LGA 1156) Core i5 Core 2 Quad LGA 1366 4 LGA 1156 4 LGA 1156 4 LGA 775 4 Yes Yes Yes Yes Yes No No No 32KB/32KB per core 256KB per core 32KB/32KB per core 256KB per core 8MB shared 3 DDR3-1066 8MB shared 2 DDR3-1333 X58 $284-$999 P55 $285-$555 ©S.Maciulevičius 32KB/32KB 32KB/32KB per core per core 256KB per Up to 12MB core shared 8MB shared No 2 2 DDR3-1333 DDR3-1600 P55 $199 X48 $163-$316 12 Intel’s strategy Intel introduces new microprocessor architectures every 2 years as part of “TickTock” strategy: 2014 ©S.Maciulevičius 13 Intel’s Sandy Bridge Sandy Bridge is the codename for a microarchitecture developed by Intel beginning in 2005 for CPUs in computers to replace the Nehalem microarchitecture It was designed for the full range of applications from mobile devices, laptop and desktop computers, to large enterprise servers . Intel demonstrated a Sandy Bridge processor in 2009, and released first products in January 2011 based on the architecture 2014 ©S.Maciulevičius 14 Intel’s Sandy Bridge Sandy Bridge main features: 32 nm fabrication process CPU clock rate 1.4–3.4 GHz, grafics clock rate 350850 MHz (for different models) Turbo Boost 2.0 technology enables rise of clock rate till 3.8 GHz and 1350 MHz respectively 32 kB data + 32 kB instruction L1 cache . (3 clocks) and 256 kB L2 cache (8 clocks) per core Shared L3 cache – 3-8 MB (25 clocks) 2014 ©S.Maciulevičius 15 Intel’s Sandy Bridge Sandy Bridge has integrated graphic controller and specialized accelerator; it accelerates multimedia content processing significantly Sandy Bridge supports DirectX 10.1 and OpenCL 1.1; its productivity far exceeds the performance of the first generation Core Advanced Vector Extensions (AVX) 256-bit . instruction set with wider vectors, new extensible syntax and rich functionality 2014 ©S.Maciulevičius 16 Intel’s Sandy Bridge 2014 Decoded micro-operation cache and enlarged, optimized branch predictor 256-bit/cycle ring bus interconnect between cores, graphics, cache and System Agent Domain Intel Quick Sync Video, hardware support for video encoding and decoding . Up to 8 physical cores or 16 logical cores through Hyper-threading TDP of desktop CPUs is 35–95 W, for mobile CPUs –17-55 W ©S.Maciulevičius 17 Intel’s Sandy Bridge caches . 2014 ©S.Maciulevičius 18 Sandy Bridge microarchitecture . 2014 ©S.Maciulevičius 19 Sandy Bridge: L0 cache . 2014 ©S.Maciulevičius 20 Sandy Bridge: ring bus 2014 ©S.Maciulevičius Each core, each slice of L3 (LLC) cache, the on-die GPU, media engine and the system agent all have a stop on the ring bus The bus is made up of four independent rings: a data ring, request ring, .acknowledge ring and snoop ring. Each stop for each ring can accept 32-bytes of data per clock 21 Intel’s Ivy Bridge Ivy Bridge is the first chip to use Intel's 22nm tri-gate transistors, which help scale frequency and reduce power consumption At a high level Ivy Bridge looks a lot like Sandy Bridge Ivy Bridge is considered a tick from the CPU perspective but a tock from the GPU perspective 2014 ©S.Maciulevičius 22 Intel’s Ivy Bridge 2014 ©S.Maciulevičius 23 Intel’s Ivy Bridge 2014 ©S.Maciulevičius 24 Intel’s Ivy Bridge Ivy Bridge introduces configurable TDP that allows the platform to increase the CPU's TDP if given additional cooling, or decrease the TDP to fit into a smaller form factor Ivy Bridge Configurable TDP 2014 cTDP Down Nominal cTDP Up Ivy Bridge ULV 13W 17W 33W Ivy Bridge XE 45W 55W 65W ©S.Maciulevičius 25 Intel’s Ivy Bridge Sandy Bridge brought a completely redesigned GPU core onto the processor die itself With Ivy Bridge the GPU remains on die but it grows more than the CPU does this generation Ivy Bridge GPU adds support for OpenCL 1.1, DirectX 11 and OpenGL 3.1 2014 ©S.Maciulevičius 26 From Nehalem to Hasswell 2014 ©S.Maciulevičius 27 Intel’s Hasswell Haswell is the codename for a processor microarchitecture as the successor to the Ivy Bridge architecture Using the 22 nm process, Intel is expected to release CPUs based on this microarchitecture around June 2, 2013 With Haswell, Intel will introduce a new lowpower processor designed for convertible or 'hybrid' Ultrabooks 2014 ©S.Maciulevičius 28 Intel’s Hasswell The Haswell architecture is specifically designed to optimize the power savings and performance benefits Haswell is expected to launch in three major forms: Desktop version (LGA1150 socket): Haswell-DT Mobile/Laptop version (PGA socket): Haswell-MB BGA version: 2013 2014 47W and 57W TDP classes: Haswell-H (For "All-in-one" systems, Mini-ITX form factor motherboards, and other small footprint formats.) 13.5W and 15W TDP classes (SoC): Haswell-ULT (For Intel's UltraBook platform.) 10W TDP class (SoC): Haswell-ULX (For tablets and certain UltraBook-class implementations.) ©S.Maciulevičius 29 Intel’s Hasswell Performance Compared to Ivy Bridge: Twice the vector processing performance At least 10% sequential CPU performance increase (8 execution ports per core versus 6) Up to double the performance of the integrated GPU 2014 ©S.Maciulevičius 30 Intel’s Hasswell 2014 ©S.Maciulevičius 31 2014 ©S.Maciulevičius 32 CPU Idle Power 2014 ©S.Maciulevičius 33 2014 ©S.Maciulevičius 34 Intel’s Hasswell 2014 ©S.Maciulevičius 35 Intel Hasswell 2013 ©S.Maciulevičius 36 AVX2 – FMA 2013 ©S.Maciulevičius 37 Some models CPU Freq. Turbo Boost CacheMemory Cores / Threads TDP Core i7-4770K 3.5 GHz 3.9 GHz 8 MB 4/8 84 W Core i7-4770 3.4 GHz 3.9 GHz 8 MB 4/8 84 W Core i7-4770S 3.1 GHz 3.9 GHz 8 MB 4/8 65 W Core i7-4770T 2.5 GHz 3.7 GHz 8 MB 4/8 45 W Core i7-4765T 2.0 GHz 3.0 GHz 8 MB 4/8 35 W 2013 ©S.Maciulevičius 38