IBM Marketing AMD 多核心處理器應用與發展趨勢 Kevin Lai Commercial Marketing Manager Oct. 30, 2007 © 2006 IBM Corporation IBM Marketing AMD Opteron™ Processor Update “Barcelona” Changes the Game AMD Commercial Ecosystem 2 IBM Confidential © 2006 IBM Corporation IBM Marketing The AMD “Barcelona” Advantage More Than Just Four Cores •Common Core Strategy •Independent Dynamic Core Technology •Same Socket Infrastructure •Dual Dynamic Power Management™ •Cool Core™ Technology 3 •Direct Connect Architecture •Memory Optimizer Technology •Rapid Virtualization Indexing •AMD Balanced Smart Cache •AMD-V™ Extended Migration •Wide Floating Point Accelerator IBM Confidential © 2006 IBM Corporation IBM Marketing Application Cycles Favor AMD Platforms AMD’s common core strategy and longer lifecycles are a better match for customers deploying enterprise apps 0 6 Evaluate 12 18 Deploy 24 30 36 42 Manage and maintain Applications need long lifecycles Socket F (1207) Platform Customers who deployed the original Rev F platforms have a stable lifecycle for their applications Tick Tock Tick Tock Customers who deployed with Intel may suffer an inconsistent application platform through Intel’s “Tick Tock” strategy 4 IBM Confidential © 2006 IBM Corporation IBM Marketing Quad-Core Upgradeability Customers with existing Socket F (1207) systems* should be able to easily upgrade to quad core Socket F(1207) systemboard* Updated BIOS with quad-core support + Quad-Core AMD Opteron processors + Existing thermal solution (heatsink/fans) + * Systemboard must adhere to AMD design guidelines 5 IBM Confidential © 2006 IBM Corporation IBM Marketing AMD Power Efficiency Innovation Independent Dynamic Core Technology AMD CoolCore™ Technology Same Power And Thermal Envelopes As Dual-Core! Dual Dynamic Power Management™ 6 6 Low-Power DDR2 Memory IBM Confidential © 2006 IBM Corporation IBM Marketing Improving Processor Power Management with Enhanced AMD PowerNow!™ Dual-core 75% CORE 0 Multi-chip Module 35% Native Quad-core 75% 35% CORE 0 CORE 1 10% 1% CORE 2 CORE 3 CORE 1 MHz and voltage is locked to highest utilized core’s pstate MHz is set to the highest utilized core’s p-state within each dual-core die. Voltage is locked to highest utilized core’s pstate in the package MHz is independently adjusted separately per core. Voltage is locked to highest utilized core’s p-state Native Quad-Core technology enables enhanced power management across all four cores 7 IBM Confidential © 2006 IBM Corporation IBM Marketing AMD CoolCore™ Technology Turns off Blocks of CPU When Not in Use FPU Coarse Control (Core) – Ex, FPU (hottest part of die) L1 L2 Core 1 Core 2 L1 Fine Control (Core) – Incrementally Smaller Sections L3 Memory Controller Core 4 Core 3 Memory Controller – Reads (turn off write logic) – Writes (turn off read logic) Example only: does not reflect actual areas of clock gating AMD CoolCore™ is Automatic – No Drivers Needed! 8 IBM Confidential © 2006 IBM Corporation IBM Marketing Dual Dynamic Power Management™ (DDPM) Separate power planes for cores and memory controller for: – Optimum power consumption - Enables cores to operate at reduced power consumption levels while memory controller continues to run at full speed – Increased performance - Memory controller can operate at higher frequency for increased bandwidth and performance Unified Plane Systemboard 9 DDPM Systemboard IBM Confidential © 2006 IBM Corporation IBM Marketing Memory Power Measurements • Enormous power penalties using FBDIMM • at higher capacities DDR2 vs. FBDIMM Average Power Consumption for 8x DIMMs (1GB DDR2 vs. 1GB FBDIMM) FBDIMMs consume 120w over 100 watts at the highest 100w measured LOAD vs. only ~37 watts for 80w DDR2 60w FBDIMM consumes ~83 watts during IDLE DDR2 consumes ~60 watts less! 40w 20w 0w 8x DDR2 (AMD) 8x FBDIMM (Intel) 1. SPECcpu2006-based results are in development 10 IDLE Power 14.32 83.34 SPECcpu20001 SPECcpu20001 SPECjbb2005 FP INT 33.68 95.49 1GB DDR2-667 DIMM: Brand: Micron Model: MT18HTF12872Y-667D6 29.24 90.21 36.94 101.2 1GB 667 FB-DIMM: Brand: ATP Model: AP28K72S8BHE6S IBM Confidential © 2006 IBM Corporation IBM Marketing 11 11 2003 CPU Watts 2005 Quad Core CPU Watts Dual Core Consistent power and thermals help deliver better performance per watt Single Core PerformancePer-Watt CPU Watts Power Performance Performance-Per-Watt Scalability 2007 IBM Confidential © 2006 IBM Corporation IBM Marketing “Barcelona” Sets New Performance-Per-Watt Standard RESULT: 25% performance advantage; more than 30% performance-per-watt advantage 150 100 1.3 69.5 50 0 54 SPECfp_rate2006 are Peak Score on Linux 1.0 Perf/Watt Quad-Core AMD Opteron™ Processor 2GHz* 2P servers: Barcelona (2.0 GHz, 95-watt) vs. Xeon 5345 (2.33 GHz, 1333 MHz FSB, 80-watt); 8 DIMMs of memory; SUSE Linux Enterprise Server 10 12 12 Xeon 5345 SPEC and the benchmark name SPECfp_rate2006 are registered trademarks of the Standard Performance Evaluation Corporation. Results for Xeon 5345 is valid as of July 19, 2007. For latest scores visit www.spec.org *Performance based on AMD estimated results of Quad-Core AMD Opteron™ processor at 2 GHz IBM Confidential © 2006 IBM Corporation IBM Marketing The AMD Platform Power Estimator www.amd.com/powercalculator 13 IBM Confidential © 2006 IBM Corporation IBM Marketing 14 IBM Confidential © 2006 IBM Corporation IBM Marketing AMD Performance Innovation AMD Wide Floating-Point Accelerator AMD Memory Optimizer Technology ~150% 100% Comprehensive Performance Enhancements! Dual Dynamic Power Management™ 15 15 Dual-Core Quad-Core AMD Balanced Smart Cache IBM Confidential © 2006 IBM Corporation IBM Marketing AMD Wide Floating-Point Accelerator Significantly Improved Floating-Point Performance vs. Rev F Very Competitive vs. Intel ‘Core’ Architecture 2x Rev F 2x Rev F 2x Rev F 2x Rev F SSE Execution Width 128-bits Instruction Fetch Bandwidth Data Cache Load Bandwidth 2x ‘Core’ 2x 128-bits loads/cycle L2/NB Bandwidth 128-bits/cycle Floating-Point Schedule Depth 36 dedicated 128-bit Ops 16 2x ‘Core’ 32 bytes/cycle + misaligned Ops Core has 32 entry shared w/Integer IBM Confidential © 2006 IBM Corporation IBM Marketing AMD Memory Optimizer Technology Comprehensive Updates to our Integrated Memory Controller Designed for Quad-Core Performance “Barcelona” 17 ~140% • 2x available memory controllers for more bandwidth Larger Memory Buffers (~2-4x More)1 • Better optimized for DDR2 data rates 100% Dual-Core AMD Opteron™ Processor with DDR2 Increasing Memory Bandwidth ~150% Independent Memory Channels (2x More)1 Write Bursting • Reduced Read/Write transition = greater bandwidth Better Optimized DRAM Paging • Smarter algorithm helps improve bandwidth DRAM Prefetcher • Intelligently predicts and fetches data needed from main memory; doesn’t pollute cache hierarchy Core Prefetchers • Data fetched directly to L1 cache; ~5ns lower latency1 and spares L2 bandwidth IBM Confidential 1. Compared to same-frequency Second-Generation AMD Opteron processors. © 2006 IBM Corporation IBM Marketing Improved Performance with Dual Dynamic Power Management™ Increases Memory Bandwidth for Better Performance With Dual Dynamic Power Management Without Dual Dynamic Power Management Power Shared Barcelona Power Barcelona Cores 2.0GHz Memory Controller 1.6GHz Standard Power Dedicated Dedicated Cores 2.0GHz Memory Controller 1.8GHz Standard Power •Power delivery must be shared between the cores and the memory controller •Power delivery is dedicated to the cores which allows for voltage changes •Doesn’t allow voltage changes for the cores •Dedicated current for memory controller allows another 200MHz for increased bandwidth and performance 18 IBM Confidential © 2006 IBM Corporation IBM Marketing AMD Balanced Smart Cache AMD Balanced Smart Cache Better Support for Multi-threaded Environments Core 1 Core 2 L1 L1 L2 Core 3 Core 4 L1 L2 L2 L1 L2 Core 1 Core 2 Core 3 L1 L1 L1 x Core 4 x L1 L2 L2 L3 Front Side Bus Integrated Memory Controller External Memory Controller • Core 1 is running a large workload (>4MB), so it needs the whole L3 cache and access to main memory • Core 1 is running a large workload (>4MB), so it needs the whole L2 cache and access to main memory • But Cores 2, 3, and 4 are still able to run smaller workloads • So Core 2 can’t do any work (this is called “thrashing”) • Same can happen between cores 3 and 4 (more thrashing) 19 IBM Confidential © 2006 IBM Corporation IBM Marketing AMD Virtualization™ Leadership High Performing Direct Connect Architecture Rapid Virtualization Indexing Tagged TLB Host More Virtual Machines per System! Highly Secure DEV (Device Exclusion Vector) Supported in Software AMD-V™ Extended Migration Unmodified Guest OS Support Robust Software Ecosystem 20 20 IBM Confidential © 2006 IBM Corporation IBM Marketing AMD Virtualization™ versus Intel VT VM VM VM VM VM VM VM VM VM VM VM VM CPU CPU CPU CPU VM VM VM VM VM VM CPU CPU Memory Controller Memory Controller VM VM VM VM VM VM CPU CPU Memory Controller Memory Controller Memory Controller Hub Shared memory can create bottlenecks 21 Dedicated memory for scalability • Shared front-side bus can decrease application performance within a virtual machine • Direct Connect Architecture helps improve application performance within a virtual machine • Untagged TLB means less efficient switching between virtual machines • Tagged TLB means more efficient switching between virtual machines • Software-based memory management and security (via external Memory Controller Hub) can reduce overall virtualization performance and efficiency • Hardware-based memory management and security (Integrated memory controller with DEV) can improve overall virtualization performance and efficiency IBM Confidential © 2006 IBM Corporation IBM Marketing Device Exclusion Vector (DEV) VM 1 VM 2 VM 3 VM 4 VM 5 VM 6 Hypervisor (VMM) Core 2 Core1 Memory Controller DEV Table HT 1 • DEV lets the Hypervisor (VMM) know if a device is allowed to access a page of memory or not • So DEV improves virtualization security by denying memory accesses for unauthorized requests • For example: VM 1 VM VM 1 VM 2 VM 3 3, 9, 15, 20, 27 HT 2 4, 7, 13, 22, 25 8, 12, 19, 21, 30 VM 5 2, 10, 16, 23, 26 6, 14, 17, 18, 24 access is granted … quickly Requests page 25, 1, 5, 11, 28, 29 VM 4 VM 6 22 Pages Owned Requests page 28, VM 5 access is denied … quickly • Xeon can do this, but it happens in software … so it happens slower HT 3 • Only processors with an Integrated Memory Controller offer this benefit IBM Confidential © 2006 IBM Corporation IBM Marketing Tagged Translation Look-aside Buffer (TLB) • TLB is a table in the CPU that contains cross-references between the virtual and real addresses of recently referenced pages of memory • “Tagged” means the CPU knows which data belong to a virtual machine • So, for example: VM 1 VM 2 VM 3 VM 4 VM 5 Hypervisor (VMM) VM 1 Tagged TLB Cache lines VM 3 HT 1 VM 1 HT 2 As VM 3 takes control and loads its data, other TLB data remains So when VM 1 takes control back the data it needs is there … resulting in better performance Hypervisor (VMM) Intel Xeon Un-Tagged TLB Cache lines Front-side Bus Memory Controller AMD Opteron VM 1 runs on the CPU and loads additional data from memory VM 6 HT 3 • Xeon’s VMM must flush the TLB every switch, hurting performance VMM Control: VM 4 Fill 23 VM 6 Flush Fill VM 4 Flush Fill IBM Confidential © 2006 IBM Corporation IBM Marketing Rapid Virtualization Indexing Translating Virtual to Physical Memory Without Virtualization With Virtualization VM1 Virtual Memory 1 VM2 Virtual Memory 2 Virtual Memory Physical Memory Physical Memory 24 Shadow Page Tables Rapid Virtualization Indexing Translations take place in Hardware (in CPU silicon) Software (in Hypervisor) Hardware (in CPU silicon) Translations are stored in Hardware (in TLB) Virtual Memory (DRAM or disk) Hardware (in guest TLB) IBM Confidential © 2006 IBM Corporation IBM Marketing Rapid Virtualization Indexing Live migration of latest dualcore to quadcore supported Direct Connect Architecture VT Tagged TLB DEV Performance Live migration of Single-core to two generations of dual-core supported AMD-V™ Goo d Bette r Best Best AMD Virtualization Advantages Performance Security Intel Software Support Security Software Support AMD AMD leadership in all relevant aspects of virtualization 25 IBM Confidential © 2006 IBM Corporation IBM Marketing Database Performance on VMware ESXTM Server ® Quad Core AMD Opteron™ Processor (Barcelona) (2GHz) vs. AMD Opteron™ Processor Model 2222 SE (3GHz) Configuration: 26 Quad-Core AMD Opteron™ Processor (Barcelona) Platform: 2 2GHz Quad-Core processors (4x512kB L2, 2MB-L3) in a 2 Socket AMD Reference Server with 16GB (8x2GB) Micron DDR2-667 on an experimental version of VMware® ESXTM Server Dual-Core AMD Opteron ™ Processor Platform: 2 AMD Opteron ™ Processor Model 2222 SE (3GHz/2x1MB-L2) processors in a 2 Socket AMD Reference Server with 16GB (8x2GB) Micron DDR2-667 on VMware® ESXTM Server 3.0.1 Each Platform also contained: 1 HP MSA1500 with 2 controllers and 28 HP 72GB 15kRPM Ultra320 SCSI drives 1 Dual-port 4Gb Fibre Channel QLogic QLA2432 1 10/100/1000 Gigabit Ethernet Intel EXPI9402PT PCI-e NIC 1 Internal HP 73GB 15kRPM SAS drive for VMware® ESXTM Server Workload: 4 2P SLES10 VMs of MySQL/SysBench (2.5GB Database per VM) © 2006 IBM IBM Confidential Corporation IBM Marketing Dual-Core to Quad-Core Uplift Dual-Core AMD OpteronTM 2200 Series vs. Quad-Core AMD Opteron Model 2350 2 Socket Performance Scaling >124% VMmark >124% 57% 59% 57% SPECint_rate2006 49% SPECfp_rate2006 57% 23% Stream memory bandwidth 49% 54% Average Performance Increase 17% SPECompMbase2001 50 100 150 200 250 100% = Dual-Core AMD Opteron Processor Performance SPEC and the benchmark name SPECint, SPECfp and SPECOMPM are registered trademarks of the Standard Performance Evaluation Corporation. Benchmark results stated above for Dual-Core AMD Opteron™ processor Model 2222 reflect results published on www.spec.org as of Sep 9, 2007. The comparison presented above is based on results for QuadCore AMD Opteron processor Model 2350 under submission to SPEC as of Sep 9, 2007. For the latest results visit http://www.spec.org/cpu2006/results/ and http://www.spec.org/omp/results/. Stream and VMmark results based on internal measurements at AMD performance labs. 27 27 EMBARGOED UNTIL SEPTEMBER 10 IBM Confidential © 2006 IBM Corporation IBM Marketing Rapid Virtualization Indexing Uplift Quad-Core AMD Opteron™ Processor Model 2350 200 190 94% 180 170 160 150 OLTP Terminal Services 140 130 120 110 23% 14% 100 90 VMware 3.5 Experimental RHEL 5.1/Xen 100% = Without Rapid Virtualization Indexing 28 28 Under Embargo until 12:01 am EDT, Sept. 10, 2007 IBM Confidential © 2006 IBM Corporation IBM Marketing Performance-Per-Watt Leadership Quad-Core AMD Opteron™ Processor Model 2350 (75 Watt ) vs. Intel Xeon 5345 (80 Watt, without Additional Watts of Memory Controller and FBDIMM) Fluent 6.4.3 (sedan_4m) Fluent sedan_4m 67% SPECompMBase2001 SPECompM2001 Base 36% SPECfp_rate_base2006 SPECfp_rate2006 Both GCC on gcc 30% SPECfp_rate2006 SPECfp_rate2006 PGI Intel compiler vs. PGI compiler 27% LSDyna 3 Vehicle Collision SPECint_rate_base2006 Both GCC on gcc SPECint_rate2006 9% SPECint_rate2006 Intel compiler vs. PGI compiler SPECint_rate2006 PGI 50 26% Average Performance Increase 12% LSDyna 3 Vehicle Collison -5% 70 90 110 130 150 170 190 100% = Intel Xeon 5345 SPEC and the benchmark name SPECint, SPECfp and SPECOMPM are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of Sep 9, 2007. The comparison presented above is based on results for Quad-Core AMD Opteron processor Model 2350 and Xeon 5345 (specint_rate2006 gcc and SPECompM2001 base) under submission to SPEC as of Sep 9, 2007. For the latest results visit http://www.spec.org/cpu2006/results/. Fluent and LSDyna result based on internal measurements at AMD performance labs. 29 29 Under Embargo until 12:01 am EDT, Sept. 10, 2007 IBM Confidential © 2006 IBM Corporation IBM Marketing Quad-Core Frequency Scaling Quad-Core AMD Opteron™ Processor Model 2360 SE (2.5 GHz) vs. Model 2350 (2 GHz) 120 115 110 18% 18% 18% 12% 13% 105 SPECint_rate2006 SPECint_rate2006* SPECfp_rate2006 SPECfp_rate2006* Vmmark VMmark 2 GHz 100 95 * On PGI compiler 90 2P 4P SPEC and the benchmark name SPECint and SPECfp are registered trademarks of the Standard Performance Evaluation Corporation. The comparison presented above is based on results for Quad-Core AMD Opteron™ processor Model 2350, Model 2360 SE, Model 8350, and Model 8360 SE under submission to SPEC as of Sep 9, 2007. For the latest results visit http://www.spec.org/cpu2006/results/. VMark based on internal measurements at AMD performance labs. 30 30 IBM Confidential © 2006 IBM Corporation IBM Marketing AMD Opteron™ Processor Update “Barcelona” Changes the Game AMD Commercial Ecosystem 31 IBM Confidential © 2006 IBM Corporation IBM Marketing Expanding Ecosystem 32 32 Leading OEM Platforms… …regional choices… …the best in software partners… …and integration partners to put it all together IBM Confidential © 2006 IBM Corporation IBM Marketing Platform Readiness 2003 vs. 2007 SC 1435 PowerEdge 2970 PowerEdge 6950 X4600 DL585 DL385 DL365 X2200 DL145 X4500 IBM eServer 325 BL465c X4200/4100 BL685c X2100 xw9400 U20 & U40 WS Blade 6000 E-9422R Blade 8000 E-9522R E-9722R X3455 E-9222T X3655 X3755 XT3™ LS21 1st Generation AMD Opteron™ XT4 LS41 AMD Validated Solutions X630 S2 33 33 BladeFrame ES and EX IBM Confidential G5450 © 2006 IBM Corporation IBM Marketing Engaging the AMD Software Ecosystem AMD collaborates to ensure “Barcelona” compatibility at launch… AMD works with 300+ software and open source providers to develop compilers, tools and OSes optimized for our new generation of processors, and optimized drivers for our new commercial graphics ~ Hundreds of software infrastructure providers now plan product roadmaps in collaboration with AMD 11G FORTRAN C++ R3 DX10 DB2 c Libraries Graphics Software …while making it easy to optimize for full “Barcelona” benefit 34 34 IBM Confidential © 2006 IBM Corporation IBM Marketing Quad-Core AMD Opteron™ Processors More than just four cores – Significant CPU Core Enhancements – Significant Cache Enhancements Outstanding Performance – Native Quad-Core • For faster data sharing between cores Optimal Virtualization – AMD Virtualization™ technology • Now with Rapid Virtualization Indexing for virtual environments Investment Protection – Stable Platform • • • Socket F (1207) compatibility Leverage existing platform infrastructures Consistent thermal envelopes Power Efficient – Performance/Watt leadership • • 35 Performance enhancements without increased power consumption Unique power management innovations IBM Confidential © 2006 IBM Corporation IBM Marketing Dziękuję Polish Hebrew Russian Obrigado Gracias Grazie Spanish Portuguese Italian Thank You! Arabic Merci French English Danke German Traditional Chinese Japanese Korean Simplified Chinese Thai 36 IBM Confidential © 2006 IBM Corporation IBM Marketing Questions??? 37 IBM Confidential © 2006 IBM Corporation IBM Marketing Trademark Attribution AMD, the AMD Arrow logo,, AMD PowerNow, AMD CoolCore, AMD Dual Dynamic Power Management, AMD Opteron and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. ©2007 Advanced Micro Devices, Inc. All rights reserved. 38 38 July 26, 2007 IBM Confidential © 2006 IBM Corporation