IBM Marketing
AMD 多核心處理器應用與發展趨勢
Kevin Lai
Commercial Marketing Manager
Oct. 30, 2007
© 2006 IBM Corporation
IBM Marketing
AMD Opteron™ Processor Update
“Barcelona” Changes the Game
AMD Commercial Ecosystem
2
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
The AMD “Barcelona” Advantage
More Than Just Four Cores
•Common Core Strategy
•Independent Dynamic
Core Technology
•Same Socket Infrastructure
•Dual Dynamic Power Management™
•Cool Core™ Technology
3
•Direct Connect Architecture
•Memory Optimizer Technology
•Rapid Virtualization Indexing
•AMD Balanced Smart Cache
•AMD-V™ Extended Migration
•Wide Floating Point Accelerator
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Application Cycles
Favor AMD Platforms
 AMD’s common core strategy and longer lifecycles are a
better match for customers deploying enterprise apps
0
6
Evaluate
12
18
Deploy
24
30
36
42
Manage and maintain
Applications need long lifecycles
Socket F (1207) Platform
Customers who deployed the original Rev F platforms have a
stable lifecycle for their applications
Tick
Tock
Tick
Tock
Customers who deployed with Intel may suffer an inconsistent
application platform through Intel’s “Tick Tock” strategy
4
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Quad-Core Upgradeability
 Customers with existing Socket F (1207) systems* should
be able to easily upgrade to quad core
Socket F(1207)
systemboard*
Updated BIOS
with quad-core
support
+
Quad-Core
AMD Opteron
processors
+
Existing
thermal
solution
(heatsink/fans)
+
* Systemboard must adhere to AMD design guidelines
5
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
AMD Power Efficiency Innovation
Independent Dynamic
Core Technology
AMD CoolCore™ Technology
Same Power
And Thermal Envelopes
As Dual-Core!
Dual Dynamic Power
Management™
6
6
Low-Power DDR2 Memory
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Improving Processor Power Management
with Enhanced AMD PowerNow!™
Dual-core
75%
CORE 0
Multi-chip Module
35%
Native Quad-core
75%
35%
CORE 0
CORE 1
10%
1%
CORE 2
CORE 3
CORE 1
MHz and voltage is
locked to highest
utilized core’s pstate
MHz is set to the highest
utilized core’s p-state
within each dual-core die.
Voltage is locked to
highest utilized core’s pstate in the package
MHz is independently
adjusted separately per
core. Voltage is locked
to highest utilized core’s
p-state
Native Quad-Core technology enables enhanced power
management across all four cores
7
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
AMD CoolCore™ Technology
Turns off Blocks of CPU When Not in Use
FPU
 Coarse Control (Core)
– Ex, FPU (hottest part of die)
L1
L2
Core 1
Core 2
L1
 Fine Control (Core)
– Incrementally Smaller
Sections
L3
Memory Controller
Core 4
Core 3
 Memory Controller
– Reads (turn off write logic)
– Writes (turn off read logic)
Example only: does not reflect actual areas of clock gating
AMD CoolCore™ is Automatic – No Drivers Needed!
8
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Dual Dynamic Power Management™
(DDPM)
 Separate power planes for cores and memory
controller for:
– Optimum power consumption - Enables cores to operate at
reduced power consumption levels while memory controller continues
to run at full speed
– Increased performance - Memory controller can operate at higher
frequency for increased bandwidth and performance
Unified Plane Systemboard
9
DDPM Systemboard
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Memory Power Measurements
• Enormous power penalties using FBDIMM
• at higher capacities
DDR2 vs. FBDIMM Average Power Consumption for 8x DIMMs
(1GB DDR2 vs. 1GB FBDIMM)
FBDIMMs consume
120w
over 100 watts at
the highest
100w
measured LOAD vs.
only ~37 watts for 80w
DDR2
60w
FBDIMM consumes
~83 watts during
IDLE
DDR2 consumes
~60 watts less!
40w
20w
0w
8x DDR2 (AMD)
8x FBDIMM (Intel)
1. SPECcpu2006-based
results are in development
10
IDLE Power
14.32
83.34
SPECcpu20001 SPECcpu20001
SPECjbb2005
FP
INT
33.68
95.49
1GB DDR2-667 DIMM:
Brand: Micron
Model: MT18HTF12872Y-667D6
29.24
90.21
36.94
101.2
1GB 667 FB-DIMM:
Brand: ATP
Model: AP28K72S8BHE6S
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
11
11
2003
CPU
Watts
2005
Quad
Core
CPU
Watts
Dual
Core
Consistent
power and
thermals help
deliver better
performance
per watt
Single
Core
PerformancePer-Watt
CPU
Watts
Power
Performance
Performance-Per-Watt Scalability
2007
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
“Barcelona” Sets New
Performance-Per-Watt Standard
 RESULT: 25%
performance advantage;
more than 30%
performance-per-watt
advantage
150
100
1.3
69.5
50
0
54
SPECfp_rate2006
are Peak Score
on Linux
1.0
Perf/Watt
Quad-Core AMD Opteron™
Processor 2GHz*
2P servers: Barcelona (2.0 GHz, 95-watt)
vs. Xeon 5345 (2.33 GHz, 1333 MHz FSB,
80-watt); 8 DIMMs of memory; SUSE
Linux Enterprise Server 10
12
12
Xeon 5345
SPEC and the benchmark name SPECfp_rate2006 are
registered trademarks of the Standard Performance
Evaluation Corporation. Results for Xeon 5345 is valid
as of July 19, 2007. For latest scores visit www.spec.org
*Performance based on AMD estimated results of
Quad-Core AMD Opteron™ processor at 2 GHz
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
The AMD Platform Power Estimator
www.amd.com/powercalculator
13
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
14
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
AMD Performance Innovation
AMD Wide Floating-Point
Accelerator
AMD Memory Optimizer
Technology
~150%
100%
Comprehensive
Performance
Enhancements!
Dual Dynamic Power
Management™
15
15
Dual-Core
Quad-Core
AMD Balanced Smart Cache
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
AMD Wide Floating-Point Accelerator
Significantly Improved Floating-Point Performance vs. Rev F
Very Competitive vs. Intel ‘Core’ Architecture
2x Rev F
2x Rev F
2x Rev F
2x Rev F
SSE Execution Width
128-bits
Instruction Fetch Bandwidth
Data Cache Load Bandwidth
2x ‘Core’
2x 128-bits loads/cycle
L2/NB Bandwidth
128-bits/cycle
Floating-Point Schedule Depth
36 dedicated 128-bit Ops
16
2x ‘Core’
32 bytes/cycle + misaligned Ops
Core has 32
entry shared
w/Integer
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
AMD Memory Optimizer Technology
Comprehensive Updates to our Integrated Memory Controller
Designed for Quad-Core Performance
“Barcelona”
17
~140%
• 2x available memory controllers for more bandwidth
Larger Memory Buffers (~2-4x More)1
• Better optimized for DDR2 data rates
100%
Dual-Core AMD Opteron™
Processor with DDR2
Increasing Memory Bandwidth
~150%
Independent Memory Channels (2x More)1
Write Bursting
• Reduced Read/Write transition = greater bandwidth
Better Optimized DRAM Paging
• Smarter algorithm helps improve bandwidth
DRAM Prefetcher
• Intelligently predicts and fetches data needed from
main memory; doesn’t pollute cache hierarchy
Core Prefetchers
• Data fetched directly to L1 cache; ~5ns lower latency1
and spares L2 bandwidth
IBM Confidential
1. Compared to same-frequency Second-Generation AMD Opteron processors.
© 2006 IBM Corporation
IBM Marketing
Improved Performance
with Dual Dynamic Power Management™
Increases Memory Bandwidth for Better Performance
With Dual Dynamic
Power Management
Without Dual Dynamic
Power Management
Power
Shared
Barcelona
Power
Barcelona
Cores
2.0GHz
Memory
Controller
1.6GHz
Standard
Power
Dedicated
Dedicated
Cores
2.0GHz
Memory
Controller
1.8GHz
Standard
Power
•Power delivery must be shared
between the cores and the
memory controller
•Power delivery is dedicated to
the cores which allows for
voltage changes
•Doesn’t allow voltage changes for
the cores
•Dedicated current for memory
controller allows another
200MHz for increased bandwidth
and performance
18
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
AMD Balanced Smart Cache
AMD Balanced Smart Cache
Better Support for Multi-threaded Environments
Core 1
Core 2
L1
L1
L2
Core 3
Core 4
L1
L2
L2
L1
L2
Core 1
Core 2
Core 3
L1
L1
L1
x
Core 4
x
L1
L2
L2
L3
Front Side Bus
Integrated Memory Controller
External Memory Controller
• Core 1 is running a large workload (>4MB),
so it needs the whole L3 cache and access
to main memory
• Core 1 is running a large workload (>4MB),
so it needs the whole L2 cache and access to
main memory
• But Cores 2, 3, and 4 are still able to run
smaller workloads
• So Core 2 can’t do any work (this is called
“thrashing”)
• Same can happen between cores 3 and 4
(more thrashing)
19
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
AMD Virtualization™ Leadership
High Performing
 Direct Connect Architecture
 Rapid Virtualization Indexing
 Tagged TLB
Host More Virtual Machines
per System!
Highly Secure
 DEV (Device Exclusion Vector)
Supported in Software
 AMD-V™ Extended Migration
 Unmodified Guest OS Support
 Robust Software Ecosystem
20
20
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
AMD Virtualization™ versus Intel VT
VM VM VM
VM VM VM
VM VM VM
VM VM VM
CPU
CPU
CPU
CPU
VM VM VM
VM VM VM
CPU
CPU
Memory
Controller
Memory
Controller
VM VM VM
VM VM VM
CPU
CPU
Memory
Controller
Memory
Controller
Memory Controller Hub
Shared memory can create bottlenecks
21
Dedicated memory for scalability
• Shared front-side bus can decrease
application performance within a virtual
machine
• Direct Connect Architecture helps
improve application performance within
a virtual machine
• Untagged TLB means less efficient
switching between virtual machines
• Tagged TLB means more efficient switching
between virtual machines
• Software-based memory management and
security (via external Memory Controller
Hub) can reduce overall virtualization
performance and efficiency
• Hardware-based memory management and
security (Integrated memory controller
with DEV) can improve overall virtualization
performance and efficiency
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Device Exclusion Vector (DEV)
VM 1
VM 2
VM 3
VM 4
VM 5
VM 6
Hypervisor (VMM)
Core 2
Core1
Memory Controller
DEV Table
HT 1
• DEV lets the Hypervisor (VMM) know if
a device is allowed to access a page of
memory or not
• So DEV improves virtualization
security by denying memory
accesses for unauthorized requests
• For example:
VM 1
VM
VM 1
VM 2
VM 3
3, 9, 15, 20, 27
HT 2
4, 7, 13, 22, 25
8, 12, 19, 21, 30
VM 5
2, 10, 16, 23, 26
6, 14, 17, 18, 24
access is granted … quickly
Requests page 25,
1, 5, 11, 28, 29
VM 4
VM 6
22
Pages Owned
Requests page 28,
VM 5
access is denied … quickly
• Xeon can do this, but it happens in
software … so it happens slower
HT 3
• Only processors with an Integrated
Memory Controller offer this benefit
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Tagged Translation Look-aside Buffer (TLB)
• TLB is a table in the CPU that contains cross-references between the virtual
and real addresses of recently referenced pages of memory
• “Tagged” means the CPU knows which data belong to a virtual
machine
• So, for example:
VM 1
VM 2
VM 3
VM 4
VM 5
Hypervisor (VMM)
VM 1
Tagged TLB
Cache lines
VM 3
HT 1
VM 1
HT 2
As VM 3 takes control and
loads its data, other TLB
data remains
So when VM 1 takes control
back the data it needs is
there … resulting in better
performance
Hypervisor (VMM)
Intel Xeon
Un-Tagged TLB
Cache lines
Front-side Bus
Memory
Controller
AMD Opteron
VM 1 runs on the CPU and
loads additional data from
memory
VM 6
HT 3
• Xeon’s VMM must flush the TLB every switch, hurting performance
VMM Control:
VM 4
Fill
23
VM 6
Flush
Fill
VM 4
Flush
Fill
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Rapid Virtualization Indexing
Translating Virtual to Physical Memory
Without
Virtualization
With Virtualization
VM1
Virtual Memory 1
VM2
Virtual Memory 2
Virtual Memory
Physical Memory
Physical Memory
24
Shadow Page
Tables
Rapid
Virtualization
Indexing
Translations
take place in
Hardware
(in CPU silicon)
Software
(in Hypervisor)
Hardware
(in CPU silicon)
Translations
are stored in
Hardware
(in TLB)
Virtual Memory
(DRAM or disk)
Hardware
(in guest TLB)
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Rapid
Virtualization
Indexing
Live
migration of
latest dualcore to quadcore
supported
Direct Connect
Architecture
 VT
 Tagged
TLB
 DEV
 Performance

 Live migration
of Single-core
to two
generations of
dual-core
supported
 AMD-V™
 Goo
d
 Bette
r
 Best
Best
AMD Virtualization Advantages
 Performance

Security
Intel
Software
Support
Security
Software
Support
AMD
AMD leadership in all relevant aspects of virtualization
25
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Database Performance on VMware ESXTM Server
®
Quad Core AMD Opteron™ Processor (Barcelona) (2GHz)
vs. AMD Opteron™ Processor Model 2222 SE (3GHz)
Configuration:
26
Quad-Core AMD Opteron™ Processor (Barcelona) Platform:
2 2GHz Quad-Core processors (4x512kB L2, 2MB-L3) in a 2 Socket AMD Reference Server with 16GB (8x2GB) Micron DDR2-667
on an experimental version of VMware® ESXTM Server
Dual-Core AMD Opteron ™ Processor Platform:
2 AMD Opteron ™ Processor Model 2222 SE (3GHz/2x1MB-L2) processors in a 2 Socket AMD Reference Server with 16GB (8x2GB) Micron DDR2-667
on VMware® ESXTM Server 3.0.1
Each Platform also contained:
1 HP MSA1500 with 2 controllers and 28 HP 72GB 15kRPM Ultra320 SCSI drives
1 Dual-port 4Gb Fibre Channel QLogic QLA2432
1 10/100/1000 Gigabit Ethernet Intel EXPI9402PT PCI-e NIC
1 Internal HP 73GB 15kRPM SAS drive for VMware® ESXTM Server
Workload: 4 2P SLES10 VMs of MySQL/SysBench (2.5GB Database per VM)
© 2006 IBM
IBM Confidential
Corporation
IBM Marketing
Dual-Core to Quad-Core Uplift
Dual-Core AMD OpteronTM 2200 Series vs. Quad-Core AMD Opteron Model 2350
2 Socket Performance Scaling
>124%
VMmark
>124%
57%
59%
57%
SPECint_rate2006
49%
SPECfp_rate2006
57%
23%
Stream memory bandwidth
49%
54% Average
Performance
Increase
17%
SPECompMbase2001
50
100
150
200
250
100% = Dual-Core AMD Opteron Processor Performance
SPEC and the benchmark name SPECint, SPECfp and SPECOMPM are registered trademarks of the Standard Performance Evaluation Corporation. Benchmark results stated above
for Dual-Core AMD Opteron™ processor Model 2222 reflect results published on www.spec.org as of Sep 9, 2007. The comparison presented above is based on results for QuadCore AMD Opteron processor Model 2350 under submission to SPEC as of Sep 9, 2007. For the latest results visit http://www.spec.org/cpu2006/results/ and
http://www.spec.org/omp/results/. Stream and VMmark results based on internal measurements at AMD performance labs.
27
27
EMBARGOED UNTIL SEPTEMBER 10
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Rapid Virtualization Indexing Uplift
Quad-Core AMD Opteron™ Processor Model 2350
200
190
94%
180
170
160
150
OLTP
Terminal Services
140
130
120
110
23%
14%
100
90
VMware 3.5 Experimental
RHEL 5.1/Xen
100% = Without Rapid Virtualization Indexing
28
28
Under Embargo until 12:01 am EDT, Sept. 10, 2007
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Performance-Per-Watt Leadership
Quad-Core AMD Opteron™ Processor Model 2350 (75 Watt ) vs. Intel Xeon 5345
(80 Watt, without Additional Watts of Memory Controller and FBDIMM)
Fluent 6.4.3
(sedan_4m)
Fluent
sedan_4m
67%
SPECompMBase2001
SPECompM2001 Base
36%
SPECfp_rate_base2006
SPECfp_rate2006
Both GCC
on gcc
30%
SPECfp_rate2006
SPECfp_rate2006
PGI
Intel compiler
vs. PGI compiler
27%
LSDyna 3 Vehicle Collision
SPECint_rate_base2006
Both GCC
on gcc
SPECint_rate2006
9%
SPECint_rate2006
Intel compiler
vs. PGI compiler
SPECint_rate2006
PGI
50
26% Average
Performance
Increase
12%
LSDyna 3 Vehicle Collison
-5%
70
90
110
130
150
170
190
100% = Intel Xeon 5345
SPEC and the benchmark name SPECint, SPECfp and SPECOMPM are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results
stated above reflect results published on www.spec.org as of Sep 9, 2007. The comparison presented above is based on results for Quad-Core AMD Opteron processor Model 2350
and Xeon 5345 (specint_rate2006 gcc and SPECompM2001 base) under submission to SPEC as of Sep 9, 2007. For the latest results visit http://www.spec.org/cpu2006/results/.
Fluent and LSDyna result based on internal measurements at AMD performance labs.
29
29
Under Embargo until 12:01 am EDT, Sept. 10, 2007
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Quad-Core Frequency Scaling
Quad-Core AMD Opteron™ Processor Model 2360 SE (2.5 GHz) vs. Model 2350 (2 GHz)
120
115
110
18%
18%
18%
12%
13%
105
SPECint_rate2006
SPECint_rate2006*
SPECfp_rate2006
SPECfp_rate2006*
Vmmark
VMmark
2 GHz 100
95
* On PGI compiler
90
2P
4P
SPEC and the benchmark name SPECint and SPECfp are registered trademarks of the Standard Performance Evaluation Corporation. The comparison presented above is
based on results for Quad-Core AMD Opteron™ processor Model 2350, Model 2360 SE, Model 8350, and Model 8360 SE under submission to SPEC as of Sep 9, 2007. For
the latest results visit http://www.spec.org/cpu2006/results/. VMark based on internal measurements at AMD performance labs.
30
30
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
AMD Opteron™ Processor Update
“Barcelona” Changes the Game
AMD Commercial Ecosystem
31
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Expanding Ecosystem
32
32
Leading OEM Platforms…
…regional choices…
…the best in software
partners…
…and integration partners
to put it all together
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Platform Readiness
2003 vs.
2007
SC 1435
PowerEdge
2970
PowerEdge
6950
X4600
DL585
DL385
DL365
X2200
DL145
X4500
IBM eServer 325
BL465c
X4200/4100
BL685c
X2100
xw9400
U20 & U40 WS
Blade 6000
E-9422R
Blade 8000
E-9522R
E-9722R
X3455
E-9222T
X3655
X3755
XT3™
LS21
1st Generation
AMD Opteron™
XT4
LS41
AMD
Validated
Solutions
X630 S2
33
33
BladeFrame
ES and EX
IBM Confidential
G5450
© 2006 IBM Corporation
IBM Marketing
Engaging the AMD Software Ecosystem
AMD collaborates to ensure “Barcelona” compatibility at launch…
AMD works with
300+ software and
open source
providers to develop
compilers, tools and
OSes optimized for
our new generation
of processors, and
optimized drivers for
our new commercial
graphics
~
Hundreds
of software
infrastructure
providers now plan
product roadmaps
in collaboration
with AMD
11G
FORTRAN
C++
R3
DX10
DB2
c
Libraries
Graphics
Software
…while making it easy to optimize for full “Barcelona” benefit
34
34
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Quad-Core AMD Opteron™ Processors
 More than just four cores
– Significant CPU Core Enhancements
– Significant Cache Enhancements
 Outstanding Performance
– Native Quad-Core
•
For faster data sharing between cores
Optimal Virtualization
– AMD Virtualization™ technology
•
Now with Rapid Virtualization Indexing
for virtual environments
 Investment Protection
– Stable Platform
•
•
•
Socket F (1207) compatibility
Leverage existing platform
infrastructures
Consistent thermal envelopes
 Power Efficient
– Performance/Watt leadership
•
•
35
Performance enhancements without
increased power consumption
Unique power management innovations
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Dziękuję
Polish
Hebrew
Russian
Obrigado
Gracias
Grazie
Spanish
Portuguese
Italian
Thank You!
Arabic
Merci
French
English
Danke
German
Traditional Chinese
Japanese
Korean
Simplified Chinese
Thai
36
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Questions???
37
IBM Confidential
© 2006 IBM Corporation
IBM Marketing
Trademark Attribution
AMD, the AMD Arrow logo,, AMD PowerNow, AMD CoolCore, AMD Dual Dynamic Power Management, AMD Opteron and
combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other
names used in this presentation are for identification purposes only and may be trademarks
of their respective owners.
©2007 Advanced Micro Devices, Inc. All rights reserved.
38
38
July 26, 2007
IBM Confidential
© 2006 IBM Corporation