III. Multicore Processors (5)

advertisement
III. Multicore Processors (5)
Dezső Sima
Spring 2007
(Ver. 2.0)
 Dezső Sima, 2007
10.3 IBM’s MC processors
• 10.3.1 POWER line
• 10.3.2 Cell BE
10.3 IBM’s MC processors
10.3.1 POWER line
• POWER4
• POWER4+
10/2001
11/2002
180 nm
130 nm
5/2004
130 nm
• POWER5+
10/2005
• POWER6
2007
90 nm
65 nm
• POWER5
10.3.1 Evolution of IBM’s major RISC lines
IMP I/48
AS /400-lin e
(Scalar CISC)
P owerP C AS/64
Comme rcial computing
AS /400
A10 A30
(1.G. superscalar)
e-S erver iS eries
P owerP C AS/64 ext.
North
star
A50
P ulsar SStar S S tar
OS/400
(~1.G. superscalar)
P owerP C/64 ext.
PO W ER4
P OWER/32
PO W ER
PO W ER4+
PS C
PO W ER2
P2S C
PO W ER6
PO W ER5
PO W ER5+
(3.G. superscalar)
(~2.G. superscalar)
P owerP C/64
Te chnical computing
Powe r3
RS /6000
Powe r3-II
(3.G. superscalar)
e-S erver pS eries
AIX
P owerP C/32
604 604e
601
(1.-2.G. superscalar)
88
89
90
91
92
93
94
95
96
97
98
99
00
Derived from
Upwards binary compatible extension
T ransition
Figure: The evolution of IBM’s major RISC lines
01
02
03
04
05
06
07
10.3.1 POWER4 (1)
Service Processor
Core interface Unit
(crossbar)
Power On Reset
Built-In-SelfTest
Non-Cacheable
Unit
MultiChip Module
Figure : POWER4 chip logical view
Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,,
IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25,
http://www.research.ibm.com/journal/rd/461/tendler.pdf
10.3.1 POWER4 (2)
Figure: Logical view of the L3 controller
Source: Power4 System Microarchitecture, Technical White Paper, 2001, IBM Corp.,
http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.pdf
10.3.1 POWER4 (3)
Figure: The memory cotroller of the POWER4
Source: Power4 System Microarchitecture, Technical White Paper, 2001, IBM Corp.,
http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.pdf
10.3.1 POWER4 (4)
Fabric
Controller
Figure: I/O controller of the POWER4
Source: Power4 System Microarchitecture, Technical White Paper, 2001, IBM Corp.,
http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.pdf
10.3.1 POWER4 (5)
Figure: POWER4 chip
Source: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 –
IBM’s Next Generation POWER Microprocessor, 2003
http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf
10.3.1 POWER4 (6)
POWER line
POWER4
Dual/Quad-Core
DC
Introduced
10/2001
Technology
180 nm
Die size
412 mm2
Nr. of transistors
174 mtrs
fc
[GHz]
1.3
Size/allocation
1.44 MB/shared
Implementation
On-chip
L2
Size
32 MB
L3
Implementation
Tags on-chip, data off-chip
Mem. contr.
Off-chip
TDP
115/125
[W]
SCM1/MCM2
Packaging
Dual threaded
Power management
1
2
3
L3 impl.
Tags on-chip
L3 size
32 MB
SMC: Single Chip Module
MCM: Multi Chip Module
DCM: Dual Chip Module
4
5
6
DCM: Dual Core Module
QCM: Quad Core Module
DPM: Dynamic Power Management
Table: Main features of IBM’s dual-core POWER line
10.3.2 POWER4+ (1)
Figure: New features of the POWER5+
Source: Grassl C., „New IBM Components for HPCx”, Dec. 2003,
http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf
10.3.1 POWER4+ (2)
POWER line
POWER4
POWER4+
DC
DC
Introduced
10/2001
11/2002
Technology
180 nm
130 nm
Die size
412 mm2
380 mm2
Nr. of transistors
174 mtrs
184 mtrs
1.3
1.7
Size/allocation
1.44 MB/shared
1.5 MB/shared
Implementation
On-chip
On-chip
32 MB
32 MB
Dual/Quad-Core
fc
[GHz]
L2
Size
L3
Implementation
Tags on-chip, data off-chip
Mem. contr.
Off-chip
On-chip
TDP
115/125
70
SCM1/MCM2
SCM1/MCM2
[W]
Packaging
Dual threaded
Power management
L3 impl.
Tags on-chip
L3 size
32 MB
1
2
3
SMC: Single Chip Module
MCM: Multi Chip Module
DCM: Dual Chip Module
4
5
6
DCM: Dual Core Module
QCM: Quad Core Module
DPM: Dynamic Power Management
Table: Main features of IBM’s dual-core POWER line
10.3.1 POWER5 (1)
Figure 5.14: Contrasting POWER4 and POWER5 system structures
Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006,
http://www.llnl.gov/computing/tutorials/ibm_sp/
10.3.1 POWER5 (2)
Figure: Block diagram of the POWER5 (1)
Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006,
http://www.llnl.gov/computing/tutorials/ibm_sp/
10.3.1 POWER5 (3)
Figure: Block diagram of the POWER5 (2)
http://studies.ac.upc.edu/ETSETB/SEGPAR/microprocessors/power5%20(2)%20(mpr).pdf
10.3.1 POWER5 (4)
Figure: Floorplan of the POWER5
Source: Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,”
IBM J. R&D, Vol. 49, No. 4/5, 2005, pp. 505-521
10.3.1 POWER5 (6)
POWER4
180 nm, 412 mm2
POWER5
130 nm, 389 mm2 (enlarged)
Figure: Contrasting the floor plans of the POWER4 and POWER5 dies
Sources: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 –
IBM’s Next Generation POWER Microprocessor, 2003http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf
Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,”
IBM J. R&D, Vol. 49, No. 4/5, 2005, pp. 505-521
10.3.1 POWER5 (7)
POWER5+
Dual-Core Module
Figure: Packaging alternatives of the POWER4/5 processors
Source: Partridge R. and Ghatpande S., IBM Introduces POWER5+ and Quad-Core Modules in System p5,”
Tech Trends Monthly, Nov./Dec. 2005,
10.3.1 POWER5 (8)
POWER4 MCM Photo
32-way System Showing 4 MCMs and L3 Cache
Figure: Quad–Chip POWER4 module (MCM) and a 32-way POWER4 system
Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006,
http://www.llnl.gov/computing/tutorials/ibm_sp/
10.3.1 POWER5 (9)
Figure: Interpretation of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5
Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006,
http://www.llnl.gov/computing/tutorials/ibm_sp/
10.3.1 POWER5 (10)
Figure: Photos of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5
Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006,
http://www.llnl.gov/computing/tutorials/ibm_sp/
10.3.1 POWER5 (11)
Figure: The Multi-chip module of the POWER5
Source: Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003,
www-csl.csres.utexas.edu/users/billmark/teach/cs352-05-spring/lectures/Lecture22-RonKallaIBM.pdf
10.3.1 POWER5 (12)
POWER line
POWER4
POWER4+
POWER5
DC
DC
DC
Introduced
10/2001
11/2002
5/2004
Technology
180 nm
130 nm
130 nm
Die size
412 mm2
380 mm2
389 mm2
Nr. of transistors
174 mtrs
184 mtrs
276 mtrs
1.3
1.7
1.65/1.9
Size/allocation
1.44 MB/shared
1.5 MB/shared
1.9 MB/shared
Implementation
On-chip
On-chip
On-chip
32 MB
32 MB
36 MB
Dual/Quad-Core
fc
[GHz]
L2
Size
L3
Implementation
Tags on-chip, data off-chip
Mem. contr.
Off-chip
On-chip
On-chip
TDP
115/125
70
80 (est)
SCM1/MCM2
SCM1/MCM2
DCM3/MCM2
[W]
Packaging
Dual threaded
DPM6
Power management
L3 impl.
Tags on-chip
Tags on-chip
L3 size
32 MB
36 MB
1
2
3
SMC: Single Chip Module
MCM: Multi Chip Module
DCM: Dual Chip Module
4
5
6
DCM: Dual Core Module
QCM: Quad Core Module
DPM: Dynamic Power Management
Table: Main features of IBM’s dual-core POWER line
10.3.1 POWER5+ (1)
Figure: Block diagram of the POWER5+
Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper,
IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf
10.3.1 POWER5+ (2)
Figure: Dual-Core Modules (DCMs) and Quad-Core Modules (QCM) of the POWER5+
Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper,
IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf
10.3.1 POWER5+ (3)
POWER line
POWER4
POWER4+
POWER5
POWER5+
DC
DC
DC
DC
Introduced
10/2001
11/2002
5/2004
10/2005
Technology
180 nm
130 nm
130 nm
90 nm
Die size
412 mm2
380 mm2
389 mm2
230 mm2
Nr. of transistors
174 mtrs
184 mtrs
276 mtrs
276 mtrs
1.3
1.7
1.65/1.9
1.92
Size/allocation
1.44 MB/shared
1.5 MB/shared
1.9 MB/shared
1.9 MB/shared
Implementation
On-chip
On-chip
On-chip
On-chip
36 MB
36 MB
Dual/Quad-Core
fc
[GHz]
L2
Size
32 MB
L3
32 MB
10.3
Implementation
Tags on-chip, data off-chip
Mem. contr.
Off-chip
On-chip
On-chip
On-chip
TDP
115/125
70
80 (est)
70
SCM1/MCM2
SCM1/MCM2
DCM3/MCM2
DCM4/QCM5
DPM6
DPM6
[W]
Packaging
Dual threaded
Power management
L3 impl.
Tags on-chip
Tags on-chip
Tags on-chip
L3 size
32 MB
36 MB
36 MB
1
2
3
SMC: Single Chip Module
MCM: Multi Chip Module
DCM: Dual Chip Module
4
5
6
DCM: Dual Core Module
QCM: Quad Core Module
DPM: Dynamic Power Management
Table: Main features of IBM’s dual-core POWER line
10.3.1 POWER6 (1)
POWER6
POWER5+
Hardware support of decimal arithmetic
Figure: Contrasting the block diagrams of the POWER5 and POWER6 processors
Source: Kanter D., „IBM Previews the Power6,” Oct. 2006, dkanter@realwordtech.com
10.3.1 POWER6 (2)
POWER line
POWER4
POWER4+
POWER5
POWER5+
POWER6
DC
DC
DC
DC
DC
Introduced
10/2001
11/2002
5/2004
10/2005
2007
Technology
180 nm
130 nm
130 nm
90 nm
65 nm
Die size
412 mm2
380 mm2
389 mm2
230 mm2
341 mm2
Nr. of transistors
174 mtrs
184 mtrs
276 mtrs
276 mtrs
750 mtrs
1.3
1.7
1.65/1.9
1.92
4-5
Size/allocation
1.44 MB/shared
1.5 MB/shared
1.9 MB/shared
1.9 MB/shared
2*4 MB/private
Implementation
On-chip
On-chip
On-chip
On-chip
On-chip
32 MB
32 MB
36 MB
36 MB
64 MB?
Dual/Quad-Core
fc
[GHz]
L2
Size
L3
Implementation
Tags on-chip, data off-chip
Mem. contr.
Off-chip
On-chip
On-chip
On-chip
TDP
115/125
70
80 (est)
70
~100
SCM1/MCM2
SCM1/MCM2
DCM3/MCM2
DCM4/QCM5
n.a.
DPM6
DPM6
n.a.
[W]
Packaging
Dual threaded
Power management
L3 impl.
Tags on-chip
Tags on-chip
Tags on-chip
Tags on-chip
L3 size
32 MB
36 MB
36 MB
32 MB
1
2
3
SMC: Single Chip Module
MCM: Multi Chip Module
DCM: Dual Chip Module
4
5
6
DCM: Dual Core Module
QCM: Quad Core Module
DPM: Dynamic Power Management
Table: Main features of IBM’s dual-core POWER line
10.3 IBM’s MC processors
10.3.2 Cell BE
• Cell BE
2/2006
90 nm
10.3.2 Cell BE (1)
Figure: The history and development cost of the Cell BE
Sources: Brochard L., A Cell History,” Cell Workshop, April, 2006
http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf
Hofstee H. P., „Cell today and tomorrow,” 2005, http://www.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf
10.3.2 Cell BE (2)
AUC: Atomic Update Cache
BIC: Bus Interface Contr.
EIB: Element Interface Bus
LS: Local Store of 256 KB
MFC: Memory Flow Controller
MIC: Memory Interface Contr.
PPE: Power Processing Element
PXU: POWER Execution Unit
SMF: Synergistic Memory Flow
Unit
SPU: Synergistic Processor Unit
SXU: Synergistic Execution Unit
XDR: Rambus DRAM
Figure: Block diagram of the Cell BE
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf
10.3.2 Cell BE (3)
Design parameters of the Cell BE:
PPE: dual-threaded
> 200 GFLOPS (SP)
> 20 GFLOPS (DP)
> 25 GB/s memory BW
> 75 GB/s I/O BW
> 300 GB/s EIB BW
fc > 4 GHz (lab)
Figure: Main design parameters of the Cell BE
Source: IBM „Cell Broadband Engine Overview,” Course Code L1T1H1-02, Mai 2006
publib.boulder.ibm.com/.../stgv1r0/topic/com.ibm.iea.cbe/cbe/1.0/Overview/L1T1H1_02_CellOverview.pdf
10.3.2 Cell BE (4)
Figure 5.16: Cell SPE architecture
Source: Blachford N.: „Cell Architecture Explained Version 2”,
http://www.blachford.info/computer/Cell/Cell1_v2.html
10.3.2 Cell BE (5)
Figure: Block diagram of the SPE
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf
10.3.2 Cell BE (6)
Figure: Pipeline stages of the Cell BE
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf
10.3.2 Cell BE (7)
Figure: Floor plan of a single SPE
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf
10.3.2 Cell BE (8)
Principle of operation of the Element Interface Bus (EIB)
Source: Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006,
http://www.cse.clrc.ac.uk/disco/mew17/talks/Keable_IBM_MEW17.pdf
10.3.2 Cell BE (9)
Figure: The Element Interface Bus EIB)
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf
10.3.2 Cell BE (10)
Figure: The Synergistic Memory Flow unit (SMF)
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf
10.3.2 Cell BE (11)
235 mm2
241 mtrs
Figure: Floor plan of the Cell BE processor
Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf
10.3.2 Cell BE (12)
Series
Cell BE
Implementation
Heterogeneous
1xPPE, 8*SPE
Architecture
PowerPC 2.02
PPE: 64-bit RISC
SPE: Dual-issue 32-bit SIMD with 128 bit capability
Cores
Introduction
9/2006 (in the QS20 BladeCenter)
Technology
90 nm
Die size
221 mm2
Nr. of transistors
234 mtrs
fc
[GHz]
3.0/3.2
PPE: 512 KB
SPE: 256 KB Local Store (128*128 bit)
L2
L3
Memory controller
Interconnection network
Memory bandwidth
On-chip
Ring based
25 GB/s
I/O bandwidth
Up to 75 MB/s
TDP
95 W @ 3GHz
[W]
Multithreading
PPE: 2-way
SPE:
Table: Main features of the IBM’s Cell BE
10.3.2 Cell BE (13)
Figure: Cell BE Blade Roadmap
Source: Brochard L., A Cell History,” Cell Workshop, April, 2006
http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf
10.3.2 Cell BE (14)
Figure: Roadmap of the Cell BE
Source: Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006,
http://www.cercs.gatech.edu/docs/SC06_Cell_111606.pdf
10.3 Literature (1)
POWER4, POWER4+
Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006,
http://www.llnl.gov/computing/tutorials/ibm_sp/
DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004,
http://h21007.www2.hp.com/dspp/files/unprotected/Itanium/sizingsuperheavys.pdf
Grassl C., „New IBM Components for HPCx”, Dec. 2003,
http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf
Krevell K., „IBM’s POWER4 Unveiling Continuues”, Microprocessor Report, Nov. 20. 2000, pp- 1-4
Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture, IBM Server,
Technical White Paper, October 2001
http://www-03.ibm.coom/servers/eserver/pseries/hardware/whitepapers/power4.pdf
Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,,
IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25,
http://www.research.ibm.com/journal/rd/461/tendler.pdf
POWER5, POWER5+
Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006,
http://www.llnl.gov/computing/tutorials/ibm_sp/
DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004,
http://h21007.www2.hp.com/dspp/files/unprotected/Itanium/sizingsuperheavys.pdf
Grassl C., „New IBM Components for HPCx”, Dec. 2003,
http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf
Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003,
www-csl.csres.utexas.edu/users/billmark/teach/cs352-05-spring/lectures/Lecture22-RonKallaIBM.pdf
10.3 Literature (2)
POWER5, POWER5+ (cont.)
Kalla R., Sinharoy B., Tendler J.: Simultaneous Multi-threading Implementation in Power5 –
IBM’s Next Generation POWER Microprocessor, 2003
http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf
Krevell K., „POWER5 Tops on Bandwidth”, Microprocessor Report, Dec. 2003
http://studies.ac.upc.edu/ETSETB/SEGPAR/microprocessors/power5%20(2)%20(mpr).pdf
Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,”
IBM J. R&D, Vol. 49, No. 4/5, 2005, pp. 505-521
Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper,
IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf
POWER6
Kanter D., „IBM Previews the Power6,” Oct. 2006, dkanter@realwordtech.com
Cell BE
Blachford N.: „Cell Architecture Explained Version 2”,
http://www.blachford.info/computer/Cell/Cell1_v2.html
Brochard L., A Cell History,” Cell Workshop, April, 2006
http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf
Day M. and Hofstee P., „Hardware and Software Architectures for the Cell Broadband Engine processor,
”CODES, Sept. 2006, http://www.casesconference.org/cases2005/pdf/Cell-tutorial.pdf
Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,
http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf
10.3 Literature (3)
Cell BE (cont.)
Gschwind M., Hofstee H. P., Flachs B. K., Hophkins M., Watanabe Y., Yamazaki T
„Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, Vol. 26, No. 2, 2006, pp. 10-24
Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006,
http://www.cercs.gatech.edu/docs/SC06_Cell_111606.pdf
Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006,
http://www.cse.clrc.ac.uk/disco/mew17/talks/Keable_IBM_MEW17.pdf
Krolak D., „Unleashing the Cell Broadband Engine Processor,” MPR Fall Proc. Forum, Nov. 2005,
http://www-128.ibm.com/developerworks/power/library/pa-fpfeib/?ca=dgr-lnxwCellConnects
Krewell K., „Cell Moves Into The Limelight,” Microprocessor Report, Febr. 14 2005, pp. 1-9
Solie, D., „Technology Trends Presentation,” Power Symposium, Aug. 2006,
http://www-03.ibm.com/procurement/proweb.nsf/objectdocswebview/
file14+-+darryl+solie+-+ibm+power+symposium+presentation/$file/
14+-+darryl+solie-ibm-power+symposium+presentation+v2.pdf
- „Cell Broadband Engine processor – based systems,” White Paper, IBM Corp., 2006
Download