III. Multicore Processors (5) Dezső Sima Spring 2007 (Ver. 2.0) Dezső Sima, 2007 10.3 IBM’s MC processors • 10.3.1 POWER line • 10.3.2 Cell BE 10.3 IBM’s MC processors 10.3.1 POWER line • POWER4 • POWER4+ 10/2001 11/2002 180 nm 130 nm 5/2004 130 nm • POWER5+ 10/2005 • POWER6 2007 90 nm 65 nm • POWER5 10.3.1 Evolution of IBM’s major RISC lines IMP I/48 AS /400-lin e (Scalar CISC) P owerP C AS/64 Comme rcial computing AS /400 A10 A30 (1.G. superscalar) e-S erver iS eries P owerP C AS/64 ext. North star A50 P ulsar SStar S S tar OS/400 (~1.G. superscalar) P owerP C/64 ext. PO W ER4 P OWER/32 PO W ER PO W ER4+ PS C PO W ER2 P2S C PO W ER6 PO W ER5 PO W ER5+ (3.G. superscalar) (~2.G. superscalar) P owerP C/64 Te chnical computing Powe r3 RS /6000 Powe r3-II (3.G. superscalar) e-S erver pS eries AIX P owerP C/32 604 604e 601 (1.-2.G. superscalar) 88 89 90 91 92 93 94 95 96 97 98 99 00 Derived from Upwards binary compatible extension T ransition Figure: The evolution of IBM’s major RISC lines 01 02 03 04 05 06 07 10.3.1 POWER4 (1) Service Processor Core interface Unit (crossbar) Power On Reset Built-In-SelfTest Non-Cacheable Unit MultiChip Module Figure : POWER4 chip logical view Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,, IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25, http://www.research.ibm.com/journal/rd/461/tendler.pdf 10.3.1 POWER4 (2) Figure: Logical view of the L3 controller Source: Power4 System Microarchitecture, Technical White Paper, 2001, IBM Corp., http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.pdf 10.3.1 POWER4 (3) Figure: The memory cotroller of the POWER4 Source: Power4 System Microarchitecture, Technical White Paper, 2001, IBM Corp., http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.pdf 10.3.1 POWER4 (4) Fabric Controller Figure: I/O controller of the POWER4 Source: Power4 System Microarchitecture, Technical White Paper, 2001, IBM Corp., http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power4.pdf 10.3.1 POWER4 (5) Figure: POWER4 chip Source: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003 http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf 10.3.1 POWER4 (6) POWER line POWER4 Dual/Quad-Core DC Introduced 10/2001 Technology 180 nm Die size 412 mm2 Nr. of transistors 174 mtrs fc [GHz] 1.3 Size/allocation 1.44 MB/shared Implementation On-chip L2 Size 32 MB L3 Implementation Tags on-chip, data off-chip Mem. contr. Off-chip TDP 115/125 [W] SCM1/MCM2 Packaging Dual threaded Power management 1 2 3 L3 impl. Tags on-chip L3 size 32 MB SMC: Single Chip Module MCM: Multi Chip Module DCM: Dual Chip Module 4 5 6 DCM: Dual Core Module QCM: Quad Core Module DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line 10.3.2 POWER4+ (1) Figure: New features of the POWER5+ Source: Grassl C., „New IBM Components for HPCx”, Dec. 2003, http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf 10.3.1 POWER4+ (2) POWER line POWER4 POWER4+ DC DC Introduced 10/2001 11/2002 Technology 180 nm 130 nm Die size 412 mm2 380 mm2 Nr. of transistors 174 mtrs 184 mtrs 1.3 1.7 Size/allocation 1.44 MB/shared 1.5 MB/shared Implementation On-chip On-chip 32 MB 32 MB Dual/Quad-Core fc [GHz] L2 Size L3 Implementation Tags on-chip, data off-chip Mem. contr. Off-chip On-chip TDP 115/125 70 SCM1/MCM2 SCM1/MCM2 [W] Packaging Dual threaded Power management L3 impl. Tags on-chip L3 size 32 MB 1 2 3 SMC: Single Chip Module MCM: Multi Chip Module DCM: Dual Chip Module 4 5 6 DCM: Dual Core Module QCM: Quad Core Module DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line 10.3.1 POWER5 (1) Figure 5.14: Contrasting POWER4 and POWER5 system structures Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, http://www.llnl.gov/computing/tutorials/ibm_sp/ 10.3.1 POWER5 (2) Figure: Block diagram of the POWER5 (1) Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, http://www.llnl.gov/computing/tutorials/ibm_sp/ 10.3.1 POWER5 (3) Figure: Block diagram of the POWER5 (2) http://studies.ac.upc.edu/ETSETB/SEGPAR/microprocessors/power5%20(2)%20(mpr).pdf 10.3.1 POWER5 (4) Figure: Floorplan of the POWER5 Source: Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp. 505-521 10.3.1 POWER5 (6) POWER4 180 nm, 412 mm2 POWER5 130 nm, 389 mm2 (enlarged) Figure: Contrasting the floor plans of the POWER4 and POWER5 dies Sources: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp. 505-521 10.3.1 POWER5 (7) POWER5+ Dual-Core Module Figure: Packaging alternatives of the POWER4/5 processors Source: Partridge R. and Ghatpande S., IBM Introduces POWER5+ and Quad-Core Modules in System p5,” Tech Trends Monthly, Nov./Dec. 2005, 10.3.1 POWER5 (8) POWER4 MCM Photo 32-way System Showing 4 MCMs and L3 Cache Figure: Quad–Chip POWER4 module (MCM) and a 32-way POWER4 system Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, http://www.llnl.gov/computing/tutorials/ibm_sp/ 10.3.1 POWER5 (9) Figure: Interpretation of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, http://www.llnl.gov/computing/tutorials/ibm_sp/ 10.3.1 POWER5 (10) Figure: Photos of Dual-Chip Modules (DCMs) and Multi-Chip Modules (MCM) of the POWER5 Source:Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, http://www.llnl.gov/computing/tutorials/ibm_sp/ 10.3.1 POWER5 (11) Figure: The Multi-chip module of the POWER5 Source: Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003, www-csl.csres.utexas.edu/users/billmark/teach/cs352-05-spring/lectures/Lecture22-RonKallaIBM.pdf 10.3.1 POWER5 (12) POWER line POWER4 POWER4+ POWER5 DC DC DC Introduced 10/2001 11/2002 5/2004 Technology 180 nm 130 nm 130 nm Die size 412 mm2 380 mm2 389 mm2 Nr. of transistors 174 mtrs 184 mtrs 276 mtrs 1.3 1.7 1.65/1.9 Size/allocation 1.44 MB/shared 1.5 MB/shared 1.9 MB/shared Implementation On-chip On-chip On-chip 32 MB 32 MB 36 MB Dual/Quad-Core fc [GHz] L2 Size L3 Implementation Tags on-chip, data off-chip Mem. contr. Off-chip On-chip On-chip TDP 115/125 70 80 (est) SCM1/MCM2 SCM1/MCM2 DCM3/MCM2 [W] Packaging Dual threaded DPM6 Power management L3 impl. Tags on-chip Tags on-chip L3 size 32 MB 36 MB 1 2 3 SMC: Single Chip Module MCM: Multi Chip Module DCM: Dual Chip Module 4 5 6 DCM: Dual Core Module QCM: Quad Core Module DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line 10.3.1 POWER5+ (1) Figure: Block diagram of the POWER5+ Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf 10.3.1 POWER5+ (2) Figure: Dual-Core Modules (DCMs) and Quad-Core Modules (QCM) of the POWER5+ Source: Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf 10.3.1 POWER5+ (3) POWER line POWER4 POWER4+ POWER5 POWER5+ DC DC DC DC Introduced 10/2001 11/2002 5/2004 10/2005 Technology 180 nm 130 nm 130 nm 90 nm Die size 412 mm2 380 mm2 389 mm2 230 mm2 Nr. of transistors 174 mtrs 184 mtrs 276 mtrs 276 mtrs 1.3 1.7 1.65/1.9 1.92 Size/allocation 1.44 MB/shared 1.5 MB/shared 1.9 MB/shared 1.9 MB/shared Implementation On-chip On-chip On-chip On-chip 36 MB 36 MB Dual/Quad-Core fc [GHz] L2 Size 32 MB L3 32 MB 10.3 Implementation Tags on-chip, data off-chip Mem. contr. Off-chip On-chip On-chip On-chip TDP 115/125 70 80 (est) 70 SCM1/MCM2 SCM1/MCM2 DCM3/MCM2 DCM4/QCM5 DPM6 DPM6 [W] Packaging Dual threaded Power management L3 impl. Tags on-chip Tags on-chip Tags on-chip L3 size 32 MB 36 MB 36 MB 1 2 3 SMC: Single Chip Module MCM: Multi Chip Module DCM: Dual Chip Module 4 5 6 DCM: Dual Core Module QCM: Quad Core Module DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line 10.3.1 POWER6 (1) POWER6 POWER5+ Hardware support of decimal arithmetic Figure: Contrasting the block diagrams of the POWER5 and POWER6 processors Source: Kanter D., „IBM Previews the Power6,” Oct. 2006, dkanter@realwordtech.com 10.3.1 POWER6 (2) POWER line POWER4 POWER4+ POWER5 POWER5+ POWER6 DC DC DC DC DC Introduced 10/2001 11/2002 5/2004 10/2005 2007 Technology 180 nm 130 nm 130 nm 90 nm 65 nm Die size 412 mm2 380 mm2 389 mm2 230 mm2 341 mm2 Nr. of transistors 174 mtrs 184 mtrs 276 mtrs 276 mtrs 750 mtrs 1.3 1.7 1.65/1.9 1.92 4-5 Size/allocation 1.44 MB/shared 1.5 MB/shared 1.9 MB/shared 1.9 MB/shared 2*4 MB/private Implementation On-chip On-chip On-chip On-chip On-chip 32 MB 32 MB 36 MB 36 MB 64 MB? Dual/Quad-Core fc [GHz] L2 Size L3 Implementation Tags on-chip, data off-chip Mem. contr. Off-chip On-chip On-chip On-chip TDP 115/125 70 80 (est) 70 ~100 SCM1/MCM2 SCM1/MCM2 DCM3/MCM2 DCM4/QCM5 n.a. DPM6 DPM6 n.a. [W] Packaging Dual threaded Power management L3 impl. Tags on-chip Tags on-chip Tags on-chip Tags on-chip L3 size 32 MB 36 MB 36 MB 32 MB 1 2 3 SMC: Single Chip Module MCM: Multi Chip Module DCM: Dual Chip Module 4 5 6 DCM: Dual Core Module QCM: Quad Core Module DPM: Dynamic Power Management Table: Main features of IBM’s dual-core POWER line 10.3 IBM’s MC processors 10.3.2 Cell BE • Cell BE 2/2006 90 nm 10.3.2 Cell BE (1) Figure: The history and development cost of the Cell BE Sources: Brochard L., A Cell History,” Cell Workshop, April, 2006 http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf Hofstee H. P., „Cell today and tomorrow,” 2005, http://www.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf 10.3.2 Cell BE (2) AUC: Atomic Update Cache BIC: Bus Interface Contr. EIB: Element Interface Bus LS: Local Store of 256 KB MFC: Memory Flow Controller MIC: Memory Interface Contr. PPE: Power Processing Element PXU: POWER Execution Unit SMF: Synergistic Memory Flow Unit SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit XDR: Rambus DRAM Figure: Block diagram of the Cell BE Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf 10.3.2 Cell BE (3) Design parameters of the Cell BE: PPE: dual-threaded > 200 GFLOPS (SP) > 20 GFLOPS (DP) > 25 GB/s memory BW > 75 GB/s I/O BW > 300 GB/s EIB BW fc > 4 GHz (lab) Figure: Main design parameters of the Cell BE Source: IBM „Cell Broadband Engine Overview,” Course Code L1T1H1-02, Mai 2006 publib.boulder.ibm.com/.../stgv1r0/topic/com.ibm.iea.cbe/cbe/1.0/Overview/L1T1H1_02_CellOverview.pdf 10.3.2 Cell BE (4) Figure 5.16: Cell SPE architecture Source: Blachford N.: „Cell Architecture Explained Version 2”, http://www.blachford.info/computer/Cell/Cell1_v2.html 10.3.2 Cell BE (5) Figure: Block diagram of the SPE Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf 10.3.2 Cell BE (6) Figure: Pipeline stages of the Cell BE Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf 10.3.2 Cell BE (7) Figure: Floor plan of a single SPE Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf 10.3.2 Cell BE (8) Principle of operation of the Element Interface Bus (EIB) Source: Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006, http://www.cse.clrc.ac.uk/disco/mew17/talks/Keable_IBM_MEW17.pdf 10.3.2 Cell BE (9) Figure: The Element Interface Bus EIB) Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf 10.3.2 Cell BE (10) Figure: The Synergistic Memory Flow unit (SMF) Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf 10.3.2 Cell BE (11) 235 mm2 241 mtrs Figure: Floor plan of the Cell BE processor Source: Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf 10.3.2 Cell BE (12) Series Cell BE Implementation Heterogeneous 1xPPE, 8*SPE Architecture PowerPC 2.02 PPE: 64-bit RISC SPE: Dual-issue 32-bit SIMD with 128 bit capability Cores Introduction 9/2006 (in the QS20 BladeCenter) Technology 90 nm Die size 221 mm2 Nr. of transistors 234 mtrs fc [GHz] 3.0/3.2 PPE: 512 KB SPE: 256 KB Local Store (128*128 bit) L2 L3 Memory controller Interconnection network Memory bandwidth On-chip Ring based 25 GB/s I/O bandwidth Up to 75 MB/s TDP 95 W @ 3GHz [W] Multithreading PPE: 2-way SPE: Table: Main features of the IBM’s Cell BE 10.3.2 Cell BE (13) Figure: Cell BE Blade Roadmap Source: Brochard L., A Cell History,” Cell Workshop, April, 2006 http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf 10.3.2 Cell BE (14) Figure: Roadmap of the Cell BE Source: Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, http://www.cercs.gatech.edu/docs/SC06_Cell_111606.pdf 10.3 Literature (1) POWER4, POWER4+ Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, http://www.llnl.gov/computing/tutorials/ibm_sp/ DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, http://h21007.www2.hp.com/dspp/files/unprotected/Itanium/sizingsuperheavys.pdf Grassl C., „New IBM Components for HPCx”, Dec. 2003, http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf Krevell K., „IBM’s POWER4 Unveiling Continuues”, Microprocessor Report, Nov. 20. 2000, pp- 1-4 Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001 http://www-03.ibm.coom/servers/eserver/pseries/hardware/whitepapers/power4.pdf Tendler, J.M., Dodson, S., Fields S., Le H., Sinharoy B.: Power4 System Microarchitecture,, IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5-25, http://www.research.ibm.com/journal/rd/461/tendler.pdf POWER5, POWER5+ Barney B., „IBM POWER Systems Overview”, Livermore Computing, 2006, http://www.llnl.gov/computing/tutorials/ibm_sp/ DeMone P., „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004, http://h21007.www2.hp.com/dspp/files/unprotected/Itanium/sizingsuperheavys.pdf Grassl C., „New IBM Components for HPCx”, Dec. 2003, http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf Kalla R., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003, www-csl.csres.utexas.edu/users/billmark/teach/cs352-05-spring/lectures/Lecture22-RonKallaIBM.pdf 10.3 Literature (2) POWER5, POWER5+ (cont.) Kalla R., Sinharoy B., Tendler J.: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003 http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf Krevell K., „POWER5 Tops on Bandwidth”, Microprocessor Report, Dec. 2003 http://studies.ac.upc.edu/ETSETB/SEGPAR/microprocessors/power5%20(2)%20(mpr).pdf Shinharoy B., Kalla R.N., Tendler J.M., Eickenmeyer R.J., Joyner J.B., „POWER5 system microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp. 505-521 Vetter S. et al., IBM System p5 Quad-Core Module Based on POWER5+ Technology,” Redbooks paper, IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf POWER6 Kanter D., „IBM Previews the Power6,” Oct. 2006, dkanter@realwordtech.com Cell BE Blachford N.: „Cell Architecture Explained Version 2”, http://www.blachford.info/computer/Cell/Cell1_v2.html Brochard L., A Cell History,” Cell Workshop, April, 2006 http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf Day M. and Hofstee P., „Hardware and Software Architectures for the Cell Broadband Engine processor, ”CODES, Sept. 2006, http://www.casesconference.org/cases2005/pdf/Cell-tutorial.pdf Gshwind M., „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006, http://beatys1.mscd.edu/compfront//2006/cf06-gschwind.pdf 10.3 Literature (3) Cell BE (cont.) Gschwind M., Hofstee H. P., Flachs B. K., Hophkins M., Watanabe Y., Yamazaki T „Synergistic Processing in Cell's Multicore Architecture,” IEEE Micro, Vol. 26, No. 2, 2006, pp. 10-24 Hofstee H. P., „Real-time Superconputing and Technology for Games and Entertainment,” 2006, http://www.cercs.gatech.edu/docs/SC06_Cell_111606.pdf Keable C., „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006, http://www.cse.clrc.ac.uk/disco/mew17/talks/Keable_IBM_MEW17.pdf Krolak D., „Unleashing the Cell Broadband Engine Processor,” MPR Fall Proc. Forum, Nov. 2005, http://www-128.ibm.com/developerworks/power/library/pa-fpfeib/?ca=dgr-lnxwCellConnects Krewell K., „Cell Moves Into The Limelight,” Microprocessor Report, Febr. 14 2005, pp. 1-9 Solie, D., „Technology Trends Presentation,” Power Symposium, Aug. 2006, http://www-03.ibm.com/procurement/proweb.nsf/objectdocswebview/ file14+-+darryl+solie+-+ibm+power+symposium+presentation/$file/ 14+-+darryl+solie-ibm-power+symposium+presentation+v2.pdf - „Cell Broadband Engine processor – based systems,” White Paper, IBM Corp., 2006