Dual/multi-core processors 1. Overview Figure 1.1: Processor power density trends Figure 1.2: Dual/multi-core processors (1) Figure 1.3: Dual/multi-core processors (2) Figure 1.4: Main aspects of the design space of the macro architecture of dual/multi-core processors 2. Attaching L2 caches Figure 2.1: Main design space dimensions of laying out L2 caches for dual/multicore processors Figure 2.2: Allocation of L2 caches to the cores and their use by instructions and data in dual/multi-core processors Figure 2.3: Integration of L2 caches to the processor chip in dual/multi-core processors Figure 2.4: Examples for using private L2 caches in dual/multi-core processors Figure 2.5: Examples for using shared L2 caches in dual/multi-core processors 3. Attaching L3 caches Figure 3.1: Main design space dimensions of laying out L3 caches for dual/multicore processors Figure 3.2: Inclusion policy and allocation of L3 caches in dual/multi-core processors Figure 3.3: Use of L3 caches in dual/multi-core processors Figure 3.4: Exclusive cache architecture Figure 3.5: Examples for using inclusive L3 caches in dual/multi-core processors Figure 3.6: Examples for using exclusive L3 caches in dual/multi-core processors Figure 3.7: Integration of L3 caches to the processor chip in dual/multi-core processors 4. Connecting memory and I/O Figure 4.1: Main design space dimensions of laying out the I/O and memory architecture in dual/multi-core processors Figure 4.2: Connection policy of I/O and memory in dual/multi-core processors Figure 4.3: Examples for connecting both I/O and memory via the system bus in dual core processors Figure 4.4: Examples for dedicated connection of I/O and memory in dual/multicore processors Figure 4.5: Integration of the memory controller to the processor chip in dual/multi-core processors 5. Selected implementations Intel’s dual/multi-core processors Figure 5.1: The move to Intel multi-core Figure 5.2: Processor specifications of Intel’s Pentium D family Figure 5.3: Specifications of Intel’s Pentium Processor Extrem Edition, models 840/955/965 Figure 5.4: Processor Specifications of Intel’s Yonah Duo (Core Duo) family Figure 5.5: Future 65 nm processors (overview) Figure 5.6: Future 45 nm processors (overview) AMD’s Athlon 64 X2 Figure 5.7: AMD Athlon 64 X2 dual-core processor architecture Sun’s UltraSPARC IV family Figure 5.8: UltraSPARC IV (Jaguar) Figure 5.9: UltraSPARC IV+ (Panther ) IBM’s POWER4/POWER5 Figure 5.10: POWER4 chip logical view Figure 5.11: POWER4 chip Figure 5.12: POWER4 and POWER5 system structures Figure 1.1: Processor power density trends Source: D. Yen: Chip Multithreading Processors Enable Reliable High Throughput Computing http://www.irps.org/05-43rd/IRPS_Keynote_Yen.pdf Superscalar processors RISC IBM Sun Dual core single threaded Dual core dual threaded Power4 Power5 (2001) 0.18 µ/184 mtrs. (2004) 0.13 /276 mtrs. Multi core multi threaded UltraSPARC IV (Jaguar) (2004) 2*USIII 0.13 /66 mtrs. UltraSPARC IV+ (Panther) (2005) 0.09 µ/295 mtrs. Gemini (2004) 2*USIII 0.13 /80 mtrs. HP PA 8800 (Mako) (2004) 2*PA8700 0.13 /300 mtrs. PA 8900 (Shortfin) (5/2005) 0.13µ/317 mtrs. Figure 1.2: Dual/multi-core processors (1) UltraSPARC T1 (Niagara) (2006?) 8 cores/4T 0.09 / Superscalar processors CISC Dual core single threaded Dual core dual threaded Pentium D 820-840 (Smithfield) Pentium EE 840 Intel (4/2005) 0.09 /230 mtrs. (4/2005) 0.09 /230 mtrs. Pentium EE 955 (Presler) Pentium M Core Duo (Yonah) (4/2005) 0.065 /2*188 mtrs. (2005) 0.065 /151 mtrs. AMD Athlon 64 X2 (6/2005) 0.09 /233 mtrs. VLIW processors Intel Montecito (2006?) 2*Itanium 2 (Madison) 0.09 /1730 mtrs. Figure 1.3: Dual/multi-core processors (2) Multi core multi threaded Macro architecture of dual/multi-core processors Layout of the cores Layout of the L2 cache(s) Layout of the L3 caches (if only) Layout of the I/O and memory architecture Figure 1.4 : Main aspects of the design space of the macro architecture of dual/multi-core processors Layout of the L2 cache(s) for dual/multi-core processors Allocation to the cores Private L2 caches for each core Use by instructions/data Shared L2 cache for all cores Unifield instr./data cache(s) Split instr./data cache(s) On-chip L2 tags/ contr., off-chip data Banking policy Inclusion policy Inclusive cache Integration to the proc. chip Exclusive cache Single-banked implementation M ulti-banked implementation Figure 2.1: Main design space dimensions of laying out L2 caches for dual/multi-core processors Entire L2 on-chip Allocation of L2 caches to the cores Private L2 caches for each core S plit instr./data cache(s) Use by instr./data Unifield instr./data cache(s) S hared L2 cache for all cores Montecito (2006?) UltraSPARC IV (2004) Smithfield (2005) UltraSPARC T1 (2005) Yonah (2006) Core Duo (2006) POWER4 (2001) POWER5 (2005) Athlon 64 X2 (2005) Exp. trend Expected trend Figure 2.2: Allocation of L2 caches to the cores and their use by instructions and data in dual/multi-core processors Integration of the L2 to the processor chip On-chip L2 tags/contr. off-chip data Entire L2 on-chip UltraSPARC IV (2004) UltraSPARC V (2005) POWER4 (2001) POWER5 (2005) Smithfield (2005) Presler (2005) Athlon 64 X2(2005) Expected trend Figure 2.3: Integration of L2 caches to the processor chip in dual/multi-core processors Private L2 caches for each core Unified instruction/data caches On-chip L2 tags/contr., off-chip data S plit instruction/data caches On-chip L2 tags/contr., off-chip data Entire L2 on-chip Entire L2 on-chip Examples: UltraSPARC IV (2004) L2 data Smithfield (2005) Presler (2005) (Exclusive L2) L2 data L2 Core L2 tags/contr. L2 I L2 Syst. if. Mem. contr. L2 D HT-bus contr. Mem contr. Memory L2 I HT-bus Memory L3 FSB Fire Plane bus Core L2 D Xbar Core Syst. if. Core System Request Queue L2 Core L2 Core L2 tags/contr. Interconn. network Montecito (2006?) Athlon 64 X2 (2005) L3 Syst. if. FSB Figure 2.4: Examples for using private L2 caches in dual/multi-core processors S hared L2 cache for all cores Dual core/single banked L2 Dual core/multi banked L2 Multi core/multi banked L2 Examples: Yonah Duo (2006) Core (2006) Core UltraSPARC T1 (2005) POWER4 (2001) POWER5 (2005) Core Core (Niagara) (8 cores/4xL2 banks) L2 contr. Core Core Core X-bar X-bar L2 L2 L2 L2 System if. Fabric Bu SContr. Fabric Bus Contr. L2 L2 Mem. contr. Mem. contr. FSB GX contr. Memory Memory L3 tags/ contr. GX bus Mapping of addresses to the banks: The 128-byte long L2 cache lines are hashed across the 3 modules. Hashing is performed by modulo 3 arithmetric applied on a large number of real address bits. 7 6 Mapping of addresses to the banks: The four L2 modules are interleaved at 64-byte blocks. 0 Addr. 196 128 Modulo 3 0 1 2 64 0 256 Figure 2.5: Examples for using shared L2 caches in dual/multi-core processors Layout of the L3 cache(s) for dual/multi-core processors Allocation to the L2 cache(s)/cache banks Private L3 caches for each cache/ cache banks Use by instructions/data Shared L3 cache for all L2 cache/ cache banks Unifield instr./data cache(s) Split instr./data cache(s) On-chip L3 tags/ contr., off-chip data Banking policy Inclusion policy Inclusive cache Integration to the proc. chip Exclusive cache Single-banked implementation M ulti-banked implementation Figure 3.1: Main design space dimensions of laying out L3 caches for dual/multi-core processors Entire L3 on-chip Inclusion policy (of L3 caches) Allocation to the L2 cache(s)/ cache banks Inclusive cache Exclusive cache S hared L3 cache for all L2 cache/ cache bank POWER4 (2001) UltraSPARC IV+ (2005) Private L3 cache for each L2 cache/ cache bank Montecito (2006?) POWER5 (2005) Expected trend Figure 3.2: Inclusion policy and allocation of the L3 cache in dual/multi-core processors Use of L3 caches Inclusive L3 cache (In-path cache between the L2 and the memory) L2 L3 Explicite L3 cache (Victim cache of the L2) L2 L3 Memory Memory Lines replaced (victimized) in the L2 are written into the L3 References to data in the L3 initiate reloading that cache line into the L2, L3 operates usually as write back cache (only modified data that is replaced in the L3 is written back to the memory), Unmodified data that is replaced in the L3 is deleted. Examples: Montecito (2006?) POWER4 (2001) POWER5 (2005) UltraSPARC IV+ (2005) Figure 3.3: Use of L3 caches in dual/multi-core processors Figure 3.4: Exclusive cache architecture Source: Zheng, Y., Davis, B.T., Jordan, M.: “ Performance evaluation of exclusive cache hierarchies”, 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2004, pp. 89-96. Inclusive L3 cache (In-path cache between the L2 and the memory) Private L3 caches for each L2 cache banks L3 tags/contr., on-chip data off-chip S hared L3 cache for all cache banks Entire L3 on-chip L3 tags/contr., on-chip data off-chip Montecito (2006?) POWER4 (2001) Entire L3 on-chip Examples: L2 I L2 D L2 I L2 D L2 L2 L2 Fabric Bus Contr. L3 L3 L3 tags/contr. Arbiter L3 data System if. FSB Mem. contr. Memory Figure 3.5: Examples for using inclusive L3 caches in dual/multi-core processors Exclusive L3 cache (Victim cache of L2) Private L3 caches for each L2 cache bank L3 tags/contr. on-chip, data off-chip Entire L3 on-chip S hared L3 cache for all cache banks L3 tags/contr. on-chip, data off-chip Entire L3 on-chip Examples: POWER5 (2005) UltraSPARC IV+ (2005) L3 data L3 tags/contr. L2 L3 data L3 tags/contr. L2 L3 tags/contr. L3 data L2 L2 L3 tags/contr. L3 data Interconn. network Core Core Fabric Bus Contr. Syst. if. Mem. contr. Memory contr. Fire Plane bus Memory Memory Figure 3.6: Examples for using exclusive L3 caches in dual/multi-core processors Integration of L3 caches to the processor chip On-chip L3 tags/contr. off-chip data Entire L3 on-chip UltraSPARC IV+ (2005) POWER4 (2001) POWER5 (2005) Montecito (2006?) Expected trend Figure 3.7: Integration of L3 caches to the processor chip in dual/multi-core processors Layout of the I/O and memory architecture in dual/multi-core processors Integration of the memory controller to the processor chip Connection policy of I/O and memory Connecting both I/O and memory via the system bus Dedicated connection of I/O and memory Asymmetric connection of I/O and memory Off-chip memory controller On-chip memory controller Symmetric connection of I/O and memory Figure 4.1: Main design space dimensions of laying out the I/O and memory architecture in dual/multi-core processors Connection policy of I/O and memory Connecting both I/O and memory via the system bus Dedicated connection of I/O and memory Asymmetric connection of I/O and memory PA-8800 (2004) PA-8900 (2005) Smithfield (2005) Presler (2005) Yonah Duo (2006) Core (2006) Montecito (2006?) POWER4 (2001) UltraSPARC T1 (2005) S ymmetric connection of I/O and memory POWER5 (2005) UltraSPARC IV (2004) UltraSPARC IV+ (2005) Athlon64 X2 (2005) Figure 4.2: Connection policy of I/O and memory in dual/multi-core processors Connecting both I/O and memory via the system bus Examples: Smithfield/Presler (2005/2005) L2 Yonah Duo/Core (2006/2006) L2 L2 Syst. bus if. Syst. bus if. FSB FSB Montecito (2006) L2 I/ L2 D L2 PA-8800 (2004) PA-8900 (2005) L2 I/ L2 D L2 Syst. bus if. FSB L2 Core L2 contr. Core Syst. bus if. FSB Figure 4.3: Examples for connecting both I/O and memory via the system bus in dual core processors Dedicated connection of I/O and memory S ymmetric connection of I/O and memory (Connecting both I/O and memory via the internal interconnection network) Asymmetric connection of I/O and memory (Connecting I/O via the internal interconnection network, and memory via the L2/L3 cache) Examples: POWER5 (2005) UltraSPARC T1 (2005) Core 0 L2 L2 M. contr. L2 data Memory L2 L2 L2 X M. contr. L2 b a L2 r M. contr. Memory Memory Chip-chip/ Mem.-Mem. interconn. L2 M. contr. L2 data L3 L2 tags/contr. L2 tags/contr. Fabric Bus Contr. GX contr. Core 7 UltraSPARC IV (2004) Mem contr. Interconn. network Core Memory Core Syst. if. GX. bus Memory Mem. contr. Fire Plane bus Bus if. Memory JBus POWER4 (2001) Athlon 64 X2 (2005) UltraSPARC IV+ (2005) L3 data L2 L2 L2 L2 Chip-to-chip/ Mem.-to-Mem. interconn. L2 L3 tags/contr. Fabric Bus Contr. System Request Queue GX contr. L3 dir./ contr. L2 Xbar Interconn. network Core GX-bus L3 data HT-bus contr. Mem contr. Syst. if. Mem. contr. Mem. contr. HT-bus Memory Memory Fire Plane bus Memory Figure 4.4: Examples for dedicated connection of I/O and memory in dual/multi-core processors Core Integration of the memory controller to the processor chip Off-chip memory controller POWER4 (2001) UltraSPARC T1 (2005) PA-8800 (2004) PA-8900 (2005) Smithfield (2005) Presler (2005) Yonah Duo (2006) Core (2006) Montecito (2006?) On-chip memory controller POWER5 (2005) UltraSPARC IV+ (2005) UltraSPARC IV (2004) Athlon 64 X2(2005) Expected trend Figure 4.5: Integration of the memory controller to the processor chip in dual/multi-core processors The Move to Intel Multi-core 2005 Platform 2006 2007+ Itanium® Itanium® processor MP Server DP Server / WS Desktop Client Mobile Client today All products and dates are preliminary and subject to change without notice. Refer to ‘fact sheet’ for specific product timings Figure 5.1: The move to Intel multi-core Source: A. Loktu: Itanium 2 for Enterprise Computing http://h40132.www4.hp.com/upload/se/sv/Itanium2forenterprisecomputing.pps Intel® Pentium® D Processor 950 Remove Intel® Pentium® D Processor 940 Remove Intel® Pentium® D Processor 930 Remove Intel® Pentium® D Processor 920 Remove Intel® Pentium® D Processor 840 Remove Intel® Pentium® D Processor 830 Remove Intel® Pentium® D Processor 820 Remove Intel® Pentium® D Processor 805 Remove Intel® Pentium® D Processor 960 Remove Processor Number Δ 950 940 930 920 840 830 820 805 960 Architecture 65 nanometer technology 65 nanometer technology 65 nanometer technology 65 nanometer technology 90 nanometer technology 90 nanometer technology 90 nanometer technology 90 nanometer technology 65 nanometer technology L2 Cache 2x2M 2x2M 2x2M 2x2M 2x1M 2x1M 2x1M 2x1M 2x2M L3 Cache N/A N/A N/A N/A N/A N/A N/A N/A N/A Clock Speed 3.40 GHz 3.20 GHz 3 GHz 2.80 GHz 3.20 GHz 3 GHz 2.80 GHz 2.66 GHz 3.60 GHz Front Side Bus Speed 800 MHz 800 MHz 800 MHz 800 MHz 800 MHz 800 MHz 800 MHz 533 MHz 800 MHz Other Intel Technologies Execute Disable Bit°, VT±, Intel® EM64TØ, dual-core Execute Disable Bit°, VT±, Intel® EM64TØ, dual-core Execute Disable Bit°, VT±, Intel® EM64TØ, dual-core Execute Disable Bit°, VT±, Intel® EM64TØ, dual-core Execute Disable Bit°, Intel® EM64TØ, dual-core, EIST* Execute Disable Bit°, Intel® EM64TØ, dual-core, EIST* Execute Disable Bit°, Intel® EM64TØ, dual-core Execute Disable Bit°, Intel® EM64TØ, dual-core Execute Disable Bit°, VT±, Intel® EM64TØ, dual-core, EIST* Package FC-LGA FC-LGA FC-LGA FC-LGA FC-LGA FC-LGA FC-LGA FC-LGA LGA 775 Chipset Intel® 945 Express Chipset, Intel® 955X Express Chipset, Intel® 975X Express Chipset Intel® 945 Express Chipset, Intel® 955X Express Chipset, Intel® 975X Express Chipset Intel® 945 Express Chipset, Intel® 955X Express Chipset, Intel® 975X Express Chipset Intel® 945 Express Chipset, Intel® 955X Express Chipset, Intel® 975X Express Chipset Intel® 955X Express Chipset, Intel® 945G Express Chipset, Intel® 945P Express Chipset, Intel® E7230 Chipset Intel® 955X Express Chipset, Intel® 945G Express Chipset, Intel® 945P Express Chipset, Intel® E7230 Chipset Intel® 955X Express Chipset, Intel® 945G Express Chipset, Intel® 945P Express Chipset, Intel® E7230 Chipset Intel® 975X Express Chipset, Intel® 945 Express Chipset, Intel® E7230 Chipset, Intel® 955X Express Chipset Intel® 945 Express Chipset, Intel® 975X Express Chipset Figure 5.2: Processor specifications of Intel’s Pentium D family Source: http://www.intel.com/products/processor/index.htm Pentium® Processor Extreme Edition 965 Remove Pentium® Processor Extreme Edition 955 Remove Pentium® Processor Extreme Edition 840 Remove Processor Number Δ 965 955 840 Architecture 65 nanometer technology 65 nanometer technology 90 nanometer technology L2 Cache 2x2M 2x2M 2x1M L3 Cache N/A N/A N/A Clock Speed 3.73 GHz 3.46 GHz 3.20 GHz Front Side Bus Speed 1066 MHz 1066 MHz 800 MHz Other Intel Technologies HT† Enabled, Intel® EM64TØ, VT±, Execute Disable Bit°, dual-core HT† Enabled, Execute Disable Bit°, VT±, Intel® EM64TØ, dual-core HT† Enabled, Execute Disable Bit°, Intel® EM64TØ, dual-core Package FC-LGA FC-LGA FC-LGA Chipset Intel® 975X Express Chipset Intel® 975X Express Chipset Intel® 955X Express Chipset Memory Type NA Dual Channel DDR2 400/533 Dual-Channel DDR2 Boards NA Intel® Desktop Board D975XBX Intel® Desktop Board D955XCS, Intel® Desktop Board D955XBK Slot/Socket Type LGA775 LGA775 LGA775 Min-Max Voltage 1.200 - 1.3375 1.25-1.40 1.20-1.40 Pin Count 775-land 775-land 775-land sSpec Number SL9AN SL94N SL8FK Figure 5.3 Specifications of Intel’s Pentium Processor Extrem Edition models 840/955/965 Source: http://www.intel.com/products/processor/index.htm Intel® Core™ Duo processor T2600 Remove Intel® Core™ Duo processor T2500 Remove Intel® Core™ Duo processor T2400 Remove Intel® Core™ Duo processor T2300 Remove Intel® Core™ Duo processor L2400 (Low Voltage) Remove Intel® Core™ Duo processor L2300 (Low Voltage) Remove Processor NumberΔ T2600 T2500 T2400 T2300 L2400 L2300 Architecture 65-nm process technology 65-nm process technology 65-nm process technology 65-nm process technology 65-nm process technology 65-nm process technology L2 Cache 2MB 2MB 2MB 2MB 2MB 2MB Clock Speed 2.16 GHz 2 GHz 1.83 GHz 1.66 GHz 1.66 GHz 1.50 GHz Front Side Bus 667 MHz 667 MHz 667 MHz 667 MHz 667 MHz 667 MHz Chipset Mobile Intel® 945PM Chipset, Mobile Intel® 945GM Chipset Mobile Intel® 945GM Chipset, Mobile Intel® 945PM Chipset Mobile Intel® 945GM Chipset, Mobile Intel® 945PM Chipset Mobile Intel® 945GM Chipset, Mobile Intel® 945PM Chipset Mobile Intel® 945GM Chipset, Mobile Intel® 945PM Chipset Mobile Intel® 945GM Chipset, Mobile Intel® 945PM Chipset Wireless Intel® PRO/ Wireless 3945ABG 802.11a/b/g Intel® PRO/ Wireless 3945ABG 802.11a/b/g Intel® PRO/ Wireless 3945ABG 802.11a/b/g Intel® PRO/ Wireless 3945ABG 802.11a/b/g Intel® PRO/ Wireless 3945ABG 802.11a/b/g Intel® PRO/ Wireless 3945ABG 802.11a/b/g Figure 5.4: Procesor specifications of Intel’s Yonah Duo (Core Duo) family Source: http://www.intel.com/products/processor/index.htm Category Code Name Cores Cache Market Dual core Desktop Kentsfield 4 MB Mid 2007 multi-die Dual core Desktop Conroe 4 MB shared End 2006 single die Dual core Desktop Allendale 2 MB shared End 2006 single die Desktop Cedar Mill (NetBurst/P4) Single core 512 kB, 1 MB, 2 MB Early 2006 Desktop Presler (NetBurst/P4) Dual core, dual die 4 MB Early 2006 Desktop/Mobile Millville Single core 1 MB Early 2007 Mobile Yonah2 Dual core, single die 2 MB Early 2006 Mobile Yonah1 Single core 1/2 MB Mid 2006 Mobile Stealey Single core 512 kB Mid 2007 Mobile Merom Dual core, single die 2/4 MB shared End 2006 Enterprise Sossaman Dual core, single die 2 MB Early 2006 Enterprise Woodcrest Dual core, single die 4 MB Mid 2006 Enterprise Clovertown Quad core, multi-die 4 MB Mid 2007 Enterprise Dempsey (NetBurst/Xeon) Dual core, dual die 4 MB Mid 2006 Dual core Enterprise Tulsa 4/8/16 MB End 2006 single die Quad core Enterprise Whitefield 8 MB, 16 MB shared Early 2008 single die Figure 5.5: Future 65 nm processors (overview) Source: P. Schmid: Top Secret Intel Processor Plans Uncovered www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered Codename Desktop Wolfdale Desktop Ridgefield Desktop Yorkfield Desktop Bloomfield Desktop/Mobile Perryville Mobile Penryn Mobile Silverthorne Enterprise Hapertown Cores Dual core, single die Dual core single die 8 cores multi-die Quad core, single die Single core Dual core single die 8 cores multi-die Cache Market 3 MB shared 2008 6 MB shared 2008 12 MB shared 2008+ 2 MB 2008+ 2008 3 MB, 6 MB shared 2008 - 2008+ 12 MB shared 2008 Figure 5.6: Future 45 nm processors (overview) Source: P. Schmid: Top Secret Intel Processor Plans Uncovered www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered Figure 5.7: AMD Athlon 64 X2 dual-core processor architecture Source: AMD Athlon 64 X2 Dual-Core Processor for Desktop – Key Architecture Features, http:///www.amd.com/us-en/Processors/ProductInformation/0,,30_118_9485_13041.00.html Figure 5.8: UltraSPARC IV (Jaguar) Source: C. Boussard: Architecture des processeurs http://laser.igh.cnrs.fr/IMG/pdf/SUN-CNRS-archi-cpu-3.pdf Figure 5.9: UltraSPARC IV+ (Panther) Source: C. Boussard: Architecture des processeurs http://laser.igh.cnrs.fr/IMG/pdf/SUN-CNRS-archi-cpu-3.pdf Figure 5.10: POWER4 chip logical view Source: J.M. Tendler, S. Dodson, S. Fields, H. Le, B. Sinharoy: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001 http://www-03.ibm.coom/servers/eserver/pseries/hardware/whitepapers/power4.pdf Figure 5.11: POWER4 chip Source: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003 http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf Figure 5.12: POWER4 and POWER5 system structures Source: R. Kalla, B. Sinharoy, J.M. Tendler: IBM Power5 chip: A Dual-core multithreaded Processor, IEEE. Micro, Vol. 24, No.2, March-April 2004, pp. 40-47.