Class Presentation of Custom DSP Implementation Course ECE Department – University of Tehran The CELL processor Prepared and Presented by: S.H.R. Ahmadi May 2005 This is a class presentation. All data are copyrights of their respective authors as listed in the references and have been used here for educational purposes only. Notice: • Photos and Diagrams are proprietary to IBM • The Cell processor, Power & PowerPC are trademarks of IBM • PlayStation™ 3 is a trademark of Sony Computer Entertainment Inc. (SCEI) • FlexIO™ & XDR™ are Rambus Inc. trademarks • All data are gathered from public sources which are listed in the “References” Outline • Development History • Specifications & Architecture • Applications • Software Aspects • Marketing & NEWS • References Development History • Completely secret and under cover • March 12, 2001 – “Cell” announced – “supercomputer-on-a-chip” from Sony,Toshiba,IBM – Capable of TeraFlops computation speed – $400m investment in 5 years • March, 2002 – Okamoto speech – 2005 target date – First glimpse of cell idea: 1000x figure • August, 2002 – Cell design finished – near “tape out” – “4-16 general-purpose processor cores per chip” Development History • November, 2002 – Rambus licenses “Yellowstone” technology to Toshiba – Yellowstone : 3.2 GHz memory • January, 2003 – Rambus licenses Yellowstone/Redwood Technology to Sony – Redwood – parallel interface between chips • January, 2003 – Cell at 4 GHz, 1024 bit bus, 64 MB memory, PowerPC – At least 4 patents in 2002 & 2003 on: • • • • Hardware & software architecture Processing modules Memory protection data synchronization Development History • 2004 – Marketing NEWS – Some general technical data • May, 2004 – CELL-based Workstation will be made • Application : digital content creation • February, 2005 – Formal introduction at ISSCC’05 – Extensive media coverage • May, 2005 – Sony’s PlayStation3 formal announcement Outline • Development History • Specifications & Architecture • Applications • Software Aspects • Marketing & NEWS • References Specifications & Architecture • Broadband Processor Architecture – Optimized for broadband media and 3D graphics • 90-nm PD-SOI process, 8M (copper) • 234 million transistors in ~ 235 mm2 • 4.6 GHz operation at 1.3v • 85° Celsius operating temp. with heat sink • Thermal protection schemes • 2965 core connections / ~ 1300 pins • 256 GFlops SP-FP , 26 GFlops DP-FP • HUGE communication speed to outside • 4 x 128 bit internal bus (ring), 96 Bytes/cycle Specifications & Architecture BPA (Cell) design features: • Multi-Core Architecture • Based on the Power Architecture – Code compatibility • Coherent and cooperative off-load processing • Enhanced SIMD architecture • Power efficiency improved • “Absolute timers“ allow "hard” realtime data processing – Good estimation of execution time is possible • Big-endian memory – Support Apple, but not Intel • Isolation mechanism for secure code execution Specifications & Architecture BPA (Cell) design justification: • Multi-Core Non-Homogeneous Architecture – Better Power • 3-level Model of Memory – Main Memory, Local Store, Registers – Better Memory • Large Register File & SW Controlled Branching – Allows deeper pipelines – Better Frequency FlexIO Specifications & Architecture CPU: (Power Processor Element) • 64-bit Power Architecture™ with VMX(SIMD) • In-order, 2-way hardware Multi-threading – Simple design improvements possible – predictable execution times • Coherent Load/Store Cache (32KB L1 - 512KB L2) • Redesigned for use in the Cell processor Serves as a: • multi-OS GPP • Control unit for SPEs Specifications & Architecture SPE: (synergistic Processing Element) • Dual issue, 128-bit 4-way SIMD – Vector Processing • 4 Integer Units + 4 FP Units • 8-,16-,32-bit Integer + 32-,64-bit FP • 128x128-bit Registers • 256KB Local-Store Memory – Caches are not used – Data & Instruction in LS (specially designed) Specifications & Architecture SPE: • Coherent & Cooperative off-load engines for CPU – Works independently – Not directly tied to CPU as co-processor • Dedicated DMA engine – Move data : CPUSPE or SPESPE – Parallel or Serial with other SPEs • Dynamically configurable to protect resources • Can perform security algorithms Specifications & Architecture • 8 SPE blocks, each with 32 GFlops or 32 Gops Monstrous processing power Need to be fed accordingly Solution : EIB High-Speed MEM (Dual XDR™) High-Speed IO (FlexIO™) Specifications & Architecture EIB: (Element Interconnect Bus) • Data ring for internal communication • Four 16 byte data rings – low latency • Multiple simultaneous transfers • 96B/cycle peak bandwidth (@ ½ CPU speed ) Specifications & Architecture External Memory Bus: • Licensed from Rambus • Dual XDR™ interface (25.6GB/s @ 3.2GHz) External IO: • Licensed from Rambus • FlexIO™ interface (each 2-wire bit @ 800Mbps) • Total 76.8 GB/s ( 7 Tx Bytes + 5 Rx Bytes ) • Excessive Shielding is necessary – Many VDD/GND wires – 90% of all pins Outline • Development History • Specifications & Architecture • Applications • Software Aspects • Marketing & NEWS • References Applications According to IBM: • CELL design was based on the analysis of a broad range of workloads in areas such as cryptography, graphics transform and lighting, physics, fast-Fourier transforms (FFT), matrix operations, and scientific workloads • The Cell processor is designed for graphics- and network-intensive jobs ranging from video games to complex imaging for the medical, defense, automotive and aerospace industries Applications • Games,3D Graphics,Video,Audio – Image manipulation; Video processing, encoding, decoding • DSP (Digital Signal Processing) – FFT (e.g. SETI); Distributed DSP • Digital Rights Management – Cryptography; Secure data processing • Scientific Calculations – Linear system solvers; Linear algebra; PDE • Super Computing • Servers (Commercial databases) • Stream Processing Applications – Serial use of SPE blocks (e.g. Digital TV) Applications Outline • Development History • Specifications & Architecture • Applications • Software Aspects • Marketing & NEWS • References Software Aspects According to Experts: • Programming the Cell processor requires new tools & new programming paradigm – Because SPE programs should be self-contained with data and instruction bundles • For a game console, programmers will craft custom optimized code. The next challenge for the STI is to find a way to make this architecture accessible to programmers beyond game developers • Cell is "OS neutral" and supports multiple OS simultaneously Software Aspects • Tool chain for Cell is built on PowerPC Linux – Early availability of SIMD-optimized compilers – Development of high-performance graphics and media libraries for the Broadband Architecture entirely in C – CELL team developed the first SPU compiler – Development of an advanced parallelizing compiler with auto-SIMDization features based on IBM XL compiler technology Outline • Development History • Specifications & Architecture • Applications • Software Aspects • Marketing & NEWS • References Marketing & NEWS • “Cell is basically a vector supercomputer on a chip”, we present the 2004 Microprocessor Report Analysts’ Choice Award for Best Technology to the Cell Processor • IBM is working with companies to integrate Cell microprocessor into third-party products • The companies are working with open-source compiler developers to create software development tools for programmers Marketing & NEWS • Sony PlayStation™ 3 • Cell Processor running at 3.2Ghz – 7 special purpose 3.2Ghz processors – 218 gigaflops of performance • 256Mb XDR main RAM at 3.2 GHz • 256Mb of GDDR VRAM at 700Mhz • Support for seven Bluetooth controllers • Supports Blu-ray DVD format • System Floating Point Performance of 2 teraflops • Communication Ethernet, Wi-Fi IEEE 802.11, Bluetooth • Output in HDTV resolution up to 1080p as standard Marketing & NEWS • Cell Processor Based Workstation (CPBW) • From Sony Group and IBM • First Prototype “Powered On” • 16 TeraFlops in a rack (est.) • Optimized for Digital Content Creation – – – – Computer entertainment Movies Real-time rendering Physics simulation • Affordable by Small Businesses (and Individuals) Marketing & NEWS • CELL Industries • Our Objective : Distributing Cell Power • Facilitate small-scale supercomputer applications for Cell • Cell-based systems – affordable for individuals and small to medium-sized businesses • Our Cell PCI-x plug-in card, xpac-zero – fastest and most economical way for people to get their hands on some real computing power • Uses Cell as a general-purpose numerical accelerator – The xpac-zero card acts much like a video card Outline • Development History • Specifications & Architecture • Applications • Software Aspects • Marketing & NEWS • References References • IBM, Sony, Toshiba papers in ISSCC’05 – “A Streaming Processing Unit for a CELL Processor”, B. Flachs et. al. – “The Design and Implementation of a First-Generation CELL Processor”, D. Pham et. al. • “Microprocessor Report”, Reed Electronics Group, 2005, Jan. 31 & Feb. 14 • “IBM’s Cell Processor : The next generation of computing?”, D.K. Every, Shareware Press, Feb. 2005 References • “Power Efficient Processor Architecture and The Cell Processor”, H.P. Hofstee, HPCA-11 2005 • “Power Efficient Processor Design and the Cell Processor”, IBM, 2005 • “Introducing the IBM/Sony/Toshiba Cell Processor“, J. H. Stokes, http://arstechnica.com/ • “Cell Architecture Explained”, N. Blachford, http://www.blachford.info/ Thank you