The CELL Processor

advertisement
Class Presentation of
Custom DSP Implementation Course
ECE Department – University of Tehran
The CELL processor
Prepared and Presented by:
S.H.R. Ahmadi
May 2005
This is a class presentation. All data are copyrights of their
respective authors as listed in the references and have
been used here for educational purposes only.
Notice:
• Photos and Diagrams are proprietary to IBM
• The Cell processor, Power & PowerPC are
trademarks of IBM
• PlayStation™ 3 is a trademark of Sony
Computer Entertainment Inc. (SCEI)
• FlexIO™ & XDR™ are Rambus Inc. trademarks
• All data are gathered from public sources which
are listed in the “References”
Outline
• Development History
• Specifications & Architecture
• Applications
• Software Aspects
• Marketing & NEWS
• References
Development History
• Completely secret and under cover
• March 12, 2001 – “Cell” announced
– “supercomputer-on-a-chip” from Sony,Toshiba,IBM
– Capable of TeraFlops computation speed
– $400m investment in 5 years
• March, 2002 – Okamoto speech
– 2005 target date
– First glimpse of cell idea: 1000x figure
• August, 2002 – Cell design finished
– near “tape out”
– “4-16 general-purpose processor cores per chip”
Development History
• November, 2002 – Rambus licenses
“Yellowstone” technology to Toshiba
– Yellowstone : 3.2 GHz memory
• January, 2003 – Rambus licenses
Yellowstone/Redwood Technology to Sony
– Redwood – parallel interface between chips
• January, 2003
– Cell at 4 GHz, 1024 bit bus, 64 MB memory, PowerPC
– At least 4 patents in 2002 & 2003 on:
•
•
•
•
Hardware & software architecture
Processing modules
Memory protection
data synchronization
Development History
• 2004
– Marketing NEWS
– Some general technical data
• May, 2004
– CELL-based Workstation will be made
• Application : digital content creation
• February, 2005
– Formal introduction at ISSCC’05
– Extensive media coverage
• May, 2005
– Sony’s PlayStation3 formal announcement
Outline
• Development History
• Specifications & Architecture
• Applications
• Software Aspects
• Marketing & NEWS
• References
Specifications & Architecture
• Broadband Processor Architecture
– Optimized for broadband media and 3D graphics
• 90-nm PD-SOI process, 8M (copper)
• 234 million transistors in ~ 235 mm2
• 4.6 GHz operation at 1.3v
• 85° Celsius operating temp. with heat sink
• Thermal protection schemes
• 2965 core connections / ~ 1300 pins
• 256 GFlops SP-FP , 26 GFlops DP-FP
• HUGE communication speed to outside
• 4 x 128 bit internal bus (ring), 96 Bytes/cycle
Specifications & Architecture
BPA (Cell) design features:
• Multi-Core Architecture
• Based on the Power Architecture
– Code compatibility
• Coherent and cooperative off-load processing
• Enhanced SIMD architecture
• Power efficiency improved
• “Absolute timers“ allow "hard” realtime data processing
– Good estimation of execution time is possible
• Big-endian memory
– Support Apple, but not Intel
• Isolation mechanism for secure code execution
Specifications & Architecture
BPA (Cell) design justification:
• Multi-Core Non-Homogeneous Architecture
– Better Power
• 3-level Model of Memory
– Main Memory, Local Store, Registers
– Better Memory
• Large Register File & SW Controlled Branching
– Allows deeper pipelines
– Better Frequency
FlexIO
Specifications & Architecture
CPU:
(Power Processor Element)
• 64-bit Power Architecture™ with VMX(SIMD)
• In-order, 2-way hardware Multi-threading
– Simple design  improvements possible
– predictable execution times
• Coherent Load/Store Cache (32KB L1 - 512KB L2)
• Redesigned for use in the Cell processor
Serves as a:
• multi-OS GPP
• Control unit for SPEs
Specifications & Architecture
SPE:
(synergistic Processing Element)
• Dual issue, 128-bit 4-way SIMD
– Vector Processing
• 4 Integer Units + 4 FP Units
• 8-,16-,32-bit Integer + 32-,64-bit FP
• 128x128-bit Registers
• 256KB Local-Store Memory
– Caches are not used
– Data & Instruction in LS
(specially designed)
Specifications & Architecture
SPE:
• Coherent & Cooperative off-load engines for CPU
– Works independently
– Not directly tied to CPU as co-processor
• Dedicated DMA engine
– Move data : CPUSPE or SPESPE
– Parallel or Serial with other SPEs
• Dynamically configurable to protect resources
• Can perform security algorithms
Specifications & Architecture
• 8 SPE blocks, each with 32 GFlops or 32 Gops
 Monstrous processing power
 Need to be fed accordingly
 Solution :
EIB
High-Speed MEM (Dual XDR™)
High-Speed IO (FlexIO™)
Specifications & Architecture
EIB:
(Element Interconnect Bus)
• Data ring for internal communication
• Four 16 byte data rings – low latency
• Multiple simultaneous transfers
• 96B/cycle peak bandwidth (@ ½ CPU speed )
Specifications & Architecture
External Memory Bus:
• Licensed from Rambus
• Dual XDR™ interface (25.6GB/s @ 3.2GHz)
External IO:
• Licensed from Rambus
• FlexIO™ interface (each 2-wire bit @ 800Mbps)
• Total 76.8 GB/s ( 7 Tx Bytes + 5 Rx Bytes )
• Excessive Shielding is necessary
– Many VDD/GND wires
– 90% of all pins
Outline
• Development History
• Specifications & Architecture
• Applications
• Software Aspects
• Marketing & NEWS
• References
Applications
According to IBM:
• CELL design was based on the analysis of a broad
range of workloads in areas such as cryptography,
graphics transform and lighting, physics, fast-Fourier
transforms (FFT), matrix operations, and scientific
workloads
• The Cell processor is designed for graphics- and
network-intensive jobs ranging from video games to
complex imaging for the medical, defense, automotive
and aerospace industries
Applications
• Games,3D Graphics,Video,Audio
– Image manipulation; Video processing, encoding, decoding
• DSP (Digital Signal Processing)
– FFT (e.g. SETI); Distributed DSP
• Digital Rights Management
– Cryptography; Secure data processing
• Scientific Calculations
– Linear system solvers; Linear algebra; PDE
• Super Computing
• Servers (Commercial databases)
• Stream Processing Applications
– Serial use of SPE blocks (e.g. Digital TV)
Applications
Outline
• Development History
• Specifications & Architecture
• Applications
• Software Aspects
• Marketing & NEWS
• References
Software Aspects
According to Experts:
• Programming the Cell processor requires new tools
& new programming paradigm
– Because SPE programs should be self-contained with data
and instruction bundles
• For a game console, programmers will craft custom
optimized code. The next challenge for the STI is to
find a way to make this architecture accessible to
programmers beyond game developers
• Cell is "OS neutral" and supports multiple OS
simultaneously
Software Aspects
• Tool chain for Cell is built on PowerPC Linux
– Early availability of SIMD-optimized compilers
– Development of high-performance graphics and
media libraries for the Broadband Architecture entirely
in C
– CELL team developed the first SPU compiler
– Development of an advanced parallelizing compiler
with auto-SIMDization features based on IBM XL
compiler technology
Outline
• Development History
• Specifications & Architecture
• Applications
• Software Aspects
• Marketing & NEWS
• References
Marketing & NEWS
• “Cell is basically a vector supercomputer on a
chip”, we present the 2004 Microprocessor
Report Analysts’ Choice Award for Best
Technology to the Cell Processor
• IBM is working with companies to integrate Cell
microprocessor into third-party products
• The companies are working with open-source
compiler developers to create software
development tools for programmers
Marketing & NEWS
• Sony PlayStation™ 3
• Cell Processor running at 3.2Ghz
– 7 special purpose 3.2Ghz processors
– 218 gigaflops of performance
• 256Mb XDR main RAM at 3.2 GHz
• 256Mb of GDDR VRAM at 700Mhz
• Support for seven Bluetooth controllers
• Supports Blu-ray DVD format
• System Floating Point Performance of 2 teraflops
• Communication Ethernet, Wi-Fi IEEE 802.11, Bluetooth
• Output in HDTV resolution up to 1080p as standard
Marketing & NEWS
• Cell Processor Based Workstation (CPBW)
• From Sony Group and IBM
• First Prototype “Powered On”
• 16 TeraFlops in a rack (est.)
•
Optimized for Digital Content Creation
–
–
–
–
Computer entertainment
Movies
Real-time rendering
Physics simulation
• Affordable by Small Businesses (and Individuals)
Marketing & NEWS
• CELL Industries
• Our Objective : Distributing Cell Power
• Facilitate small-scale supercomputer applications for Cell
• Cell-based systems
– affordable for individuals and small to medium-sized businesses
• Our Cell PCI-x plug-in card, xpac-zero
– fastest and most economical way for people to get their hands
on some real computing power
• Uses Cell as a general-purpose numerical accelerator
– The xpac-zero card acts much like a video card
Outline
• Development History
• Specifications & Architecture
• Applications
• Software Aspects
• Marketing & NEWS
• References
References
• IBM, Sony, Toshiba papers in ISSCC’05
– “A Streaming Processing Unit for a CELL Processor”,
B. Flachs et. al.
– “The Design and Implementation of a First-Generation
CELL Processor”, D. Pham et. al.
• “Microprocessor Report”,
Reed Electronics Group, 2005, Jan. 31 & Feb. 14
• “IBM’s Cell Processor : The next generation of
computing?”,
D.K. Every, Shareware Press, Feb. 2005
References
• “Power Efficient Processor Architecture and The
Cell Processor”, H.P. Hofstee, HPCA-11 2005
• “Power Efficient Processor Design and the Cell
Processor”, IBM, 2005
• “Introducing the IBM/Sony/Toshiba Cell
Processor“,
J. H. Stokes, http://arstechnica.com/
• “Cell Architecture Explained”,
N. Blachford, http://www.blachford.info/
Thank you
Download