STARFIRE: Extending the SMP Envelope Alan Charlesworth Presented By Bob Koutsoyannis The Nature of Starfire A complex Snoopy-Bus-Based UniformMemory-Access System. 1997 UltraSparc-II (250MHz) More to come… Outline of Key Points CCSMP Design Choices Starfire Design Choices Importance of ASICs Elaborate Hardware Design Starfire’s Extra Features Dynamic System Domains Evaluation/New Benchmarks The Three Generations of Snoopy-Bus-Based UniformMemory-Access Interconnects Derived Design Choices Bus-Driving Logic Switching Protocol Bus Management Cache Size Rearranging Cache Protocols left out. Ultra Port Architecture Write back MOESI coherency on 64-byte-wide cache blocks. 18-byte-wide data lines (2ECC bytes) Centralized Coherency Controller Small data Crossbar Broad Range of expandability Lowest possible memory Latency 4XBandwidth, Dynamic Repartition, and more Starfire Design Choices Increased Address and Data Bandwidth. 4-way interleaved address buses. An 83.3-MHz system clock with snooping every other cycle and a 64-byte cache line width gives a snooping limit of 4buses X 0.5snoops/s X 83.3MHz X 64bytes = 10,667MB/s 16x16 (18-byte-wide) Data Crossbar to support the snooping limit Point-to-point routing with ASICs on a Centerplane Add Dynamic Systems Domain Feature Improve Reliability, Availability*, and Serviceability Designed for external control from a System Service Processor via Ethernet with ASIC data available Application Specific Integrated Circuits Data Interconnect Two Rows of Eight Boards Closer Board View Elaborate Board Design Centerplane 27”X18”X141mils 34 ASICs 28 Layers 14,000 nets, nearly 100% density with 95% done by hand. 43,000 drill holes System Boards 16”X20” Memory, I/O, 4 Processors, 5 power converters, 18 Asics 24 Layers The Starfire, Ultra 10000 1. 2. 3. 4. 5. 6. 7. 8. 9. Processor Cabinet Flat Side Panel Circuit Breaker (×11 on each side) Curved Side Panel ("Styling Panel") Fan Tray Centerplane Left Front/Rear Door Right Front/Rear Door Fan Tray AC/DC Power Shelf Dynamic System Domains Unique Feature that allows the server to truly partition into separate domains Provide Isolated Development/Production/Test environments Easy to administer – System Service Processor Easy to repair – hot swap components – Attach/Detach Rapid reassignment of computing resources I/O Flexibility Cost/Convenience Domain Protection Centerplane Filtering – SSP has control over the Global Arbiter ASICs Board-Level Filtering Domain Mask – 16 bit Group Memory Mask – 16 bit Group Memory Base and Limit Registers Extra Features Fault Tolerance: ASICs Generate and Check ECCs for Address Packets Redundant Components Crash Recovery TPC-D 300-Gbyte Results Cost Breakdown and Record Breaking The Starfire records: Online Transaction Processing (SAP R/3 and BAAN) Cluster of 4 sustained over 100 Gflops on Linpack Parallel equation-solving 2 Starfires lead the SPECrate_int95 integer-application throughput benchmark.