DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Or, how I learned to stop worrying and love complexity. Todd Austin Advanced Computer Architecture Lab University of Michigan http://www.eecs.umich.edu/~taustin Microprocessor Verification • Task of determining if a design is correct – ∀ starting states (statei, inputsj ), next state (statei+1 ) is correct – Implemented with functional and electrical verification • Huge burden on design teams – – – – Immense test space Done with respect to ill-defined reference, what is correct? Expensive and time-consuming process, typically 1-2 years after tape-out High-risk, only one chance to “get it right” or else… • New reliability challenges in deep submicron silicon – Increased complexity – Degraded signal quality – Increased exposure to single event radiation (SER) upsets Advanced Computer Architecture Lab University of Michigan DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin Motivating Observations • Speculative execution is fault tolerant – Design errors and electrical faults indistinguishable from bad predictions – Predictor faults only manifest as performance divots – Correct checking mechanism will fix any incorrect speculative operation branch predictor array PC stuck-at fault X always not taken • What if all computation, communication, and control were speculative? – Any fault outside the checking mechanism would be detected and corrected Advanced Computer Architecture Lab University of Michigan DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin Dynamic Verification: Seatbelts for Your CPU Traditional Core DIVA Checker EX/ MEM IF ID REN REG SCHEDULER speculative instructions in-order with inputs and outputs CHK CT • Dynamic implementation verification architecture (DIVA) – Instructions verified by checker before retirements – Checker detects and corrects any faulty result, restarts core – Existing speculation infrastructure protects architected state • Lifts the burden of correctness from core processor – All core computation, communication, control is speculative – Tolerates design errors, electrical faults, silicon defects, and failures – Core has burden of high accuracy Advanced Computer Architecture Lab University of Michigan DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin Architectural-Level Instruction Checking • Checker enforces serial semantics – Correct control • PCi == NPCi-1 – Correct computation • 5 + 7 == 12 – Correct communication (inputs) • ARF[r1] == 5, ARF[r2] == 7 – Forward progress • Use timeout mechanism Advanced Computer Architecture Lab University of Michigan r3 ← r1 + r2 <PC, 12, 5, 7> CHK CT Architected Registers (ARF) Architected Memory (AMEM) Precise State DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin DIVA Checker Architecture CHKcomm pipeline in-order speculative instructions <inst, PC, result, src1, src2> from core with inputs and outputs WT RD CHK OK? CT OK? EX’ CMP CHKcomp pipeline • • • • • CHKcomm pipe verifies reg/mem inputs (with in-order accesses) CHKcomp pipe verifies results (with simple, robust algorithm) Watchdog timer detects deadlocks, livelocks, and lockups Availability of correct results ensures forward progress Key design issue: checker must be simple, reliable and fast Advanced Computer Architecture Lab University of Michigan DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin Verifying the Checker • Simplicity should ensure high-quality functional verification – In-order blocking pipelines (trivial scheduler, no rename/reorder) – Design lends itself to formal verification (in-order, precise state) • Latency insensitive design should ensure robust implementation – Deeply pipeline the design for large timing margins, high noise immunity • Effort to better quantify verification costs is underway – Including other costs such as area and power Advanced Computer Architecture Lab University of Michigan DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin Minimal Impact on Core Performance • Checker throughput bounds IPC No control hazards (simply check PC) Computation embarrassingly parallel Few cache stalls (core warms cache) Plus, retirement B/W typically low • Core performance fairly insensitive to checker design parameters 1.035 1.030 1.025 Relative CPI – – – – 1.020 1.015 1.010 1.005 Po rt No Me m DIV A 1.000 Advanced Computer Architecture Lab University of Michigan DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin Future Work: Beta-Release Processors – Beta-release when checker works – Launch when performance stable – Step for good karma 100 80 15 60 10 40 5 Performance Design Errors 20 0 0 t Launch DIVA Processor Verification Design Errors 20 100 80 15 60 10 40 5 Performance • Traditional verification stalls launch until debug complete • DIVA processor verification could overlap with launch Traditional Verification 20 20 0 0 t Beta Advanced Computer Architecture Lab University of Michigan Launch Stepping DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin Future Work: Scalable SER Protection RD CHK CT EX/ MEM IF ID REN REG WT EX’ CMP Architected Registers (ARF) SCHEDULER Architected Memory (AMEM) Precise State RD CHK CT • Only need to address SER in checker WT EX’ CMP – Sparse strikes manifest as functional errors – Rad-hard checker detects and corrects faults – Core designed without regard to particle strikes (e.g., no ECC…) • Rad-hard checker designs – Small checker will provide natural resistance to SER (small target!) – Or, replicate the checker logic, restart pipes on disagreement Advanced Computer Architecture Lab University of Michigan DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin Future Work: Self-Tuned Systems Temp max Voltage Frequency max worse-case margin insts to verify max min min min Slow corner Actual operating conditions DIVA Core DIVA Checker clk’ Vdd’ temperature clk Vdd Clock/Voltage Generator • Traditional logic implementations way too conservative for DIVA – Unnecessary design margins consume power and performance – System may not be operating at slow corner • DIVA checker enables a self-tuned clock/voltage strategy – Push clock, drop voltage until desired power-performance characteristics – If system fails, reliable checker will correct error, notify control system – Reclaims design margins plus any temperature and voltage margins Advanced Computer Architecture Lab University of Michigan DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin Conclusions • Design quality and reliability are uncompromising tasks – Verification costs and risks are very high and growing • Speculative execution can reduce the burden of verification – – – – Dynamic verification makes core processor fully speculative Core tolerates design errors, electrical faults, silicon defects, and failures Architectural-level checking keeps checker design simple Core processor eliminates hazards that could slow checker pipeline • Pushing speculation to the limit may yield more benefits – Beta-release processors could overlap verification with launch – Rad-hard checker provides single event radiation (SER) protection – Fault-tolerant core can leverage aggressive circuits (self-tuned systems) Advanced Computer Architecture Lab University of Michigan DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design Todd Austin