1 VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Sarangi et al Prateeksha Satyamoorthy CS 8501 2 Parameter Variation • Deviation of process, voltage and temperature values from specifications • Technology scaling beyond 90nm => higher levels of device parameter variations => design problem from deterministic to probabilistic • Key process parameters: Vth and Leff ▫ Determine transistor and gate speeds • Vth variation impacts: ▫ Frequency , leakage power • Variation => some sections of chip are slower than others => corresponding circuits suffer timing errors • Lose benefits from scaling to a technology generation 3 Cross-section of a MOSFET • The clock cycle of a chip is determined by the delay of its longest path, usually referred to as the critical path 4 Impact of Vth and Leff Source: www.ocw.mit.edu 5 Impact of process variation on processor frequency 6 Varius • To study parameter variation affects timing errors in high-performance processors ▫ A novel model for process variation Within-die parameter variation (WID) ▫ A novel model for timing errors 7 Process Variation Model • Systematic variation ▫ Exhibits spatial correlation ▫ Assumptions: position independence, isotropy ▫ Spherical model - initially linear - then tapers off to zero [range] - no correlation at this distance 8 They finally assume phi = 0.5 9 • Random variation ▫ Level of individual transistors ▫ Assumption: Vth and Leff normally distributed with zero mean, uncorrelated • Final σ and ▫ Total WID variation is normally distributed, so =½ Assumptions 10 VATS - Model for variation-induced timing errors in processor pipelines Pdf – probability density function All paths that have become longer than 1 generate errors PE – probability of error Cdf – cumulative density function 11 Timing errors in logic Dvarlogic distribution Cdfvarlogic Error rate 12 Timing errors in SRAM Memory More errors as paths fail First path fails Distribution 13 Validation 180 nm process 14 Validation 1. Generate Vth and Leff variation map 2. Apply timing error model to get error rate vs. frequency for each pipeline stage 15 How Varius is used • Variation-Aware Dynamic Voltage/Frequency Scaling - Herbert et al : Vth and Leff are generated and the values are used to determine the maximum frequency and subthreshold leakage of each core across Vdd and temperature. Variability-aware schemes maintain significant improvement of power/throughput over the variability-unaware ones, upto 9.9% • Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors Feng et al : Showed that for CMPs without damage profiles, temperature sensors and performance counters are inadequate in environment with significant process variations, so they propose low-level damage sensors • EVAL: Utilizing Processors with Variation-Induced Timing Errors – Sarangi et al : design for closer to nominal values, and provide some transistor budget to tolerate unavoidable variation induced errors • Facelift: Hiding and Slowing Down Aging in Multicores - Tiwari et al : determine how variation impacts the delay of each gate of each critical path. The slowest of the critical paths in a processor determines the processor frequency. 16 Related Work, Contribution • Delay of an inverter from Vth and Leff • Mukhopadhyay et al. proposed models for timing errors in SRAM memory due to random Vth variation. The VATS model, is extension of their model of access time errors by ▫ ▫ ▫ ▫ including systematic variation effects, considering variation in Leff, modeling the maximum access time of a line of SRAM rather than a single cell using the alpha-power model that uses an [alpha] equal to 1.3 • Memik et al. modeled errors in SRAM memory due to cross-talk noise as they overclock circuits. They use high degrees of overclocking — twice the nominal frequency and more. In the less than 25% overclocking regime that we consider, such cross-talk errors are negligible. For very small feature-size technologies, however, the situation may change. • Ernst et al. and Karl et al. measured the error rate of a multiplier and an SRAM circuit, respectively, by reducing the voltage beyond safe limits to save power. They plot curves for error rate versus voltage. In this paper, we outlined a procedure to extract the distribution of path delays from these curves, and validated parts of our model by comparing it against their curves.