Advanced Digital Design Metastability A. Steininger Vienna University of Technology Outline What is metastability Effects and threats The unavoidability MTBU estimation Synchronizers & Countermeasures Trends Measurement of Model Parameters Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 2 Metastability: An Example stable left position stable right position Ball may remain on top („metastable“) for unbounded time A small disturbance causes the ball to fall in either direction Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 3 What is Metastability ? continuous-valued input space (initial position of the ball) mapped to binary output space (left or right position) mapping may be undecided for unbounded time Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 4 Mestastability in Logic ? „In the synchronous digital world we do not have a continuous space“ (after all, that‘s the key benefit!) „Inputs and outputs of gates are all digital“ So why bother about metastability? Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 5 The real world signal levels representing the digital state are continuous pulse lengths are continuous in time relative signal arrival times are continuous transistors and the circuits built from them operate in continuous time with continuous voltage amplitudes Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 6 Specifying Problems Away is the input high or low? is the pulse long enough to be recognized by a gate? spec: forbidden range spec: min pulsewidth did A occur before or after B? spec: setup/hold time Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 7 Limits of the Abstraction in a closed world these issues can be „specified away“, but what happens at interfaces what happens with faults The synchronous digital abstraction cannot comprise these issues when facing metastability, CMOS circuits are operated out of spec, hence have undefined behavior Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 8 Level: Inverter Example uout Invertercharacteristics uin Lecture "Advanced Digital Design" analog transfer characteristics „forbidden“ input level may lead to „forbidden“ output level propagation of „forbidden“ level © A. Steininger / TU Vienna 9 Pulsewidth: RC Example short digital input pulse creates analog output in forbidden range parasitic RCs are omnipresent in ASICs Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 10 A before B: AND Example contradicting digital transitions on inputs depending on timing a glitch is produced RC will convert it into ambiguous voltage Lecture "Advanced Digital Design" a b a AND b © A. Steininger / TU Vienna 11 Setup/Hold Time of Latch feedback path must be stable when swiching from „transparent“ to „hold“. Otherwise we feed the storage loop with a marginal condition (pulse width, level), thus creating undefined behavior Lecture "Advanced Digital Design" D 1 1 Q 1 CLK D Q © A. Steininger / TU Vienna 12 Metastability in the Latch stable left position stable right position normal operation: strong momentum will roll ball to other side metastability: marginal momentum will roll ball just to the top Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 13 Response Time of a FF Observation: An input transition during the decision window leads to an (unbounded) increase of clock-to-output delay off-spec tclk2out tclk2out,nom tsetup 0 Lecture "Advanced Digital Design" thold © A. Steininger / TU Vienna CLK D tclk2data 14 Observation combinational elements transform off-spec inputs into offspec outputs immediatey sequential (stateful) elements are expected to decide for one state; off-spec inputs will delay this decision only they can become metastable Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 15 Faces of Metastability (properly shaped) late transition creeping through forbidden voltage range may cause timing problems problem specific for synchronous design generates long undefined level oscillation generates erroneous transitions Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 16 Metastability: Creeping ue,2 = ua,1 Inv 1 stable (HI) 5 1 4 Inv 2 metastable 3 stable (LO) 1 2 1 1 A Lecture "Advanced Digital Design" 2 © A. Steininger / TU Vienna 3 4 5 ue,1 = ua,2 17 Metastability: Oscillation PW<D1+D2 D1 1 D2 1 Lecture "Advanced Digital Design" A pulse with length shorter than the roundtrip delay through the inverter loop can circulate Thus it appears periodically at the output „oscillation“ © A. Steininger / TU Vienna 18 Ways of Triggering MS Time domain glitch in feedback loop S/H violation, or glitch on D D 1 1 Value domain Marginal input voltage stored even without S/H violation Q D 1 Clk D CLK D Q L Clk FB L FB Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 19 Why voilate Setup/Hold? in a closed synchronous system no violations will occur BUT: no system is really closed non-synchronous interfaces clock domain boundaries fault effects (single-event upsets) off-spec operation (temp, VCC, frequency) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 20 Asynchronous Inputs clock period Tclk dec. win. T0 setup/hold asynchronous event probability of setup/hold violation Lecture "Advanced Digital Design" © A. Steininger / TU Vienna T0 Pviolate 0 Tclk 21 Multiple Clock Domains CLK 1 (Ref) CLK 2 A arbitrary „phase“ relation setup/hold violation inevitable (fundamentally!) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 22 Metastability: Threats propagation undefined logic level/timing at input may produce undefined output „Byzantine“ Interpretation Thresholds/timing of different inputs are different (type variations) marginal input level/timing may be interpreted differently Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 23 Metastability Propagation uout Metastab. Invertercharacteristics data uin clk D CLK X D X CLK Combinational gates as well as the inverters inside the FF map metastable inputs to metastable outputs A Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 24 Inconsistent Perception Metastab. D X D A 0 CMOS 3V threshold A CLK 2.4V 2.0V CLK 3.3V D B 1 0.8V treshold B CLK X 0.4V 0.0V The metastable state may be regarded as „1“ by one FF and as „0“ by another A Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 25 Metastability Proofs Formal proofs exist that metastability can in principle not be avoided („Buridan‘s Principle“) no upper bound on the duration of metastable state can be given but after infinite time the state will be resolved with probability 1 Fundamental issue Mapping from a continuous space to a discrete space involves a decision that may take unbounded time (namely in borderline cases) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 26 Approaching the Border The mapping from continuous to binary space needs a borderline In the proximity of the borderline the force pulling towards one of the binary states becomes smaller (compare momentum of the ball) In the continuous input space one can go arbitrarily close to the borderline, thus moving this force towards zero Often the stable binary states represent energy-minima, while the metastable state represents a (local) maximum (Remember: energy must change continuously) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 27 Metastability Avoidance? Can‘t we avoid metastability in practice, if we avoid borderline cases? (only those are problematic!) => synchronous design, noise margins… allow arbitrary time for resolving? change input threshold of successor stage ? use a different storage element ? Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 28 Why use the D-Flipflop? Metastability is not restriced to D-FFs, it is encountered with SR-latch, JK-Flipflop, Muller C-Gate,… Basically all biststable elements can become metastable: state is always associated with energy state change always involves energy transfer law of physics dictate max continuous transfer min min but: binary state Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 29 Mitigating Metastability Metastability cannot be eliminated in general all such circuits have been shown to fail… in practice systems still work because metastability is very improbable it can be made more or less probable by design techniques it can be transformed between its different modes marginal voltage level late transition oscillation Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 30 Conversions Low-Pass Discriminator creeping => glitch Schmitt Trigger creeping + noise => oscillation High / Low threshold input oscillation => creeping creeping => late transition Flip-Flop late transition => creeping or oscillation Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 31 Masking Metastability assume m-of-n voting … … m-1 n • If the metastable input just makes the difference, MS can propagate • in all other cases MS will be masked Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 32 Detecting Metastability … often possible by comparing Q and Q creeping late transition both, Q and Q deliver VDD/2; this is often perceived as the „same“ logic level with proper separation of Schmitt-Trigger / High threshold inverter and output inverter => no visible effect oscillation literature reports about „in phase“ oscillation of Q and Q Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 33 Quantifying the Risk of MS „Upset“ metastable output is captured by subsequent FF after tr Mean Time Between Upset (MTBU) expected value (statistics!) for interval between two subsequent upsets t res 1 MTBU exp dat T0 f clk c Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 34 Resolution Time t res T clk t clk 2 out t comb t SU clk asyn syn tclk2out normal operation: tres>0 tcomb tSU upset: tres asyn clk D CLK Lecture "Advanced Digital Design" syn comb. logic tres<0 D CLK © A. Steininger / TU Vienna 35 Parameters Resolution time tres Flip-Flop parameters c ,T0 interval available for output to settle after active clock edge experimentally determined time constant c dep. on transit frequ. T0 from effective width of decision window Clock period of FF Tclk = 1/fclk Average rate of change dat Avg. rate of transitions at FF data input Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 36 Modeling Metastability How can we derive this equation? Which model to apply? Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 37 Simple Metastability Model u1 u2 uout Invertercharacteristics uout = -A*uin uin Lecture "Advanced Digital Design" © A. Steininger / TU Vienna model bistable element by inverter pair use linear model for inverter, around midpoint of transfer function („balance point“) consider „homogenuous“ case, i.e. closed loop w/o inputs 38 Introducing Dynamics -A RC = u1 u2 RC = -A Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 1st order approximation of dynamic behavior: RC element assume symmetry (same A, RC for both inverters) for simplicity WLOG assume symmetric supply (+VCC/-VCC) against GND 39 Differential Equations Basics: forward path: u R R iR duC dt du 2 u 2 A u1 R C dt backward path: du1 u1 A u 2 R C dt Laplace: du (t ) 0 L s U ( s) u dt iC C U 2 A U1 s U 2 u20 U1 A U 2 s U1 u10 time-domain solution: 0 0 u20 u10 A 1 u u A 1 2 1 u 2 (t ) exp t exp t 2 2 Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 40 The Solution u 20 u10 A 1 u 2 (t ) exp t 2 u20-u10 … difference of initial voltages (charges on Cs); zero at balance point … RC constant, A … inverter gain at balance point A/ … gain bandwidth product of inverter bandwidth = 1/RC starting from the initial difference u2 rises exponentially with time towards the positive or negative supply voltage Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 41 Plot of u2 over Time 500 -25 -20 -15 250 -10 -5 0 0 0 1 2 3 5 10 -250 15 20 -500 25 For a given t we can project „forbidden“ input range back to a „forbidden“ range of the initial voltage difference Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 42 Forbidden Initial Range u20 u10 A 1 u2 (t ) exp t 2 u0 A 1 u0,border (t r ) U out ,border exp t res The forbidden output voltage range relates to a forbidden range of initial difference voltage (i.e. just after sampling). This range becomes exponentially smaller for high resolution time tres and high gainbandwidth product A/. Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 43 Aperture Window TAW How long does it take for the input voltage difference to cross the forbidden range? udiff(t), slope S 2u0,border +u0,border S TAW TAW u0,border Depends on slopes of both, input voltage AND feedback voltage Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 44 Calibrating TAW TAW depends on u0,border , which in turn depends on tres TAW 2u0,border S 2U 0,border S A 1 exp t res for immediate use of the output: 2U 0,border technology TAW (t res 0) TW 0 parameter S thus TAW Lecture "Advanced Digital Design" A 1 TW 0 exp t res © A. Steininger / TU Vienna 45 Hitting the Aperture with exponentially distributed inter-arrival time of input events (rate dat) and sampling with period Tclk (i.e. window TAW is repeated) the upset rate can be calculated as upset TAW dat Tclk Hence the MTBU becomes MTBU Lecture "Advanced Digital Design" 1 upset 1 dat Tclk TAW © A. Steininger / TU Vienna 46 Putting it all together MTBU 1 upset 1 dat Tclk TAW TAW MTBU 1 dat Tclk A 1 exp t res TW 0 T0 Lecture "Advanced Digital Design" A 1 TW 0 exp t res © A. Steininger / TU Vienna 1/C 47 The widely used equation expected time between upsets (statistical!) available resolution time tr 1 MTBU exp dat f clk T0 c rate of input events sampling frequency Lecture "Advanced Digital Design" © A. Steininger / TU Vienna technology parameters 48 Late Transition calculate output delay over data to clk distance u out ( t ) t ( u diff t exp 2 C u diff 2u out ) C ln u diff u out U th detector threshold u diff S D T in 2U th t dly ( D T in ) C ln S D T in input slope S TW 0 C ln DT in output delay depends on input phase with ln(1/x) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 49 Graphical View Dly 25 20 15 Dly 10 5 0 -25 -20 Lecture "Advanced Digital Design" -15 -10 -5 0 5 © A. Steininger / TU Vienna 10 15 20 25 50 Provoking Metastability asynchronous inputs multiple clock domains clock divider (uncontrolled delay) low timing margins slow technology (gain/BW prod) supply drop (excessive delay) Operation under high temperature Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 51 Determination of T0, C experimental: vary tres observe MTBU log graph => straight slope -> C offset -> T0 1 1 C dat = 2MHz fclk = 10MHz 1 dat*fclk*T0 tres(ns) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna typical values 52 Metastability – Trends Claim: „Metastability is a non-issue in modern technologies“ log MTBU[s] 2002 1996 (XC2VP4) (XC4005) BUT: clock rates have increased by a factor of 16 during that period – 12 6 tres and timing margins have shrunk in the same way! 5 Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 53 Mitigating Metastability avoid/minimize non synchronous IFs leave sufficient timing margins use fast technology (gain/BW prod) ensure proper operating conditions (stable power supply, cooling,…) basic principle of synchronizers: trade performance for increased timing margins (tres) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 54 Synchronizer Example: Cascade of n Input-FFs asyn syn D D clk CLK CLK MTBU calculation: same equation as before, but now individual resolution times sum up: t res ,i t res Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 55 MTBF of n-Stage Synchr. Recall the projection of allowed output range to an input range considering the exponential increase during the resolution time: t res ˆ uˆ0 (t res ) U out exp c u0 for FFk is provided by the output of a preceding stage FFk-1 => we make the same projection again: t res ˆ ˆ u0,k 1 (t res ,k 1 ) u0,k exp c t res ˆ U out ,k exp c Lecture "Advanced Digital Design" t res exp k c © A. Steininger / TU Vienna k 1 k 1 56 Synchronizer-Rules never synchronize more than one signal (rail) danger of data inconsistecy degradation of MTBU by number of signals for a wider bus, use one signal for handshaking never introduce a fork before the end of synchronizer estimate the MTBU of your solution too low MTBU leads to failures too many stages introduce unnecessary delay there is definitely no magic solution to eliminate the potential for metastability, but it can be made arbitrarily improbable Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 57 Synchronizer – Trends need for more synchronizers need for more synchronizer stages more function units being integrated on a chip more standardized frequencies higher communication demands increasing PVT variations => larger safety margins synchronizer paramters become worse: C used to scale proportional to (FO4) propagation delay for decades, below 45nm technologies the scaling is worse synchronizers tend to create a considerable performance loss in the future Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 58 Even/Odd Synchronizer works for two periodic clocks only avoids performance penalty of synchronizers largely eliminated potential for metastability for details see [Dally & Tell, The Even/Odd Synchronizer, ASYNC 2010] Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 59 Mutex For deciding the „A before B“ problem a special circuit exists, namely the Mutex (mutual exclusion element) Unlike the Synchronizer it assumes there is unbounded time to resolve It will be treated in a later Section. Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 60 Assumptions made so far linear inverter slope (1st order model) load independent gain dominating RC const. (1st order model) full symmetry (RCs, inverter properties, rising/falling slopes,…) decreasing exp term neglected homogenuous case (MUX switching and input signal shape neglected) equally distributed voltage levels exponentially distributed input events Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 61 What about Oscillation? Can our model be used for oscillatory behavior? How / Why not? Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 62 A More general MS Model ideal amplifier gain -A pure delay delay D slope limiter time constant RC slope S GBWP = A/RC determines dynamics (decay of metastable state) oscillation for D > RC/A creeping otherwise Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 63 Characterizing Metastability know (=assume) exponential MTBU relation measure MTBU over tres draw semilog plot => straight line find params: C 1 slope C offset T0 need very good setup for measurements ! (assumptions made…) Lecture "Advanced Digital Design" 1 © A. Steininger / TU Vienna dat = 2MHz fclk = 10MHz 1 dat. fclk. T0 tres(ns) 64 Measuring Metastability MS producer DUT D MS detector counter Q clk [Altera] Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 65 MS Producer Single clock source, controllable relative delay between clock and data path variable delay element, optional: feedback control create as many MS events as possible in short time well-controlled and reproducible phase steer into deep metastability problems: noise, cannot derive MTBU Two independent clock sources: uniform distribution of phase relations problems: MS rare, phase distribution truly uniform? Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 66 MS Detector Aims: Problem: detect metastable output of DUT How define MS ? late transition detection intermediate voltage detection output proximity detection Implementation options (late trans det): sample DUT output with FF1 after tres compare with reference FF2 having „infinite“ tres mismatch indicates metastability many sources of error! Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 67 Late Transition Detection osc 1 DUT DET D D Q Q ≠ var D CNT osc 2 D Q D∞ REF • max of var D determines maximum detectable tCO • infinite delay not feasible => false positive for large tCO 68 Detecting Metastability (1) Fundamental problems MS behavior is highly sensitive esp. to loading cannot measure w/o influencing can only make indirect observation What is an „upset“ at all? no sharp definition MS interpretation becomes ambiguous often „by chance“ (threshold of next stage) or „deliberate“ (scope) 69 Detecting Metastability (2) Practical problems FFs in „relevant“ circuits are not accessible, detection circuits usually involve forks in DUT and measurement circuit which manifestation of MS to observe? different path delays, different thresholds usually ignored: symmetry assumed how do PVT variations impact the results? cannot propagate subtle effects over pins cannot reliably capture them on-chip either intermediate voltage, output proximity, late trans. where get the reference from? infinite time… 70 Relating the Results We plot log(MTBU) or tCO over tDtoC How determine tDtoC? measure with oscilloscope/counter know from timing control: dly 2 – dly 1 This relates to the external view (pins)! The actual FF cell will perceive a different timing due to non-matching path delays for C/D At best this may shift the MS point, but what about variable path delays (VT) ? 71 Time Accuracy Clock Delay how accurate/stable is it? where is it used? how accurate is it in which granularity can I vary it? Output delay measurement how accurate is my scope? Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 72 Uncertainty Characterization …is a „must“ in many types of measurement. Result is given as value ± u% For probabilistic results: confidence interval These types of characterization allow Estimation of the credibility of value Determination of worst case for value Calculation of compound uncertainty Why not care for this in metastability measurement / MTBU prediction? 73 Why we SHOULD care There is no other evidence for the (even approximate) correctness of MTBU prediction: Wait for 1000 years? Highly super-linear dependence of predicted MTBU on measured parameters => may amplify errors! Given the ample PVT variations – how to translate a specific measurement result into a generally valid prediction? 74 What about simulation simulation can provide access to all nodes of interest in a non-intrusive way metastability is, however, a very subtle effect, depending on many details a very detailed model for transistors (parasitics) and circuit (layout!) is needed analog simulation is needed, so the simulation time may become considerable finding the right phase CLK to data is difficult the simulator tends to run into numeric problems noise is not necessarily considered so are the results finally representative? Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 75 Summary (1) Metastability is unavoidable when mapping from a continuous space to a binary one. It can result in late transition, creeping or oscillation. It can be specified away, but only in a closed system. Metastable inputs make gates operate out of spec, hence their behavior is undefined. Metastability can propagate, even over masking provisions (TMR, etc.) Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 76 Summary (2) In practice, the risk of facing a metastable upset can be made arbitrarily small. On a statistical base, the upset probability of a flip-flop can be predicted. The corresponding equation can be derived by investigating the homogenouns solution of a dynamic model built from first-order models of the inverters. The generally used equation is based on many simplifying assumptions. Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 77 Summary (3) The required model parameters are often hard to find. Their determination by measurements involves a lot of uncertainties. Synchronizers trade performance for a reduced probability of a metastable upset. Metastability is also an issue for modern technologies. It can be best mitigated by conservative design and large timing margins. Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 78