Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, R. Iris Bahar Kundan Nepal Division of Engineering Brown University Providence, RI 02912 Electrical Engineering Dept. Bucknell University Lewisburg, PA 17837 NATW 2008 Online error detection • Purpose: Detect transient faults that may occur in a circuit during operation • Critical as circuits scale to smaller sizes • “Easy” in memory logic • In circuit logic not so easy Common online detection techniques 1. Stored pre-computed test vectors in hardware 2. Duplicating the computation of disjoint hardware elements and voting on the result 3. Use of check bits Our approach • Find invariant relationships in a circuit • Violations of these expected relationships can identify errors Error detection implementation Invariant relationships in circuits n1 n2 n3 n4 n5 n8 These relationships are logic implications n5=1 n6 n7 n8=0 Error detection with implications n1 n2 n3 n8 n4 n5 ERROR n6 n7 n5=1 n8=0 n5=1 & n8=1 will generate an error in checker logic How we find implications Verilog Description Logic Simulation Find Implications Collect Logic Values At Each Site Validate Implications We have implications. Now what? Remove Redundant Implications Select Useful Implications Pick Best Implications For Given HW Overhead Why should we remove implications? • With all implications we can generate checker logic for each implication. • Inefficient! ▫ A circuit can contain thousands of implications ▫ generating separate checker logic for each implication could more than double circuit size. • We want to detect only the “most important” implications. Removing redundant implications n1 n2 n3 n9 n10 n4 n5 n12 n13 i1: n3=0 n8=0 i2: n4=1 n12=0 i3: n4=1 n8=0 n6 n7 n11 i4: n12=0 n8=0 i5: n4=1 n13=0 n8 Removing low coverage implications • We only want implications that: ▫ Detect many faults ▫ Identify hard-to-detect faults ▫ Cover faults not detected by other implications • Finding these important implications requires: ▫ fault analysis to determine the specific fault coverage for each implication Reducing the number of implications redundant implications low coverage implications c1 35 5 c4 99 c4 32 c1 90 8 is ex 2 m b1 2 z9 sy m cl ip z5 xp 1 rd 73 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% high-quality implications Covering faults with implications • For each random input vector, and at each fault, the implications-based circuit operation can fall into the following 4 categories: Cas e 1 Cas e 2 Cas e 3 Cas e 4 Error Propagates To Output An Implication is Violated Average distribution of the 4 scenarios 70 60 40 30 20 10 is ex 2 c1 90 8 c4 32 c4 99 c1 35 5 m b1 2 0 rd 73 z5 xp 1 cl ip z9 sy m % 50 Case 1: Error Propagated & Implication Violated Case 2: Error NOT Propagated & Implication Violated Case 3: Error NOT Propagated & Implication NOT Violated Case 4: Error Propagated & Implication NOT Violated How often do we detect errors? Case1/[Case1+Case4] Implications with fixed HW budgets • Given a fixed HW budget, by how much can we reduce the probability of an undetected error? 20% 18% 16% 14% 12% 10% 8% 6% 4% 2% 0% 10% b12 30% 50% mis ex2 rd73 Z 5xp1 clip Z 9s ym C 499 C 432 C 1908 Conclusions • Practical online error detection alternative based on implication validation • No modification of targeted logic • Checker logic is added off the critical path and run in parallel rest of circuit. • For several circuits, we can detect almost 90% of all errors that propagate to a primary output. • With only a 10% area overhead, probability of an error being both observable and undetected is reduced to 11% on average