Detecting Errors Using Multi-Cycle Invariance Information Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI 02912 Kundan Nepal Electrical Engineering Dept. Bucknell University Lewisburg, PA 17837 Design, Automation, and Test in Europe, April 20-24, 2009 Motivation Errors in ICs are increasing – Particle strikes, temperature, power, noise, process variations, test escapes, etc. Previously, we have proposed using logic implications for online error detection during a single clock cycle What happens if we consider implications across time cycles? Outline Introduction & Background Logic Implications for Error Detection Multi-Cycle Implications Experimental Results Conclusions Outline Introduction & Background Logic Implications for Error Detection Multi-Cycle Implications Experimental Results Conclusions Other Work Triple Modular Redundancy Logic Duplication Re-Execution in Multiple Threads Codes (Parity, Berger, Bose Lin, etc.) High Level Fault Assertions Fault Masking Checking the Outputs Against a Subset of the Truth Table Our Approach Find natural expected relationships and check for their violation. Water should be blue…. Not brown… In circuits, expected relationships at the gate level consist of logic implications. Outline Introduction & Background Logic Implications for Error Detection Multi-Cycle Implications Experimental Results Conclusions Implications Naturally Occur in Circuits n1 n2 n3 n4 n5 0 0 0 n8 1 n6 n7 n5 = 1 → n8 = 0 Implication Violations Can Be Used to Detect Errors Appropriate checker logic can detect multiple errors with a single implication. n1 n2 n3 n4 n5 n8 n5=1 n8=0 n6 n7 ERROR Implication Violations Can Be Used to Detect Errors Appropriate checker logic can detect multiple errors with a single implication. n1 sa1 n2 n3 sa1 sa1 sa1 n4 n5 n8 n5=1 n8=0 n6 n7 ERROR Total Number of Implications With Distance 2 or Greater 10000 1000 100 10 Circuit 90 8 c1 35 5 c1 99 c4 32 c4 ex 2 is m b1 2 m z9 sy ip cl xp 1 z5 3 1 rd 7 Number of Implications 100000 So….what’s the problem? We have too many implications! How do we efficiently find them and which ones should we use? Implication Algorithm Gate-level implications can be found automatically …without functional knowledge of the circuit. Start Identify Potential Implications w/ Simulation Verify Implications Eliminate Subsumed Implications Determine Coverage of Remaining Implications Select Best Subset for Target Error Detection and Overhead End What determines which faults an implication may cover? Potential Spatial Fault Coverage Each implication can only cover a limited area of the circuit…. Reconvergent Fanout Direct Path P=0 → Q=0 P Divergent Fanout Q=0 → P=0 P=1 → Q=1 P Q P Q P Q P Q Q P Q Faults along the path may be detected Faults along reconverging paths may be detected Faults along paths to common ancestors may be detected Implications cannot cover any sites downstream of both implication points! Limitations of Single-Cycle Implications Implications may not exist to cover faults far downstream—e.g. close to: – Flip-flops – Primary Outputs It is possible for no useful implications to exist in a single cycle Optimal timing of capture is difficult Many of these issues are alleviated if we consider multi-cycle implications Outline Introduction & Background Logic Implications for Error Detection Multi-Cycle Implications Experimental Results Conclusions Multi-Cycle Implications A X B Y F Time Frame Expansion A1 B1 X1 F1 X2 A2 B2 Y1 X2 Y1 Cycle t1 Sequential Circuit Containing No Non-Trivial Implications in Combinational Logic Y2 X1 X0 Y0 F2 Y2 Cycle t2 Logic Value in First Clock Cycle Implies a Value at a Different Site in the Second Clock Cycle B1 = 0 → F2 = 0 Multi-Cycle Checker Hardware violation A1 B1 X1 F1 A2 B2 Y1 X2 A F Y X2 Y1 Cycle t1 X B Y2 X1 X0 Y0 F2 Y2 Cycle t2 B1 = 0 → F2 = 0 Checker hardware requires state to be held between first and second cycle…. Spatial Coverage of Multi-Cycle Implications P Q Cycle t Cycle t + 1 Advantages: Good spatial coverage can be achieved near flip-flops Logical distance may increase between implication sites Delays captured at flip-flops in cycle t can be detected without complex timing Outline Introduction & Background Logic Implications for Error Detection Multi-Cycle Implications Experimental Results Conclusions Experimental Setup ISCAS ’89 benchmark circuits Zchaff SAT solver to validate implications Three sets of implications per circuit – First cycle Both implication sites in cycle 1 Obtained with single cycle analysis & unrestricted inputs – Second cycle Both implication sites in cycle 2 Obtained with time frame expansion – Cross cycle One site per cycle Obtained with time frame expansion So, how many implications exist? Number of Implications in Each Class 25000 20000 1st cycle cross-cycle 2nd cycle only 15000 10000 5000 Circuit 8 s1 48 6 s1 19 53 s9 13 s7 10 s5 44 s4 20 s4 98 0 s2 Number of Implications 30000 What is the distance between implication sites? Average Implication Distance for Single and Between Cycle Implications 14 10 Average single cycle distance Average cross-cycle distance 8 6 4 2 Circuit 48 8 s1 19 6 s1 53 s9 13 s7 10 s5 44 s4 20 s4 98 0 s2 Average Distance 12 How do the different implication classes compare for error detection (if we use all possible implications)? Contribution of Different Implication Classes to Error Detection 100 90 80 Error Coverage 70 1st cycle 1st and 2nd cycle cross cycle all 60 50 40 30 20 10 0 s298 s420 s444 s510 s713 s953 s1196 s1488 Circuit Developing a Compressed Implication Set Start Choose next fault in fault list Find implication with best coverage of this fault Add best implication to compressed list Yes No Any more faults? Return implication list End Number of Compressed Implications 500 450 400 Number 350 300 1st cycle cross-cycle 2nd cycle only 250 200 150 100 50 0 s298 s420 s444 s510 s713 s953 s1196 s1488 Circuit What if we further tradeoff error coverage for reduced area overhead? Average Error Coverage Acheived for Different Area Thresholds 100 90 Average Error Coverage 80 70 10% 20% 30% 40% 50% Compressed All 60 50 40 30 20 10 0 s298 s420 s444 s510 s713 Circuit s953 s1196 s1488 % of Chosen Implications that are Cross-Cycle Percentage of Cross-Cycle Implications Chosen for Different Area Overheads 100.00 90.00 80.00 70.00 60.00 10% 50% 50.00 40.00 30.00 20.00 10.00 0.00 s298 s420 s444 s510 s713 Circuit s953 s1196 s1488 Outline Introduction & Background Logic Implications for Error Detection Multi-Cycle Implications Experimental Results Conclusions Conclusions Implications can be used to effectively detect many errors at runtime – Without requiring functional knowledge of the circuit – Allowing tradeoffs to be made between error coverage and overhead Cross-cycle implications cover faults that cannot be covered by single cycle implications Even though they have larger overhead, cross cycle implications are often an “optimal” choice When optimizing for low area overhead, more than 85% of the implications may be cross cycle For Inquiring Minds Implication Algorithm Gate-level implications can be found automatically …without functional knowledge of the circuit. Start Identify Potential Implications w/ Simulation Run Good Circuit Simulation with Random Vectors and Monitor Site Values… Verify Implications 00 Eliminate Subsumed Implications 01 10 A,B A,C Determine Coverage of Remaining Implications A,D Select Best Subset for Target Error Detection and Overhead End A=0 → C = 0 11 Implication Algorithm Gate-level implications can be found automatically …without functional knowledge of the circuit. Start Identify Potential Implications w/ Simulation Using a SAT solver Verify Implications (such as Zchaff) Eliminate Subsumed Implications Determine Coverage of Remaining Implications Select Best Subset for Target Error Detection and Overhead End Implication Algorithm Gate-level implications can be found automatically …without functional knowledge of the circuit. Start Identify Potential Implications w/ Simulation Verify Implications n1 n9 n2 n3 n11 n8 n13 Eliminate Subsumed Implications Determine Coverage of Remaining Implications Select Best Subset for Target Error Detection and Overhead n4 n5 n10 n12 n10 = 0 → n13 = 0 n6 n7 n4 = 1 → n8 = 0 End Implication Algorithm Gate-level implications can be found automatically …without functional knowledge of the circuit. Start Identify Potential Implications w/ Simulation Verify Implications Of all the patterns that will allow a fault to produce an error at an output, how many will each implication detect? Eliminate Subsumed Implications Determine Coverage of Remaining Implications Select Best Subset for Target Error Detection and Overhead End Implication Algorithm Gate-level implications can be found automatically …without functional knowledge of the circuit. Start Identify Potential Implications w/ Simulation Verify Implications Eliminate Subsumed Implications Determine Coverage of Remaining Implications Select Best Subset for Target Error Detection and Overhead End