U-M-I BEST COPY AVAILABLE TORSIM: An Efficient Fault Simulator for Synchronous Sequential Circuits S. Gai P.L. Montessoro M. Soma Reorda Politccnico di Torino Dipartimento di Automatica e Informatica Torino - Italy Abstract fanout elements, is accomplished by calling an ad hoc generated procedure which is different for each element; in the procedure all the topological information are coded. Sophisticated programming techniques coming from compilerwriters experience can be adopted for choosing the more efficient form for the procedure. l h r p p t r dczrribcs a new approach to the fault simulation of ..iyiirltronous srquentinl circuits. Its novelty comes from combini n g / / i f tv lit-drii,cn compiled-code simulation technique proposed iii A []' with / / i f single fault propagation fault-parallel fault siinulal i n r i i r I p r i / / i i n imtd in [ZO]. Our approach is particularly suited TORSIM's efficiency does not come from new methods t o reduce the numher of events t o he processed, but from drastically improving the performance of events processing, i.e., of the evaluation and propagation phases. The experimental results reported in the paper have been gathered through a prototype, which has been run on the standard set of benchmark circuits. TORSIM represents a significant advancement in the area of fault simulation of large numbers of input patterns on synchronous sequential circuits; its speed is about one order of magnitude greater than the one of the fastest existing fault simulators like PROOFS; a completely new approach is introduced, which extends the event-driven compiled-code technique t o fault simulation, while still exploiting many of the ideas already proposed. Section 2 describes TORSIM and Section 3 presents the experimental results proving its effecti,eness. Section 4 draws some conclusions and spends some words about our current work. fho.,f opplications requiring the fault simulation of very high of i n p u t patterns, like signature computation or fault rliclionnry construction. A fault sirnulator named TORSIM has h f t t i w i f t r i t to w r i f y the eflectiueness of the approach. Tfir re. . i r / / s u*f prcsent show an average speed-up in terms of CPI' time o / inore thnn onf order of magnitude with respect to the ones I'f p r l f d i l l [f U]. fiir iri1ri16fi~s Introduction 1 The inrrrasetl importance of testing during the design and main~ r n a n c ephases of the circuit life forces the design community t o <~ii(Iy new techniques to make testing easier and faster. Fault siiiiiilators are an essential element of any current CAD system; tlirir u ~ eincludes computing the fault coverage produced by a giveii test set. building the fault dictionary for diagnostic purposes, and determining the aliasing probability in BIST archil cv3ures. '1-lir last two operations are computationally heavier than the fii.<t one Iwcause they require the fault simulation of large num1 ~ 1 . of 5 tczt pattrriis. We thus focused our attention on the (lmdol)nic-nt of a fast tool for the fault simulation of large numIwrs of pattrrns for synchronous sequential circuits described at the gate level: the permanent single stuck-at and the zero-delay Iiio&4s havr lwrn assumed. 0111. fault simulator. which we called TORSIM ( m n o Fault SI \lulator). slioirs high-performance characteristics thanks to I I i < > comhiiirtl use of several different techniques: 0 2 TORSIM TORSIM is a fault simulator for synchronous sequential circuits described at the gate level, adopting the zeredelay model and the permanent single stuck-at fault model. It is based on the single fault propagation algorithm, and uses the parallel-fault technique, like PROOFS. The main novelty of TORSIM is the extension of the event-driven compiled-code technique [SI to the zero-delay fault simulation of synchronous sequential ciicuits (event-driven compiled-code fault simulation). Ad hoc generated code is executed to perform the evaluation of each gate and the scheduling of the fanout elements. thr .qi)tq/f .fittilt propngation algorithm: this technique was . . first proposed i n (111 for combinational circuits, and then improved 1-, inclutling a deeper analysis of the topological ~ 2.1 of' the circuit [I] [6] [9]. It is normally limited to zero-tlrlay comlinational or synchronous sequential circuits, I ~ i tit i h more efficient than the classical Concurrent Algorit him [I?] in terms of ('PU time and memory, as demonstratrtl l)y [lo]. 5trucliire p~iiw//i/fiiir// tlir I he f i'i iit-driwn coittpilrd-rodr technique [SI. Only the ele- inent\ wliicli are li!irl>-to change their value are evaluated n~ eacli t i n i r step. ay i n the classical event-driven apprmch: howrvcr. thrir evaluation. together with the scheduliiig of 1066-1409/94 $3.00 0 1994 IEEE Compiled-Code Fault In order to describe this technique, let us refer to the single fault propagation algorithm outlined i n Fig. 1. The procedure desrrihed in the pseudo-code is repeated for each input pattern and for each fault i. The queue used t o record the gates needing to he evaluated is conceptually a priority queue [5]: in practice it is implemented as a .set of stacks. one for each level in t h e circuit. The insertion of a gate in the queue is a ptrah in the stack of the corresponding level. while the extraction from the structure is a pop from the lowest level non-empty stack. In this way the gate evaluations are done in a levelized way [3]. The algorithm can be easily extended t o the parallel-fault case, provided that each variable containing a value in the code refers to n faults. being n the parallelism supported by the hardware. approach [ll]. that exploits the parallelism iintlerlying liarclware architecture; this technique has I)wn c-oml)iiw.tlw i t h the previous one in PROOFS [lo] and IIOPE [TI: I lie 01 Event-Driven Simulation 46 f s i m ( FAULT i ) { 1 . r e s t o r e t h e good c i r c u i t v a l u e s 2. r e s e t t h e queue 11. i n s e r t t h e f a u l t s o u r c e i n t h e queue I . f o r each FF ( a ) i f i t s f a u l t y v a l u e is d i f f e r e n t from t h e r/ good one Figure 2 : Sample circuit i. u p d a t e t h e v a l u e ii. i n s e r t t h e f a n o u t elements i n t h e queue .! to record whether the gate has been already introduced in the queue. A third set of variables ( p r o c 9 t r x x ) is defined, each containing the pointer to the procedure for the corresponding gate (actually, they are just the names of the C functions for the current implementation, whereas they would be labels in the assembly level version - see below). When a fault source is extracted from the queue, its value must be forced to the faulty value, and the evaluation phase has not to be activated. A fnulty procedure is thus generated for each gate; its task is to force the gate t o the faulty value, instead of executing the evaluation step. If the faulty value is different from the good one, the fanout elements are inserted in the queue, as in the good procedure. In order t o avoid the overhead deriving from the test for the fault source. in each variable p r o c - p t r x x either the pointer to the goodprocedure, or the pointer to the fuultyone is written before each simulation step is started; this latter case occurs when the element corresponds to a fault source simulated in the current step. When a gate is inserted in the propagation structure, the function pointer contained in the corresponding variable proc-ptr-xx is directly inserted in the structure. In this way the test of step 5 is avoided and the whole step 5 is reduced to a simpler and more efficient form: while the queue is not empty, the procedure is activated, whose pointer is extracted from the queue itself. Fig. 4 reports the faulty procedure for gate g 3 assuming a stuckat-0 is to be simulated on g 3 during the current simulation step. The faulty procedure is slightly more complex when the parallelfault technique is used. because the faulty value must be forced only on the bit which h a s been assigned t o the fault. At each simulation step n faults are simulated in parallel, n being the hardware parallelism: for each fault source a mask is created, which corresponds either to the stuck-at-0 or stuck-at-1 fault. The masks for the n current faults are correctly positioned on the hits assigned to each fault before the simulation step is started. During the simulation step, the faulty procedures force on each fault source a new-value obtained by accessing to the corresponding variable in the mask set and All'Ding or ORing (depending on the type of the fault) t h e current value of the gate with the mask. The resulting faulty procedure for gate g 3 stuckat-0 is shown in Fig. 5, where a n d m a s k g 3 is assumed to contain a 0 in the bit position assigned to the fault. a 1 in any other bit posit ion. while t h e r e are elements i n t h e queue ( a ) e x t r a c t an element w ( I ) ) i f w is a f a u l t s o u r c e i. f o r c e w t o t h e f a u l t y value else i. evaluate U ii. i f t h e output from t h e good A. record t h e D. i n s e r t t h e queue v a l u e of w is d i f f e r e n t one f a u l t y v a l u e of w f a n o u t elements of w i n t h e 0 . check t h e POs 7. r e c o r d t h e v a l u e s of t h e f a u l t y FFs I I..i!:iirr I: Tlir single fault propagation procedure. The procedure i, c.\;c-cutr.(l for rach pattern and for each fault. Implementation 2.2 TORSlhl is composed hy two modules: the g e n e r a t o r Inotlule: it reads the circuit description. and g ~ ~ n e r a t et.hc s good and faulty ( ' procedures for all thr, gate+ iii the circiiit: the s i m u l a t o r module: it contains the procedures for the n (inplit user interface and senera1 s i ~ n ~ l a t i onianagement pattern r c d i n g . fault list updating. detection cliecking. Flil) 47 Flop recording and restoring, good and faulty procedures activation 1. proc-g30 { / * e v a l u a t i o n */ f - v a l - g 3 = AND ( f - V a l - g l , f - v a l - g 2 ) ; /* propagation */ i f ( f - v a l - g 3 I = g-val-g3) t if ( i s - a c t i v e - g 4 ) { i s - a c t i v e - g 4 = TRUE; i n s e r t (proc-ptr-g4) ; The simulator module is linked with the object code of the files produced by the generator module, and an image file is obtained for each circuit to be simulated. The time spent for running the generator, compiling the created files, and linking them with the simulator is an overhead (hereinafter called overhead t i m e ) which must be added to the real simulation time. The larger the latter is. the more negligible the overhead time can be considered: in general, the overhead can be accepted only when a large number of input patterns have to be simulated, as in the case of fault dictionaries construction or BIST architectures analysis. The current version of the generator module produces C code. As a consequence, some of the techniques proposed in [SI,such as the use of jump instructions instead of procedure calls, and loop unrolling, can not be implemented, as the language does not allow t o use indirect g o t o statements. Further improvements in efficiency could be obtained by writing the code in the assembly language, although this would mean to abdicate to the portability constraint. Generating an assembly level version of the good and faulty procedures would allow us to further optimize their code by optimally exploiting the available registers. Moreover, the overhead time could be significantly reduced if the generator module would produce an Assembly language output. } if { (is-active-g5) i s - a c t i v e - g 5 = TRUE; i n s e r t (proc-ptr-g5) ; 1 } } Fignre :3: The procedure for gate g3 f O-proc-g30 { / * f a u l t i n j e c t i o n */ f - v a l - g 3 = ZERO; /* p r o p a g a t i o n */ i f ( f - v a l - g 3 I= g-val-gs) C if ( i s - a c t i v e - g 4 ) i s - a c t i v e - g 4 = TRUE; i n s e r t (proc-ptr-g4) ; 3 TORSIM has been run under the VMS operating system on a VAX 9000 with 256 Mbyte of Main Memory and using the standard VMS C compiler to collect the results reported in the following. A set of experiments has been performed by simulating 50,000 random patterns on the standard set of synchronous sequential circuits [2]. A non-collapsed fault list composed of all the faults on the gate outputs has been considered. The fault dropping feature has been activated. The results, in terms of fault coverage, overhead time and simulation time, are reported in Tab. 1. The ratio between the overhead and the simulation time is also computed, althoiigh this figure clearly depends on the number of patterns and decreases in a nearly linear way with it. Note that the efficiency of TORSIM in the simulation phase is so high (asshown below) that the time required by the whole process (code generation, compilation, link, and simulation) is by far lower than the one required by any other existing fault simulator, provided that the number of patterns is as high as some tens of thousands. Comparisons with the data reported for other tools are difficult, as the experiments are performed using patterns which are not available and using different hardware platforms. In the following, we will compare our results with the ones reported for PROOFS because the algorithm we adopted is similar to the PROOFS’Sone; PARIS [4] and HOPE [7] are declared to be more efficent than PROOFS by a factor of two; the techniques they use could be integrated in TORSIM. thus reducing the number of events t o be processed. To compare the performance of TORSIM with the data reported in [lo] we made the assumption that for our experiments the VAX 9000 we used be equivalent to the Sun 4/2SO used in [lo]: this assumption is based on the fart that the two machines have roughly the wine C’PL’ power. while tlie memory requirements of TORSI31 (no more than :32 h1byte) can be satisfied by both of them. Moreover. we normalized the simulation time to t w o parameters representing the work done. The first parameter is the number of gate evaluations. In fact. this parameter should be roughly tlie Same provided that the same pattwns. and the same faults are simulated by TORSI11 > i f (is-active-g5) c i s - a c t i v e - g 5 = TRUE; i n s e r t (proc-ptr-g5) ; > } risiirc I: The faulty procedure for gate g 3 assuming the fault g3/0. fO-proc-g30 { /* e v a l u a t i o n */ f - v a l - g 3 = AND ( f - V a l - g l , f - v a l - g 2 ) ; /* f a u l t i n j e c t i o n */ f-val-g3 = AND ( and-mask-g3, f - v a l - g 3 ) ; / * p r o p a g a t i o n */ i f ( f - v a l - g 3 I = g-val-g3) { i f (is-active-g4) { i s - a c t i v e - g 4 = TRUE; i n s e r t (proc-ptr-g4) ; > i f (is-active-g5) i s - a c t i v e - g 5 = TRUE; i n s e r t (proc-ptr-g5) ; { > > Experimental Results } assuming the fault 48 PROOFS. a n d it i 4 a good indicator of the work done. .An Tal). 2. the average niiii1I)er of gate evaluations 1 ) ( ~ ~ e c o i ~I)rrfoi.inrtl (I by TORSIRI can be estimated to be more t l i a i i IO t i i i i r 5 grrater t h a n iii PROOFS. I IIV w'coii(1 I)aranirtrr talirs into account the work done i n terms 01' cirriiit size. nuinl)er of patterns, and final fault coverage: in L i c t , due to faiilt dropping, the simulation is heavier if the fault (.o\.cragc i% Io\v~r.since a greater number of faults must be simi i l a t id. \\.c t l i i i s tlefinecl the following parameter aii(l \ti11 c a l l scr Iroiii Circuit FC' %> __ j201( 58.19 95 26 9298 5344 97.57 97.31 5349 24.59 5382 5386 92.49 S400 24.33 45.69 5420 s444 21.60 99.58 5510 18.12 5526 18.04 S526N 90.09 5641 88.39 5713 62.30 5820 S832 62.54 38.86 S838 99.66 s953 99.11 SI196 98.34 S1238 67.69 S1423 88.17 51488 S1494 88.22 71.53 S5378 15.83 S9234 37.19 S13207 17.84 915850 89.92 S35932 22.15 S38417 57.125 S38584 Average Iicre y is the number of gates, p the number of patterns, and /! llir f i n a l fault coverage. The results obtained by normalizing I lii* 4niiilat ion t inir to PC are reported in Tab. 3; in the a:wage I O R S 1 1 1 is 4 more than 26 times fa.ster than PROOFS. Conclusions .\ iirw al)piwicli to the fault simulation of synchronous sequen~ i a lciii.iiil5 has Ix=en presented. The approach is based on the ia\-tc.iisioii to thr zero-delay fault simulation of the event-driven ( ciiiipiletl-cotlc* technique which has been proposed for logic sim- 'This technique drastically reduces the simulation iildtion [!)I. b v avoiding run-time accesses to the data structures conlliiiiing tlir circriit topology. For each gate, ad hoc procedures i l l ( ' geiicratetl. \vliose goal is to efficiently execute the evaluation a i i ( l propagation tasks. Several high efficiency coding methodolog i r s a n d data structures have been described. A fault simulator i i a n i t d TORSI11 has been written, which implements the ideas drscril)e(l i n the paper. A prototype has been used t o provide r \ I ) r i i i i i r i i t a1 data proving the effectiveness of the approach. .1 O l l S l l l is particularly suited for special applications like fault c.o\-rrage computation in BIST architectures or fault dictionary rtIii.;t riiction for diagnostic purposes; in these cases, very long se( I I I ( Y I ~ P S of input patterns have to be simulated, and the overhead f11r cotlr generation. compilation and linking can be neglected. 'I lie speetl-rip with respect t o the fault simulator presented in [I 11 has I)wn computed using several parameters, and it is alwa\'s greatrr than one order of magnitude. l'lir results could be improved (in terms of overhead and simu1,tt ion timc) if the evaluation/propagation procedures would be rit ten i n .\ssenibly language. Further improvements could be olllained I)\.itlrutifying in the circuit some macro-gates and geno i r l t iiig nd hoc procedures for their evaluation. Dynamic Fault Oi.(lrring [:I] techniques are also considered to increase the effecI i\.ciiess of tlir fault-parallel method. II i i i r - Good Fault Siniul 16.43 20.02 20.78 21.8 1 20.91 22.06 24.56 24.00 27.21 27.29 27.78 44.51 47.63 41.95 41.81 50.90 53.39 67.71 65.99 88.04 87.50 86.71 413.23 1018.22 2353.77 4318.41 9637.42 18097.29 18691.34 8.28 14.30 14.40 9.70 13.32 10.07 11.35 10.17 17.60 11.32 10.55 30.26 31.22 28.01 28.79 20.48 23.36 50.41 50.91 55.12 40.95 42.17 194.94 74.34 222.05 381.7 1900.15 1250.33 1898.04 - 4.57 3.61 3.60 89.15 7.31 124.82 51.58 108.99 2.77 129.88 126.82 15.97 19.17 55.29 56.05 129.29 3.48 8.59 11.48 198.68 36.23 36.33 624.39 5553.84 7254.23 17664.8 1769.55 112521.79 14243.95 12.85 17.91 18.00 98.85 20.63 134.89 62.93 119.16 20.37 141.20 137.37 46.23 50.39 83.30 84.84 149.77 26.84 59.00 62.39 253.80 77.18 78.50 819.33 5628.18 7476.28 18046.5 3669.7 113772.12 16141.99 3verllead/ Siniul 0.47 128 112 115 0 12 1811 0 16 0 19 0 LO 1 .14 0 19 0 20 0 OB 0 'i5 0io 0 L9 0 14 1'19 1 15 I 06 0 15 1 13 1 IO 0 io 0 18 0 11 0 24 2 1)3 0 16 0 16 0.77 Table 1: Experimental Data for 50,000 random pattern fault simulation. The column Simul is the sum of the two previous ones. [5] N. Gouders, R. Kaibel: "PARIS: A Parallel Pattern Fault Simulator for Synchronous Sequential Circuits." ICCAD-91 IEEE Int. Conf. on CAD, Santa Clara (CA), US.4, No\ 1991, pp. 542-545 [6] D. Harel, R. Sheng, J Udell. "Efficient Single Fault Prop agation in Combinational Circults," ICCAD-87: IEEE Int Conf. on CAD, Santa Clara (CA), USA, Nov. 1987, pp. 2-5 [7] W. Ke, S.C. Seth, B.B. Bhattacharya: "A Fast Fault Simulation Algorithm for Combinational Circuits," ICCAD-88: IEEE Int. Conf. on CAD, Santa Clara (CA). U S A . Nov. 1988, pp. 166-169 [8] H.K. Lee, D.S. Ha: "HOPE: An Efficient Parallel Fault Simulator for Synchronous Sequential Circuits," DAC92: ACM/IEEE Design Automation Conference, Anaheim (CA), I'SA, June 1992. pp. 336-340 References [ I ] I \ . J . .\ntreich. R1.H. Schulz: "Accelerated Fault Simulation a i d T A r i l t Grading i n Combinational Circuits," IEEE Trans. on ('.\D/I('.AS. Nov. 19S7, pp. 701-712 ['I )verllrad - [ SI . I[SI [SI [26.01 12.31 7.28 18.73 1 \\ - [9] D.M. Lewis: "A Hierarchical Compiled Code Event-Driven Logic Simulat.or." IEEE Trans. on Computer-Aided Design. Vol. 10, No. 6, June 1991, pp. 726-737 F. 11yIrz. I). Bryan. I\. Koiniifiski: "('ombinational profil(.\ of seqiirntial benchmark circ:iits," ISCAS-S9: IEEE Simposiiiiii on ('ircuits .4nd Systems, Portland ( O R ) . l.S.\. \la>- 1959. pp. 1929-1934 lilt. [lo] F. Maamari, J. Rajski: "A Fault Simulation Method base(l on Stern regions," ICCAD-88: IEEE Int. C o d . on CAD. Santa Clara ( C A ) . USA, Nov. l9SS. pp. 170-173 [:%I(i.1'. ('al)o(li. S. Gai. hl. Sonza Reorda: "Fast Differrntial l ~ ~ a i i 1Simulation 1 I>>- Dynamic Fault Ordering," ICC'DSI: 1111. ('oiif. on ('oinputer Design, Cambridge, SIA ). Orlolwr 1991. pp. 60-63 [ l l ] T.M. Nicrmann. W.T. Cheng. J.H. Patel: "PROOFS: .\ Fast, Memory Efficient Sequential Circuit Fault Simulator," 2 i i h A C l I Design Auton!ation Conference. J u n e 1990. pp. 535-.540 and IEEE Trans. on Computer-Aided Design. Vol. 11. No. 2. Feb. 1992. pp. 198-207 [ I ] S. ( L i . 1'. Soinrnzi. 31. Spalla: .'Fast and Coherent Siiiiiil.il io11 \vi{ Ii %cro IDelay Elements." IEEF Trans. ou ( ' \ l ) / l ( ' . \ S . \'ol. ('.\LM. No. I . Jan. 19x7, pp. 55-92 49 1 fOHSISI Time Ralio 154 [ . Ix 1000 2G.01 12.85 17.91 18.00 98.8.5 20.63 131.89 62.93 119.16 20.37 141.20 13i.37 46.23 50.39 83.30 a434 149.i7 26.84 59.00 62.39 253.au 77.18 78.50 819.33 wxi8 i176.28 18046.5 3669.70 1 13772.12 16141.9 177 218 219 1.53 203 115 151 160 223 159 164 203 193 192 1aa Ii o 246 245 246 137 188 182 154 162 I25 135 183 12 129 iao - Gate E\ x 1000 15 25 OOFS Time A 100 I 130 1 40 32 00 1 40 24 ao 4 10 37 20 N A N A 22 20 2’; a 30 32 139 129 131 64 65 81 109 446 370 2003 NA NA NA 5283 NA NA 13 ao 20 20 530 16 408 68 618 343 Ratio 1000 - 15 ao 2 50 2 GO 9 GO 9 10 7 60 3 50 6 70 a 20 7 00 23 IO 20 20 99 50 TORSIM vs. PROOFS x 13 15 14 I6 11 16 16 I6 N A. 10 33 12.79 14 21 15.36 9.27 17.80 7.02 9.16 9.64 I5 10.35 I4 12 11.42 12 11 14 17 18 1) 9 15 19 18 20 N A. N A N.A. 1’ 140 90 rt.4. N A N A N.A. 15 -- 16.92 15.76 13.29 13.27 9.90 13.46 25.31 24.97 8.85 9.76 9.95 7.66 NA N A N A i1.82 11.95 I<ildr2: (‘onipari~onbetween PROOFS and TORSIM in terms number o f g a t e e v a l u a t i o n s / t i m e . [ I ? ] . J . P . Rotlr. Ii..G. Bouricius. P.R. Schneider: “Programmed AIgorit Iims to Compute Tests to Detect and Distinguish het\rtrii I.‘ailun=s i n Logic Circuits,” IEEE Trans. on Electronic NO. 5 , Oct. 1967, pp. 567-580 C O I I I ~ ) I I ~~ ‘~OII .XEC-16, . [I $1 E.G. 1-lricli. T.G. Raker: “Concurrent Simulation of Nearly [tlc~iitical Digital Networks,” Computer, Vol. 7, No. 4. 111). 30- I I. April 1974 UCircuit s208 s298 s344 s349 s382 s386 s400 s420 s444 s510 s526 s526n s641 s713 s820 s832 s838 s953 sl196 si23a s1423 s1488 s1494 s5378 s9234 ~13207 ~158.50 s35932 s3a4~i s3a5a4 , Average TORSlM PROOFS 0.32 0.21 0.22 0.23 0.31 0.24 0.44 0.29 0.28 0.19 0 26 0 26 5.98 8.00 8.68 9.25 7.47 4.13 9.75 5.06 9.77 0.24 0.52 0.21 722 5.55 0.21 0.42 0.32 0.70 0.62 0.41 2.27 0.96 0.40 6.04 6.49 0.23 0.36 0.37 0.30 0.14 0.22 18.96 38.90 39.72 39.59 24.29 17.23 22.00 17.24 34.36 N.A. 11.49 9.34 4.28 5.00 6.61 6.84 4.21 4.26 4.03 4.35 0.22 ratio - 43.55 36.37 19.45 22.04 18.41 18.49 14.11 31.49 18.25 18.01 13.81 26.61 28.21 15.40 N.A. N.A. N.A. 21.20 51.60 N A. N A. 7.29 26.17 - Table 3: Simulation time normalized to w 50