878 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 12, NO. 6, J U N E 1993 Design and Synthesis of Self-Checkmg VLSI Circuits Niraj K. Jha, Member, IEEE, and Sying-Jyan Wang Abstract-Self-checking circuits can detect the presence of both transient and permanent faults. A self-checking circuit consists of a functional circuit, which produces encoded output vectors, and a checker, which checks the output vectors. The checker has the ability to expose its own faults as well. The functional circuit can be either combinational or sequential. A self-checking system consists of an interconnection of selfchecking circuits. The advantage of such a system is that errors can be caught as soon as they occur; thus, data contamination is prevented. Although much effort has been concentrated on the design of self-checking checkers by previous researchers, very few results have been presented for the design of self-checking functional circuits. In this paper, we explore methods for the costeffective design of combinational and sequential self-checking functional circuits and checkers. The area overhead for all proposed design alternatives is studied in detail. I. INTRODUCTION ECHNOLOGICAL advances have increased drastically the complexity of integrated circuits that can be realized on a single chip, and the trend is likely to continue. Although the advantages of such a rapid advancement are obvious, we do have to learn to deal effectively with some of the side effects, such as the increase in the number of transient faults. It is well known that transient faults are the predominant cause of system failures [1][3]. With the possibility of reduced voltage levels for VLSI circuits and the resultant decrease in noise margins, system susceptibility to transient faults is likely to increase [3]. Thus we have to design circuits and systems that can expose such faults. Self-checking circuits and systems can detect the presence of both transient and permanent faults. A self-checking circuit consists of a functional circuit, which produces encoded output vectors, and a checker, which checks the vectors to determine if an error has occurred. The checker has the ability to give an error indication even when a fault occurs in the checker itself. The functional circuit can be either Combinational or sequential. A self-checking system consists of an interconnection of self-checking circuits. The most important concept developed for selfchecking circuits is the totally self-checking (TSC) concept [4], 151. The TSC concept was generalized to the path fault-secure (PFS) and strongly fault-secure (SFS) concepts [6]. These concepts are defined in Section 11. The output vectors of a self-checking functional circuit are often encoded in a code that detects unidirectional errors. The reason is that unidirectional errors are very common in integrated circuits [3], [7]-[lo]. Such an error is said to occur when there are multiple transitions in either the 0 + 1 direction or the 1 0 direction, but not both. For example, in [9] it was shown that a stuck-at fault, cross-point fault, or a short in an MOS programmable logic array (PLA) or read-only memory causes a unidirectional error at its outputs. Even for random logic, it is easy to ensure by simple design techniques that most single faults can cause only a unidirectional error. Many codes have been proposed for detecting such errors [lo][ 161, [22], [26], [27]. A nice mathematical framework for dealing with such errors is given in [lo]. A lot of work has been done in the area of self-checking checker design for different codes. However, not as much attention has been paid to the design of functional circuits. Self-checking combinational functional circuit design methods were presented in [17] and 161. Results in the area of self-checking sequential functional circuit design were presented in [ 181-1201. Applications of the selfchecking concept to microprogram control units and PLA’s were presented in 171, [3], and 191. A theory of TSC system design was presented in [ZI].In most previous works on functional circuits, usually some conditions are given which need to be satisfied to obtain a selfchecking design. However, no attempt is made to evaluate the practicality of the design in terms of area overhead, etc. For example, the area overhead needed for a self-checking sequential functional circuit design derived from some existing methods can be quite substantial. In this paper, we are concerned with deriving methods for cost-effective design of combinational and sequential functional circuits and checkers. In Section 11, we describe briefly some previous results that will be used herein. The basic self-checking circuit and system structures are given in Section 111. Next, in Section IV, we evaluate various synthesis methodologies by applying them to some benchmarks. Concluding remarks are given in Section V . Manuscript received April 30, 1991; revised August 14, 1992. This work was supported by the National Science Foundation under Grant MIP9010433. This paper was recommended by Associate Editor Franc Brglez. The authors are with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544. IEEE Log Number 9206845. 11. PRELIMINARIES In this section, first we will review come unidirectional error-detecting codes that will be used in the following sections. Then we will discuss properties that define a selfchecking circuit. T + 0278-0070/93$03.00 0 1993 IEEE JHA A N D WANG: SELF-CHECKING VLSI CIRCUITS 2.1. Systematic and Separable Codes First, we will make a distinction between systematic and separable codes. Suppose that the information symbol has k bits and the appended check symbol has r bits. Thus, a codeword has n = k r bits. Dejinition 1: A set of binary n-tuples C is called a systematic code if 1) it contains 2k n-tuples, 1 Ik In , and 2) C = {XI. = ZS,CS,, where IZS,l = k and ICS,l = r, and each one of the 2k information symbols appears as part of some codeword). Dejinition 2: A set of binary n-tuples C is called a separable code if 1) it contains s < 2k n-tuples, 1 Ik In , and 2) C = { X ( X = ZS,CS,, where IZS,l = k and ICS,l = r , and each one of the s information symbols appears as part of some codeword). The Berger code [22] is an optimal systematic all-unidirectional error-detecting (AUED) code. For such a code, r = [logz ( k 1)1 . This code is formed as follows: a binary number corresponding to the number of 1’s in the information symbol is obtained, and the binary complement of each bit in this number is taken; the resulting binary number forms the check symbol. Sometimes AUED codes are also called unordered. TSC checkers for the Berger code were presented in [23]-[25] and [40]. In a systematic code in which the number of information bits is k , it is assumed that all 2k possible information symbols are present. However, more often than not, a functional circuit with k outputs does not produce all the 2k output patterns. In such cases, it is possible to use a separable AUED code [26], [16] for encoding purposes. A separable code has redundancy no worse than a systematic code with the same capability; frequently its redundancy is less. + + 2.2. The m-out-of-n Code The rn-out-of-n code [27] is an optimal AUED code. In such a code, all codewords have exactly m 1’s and ( n rn) 0’s. For this reason, an rn-out-of-n code is also called an rn-hot code. Such codes are “nonsystematic” because the information bits cannot be separated from the check bits. A lot of work has already been done in the area of TSC checker design for m-out-of-n codes [5], [28]-[3 11. 2.3. Self-checking Circuits The following definitions can be used to describe a TSC system [4], [5]. F denotes the set of faults, and G is the network in these definitions. Dejinition 3: G is self-testing with respect to F if for every fault in F the circuit produces an output noncodeword for at least one input codeword. Dejinition 4: G is fault-secure with respect to F if for every fault in F the circuit never produces an incorrect output codeword for any input codeword. Dejinition 5: G is TSC if it is both self-testing and fault secure. Dejinition 6: G is code-disjoint if it maps input code- 879 words (noncodewords) to output codewords (noncodewords). Dejinition 7: G is a TSC checker if it is self-testing, fault-secure, and code-disjoint. It has been pointed out in [32] that the fault-secure property is not really necessary for TSC checkers. Conversely, most checkers, which are designed to be only self-testing and code-disjoint, can also be found to be automatically fault-secure. Thus, in many cases we get the fault-secure property in the checker for free. The concept of TSC functional circuits was generalized to SFS functional circuits in [6]. Dejinition 8: G is SFS with respect to F if for every fault in F, either 1) the circuit is self-testing and faultsecure, or 2) the circuit is fault-secure, and if another fault from F occurs in G then either property 1 or 2 is true for the fault sequence. The path fault-secure property also was presented in [6]. This property is defined in terms of network structures and codes. If a circuit is PFS, a modeled fault can result only in the correct codeword or a noncodeword at the outputs, never an incorrect codeword. The following theorems were proved for the PFS property [6]. Theorem I : If G is PFS with respect to F, it is SFS with respect to F. Theorem 2: If G is PFS with respect to F and input code space A , then G is also PFS with respect to F for the input code space A’ C A . Theorem 3: Any inverter-free network with an unordered output code space is PFS with respect to unidirectional stuck-at faults. 111. SELF-CHECKING CIRCUITAND SYSTEM STRUCTURES In this section we use the concepts introduced in the previous section to derive self-checking circuit and system 3.1. Self-checking Combinational Functional Circuits A circuit, which has been designed to be PFS for some input code space, may actually receive a subset of the input code space under normal operation when it is placed in an actual system environment. According to Theorems 1 and 2, the circuit remains PFS and, hence, SFS. However, if a fault is not detectable by the received input code space, then the circuit can be simplified by removing the line or gate corresponding to the fault. For example, if a stuck-at 0 (stuck-at 1) fault at the input of an A N D gate is undetectable by any normally received input codeword then the gate (line) can be removed. However, this process can be time-consuming. Theorem 3 is strictly applicable to totally inverter-free circuits. It is well known that such a circuit can be obtained if AUED encoding is used for both the inputs and outputs [33], [6]. However, in general, such an approach may be overly restrictive and may result in much more area overhead than necessary. We extend this approach as ~ 880 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 12, NO. 6, JUNE 1993 follows. We encode the output functions of the functional circuit in a unidirectional error-detecting code, but the inputs are not encoded. We then use logic optimization tools to obtain a realization of these functions which has inverters only at the primary inputs. In other words, the internal part of the circuit consists of only A N D and OR gates (the circuit can have any number of levels), as shown in Fig. 1. In fact, even if the internal part of the circuit contains inverters, N A N D gates or NOR gates, but the circuit can be reduced to a circuit shown in Fig. 1 only through the use of De Morgan's theorem, then the realization is still acceptable. However, such a circuit can be guaranteed only to the PFS with respect to internal single stuck-at faults, not unidirectional stuck-at faults. Note that, in general, the output functions of a functional circuit need not be binate in each of the input variables. In Fig. 1, the outputs have been assumed to be binate in x i , * , x u , but unate in xu + , * . . x p . Consider any single stuck-at fault in the circuit in Fig. 1 except those at the stems of the primary inputs x , , . . . , x,. Define the inversion parity of a path to be the number of inversions on the path modulo 2. For the above-mentioned faults, every path from the fault site to any circuit output has the same inversion parity. Thus these faults can result only in unidirectional errors at the outputs. Therefore, the output vector would be a noncodeword, which will be detected by the checker monitoring the outputs. This means that the circuit will be PFS and, hence, SFS with respect to the aforementioned faults. When this functional circuit is embedded in a system, it may receive its inputs from another functional circuit. The checker that monitors the outputs of the preceding functional circuit can also be made to detect the stuck-at faults on the stems of primary inputs x,, . . , x,, of the functional circuit under question; this will become clearer in Subsection 3.3. Now suppose that the functional circuit that is to be synthesized is not fed by any other functional circuit even when placed in a system. For such cases, one solution, as mentioned earlier, is to use AUED encoding for both inputs and outputs. Such a circuit is PFS with respect to all single and unidirectional stuck-at faults. Note that there is no need for an input checker. However, the disadvantage of a realization such as this is the need for extra primary inputs and most likely a significant increase in the area required for its implemention. A designer may wish to consider if even for this case the extra overhead over a circuit realization given in Fig. 1 is beneficial, considering that the extra protection is not that much, especially for VLSI functional circuits. , Fig. I . Realization of the functional circuit '3 X" AND/OR circuit 3 3.2. Self-checking Sequential Functional Circuits Previous works in the area of self-checking sequential functional circuits have been presented in [ 18]-{20]. However, these methods can be impractical when applied to VLSI circuits. The primary reasons are unacceptably high area overhead and/or the difficulty of satisfying the conditions required for making the circuits self-checking. Fig. 2 . Mealy sequential machine An alternative approach for Mealy machines is presented here. It can be trivially extended to Moore machines. Consider the Mealy machine shown in Fig. 2. In this machine the next state functions and the output functions are encoded separately. Suppose the number of states in the state table of the sequential machine that has to be synthesized is s. If the next state functions are encoded in the m-out-of-n code, then the number of state variables or flip-flops, n , must satisfy the following condition: (k) L s. Clearly, to reduce the number of flip-flops, we have to choose the smallest n such that the preceding condition is satisfied, and this leads to the choice of m = Ln/2J . However, reducing the number m usually decreases the size of the combinational logic of the sequential circuit, as we will see in Section IV. Therefore, the choice of m is a tradeoff between reducing the number of flip-flops and reducing the size of the combinational logic. The output functions are encoded in an AUBD code. Using arguments from [ 3 3 ] it can be shown that, with the above encoding, the circuit outputs and the next state outputs will be unate in the state variables, as shown in Fig. 2. The inputs need not be encoded if they are obtained from a preceding functional circuit (combinational or sequential). The arguments for doing this are similar to those given in Subsection 3.1. However, if the functional circuit is not fed by any other functional circuit, then the designer may or may not wish to use input encoding, as explained earlier. JHA AND WANG: SELF-CHECKING VLSI CIRCUITS 881 There are two TSC checkers in this circuit to monitor the circuit outputs; one checks the output functions, the other checks the next state functions. The outputs of these two checkers are then fed to a two-rail checker to produce the final checker outputs. Any detectable stuck-at fault, excluding those at the primary input stems of x,, * xu in Fig. 2, results in a unidirectional error at the circuit outputs and/or next state lines, which is caught by at least one of the checkers. The stuck-at faults at the primary input stems are detected by the checker, which checks the preceding functional circuits in the system. If there is no functional circuit preceding it, and AUED encoding is used for the primary inputs, then it can be shown that the output functions and next state functions will be unate in the primary inputs. In such a case, any stuck-at fault can cause only unidirectional errors. The preceding arguments can also be extended to PLA implementations. Thus we see that the functional circuit is PFS and, hence, SFS. This can be reduced to a TSC circuit if all gates and lines corresponding to stuck-at faults not detected by input codewords are removed. However, as mentioned earlier, this can be time-consuming. 1 TRC TRc 2 21 STATISTICS FOR CKT 3.3. Selfchecking Systems The area of design and synthesis of self-checking systems has received scant attention from researchers in the past. Applications of the self-checking property to microprogram control units were discussed in [7] and [3]. A theory of TSC systems was given in [21]. However, the results presented pertain more to analysis rather than synthesis of self-checking systems. Furthermore, it may not always be possible to meet or even check for the conditions outlined in [21] especially for moderately large systems. The design and synthesis methods for self-checking combinational and sequential circuits were presented in the previous two subsections. These are the building blocks of a system. Next we need to design the system from these building blocks. Consider the system in Fig. 3. In this system, functional circuits 1 and 2 are combinational, whereas functional circuit 3 is sequential. We have assumed that systematic or separable codes have been used to encode the output functions of the functional circuits; however, nonsystematic codes could also be used. TRC, refers to a two-variable two-rail checker. The outputs of checkers 1, 2, 3, and 4 are reduced ultimately to only two outputs, zI and z 2 , with the help of TRC,’s. From functional circuits 1 and 2, only the information bits go to functional circuit 3. In general, functional circuit 3 will require the complements of its primary inputs too (inverters are not shown in Fig. 3, but are implicit). Therefore, the branch of information bit lines that goes to checker 1 or 2 should be derived from a point beyond where these lines are fed to the inverters. Thus, we ensure that stuck-at faults on the stem and all branches of these lines can be detected. 22 Fig. 3. Self-checking system bbsse cse dk14 dk15 dk16 ex 1 ex4 ex6 kirkman mark 1 mc opus planet SI sand scf TABLE 1 BFNCHMARK SFQUFNTIAL MACHINES #inp #out I I 7 3 3 2 9 6 5 12 5 3 5 7 8 7 5 5 3 19 9 8 6 16 5 6 11 21 19 6 9 56 sse I I styr 9 10 #sta 16 16 7 4 27 20 14 8 16 15 4 IO 48 20 32 121 16 30 IV. RESULTS In this section, we give results for various synthesis methodologies. The methodologies are applied to several sequential MCNC benchmarks whose statistics are given in Table I. For each machine in this table, the number of inputs (#inp), the number of outputs (#out), and the number of states (#sta) are indicated. The outputs and the next states are encoded with AUED codes, as mentioned earlier, and then synthesized by MIS [34], a multilevel logic optimization program from Berkeley, to get the final realization. 4.1. Internally Inverter-Free Sequential Circuits Encoded with Berger Code The first synthesis scheme follows the procedure given in Subsection 3.2. The output functions are encoded with the Berger code, while the states of the finite state machines are encoded with the m-out-of-n code with all m 882 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF 1NTEGRATED CIRCUITS AND SYSTEMS. VOL. 12, NO. 6, J U N E 1993 (L) and n that satisfy the following conditions: L s and rn I L n / 2 J , where s is the number of states. The functional circuits are optimized by a script that strictly uses algebraic factorization without complement; i.e., the resulting circuits have inverters only at the primary inputs, but not inside them, as shown in Fig. 2. However, when such a script is used to optimize a circuit, the results are usually slightly inferior to a script that includes Boolean optimization. Consider, for example, the state table shown in Fig. 4(a). With the given state assignment, the combinational logic of the sequential circuit without any self-checking properties is as shown. However, when 1-hot encoding for the states and Berger encoding for the outputs are used, then the self-checking circuit is as shown in Fig. 4(b). Let us next consider the FSM benchmarks, which are first converted to KISS2 format [ 3 5 ] . With an rn-out-of-n (or, equivalently, rn-hot) state assignment, the circuit outputs and next state outputs can be made unate in state variables by specifying the 0’s in the present state assignment to be “don’t care.” For instance, if the present state is 11 000, then it should be specified as 11- - -, where the dash denotes a don’t care. If this is not done, MIS does not necessarily find the solution that makes the outputs unate in the present state inputs. The functional circuit outputs are then encoded with the Berger code. The literal-counts of the combinational logic of the self-checking sequential circuits synthesized by our method are given in Table 11. For each benchmark, we tried all possible rn-hot codes, and for each rn-hot code, two circuits were synthesized. In the first circuit, the output functions are encoded with Berger code, whereas in the second one they are not. Both of them are optimized with the algebraic script so that any internal stuck-at fault will cause only unidirectional errors at the outputs. In the last column, the target machine’s states are assigned by MUSTANG [ 3 6 ] ,which gives a state assignment with rlog2s1 bits; the circuit is then fully optimized (i.e., Boolean optimization is performed). There are two options in MUSTANG, one being fan-in-oriented and the other fan-outoriented. Synthesis was done based on both options, and the better of the two results is given in Table 11. In this table the area occupied by the latches was not taken into account because first we want to draw some conclusions from these results about the overheads involved only in the combinational logic. However, when we report the total literal-count of the functional circuit and the checker, we will also take into account the area of the latches. It is interesting to note that, as the state encoding moves from I-hot to 2-hot, etc., the literal-count of the combinational logic of the self-checking sequential circuit increases as well. In many cases, the I-hot code is better than MUSTANG in terms of literal-count. However, the 1 -hot code may not always be practical for the larger machines (with more states) since the number of required latches increases so fast that the gain in the literal-count by the 1-hot code may be more than compensated by the increase in the area due to the extra latches. as we will 0 sto stl ooo 1 sto sto loo 0 stl st2oo1 1 stl sto 101 0 st2 st2 010 1 st2 s13 111 o s 0 stlO11 I st3 sto 01I sto->oo sa -> 11 stl - > O I st3 -> 10 Next State Lines - Inputs &m I \ Present State outputs Lines osto 1 sto OStl 1 stl ost2 1 st2 ost3 1 st3 stloM)11 sto loo 10 s t 2 0 0 1 10 sto 101 01 st2010 10 so 111 oo stlo1101 stool1 01 sto ->ooIO st2 -> Oloo stl -> IoM) st3 ->ooo1 J Inputs & Present States Lines (b) : Fig. 4. (a) Example of a sequential circuit. (b) Self checking circuit for circuit in Fie. 4(a). JHA A N D WANG: SELF-CHECKING VLSI CIRCUITS 883 TABLE I1 LITERAL-COUNTS FOR THE BENCHMARK SEQUENTIAL CIRCUITS I -hot CKT bbsse cse dk14 dk15 dk16 ex 1 ex4 ex6 kirkman mark1 mc opus planet SI sand scf sse styr 2-hot 3-hOt 4-hol Berger Code No Code Berger Code No Code Berger Code No Code 169 266 175 141 238 422 Ill 137 449 I06 44 95 493 267 518 698 169 538 126 208 146 116 20 1 210 76 101 267 92 44 74 356 233 449 556 126 385 217 319 235 169 355 502 160 173 467 152 67 I34 667 363 64 I 979 217 660 186 278 I96 I37 32 I 295 1 I7 124 312 136 57 108 533 326 562 830 186 536 24 1 43 1 234 371 - - - - 457 561 165 428 404 129 - * Berger Code No Code MUSTANG 140 23 1 93 68 283 269 76 99 2 IO Ill 24 80 575 207 553 97 1 I22 489 - 183 348 159 - - - - 754 457 806 1226 24 1 952 619 416 688 1003 234 717 -: There is no need for such a code *: Results not available. see later. Conversely, the number of state variables for the 2- and 3-hot codes are usually reasonably close to rlog,sl , which makes them a good choice. For larger machines, the 2-hot encoding can give better results than MUSTANG, as can be seen for the benchmarks scfand planet. Suppose the machine to be synthesized has k primary outputs and p state bits, then the number of redundant outputs due to Berger encoding is [log, ( k + 1)1 . Since the original circuit has k + p outputs, one may expect the overhead added by the Berger code to be close to [log, (k 1)1 / k p . In Table 111, we compute the overhead for every sequential circuit, which is given by {(l.c. of self-checking circuit) - (1.c. of original circuit)} /(l.c. of original circuit), and normalize it with respect to the preceding ratio (1.c. denotes the literal-count). Here original circuit refers to a circuit whose states are encoded in an m-hot code, but whose outputs are not encoded. In reality, for different circuits this normalized overhead varies widely. However, the average normalized overhead for the 2- and 3-hot state assignments is close to 1, as one may have naively expected. The reason for reporting normalized overhead, as opposed to simply the percent overhead, is that it allows us to see the trend. All the benchmark circuits are reasonably small. Hence, for these circuits the percent overhead can be large. However, if the preceding trend holds for larger sequential circuits, then we can see that the percent overhead for such circuits can go down drastically, since the ratio rlog, ( k + 1)1/ k p decreases as the value of k increases. + + + 4.2. Fully Optimized Sequential Circuits Encoded with Berger Code If a circuit is optimized with algebraic factorization, then any internal stuck-at fault will cause only unidirec- TABLE 111 ESTIMATION OF THE NORMALIZED OVERHEAD CAUSED BY CODE CKT 1-hot 2-hot 3-hot bbsse cse dk14 dk15 dk16 ex 1 ex4 ex6 kirkman mark I mc opus planet 0.78 0.69 0.67 0.70 0.58 2.63 I .38 I .28 2. I5 0.52 0.53 0.88 1.51 0.49 0.63 2.18 0.78 1.10 0.13 0.70 sand scf sse styr 2.62 2.14 0.80 0.65 2.76 7.87 2.65 1.43 5.00 0.91 0.00 0.79 5.16 1.26 1.58 7.53 2.62 3.97 Average 2.76 I .08 0.93 SI THE BERCER 4-hOt - 0.38 I .94 1.05 - * 0.66 - 1.18 0.39 0.69 2.48 0.13 1.39 tional errors at the outputs. However, if there are inverters inside the circuit, then there might be bidirectional errors at the outputs. But, it is still possible that a bidirectional error can be detected by either the Berger-encoded circuit outputs or the rn-out-of-n encoded next state outputs. Therefore, if the fully optimized circuit has a smaller overhead, yet if a high enough percentage of faults are detected by the above codes, then it may be worthwhile using full optimization. In this subsection, we report results on the synthesis of the self-checking circuits obtained by full optimization, and on the probability that a fault will invalidate the above codes (i.e., change one codeword into another). In Table IV we give the literal- 684 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS A N D SYSTEMS, VOL. 12, NO. 6 , JUNE 1993 TABLE IV LITERAL-COUNTS FOR FULLY-OPTIMIZED BENCHMARK SEQUENTIAL CIRCUITS 1-hot CKT Berger Code 2-hot No Code Berger Code 3-hot No Code Berger Code ~~ bbsse cse dk14 dk15 dk16 ex I ex4 ex6 kirkman mark I mc opus planet SI sand scf sse styr 134 230 115 209 126 105 193 91 I08 65 78 102 41 88 95 33 75 I50 275 1 so * * * * * * 150 * * * * 480 * I15 * 225 363 174 150 370 569 136 181 464 148 61 127 743 54s 827 * 225 147 I79 301 I39 123 328 315 102 143 308 132 43 100 533 469 639 848 I79 594 counts for fully optimized circuits in the same way as in Table 11. Comparing Tables I1 and IV, it is interesting to note that fully optimized circuits sometimes have worse literalcount than the circuits optimized by the algebraic script. Actually, it seems that the algebraic script nearly always produces better results for larger circuits. The reason, however, may be that the output and the next state functions are unate in the present state variables, and this property may be better exploited by algebraic operations. Table V is similar to Table 111, and it shows the normalized overhead for the fully optimized circuits. The next thing we want to know is the probability that a fault will be detected if the method described in this section is used. To obtain this probability, we generate a test pattern for each stuck-at fault in the combinational logic of the synthesized circuit, and see if the errors at the output are detectable by the codes. We used the test pattern generation program STALLION [ 371 with some modifications. The result shown in Table VI is the probability that a fault is not detected. The average probability ranges from 4 to 7 % for 2-, 3 - , and 4-hot codes. Depending on the application, such a probability may or may not be considered too high. 4.3. TSC Checkers In this subsection, we describe briefly the checkers for the codes used herein. TSC checkers for the Berger code were presented in [23]-[25] and [40]. We consider the area-efficient method given in [24] next. The checkers presented in [24] consist of half-adders, full-adders, and two-rail checkers. Suppose the number of information bits in a codeword is k , and the number of checkbits is r . Then the number of fulladders is k - Llogz k_J - 1 , the number of half-adders is upper bounded by Llog, kJ , and the number of two- 4-hot No Code Berger Code No Code MUSTANG ~~ ~ 232 395 140 23 1 93 68 283 269 76 99 210 111 24 80 575 207 553 97 I 122 489 202 354 - - 508 624 I37 455 376 120 - - 503 164 31 1 130 - - - - 793 608 904 1234 232 843 637 460 760 1073 202 658 ESTIMATION OF THE TABLE V NORMALIZED OVERHEAD CAUSED CODE BY THE CKT 1 -hot 2-hot 3-hot bbsse cse dk14 dkl5 dk16 ex 1 ex4 ex6 kirkman mark I mc opus planet 2.33 2.42 0.76 0.83 2.88 1.20 0.96 0.84 0.66 0.70 3.02 1.25 0.86 2.19 0.53 1.26 0.99 2.36 0.70 1.32 0.64 * 2.30 1.54 * 0.44 0.73 0.92 * * * * 4-hot 0.50 0.58 3.30 0.53 - 2.47 1.15 - sand scf sse styr 2.33 * 1.20 1.22 1.32 1.29 0.76 1.68 0.64 1.19 Average 1.89 1.25 I .23 SI BERCER * variable two-rail checkers (TRC2’s) required to form an r-variable two-rail checker is rlog, (k + 1)1 - 1. The least literal-counts for a full-adder, a half-adder, and a TRC, checker among various designs we considered are 13, 6, and 8, respectively. Therefore, if 1.c. is the literalcount of a Berger code checker, the bound on it is given by 13(k - Llog, kJ - 1) I1.c. 5 13(k - + 8 ( rlog, (k Ll0g2 k ] - 1) + 1)1 - 1) + 6 ( LlOg, kJ ) There are many TSC rn-out-of-n code checker designs in the literature [ 5 ] , [28]-[31], from which the designer JHA A N D WANG. SELF-CHECKING VLSl CIRCUITS 885 TABLE VI PROBABILITY OF UNDETECTED FAULT Ih F U L L Y - O P T I MCIRCUITS IL~D CKT bbsse cse dk14 dk15 dk16 ex I ex4 ex6 kirkman mark 1 mc opus planet SI sand scf sse styr Average I-hot 5% 6% 10% 10% 48 % * 14% 2% * 4% 11% 1% * * * * 6% * 10.6% 2-hot 3-hot 2% 13% 5% 9% 17% 5% 0.3% 3% 5% 0% 15% - 1% 0% 4% 3% 3% bbsse dk14 dk15 dk16 ex 1 ex4 ex6 kirkman mark I mc 2% 3% 11% - 15% 0% - opub - 2% 5% 4.0% 6.6% * CKT cse - 4% 4% 5% 1% 9% 6% 1% 4-hot TABLE VI1 TOTAL LITERAL-COUNTS FOR THE SEl.F-c"ECI(IhC planet SI sand scf sse styr I -hot 2-hot 3-hot 355 452 289 210 497 780 317 30 I 629 429 I20 239 1121 47 1 888 2370 355 906 344 446 316 230 445 777 310 298 S88 407 135 221 1016 484 818 I876 344 789 B t . N C " M A K K cIKClJl~l S 4-hot Dup. 349 539 - - - - - 3 84 566 260 I96 652 752 272 296 516 398 IO8 256 I768 524 I240 2480 348 I120 55 I 817 305 - - .- - - - 430 - - - 1082 559 976 2034 349 I134 I124 - 2235 - 4.0% can choose the best. Usually these checkers are rather small because of the small values of n , except when 1-hot code is used. In the following discussion, for each rn-hot code, we choose the design with the least literal-count. The final circuit size depends on the size of the combinational logic of the sequential circuit, latches, and checkers. The actual area occupied by the latches may be a decisive factor in choosing among the different rn-hot codes. As suggested in [38], we count a latch as three literals. In Table VII, we list the final literal-counts (for the functional circuit containing the combinational logic and latches, and the two checkers) for all benchmark examples synthesized with algebraic factorization. Also listed in the table are the literal-counts for the duplicateand-compare type of implementation of all the benchmark circuits. This includes duplicated functional circuits (each obtained after using MUSTANG and full optimization by MIS) and two-rail checkers to compare the two sets of circuit and next state outputs of the functional circuits. 4.4. Outputs Encoded with Sepuruble Codes In many cases the output patterns that normally occur consist of only a small part of the whole output space. Under these circumstances, separable codes may frequently need fewer checkbits and potentially lead to simpler functional circuits. Conversely, as seen in the previous section, the size of Berger code checkers grows only linearly with the number of its inputs, whereas the size of a separable code checker may grow more rapidly. For some circuits, the checker accounts for a significant portion of the overhead. If this is the case, one may want to try a different approach which may reduce the checker size. We consider these issues next. We use the following encoding scheme for obtaining the separable code. This scheme is an extension of the one presented in [16]. We first obtain the zero-count (the number of 0's) of all the information symbols that can normally be produced. We put all the information symbols with the same zero-count into one group. Suppose there are G such groups. Obviously, I 5 g 5 k 1. Let r be the number of checkbits. We do not use the check symbol which has r 1's. Therefore, r is chosen to be [log, (G + 1)1. The r-bit check symbols are assigned to the different groups in ascending order (i.e., the group with a larger zero-count is assigned a larger check symbol). For example, let k = 9 and the different groups present have zero-counts 1 , 2, 5 , 6, 7, and 9. Then G = 6 and r = 3. Therefore, one possible assignment would be to use thecheck symbols 110, 101, 100, 011, 010, 001, respectively, for the information symbols with zero-counts 9, 7, 6, 5 , 2, 1. A self-checking circuit based on the preceding encoding technique is shown in Fig. 5 . The functional circuit consists of two subcircuits: the information symbol generator (ISG) and the check symbol generator (CSG). These two subcircuits do not share any logic. The checker consists of a check symbol complement generator (CSCG), which generates the bit-by-bit complement of the check symbol in the fault-free case, and an r-variable two-rail checker (TRC,). The structure of this checker for the separable code is the same as the structure frequently used for systematic codes 1231. The functions of the CSCG are derived as follows. For all the information symbols that the ISG of the functional circuit can produce under normal operation, the CSCG outputs correspond to the bit-by-bit complement of the expected check symbol. For the information symbols that should not normally occur, all the CSCG outputs are assigned 0's. It is easy to specify such an input/output mapping in the ESPRESSO [39] format. Since the information symbols that do not normally occur map to all O's, they do not need to be specified. Since ISG and CSG are synthesized separately, no internal stuck-at fault in the functional circuit can create an error at the outputs of both ISG and CSG. Based on this + 886 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS. VOL TABLE VIII LIT~RAL-COLNTF FOR S t Q C F N l I A L CIKCLIII 5 EN(ODED InLIuts I 1 ; Functional Circuit < -- i Checkbits , - - - - - - - - - - - .- -. i Check Symbol Complement ; Generator 1 -hot 2-hot 3-hot 4-hot Checker ex I mark I planet scf 457 106 533 694 617 747 181 880 1227 - 367 78 463 64 1 155 768 992 TOTALLITERAL-COUNTS FOR ~ \ / r-variable Checker 21 953 1364 TABLE IX T H E SELF-CHtCKING CIRCUITS ENCODED W I T H SFPAR.4BI.E CODES $ ’ Twn-Rail outputs SFPARABLE CKT t A <; Wt TH CODkS Check Symbol Generator Complements of Checkbits 12, NO. 6. JUNE 1993 22 Fig. 5 . Self-checking circuit based on the separable code observation, it is easy to show that the checker is codedisjoint. The fact that sharing of logic is not allowed between ISG and CSG tends to increase the area of the functional circuit. However, since there may be fewer bits in the check symbol and the area of the checker may be less, the overall area for a self-checking circuit based on a separable code may be less than the area of a self-checking circuit based on a systematic code. The preceding approach can obviously be extended easily to sequential circuits, as before. In this extension, the ISG can be combined with the next state logic, but the CSG has to be synthesized separately. The results for some large FSM sequential benchmark machines are given in Table VIII. As before, the states of each machine are encoded with all possible rn-hot codes; the number given in the corresponding column is the literal-count for the combinational logic of the sequential circuit whose outputs are encoded with separable codes. The last column gives the literal-count of the separable code checker, which includes CSCG and TRC,.. Next, in Table IX we give the total literal-count of the self-checking circuit based on separable codes. This includes the literal-counts for the functional circuit (including both the combinational logic and latches), the separable code checker and the m-out-of-n code checker. The literal-counts for the duplicate-and-compare scheme are also listed as a reference. By looking at the results in Table VII, we find that the literal-counts for the self-checking circuits based on separable codes are better for mark1 and scf, and worse for ex1 and planet compared with thc literal-counts for self-checking circuits based on the Berger code, which is a systematic code. V. CONCLUSION In this paper we presented gate-level self-checking circuit, system, and checker synthesis methodologies. The overhead due to the proposed methodologies was evalu- CKT ex 1 mark I planet scf 1 -hot 1002 329 1414 2297 2-hot 3-hot 4-hot DUP 1079 319 1364 I846 1190 335 I460 2027 - 752 398 1768 2480 - 1522 2147 ated by synthesizing self-checking circuits for MCNC benchmarks. From the experimental results we found that, if the states of a sequential circuit are encoded with an rn-hot code, the size of the combinational logic usually increases as m increases. However, when the area of the whole selfchecking circuit is taken into account (including that of the latches and checkers), then the 2-hot code seems to give the best result in most cases. Second, since the normalized overhead of the combinational logic is close to 1 on the average (at least for the 2- and 3-hot codes), it means that one can expect the overhead introduced in the circuit to be close to [log, (k 1)1/k p , where k and p are the number of primary outputs and next state lines, respectively. Therefore, the overhead due to Berger encoding should, in general, decrease as the size of the circuit becomes larger. Self-checking systems are of great value for ensuring reliability of computations. Therefore, we hope that our results will help designers of such systems choose from among the different approaches. + + REFERENCES X . Castillo. S . K. McConnel. and D. P. Siewiorek, “Derivation and calibration of a transient error reliability model,” IEEE Truns. Cornpur., vol. C-31. pp. 658-671, July 1982. Y . Savaria, N . C . Rumin, I . F. Hayes, and V . K. Agarwal. “Softerror filtering: A solution to the reliability problem of future VLSI digital circuits.” IEEE Pro(.., vol. 74, no. 5 . pp. 669-683, May 1986. M. M . Yen, W . K. Fuchs, and I . A. Abraham, “Designing for concurrent error detection in VLSI: Application to a microprogram control unit,’’ IEEE J . Solid-Sruru Circuits, vol. SC-22, pp. 595-605, Aug. 1987. W . C. Carter and P. R. Schneider, “Design of dynamically checked computers.” in Proc. IFIP ’68, v o l . 2, Edinburgh, Scotland, Aug. 1968, pp. 878-883. D. A. Anderson and G . Metze, “Design of totally self-checking circuits for m-out-of-n codes,” IEEE Truns. Compur., vol. C-22, pp. 263-269, Mar. 1973. J . E . Smith and G . Metze, “Strongly fault secure logic networks,” IEEE Trans. G m p u r . , vol. C-27, pp. 491-499, June 1978. R . W . Cook. W . H . Sisson. T . F. Storey, and W . N . Toy, “Design AND WANG: SELF-CHECKING VLSI CIRCUITS 887 of a self-checking microprogram control,” IEEE Trans. Comput., vol. [33] G . Mago. “Monotone functions in sequential circuits,” IEEE Trans. C-22, pp. 255-262, Mar. 1973. Cornput., vol. C-22, pp. 928-933. Oct. 1973. D. K. Pradhan and J. J . Stiffler, “Error correcting codes and self1341 R. K. Brayton, R. Rudell. A. Sangiovanni-Vincentelli. and A. R. checking circuits in fault-tolerant computers,” IEEE Computer. vol. Wang. “MIS: A multiple-level logic optimization program,” IEEE 13, pp. 27-37, Mar. 1980. Trans. Computer-Aided Design, vol. CAD-6, pp. 1062-1081, Nov. G . P . Mak, J. A. Abraham, and E . S . Davidson, “The design of 1987. PLAs with concurrent error detection,” in Proc. Int. Symp. F ~ ~ u l t - 1351 G . De Micheli, R. K. Brayton, and A. Sangiovanni-Vincentelli. Tolerant Cornput., Santa Monica, CA. June 1982, pp. 303-310. ”Optimal state assignment for finite state machines,” IEEE Trans. D. K. Pradhan, “A new class of error correctingidetecting codes for Computer-Aided Design, vol. CAD-4, pp. 269-284, July 1985. fault tolerant applications.” IEEE Trans. Cornput.. vol. C-29, pp. 1361 S . Devadas, H-K. T. Ma, A. R. Newton, and A. Sangiovanni-Vin471-481, June 1980. centelli, “MUSTANG: State assignment of finite state machines tarH . Dong, “Modified Berger codes for detection of unidirectional ergeting multi-level logic implementations. ’’ IEEE Trans. Computerrors,” in Proc. Int. Symp. Fdr-Toleranr Cornput.. Santa Monica, Aided Design, vol. 7 , pp. 1290-1300, Dec. 1988. CA, June 1982, pp. 317-320. 1371 H-K. T . Ma, S . Devadas. A. R. Newton, and A. Sangiovanni-VinB. Bose and D. J. Lin. “Systematic unidirectional error-detecting centelli, “Test generation for sequential circuits,” IEEE Trans. Corncodes,” IEEE Trans. Cornput., vol. C-34, pp. 1026-1032. Nov. purer-Aided Design, vol. 7, pp. 1081-1093, Oct. 1988. 1985. 1381 X . Due, G . Hachtel, B. Lin and A. R. Newton, “MUSE: A multiN. K. Jha and M. B. Vora. “A systematic code for detecting t-unilevel symbolic encoding algorithm for state assignment,” IEEE Trans. directional errors,” in Proc. Int. Symp. Fault-Tolerunt Comput., Computer-Aided Design, vol. 10, pp. 28-38, Jan. 1991. Pittsburgh, PA, June 1987, pp. 96-101. 1391 .~ R . K . Bravton. C . McMullen. G . D . Hachtel. and A. Saneiovanni[I41 B. Bose, “Burst unidirectional error detecting codes,” IEEE Trans. Vincentelli, Logic Minimization Algorithms for VLSI Synthesis. Comput.. vol. C-35, pp. 350-353, Apr. 1986. Kluwer Academic Publishers, 1984. [IS] M. Blaum, “On systematic burst unidirectional error detecting 1401 S . J . Piestrak, “Design of fast self-testing checkers for a class of codes,” IEEE Trans. Cornput., vol. (2-37, pp. 453-4.57, Apr. 1988. Berger codes,” IEEE Trans. Cornnut.. vol. C-36. .DD. . 629-634. Mav N. K. Jha, “Separable codes for detecting unidirectional errors.” 1987. IEEE Trans. Computer-Aided Design, vol. CAD-8, pp. 57 1-574. May 1989. J . E. Smith and G . Metze, “The design of totally self-checking conibinational circuits,” in Proc. I n t . Syrnp. Fault-Tolerant Cornput., Los Angeles, CA, June 1977, pp. 130-134. Niraj K. Jha (S’85-M’86) received the B.Tech. M. Diaz, “Design of totally self-checking and fail safe sequential degree in electronics and electrical communicamachines,” in Proc. Int. Syrnp. Fault-Tolerant Cornput.. Urbana, IL. tion engineering from the Indian Institute of TechJune 1974, pp. 3-9 - 3-24. nology, Kharagpur, India in 1981, the M.S. deF. Ozguner, “Design of totally self-checking asynchronous aequengree in electrical engineering from S.U.N.Y. at tial machines,” in Proc. Int. Symp. Fault-Tolerant Comput., June Stony Brook, NY, in 1982, and the Ph.D. degree 1977,pp. 124-129. in electrical engineering from university of IlliI . Viaud and R. David, “Sequentially self-checking circuits.” in nois, Urbana, IL, in 1985. Proc. Int. Symp. Fault-Tolerant Cornput.. June 1980. pp. 263-268. He is currently an Associate Professor of ElecJ. E. Smith and P. Lam, “A theory of totally self-checking system trical Engineering at Princeton University. From design,” IEEE Trans. Cornput., vol. C-32. pp. 831-844, Sept. 1983. 1985 to 1987 he was an Assistant Professor of J. M. Berger, “A note on error detecting codes for asymmetric chanElectrical Engineering and Computer Science at the University of Michinels,” Information & Control, vol. 4. pp. 68-73. Mar. 1961. gan, Ann Arbor. He has served as the Program Chairman of the 1992 M. Ashjaee and J. M. Reddy, “On totally self-checking checkers for Workshop on Fault-Tolerant Parallel and Distributed Systems. He has also separable codes,” IEEE Trans. Cornput., vol. C-26. pp. 737-744, served on the program committees for the IEEE International Conference Aug. 1977. on Computer Design in 1988. 1989, and 1990, for the IEEE International M. A. Marouf and A. D. Friedman, “Design of self-checking checkSymposium on Fault-Tolerant Computing in 1990 and 1991, and the Iners for Berger codes,” in Proc. Int. S y n p . Fault-Tolerant Cornput.. ternational Conference on VLSI Design in 1993. He has co-authored a book June 1978, pp. 179-184. titled Testing and Reliable Design of CMOS Circuits. He has authored or J.-C. Lo and S . Thanawastien, “The design of fast totally self-checkco-authored over 80 technical papers. His research interests include digital ing Berger code checkers based on Berger code partitioning,” in Proc. system testing. fault-tolerant computing, computer-aided design of inteInt. Symp. Fault-Tolerant Cornput., Tokyo, Japan, June 1988, pp. grated circuits, and parallel processing. 226-23 1. He is the recipient of the AT&T special-purpose grant award and the J. E. Smith, “On separable unordered codes.” IEEE Truns. CotnNEC Preceptorship award. p u t . , vol. C-33, pp. 741-743, Aug. 1984. C. V. Frieman, “Optimal error detecting codes for completely asymmetric binary channels,” Information & Control, vol. 5 , pp. 64-71. Mar. 1962. S. M. Reddy. “A note on self-checking checkers.” IEEE Trans. Sying-Jyan Wang received the B.S. degree in Cornput., vol. C-23, pp. 1100-1102, Oct. 1974. electronics engineering from the National Taiwan M. A. Marouf and A. D. Friedman, “Efficient design of self-checkUniversity, Taiwan, China, in 1984, and the ing checker for m-out-of-n code.” IEEE Trans. Cornput., vol. C-27, Ph.D. degree in electrical engineering from pp. 482-490. June 1978. Princeton University in 1992. N. Gaitanis and C . Halatsis. A new design method for m-out-of-n From 1984 to 1986 he was an R.O.T.C. officer TSC checkers,” IEEE Trans. Cornput.. vol. C-32, pp. 273-283. Mar. in the Chinese Air Force, Taiwan. From 1986 to 1983. 1987 he was a teaching assistant in the DepartT. Nanya and Y. Tohma, “A 3-level realization of totally self-checkment of Electrical Engineering, National Taiwan ing checkers for rn-out-of-n codes,” in Proc. I n t . S w p . Fault-To/University. He is currently an Associate Professor erant Comput.. Milan, Italy, June 1983, pp. 173-176. of Institute of Information Science at National Y. Tamir and C. H. Sequin, “Design and application of self-testing Chung-Hsing University, His research interests include fault-tolerant comcomparators implemented with MOS PLA’s.” IEEE Trans. Cornput., puting, digital system testing, and communication architectures. vol. C-33, pp. 493-506, June 1984. ~~ “