220 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 2, APRIL 2000 high-order bits, where n is large, is much less than later bits. In general, the probability of error in a particular bit will be P = 0n :(Vhigh 0 Vlow ) 2 LSB : exp 0(n + 1):tb On the Design of Fast, Easily Testable ALU’s R. D. (Shawn) Blanton and John P. Hayes : This gives a probability of error in the most significant bit of less than 10047 for the 4-b bounded time asynchronous converter compared with a figure of about 10020 for the synchronous converter. The design of speed-independent control circuits for these alternatives is discussed in [5]. Finally, asynchronous systems usually produce lower noise in power supplies because there is no clock, and the timing signal transitions are uncorrelated, however, this noise may occur at undesired times. VI. CONCLUSIONS We have shown that A–D converters with a fixed conversion time are subject to large errors due to metastability. It is also clear that an asynchronous design which is able to average comparisons over a period can be completely reliable and faster than a synchronous design, but its conversion time may be infinite. Terminating the conversion process at a fixed time and accepting the current value of the output register whether metastability exists or not gives a probability of error less than that of the synchronous converter with a 12.5% improvement in speed in a 4-b converter and more in 8- or 12-b converters. The bounded time asynchronous converter is also much less likely to give large errors in its output data than the conventional synchronous system. REFERENCES [1] G. R. Couranz and D. F. Wann, “The theoretical and experimental behavior of synchronizers operating in the metastable region,” IEEE Trans. Comput., vol. C-24, pp. 604–616, June 1975. [2] C. L. Seitz, “System timing,” in Introduction to VLSI Systems. Reading, MA: Addison-Wesley, 1979, ch. 7, pp. 218–262. [3] H. J. M. Veendrick, “The behavior of flip-flops used as synchronizers and prediction of their failure rate,” IEEE J. Solid-State Circuits, vol. 15, pp. 169–176, April 1980. [4] C. W. Mangelsdorf, “A 400-MHz input flash converter with error correction,” IEEE J. Solid-State Circuits, vol. 25, pp. 184–191, Feb. 1990. [5] D. J. Kinniment, B. Gao, A. V. Yakovlev, and F. Xia, “Toward asynchronous A–D conversion,” in Proc. ASYNC98, San Diego, CA, Mar. 30–Apr. 2 1998, pp. 206–215. Abstract—A design methodology for implementing fast, easily testable arithmetic-logic units (ALU’s) is presented. Here, we describe a set of fast adder designs, which are testable with a test set that has either ( ) comis the input plexity (Lin-testable) or (1) complexity (C-testable), where operand size of the ALU. The various levels of testability are achieved by exploiting some inherent properties of carry-lookahead addition. The Lintestable and C-testable ALU designs require only one extra input, regardless of the size of the ALU. The area overhead for a high-speed 64-bit Lintestable ALU is only 0.5%. Index Terms—ALU design, functional faults, regular circuits, testability. I. INTRODUCTION Prior work has shown that an array structure improves an arithmetic logic unit’s (ALU's) testing properties [1]. However, the array-type ALU has the major disadvantage that its worst case delay is linearly proportional to N . Carry-lookahead circuitry changes the worst case delay from being linear in N to logarithmic in N but increases the hardware cost. Another, less obvious cost is the loss of C-testability [2], that is, the ALU is no longer testable with a constant number of tests independent of it size [3]. Moreover, the adder is found to be untestable, that is, some faults are undetectable, when a more general functional fault model is considered [4], [5]. Here, we present a design methodology for fast, easily testable ALU’s that includes fast carry-lookahead techniques. The testing properties of one- and two-dimensional array circuits has been studied extensively in [1], [2], [6], and [7]–[11]. However, the approach used in prior work is not applicable to the tree circuits found in ALU’s that utilize carry-lookahead adders. By exploiting some inherent don't care conditions of carry-lookahead addition, we show how the requirements of [5] can be satisfied in order to improve the testability of the tree circuits found in carry-lookahead adders. Using this enhanced testability, we derive two different ALU designs that trade off area and testability. The first design requires very little area overhead and utilizes only one extra test input, but the complexity of the resulting test set is directly proportional to the ALU's operand size N . The second ALU design also uses only one extra test input. It has more overhead than the previous design but a test set whose size is fixed and independent of N . The remaining parts of this paper are organized as follows. Section II develops scalable ALU designs that utilize carry-lookahead addition. In Section III, we motivate and define the functional fault model adopted and analyze ALU testability under this model. Section IV presents the two testable ALU designs. In Section V, 8-, 16-, 32-, and 64-bit versions of the two ALU designs are synthesized and fault simulated in order to make testability and area comparisons. Section VI summarizes our results and presents conclusions. Manuscript received September 19, 1997; revised March 20, 1998. This work was supported by the National Science Foundation under Grant MIP-9200526 R. D. Blanton is with the Electrical and Computer Department, CarnegieMellon University, Pittsburgh, PA 15213-3890 USA. J. P. Hayes is with the Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI 48109-2122 USA. Publisher Item Identifier S 1063-8210(00)00753-8. 1063–8210/00$10.00 © 2000 IEEE © 2000 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 2, APRIL 2000 221 II. ALU DESIGN Fig. 1 shows the starting point in our analysis; an 8-bit tree ALU that operates on 8-bit operands A0 : A7 and B0 : B7 , and a carry-in bit C0 . There are six module types within the ALU: gp, GP, C, FA, SM, LM, and Mux. Each LM module executes various logical operations on a pair of operand bits Ai and Bi and is controlled by the values applied to the control input bus Q. Arithmetic operations are performed using the modules gp, GP, C, FA, and SM. The gp module produces a pair of bit generate/propagate signals Gi:i Pi:i from Ai and Bi . The GP module takes a pair of group generate/propagate signals Gi:j Pi:j and Gk:l Pk:l to produce a new pair of group generate/propagate signals Gi:l Pi:l . The C (carry) module uses Gi:l Pi:l and a carry input Ci to produce a carry output signal Cl+1 with which the FA and the simplified sum (SM) modules compute sum values, where i < j < k < l. An ALU operation is executed by applying values both to the data inputs (A0 : and Q). Logic and A7 , B0 : B7 , and C0 ) and the control inputs (A=L arithmetic operations are executed simultaneously, but the final output connected to all Mux modules. is controlled by the select line A=L The execution time of an arithmetic operation in a tree ALU is much reduced when compared to an array ALU because the number of logic levels through which a carry value must propagate has been reduced from linear to logarithmic in N . The circuit complexity of the tree ALU has increased, but the ALU of Fig. 1 is still regular. The 8-bit ALU of Fig. 1 and, in fact, any N -bit ALU constructed in a similar fashion can be viewed as a composite of tree and one-dimensional array circuits: the eight FA modules form four one-dimensional arrays, each of which is a small ripple-carry adder. The LM and gp modules form two separate one-dimensional arrays, each of which has no intermodule signals. Although not immediately apparent, the C modules are overlapping one-dimensional arrays as well. The GP modules of the adder form two convergent tree circuits [12] of one and two levels. The regularity of Fig. 1 allows it to be easily scaled. It should be noted that the testability analysis presented in the next sections applies to other carry-lookahead structures. For example, the use of more carry modules in conjunction with smaller ripple-carry adder arrays does not affect our design methodology. This fact allows traditional speed/area tradeoffs to be made without affecting testability. III. ALU TESTABILITY The widely used single stuck-line (SSL) fault model [13] has been shown to be inadequate for the dominant circuit technology CMOS [14]. We have adopted the cell fault model [15], which allows the function of any single module in the circuit to change to any other function of the same number of inputs. All modules are assumed to be combinational, and the function of a faulty module is assumed to remain combinational. A fault fj can change the response to a module input pattern ipj from oj to o0j ; we denote this change by ipj ! (oj ; oj0 ) and refer to it as an input pattern (IP) fault [4]. The pair of distinct output values (oj ; oj0 ) denoting the good and faulty values, respectively, is the error corresponding to the IP fault. The cell fault model allows a single module to be affected by one or more IP faults. A circuit is testable under the cell fault model if and only if every circuit module can be tested for all of its possible IP faults. An ALU circuit that is testable is linearly testable (Lin-testable) if it has a complete test set whose size is proportional to N . A circuit is constant-testable (C-testable) [2] if it is testable with a test set whose size remains fixed at some constant K regardless of the circuit's size. It can be easily shown that an N -bit array ALU is testable with 4K1 + 8 test patterns [16], where K1 is the number of logic operations implemented by the LM modules. The 4K1 tests completely test Fig. 1. An 8-bit tree ALU utilizing a carry-lookahead adder. the LM modules, while the remaining eight tests cover the FA modules. The combination of the two subsets of tests also completely test the Mux modules. Much of the tree ALU's circuitry is easy to test. Like the array ALU, the LM modules of the tree ALU are C-testable with only 4K1 test patterns. The four ripple-carry adders can be tested with the same eight C-tests used for an array ALU. This is true since the C-tests produce the same carry values independent of the adder's structure. Also, the Mux modules can be easily tested when the LM, SM, and FA modules are tested. The testing difficulty associated with the tree ALU arises from the carry-lookahead circuitry. Although testable for all SSL faults [3], the carry-lookahead circuitry (the gp, C, and GP modules) is not fully testable [5], [16]. For example, the IP fault 10 10 ! (10,11) in GP is untestable. Note that untestable IP faults do not necessarily equate to a reduction in circuitry as is the case for SSL faults [4]. The untestable IP faults identified here do not lead to a reduced implementation. IV. IMPROVED ALU TESTABILITY It turns out that we can easily modify the carry-lookahead circuitry so that it becomes completely Lin-testable for all IP faults. We solve the problems described in [16] and [5] for the gp and C modules by connecting a single control line Z1 to all the gp modules and by slightly altering the function of all the C modules. The modified gp functions are Gi = Z10 1 (Ai 1 Bi ) + Z1 1 Ai and Pi = Z10 1 (Ai 8 Bi ) + Z1 1 Bi , and the modified C function is Cj +1 = Gi:j 1 Pi0:j + Pi:j 1 Ci . With Z1 = 0, the gp and C modules implement their original functions. [This is the case for the C modules since the input (Gi:j Pi:j ) = (11) is never applied when Z1 = 0.] The modified ALU becomes Lin-testable with at most 4K1 + 16 + K2 N tests, where K1 is the number of logic operations of the LM module and K2 < 2 is a second constant. Our C-testable ALU design is based on a GP function described in [5] and [12]. On including the altered C and gp functions described above, this C-testable design is completely testable for all IP faults with 137+4K1 test patterns, independent of its operand size N . 222 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 2, APRIL 2000 TABLE I GATE OVERHEAD, FAULT NUMBERS, AND FAULT COVERAGE OF VARIOUS ALU DESIGNS V. ALU IMPLEMENTATION VI. CONCLUSION To validate our designs, we have generated 8-, 16-, 32-, and 64-bit Verilog structural models of the basic, unmodified ALU, a Lin-testable ALU, and a C-testable ALU. The implementation and test characteristics of the three ALU designs are shown in Table I. Section 1 of Table I gives the statistics of gate-level implementations synthesized from the Verilog models by the commercial synthesis tool Synplify. Here, the number of gates varies for each individual ALU size depending upon the level of testability indicated. The numbers in parentheses give the gate-count overhead incurred for the testable implementations. For example, the 32-bit Lin-testable ALU requires 6.3% more gates than the unmodified ALU. For the 64-bit ALU’s, the gate-level overhead is quite reasonable—less than 16%— for the testable implementations. Sections 2 and 3 of Table I list the total number of IP and SSL faults (after equivalent fault collapsing), respectively. Test patterns targeting the IP faults (we call these IP tests) were manually generated for all 12 ALU designs, and SSL test patterns were automatically generated using the ATPG tool Atalanta. The IP and SSL patterns were simulated for all possible IP and SSL faults using Cadence's SSL fault simulator Verifault and the IP fault analysis technique from [4]. Section 4 of Table I lists the numbers of SSL test patterns generated and their coverage of SSL faults. The entries of less than 100% SSL coverage result from redundancies introduced by Synplify. Section 5 of Table I lists similar statistics for IP faults. The final sections (6 and 7) of the table show how well the SSL tests detect IP faults and the IP tests detect SSL faults, respectively. As expected, the IP tests cover all the detectable SSL faults; the SSL fault coverage figures of Sections 4 and 7 are identical. However, the coverage of IP faults by SSL tests is quite poor, never reaching 90% for any ALU implementation. For the 64-bit implementations, IP coverage slightly exceeds 70% for the C-testable design. In general, the number of IP tests is much higher than the number of SSL tests except for the C-testable design. This is the case because the IP tests for each ALU subcircuit were generated manually without fault dropping. A better approach would have been to generate IP tests only for the IP faults not detected by SSL tests, but this is difficult to do manually. In spite of this, the C-testable designs achieve 100% IP fault coverage using fewer than 200 IP tests. The problem of making a fast ALU based on carry-lookahead easily testable has been investigated. Its testability has been analyzed for a fault model that requires functional verification of all the module types in the ALU. The availability of don't cares in one module type (gp) was exploited to make the ALU easily testable. Methods for designing Lin-testable and C-testable ALU’s were presented. The C-testable design is more “testable” than the Lin-testable design at the cost of more gate overhead, which illustrates an interesting design tradeoff. Both the Lin-testable and C-testable designs require only one extra test input, independent of the size of the ALU. In addition, we have also shown that we can maintain the tree-like structure of the ALU to preserve its desirable layout and scaling properties. It should be noted that the design methodology presented in this work is also applicable to general n-ary carry-lookahead adder trees. For example, a high-speed, 64-bit, carry-lookahead/carry-select adder is approximately twice as fast as the 8-bit design of Fig. 1. The speed increase is due to a couple of factors. First, we use quad-input GP modules to quickly compute the carry signals. Second, we simultaneously compute 64-bit conditional sums using ripple-carry adders composed of 2-bit add cells instead of 1-bit full adders. The final sum is then multiplexed to the output under the control of the generated carry signals. This design is made Lin-testable for all IP faults by altering the gp and C modules exactly as explained in Section IV using a gate overhead of only 0.5%. REFERENCES [1] T. Sridhar and J. P. Hayes, “Design of easily testable bit-sliced systems,” IEEE Trans. Comput., vol. C-30, pp. 842–854, Nov. 1981. [2] A. D. Friedman, “Easily testable iterative systems,” IEEE Trans. Comput., vol. C-22, pp. 1061–1064, Dec. 1973. [3] B. Becker, “Efficient testing of optimal time adders,” IEEE Trans. Comput., vol. 37, pp. 1113–1120, Sept. 1988. [4] R. D. Blanton and J. P. Hayes, “Properties of the input pattern fault model,” in Proc. 1997 Int. Conf. Computer Design, Oct. 1997, pp. 372–380. [5] R. Blanton and J. P. Hayes, “Testability of onvergent tree circuits,” IEEE Trans. Comput., vol. 45, pp. 950–963, Aug. 1996. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 2, APRIL 2000 [6] W. Cheng and J. H. Patel, “Concurrent error detection in iterative logic arrays,” in Proc. 14th Int. Symp. Fault-Tolerant Computing, 1984, pp. 10–15. [7] F. J. Ferguson and J. P. Shen, “The design of easily testable VLSI array multipliers,” IEEE Trans. Comput., vol. C-33, pp. 554–560, June 1984. [8] H. Elhuni and L. Kinney, “Techniques for testing hex connected systolic arrays,” in Proc. 1986 Int. Test Conf., Sept. 1986, pp. 1024–1033. [9] J. H. Kim, “On the design of easily testable and reconfigurable systolic arrays,” in Proc. Int. Conf. Systolic Arrays, 1988, pp. 1024–1033. [10] C. Wu and P. Cappello, “Easily testable iterative logic arrays,” IEEE Trans. Comput., vol. 39, pp. 640–652, May 1990. [11] M. Schlag and F. J. Ferguson, “Detection of multiple faults in two-dimensional ILAs,” IEEE Trans. Comput., vol. 45, pp. 741–746, June 1996. [12] R. D. Blanton, “Design and testing of regular circuits,” Ph.D. dissertation, Univ. Michigan, 1995. [13] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems Testing and Testable Design. Piscataway, NJ: IEEE Press, 1990. [14] W. Maly, “Realistic fault modeling for VLSI testing,” in Proc. Design Automation Conf., June 1987, pp. 173–180. [15] W. H. Kautz, “Testing for faults in cellular logic arrays,” in Proc. 8th Symp. Switching Automata Theory, 1967, pp. 161–174. [16] R. D. Blanton and J. P. Hayes, “Design of a fast, easily testable ALU,” in Proc. 14th VLSI Test Symp., Apr. 1996, pp. 9–16. Path Delay Fault Simulation of Sequential Circuits Tapan J. Chakraborty, Vishwani D. Agrawal, and Michael L. Bushnell Abstract—A differential algorithm for concurrent simulation of path delay faults in sequential circuits is presented. The simulator analyzes all three conditions, namely, initialization, signal transition propagation through the path, and fault effect observation at a primary output for vector pairs and considers the hazard states occurring between vectors. The main contribution is in methods of propagating signals between time frames. An optimistic method assumes that all nondestination flip-flops are not affected by delays. The pessimistic method converts all nondestination flip-flops with nonsteady values to the unknown state before these values are propagated beyond the time frame in which a path is activated. A 13-valued algebra is shown to improve the efficiency of fault simulation. Index Terms—Delay test, fault models, fault simulation, path delay faults, sequential circuit timing analysis. I. INTRODUCTION The present test methodology that is based on the stuck-at fault model allows no specific consideration of delay testing. It is sometimes believed that an at-speed application of vectors can detect delay faults. However, without a delay fault simulator this conjecture cannot be verified. Most of the literature on delay testing deals with combinational or scan type of circuits. Only recently has the problem of delay testing in sequential circuits received attention. This is evident from publications Manuscript received January 20, 1998; revised November 30, 1998 and April 7, 1999. This paper is based on a presentation at the First IEEE Asian Test Symposium. T. J. Chakraborty is with Bell Labs, Lucent Technologies, Princeton, NJ 08540 USA. V. D. Agrawal is with Bell Labs, Lucent Technologies, Murray Hill, NJ 07974 USA. M. L. Bushnell is with the Department of Electrical and Computer Engineering, Rutgers University, New Brunswick, NJ 08855 USA. Publisher Item Identifier S 1063-8210(00)00762-9. 223 that discuss delay test application methods [1], test generation methods [2], and fault models [3]. We present path delay fault simulation algorithms for two clock application schemes. In the first scheme, a rated clock is assumed for all vectors and at each vector all three conditions, namely, initialization, path activation, and fault effect propagation, are evaluated. This mode of simulation is useful for identifying path delay faults covered by the conventional tests consisting of the design verification and/or stuck-fault coverage vectors. These tests are applied at constant clock rate (rated clock) to check the correct functionality of a design. Separate tests can be generated for the delay faults not covered by the conventional tests. Such delay tests may use variable clock rate [3], and for that we use an alternative fault simulation algorithm. II. BACKGROUND Since the simulation models of this paper were first presented [4], several papers have appeared. Bose et al. [5] have presented a simulation algorithm for sequential circuits using the rated clock. For robust detection, their optimistic update rule allows the flip-flops to carry the correct values across time frames, provided those values are produced by robustly activated paths. Pomeranz and Reddy [6] give similar algorithms. Hsu and Gupta [7] recognized a problem associated with the rated-clock simulation. They observed that determination of a signal value in a circuit with a possible delay fault requires that the signal must remain steady for two or more time frames. They postprocess the simulation result to ensure that condition. Pointing to a discrepancy in the optimistic update rule, Heragu [8] has suggested that flip-flop values be computed two ways: one for fault effect propagation in the next time frame and the other for their states in the subsequent time frames. Recent papers describe two techniques of path delay testing, namely, the variable-clock method [9] and the rated-clock method [10]. The rated-clock method uses the normal operating mode of the circuit where all vectors are applied and flip-flops are clocked at the rated speed. A problem arises with this method when the rated-clock speed exceeds the capability of the test equipment [11]. Besides, the test generation for this mode is highly complex [12]. In the variable-clock mode, vector application and flip-flop clocking are done at a slower speed, with the exception of one path activation vector applied at the rated speed. Test generation is simplified since the slow clock allows the faulty circuit to behave like a fault-free circuit. The disadvantage, however, is that the test application time becomes large. For example, to test for all delay faults activated by a set of 100 vectors, we must apply the vector set 100 times, with the rated clock used for a different single vector in each application. Even though the variable clock is an artificial mode of testing, it detects all delay faults detectable in the rated-clock mode [13]. III. PATH DELAY TESTING AND FAULT MODELS Path delay tests must detect any excessive propagation delays in the combinational paths embedded in a sequential circuit. In general, the test sequence for a path delay fault consists of the following steps: 1) initialization; 2) path activation; and 3) fault effect propagation. The path delay testing method is illustrated using the iterative logic array model in Fig. 1. Each copy of the circuit represents the combinational behavior of the circuit during one time frame. A time frame is one clock period. Thus, Ti is the copy of the combinational logic with signals corresponding to the ith input vector. The top inputs PI and the outputs PO at the bottom represent the primary inputs and outputs at each time frame. The blocks shown as 1, 2, and 3 are the flip-flops that transfer data from one time frame to the next under the control of a clock signal. 1063–8210/00$10.00 © 2000 IEEE