On the Design of Fast, Easily Testable ALU’s

advertisement
220
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 2, APRIL 2000
high-order bits, where n is large, is much less than later bits. In general,
the probability of error in a particular bit will be
P =
0n :(Vhigh 0 Vlow )
2
LSB
: exp
0(n + 1):tb
On the Design of Fast, Easily Testable ALU’s
R. D. (Shawn) Blanton and John P. Hayes
:
This gives a probability of error in the most significant bit of less than
10047 for the 4-b bounded time asynchronous converter compared with
a figure of about 10020 for the synchronous converter. The design of
speed-independent control circuits for these alternatives is discussed
in [5]. Finally, asynchronous systems usually produce lower noise in
power supplies because there is no clock, and the timing signal transitions are uncorrelated, however, this noise may occur at undesired
times.
VI. CONCLUSIONS
We have shown that A–D converters with a fixed conversion time are
subject to large errors due to metastability. It is also clear that an asynchronous design which is able to average comparisons over a period
can be completely reliable and faster than a synchronous design, but its
conversion time may be infinite. Terminating the conversion process
at a fixed time and accepting the current value of the output register
whether metastability exists or not gives a probability of error less than
that of the synchronous converter with a 12.5% improvement in speed
in a 4-b converter and more in 8- or 12-b converters. The bounded time
asynchronous converter is also much less likely to give large errors in
its output data than the conventional synchronous system.
REFERENCES
[1] G. R. Couranz and D. F. Wann, “The theoretical and experimental behavior of synchronizers operating in the metastable region,” IEEE Trans.
Comput., vol. C-24, pp. 604–616, June 1975.
[2] C. L. Seitz, “System timing,” in Introduction to VLSI Systems. Reading, MA: Addison-Wesley, 1979, ch. 7, pp. 218–262.
[3] H. J. M. Veendrick, “The behavior of flip-flops used as synchronizers
and prediction of their failure rate,” IEEE J. Solid-State Circuits, vol.
15, pp. 169–176, April 1980.
[4] C. W. Mangelsdorf, “A 400-MHz input flash converter with error correction,” IEEE J. Solid-State Circuits, vol. 25, pp. 184–191, Feb. 1990.
[5] D. J. Kinniment, B. Gao, A. V. Yakovlev, and F. Xia, “Toward asynchronous A–D conversion,” in Proc. ASYNC98, San Diego, CA, Mar.
30–Apr. 2 1998, pp. 206–215.
Abstract—A design methodology for implementing fast, easily testable
arithmetic-logic units (ALU’s) is presented. Here, we describe a set of fast
adder designs, which are testable with a test set that has either ( ) comis the input
plexity (Lin-testable) or (1) complexity (C-testable), where
operand size of the ALU. The various levels of testability are achieved by
exploiting some inherent properties of carry-lookahead addition. The Lintestable and C-testable ALU designs require only one extra input, regardless of the size of the ALU. The area overhead for a high-speed 64-bit Lintestable ALU is only 0.5%.
Index Terms—ALU design, functional faults, regular circuits, testability.
I. INTRODUCTION
Prior work has shown that an array structure improves an arithmetic
logic unit’s (ALU's) testing properties [1]. However, the array-type
ALU has the major disadvantage that its worst case delay is linearly
proportional to N . Carry-lookahead circuitry changes the worst case
delay from being linear in N to logarithmic in N but increases the hardware cost. Another, less obvious cost is the loss of C-testability [2], that
is, the ALU is no longer testable with a constant number of tests independent of it size [3]. Moreover, the adder is found to be untestable,
that is, some faults are undetectable, when a more general functional
fault model is considered [4], [5]. Here, we present a design methodology for fast, easily testable ALU’s that includes fast carry-lookahead
techniques.
The testing properties of one- and two-dimensional array circuits has
been studied extensively in [1], [2], [6], and [7]–[11]. However, the approach used in prior work is not applicable to the tree circuits found
in ALU’s that utilize carry-lookahead adders. By exploiting some inherent don't care conditions of carry-lookahead addition, we show how
the requirements of [5] can be satisfied in order to improve the testability of the tree circuits found in carry-lookahead adders. Using this
enhanced testability, we derive two different ALU designs that trade
off area and testability. The first design requires very little area overhead and utilizes only one extra test input, but the complexity of the
resulting test set is directly proportional to the ALU's operand size N .
The second ALU design also uses only one extra test input. It has more
overhead than the previous design but a test set whose size is fixed and
independent of N .
The remaining parts of this paper are organized as follows. Section II
develops scalable ALU designs that utilize carry-lookahead addition. In
Section III, we motivate and define the functional fault model adopted
and analyze ALU testability under this model. Section IV presents the
two testable ALU designs. In Section V, 8-, 16-, 32-, and 64-bit versions of the two ALU designs are synthesized and fault simulated in
order to make testability and area comparisons. Section VI summarizes
our results and presents conclusions.
Manuscript received September 19, 1997; revised March 20, 1998. This work
was supported by the National Science Foundation under Grant MIP-9200526
R. D. Blanton is with the Electrical and Computer Department, CarnegieMellon University, Pittsburgh, PA 15213-3890 USA.
J. P. Hayes is with the Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI 48109-2122 USA.
Publisher Item Identifier S 1063-8210(00)00753-8.
1063–8210/00$10.00 © 2000 IEEE
© 2000 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution
to servers or lists, or reuse of any copyrighted component of this work in other works.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 2, APRIL 2000
221
II. ALU DESIGN
Fig. 1 shows the starting point in our analysis; an 8-bit tree ALU that
operates on 8-bit operands A0 : A7 and B0 : B7 , and a carry-in bit C0 .
There are six module types within the ALU: gp, GP, C, FA, SM, LM,
and Mux. Each LM module executes various logical operations on a
pair of operand bits Ai and Bi and is controlled by the values applied to
the control input bus Q. Arithmetic operations are performed using the
modules gp, GP, C, FA, and SM. The gp module produces a pair of bit
generate/propagate signals Gi:i Pi:i from Ai and Bi . The GP module
takes a pair of group generate/propagate signals Gi:j Pi:j and Gk:l Pk:l
to produce a new pair of group generate/propagate signals Gi:l Pi:l .
The C (carry) module uses Gi:l Pi:l and a carry input Ci to produce
a carry output signal Cl+1 with which the FA and the simplified sum
(SM) modules compute sum values, where i < j < k < l. An ALU
operation is executed by applying values both to the data inputs (A0 :
and Q). Logic and
A7 , B0 : B7 , and C0 ) and the control inputs (A=L
arithmetic operations are executed simultaneously, but the final output
connected to all Mux modules.
is controlled by the select line A=L
The execution time of an arithmetic operation in a tree ALU is much
reduced when compared to an array ALU because the number of logic
levels through which a carry value must propagate has been reduced
from linear to logarithmic in N . The circuit complexity of the tree ALU
has increased, but the ALU of Fig. 1 is still regular. The 8-bit ALU of
Fig. 1 and, in fact, any N -bit ALU constructed in a similar fashion
can be viewed as a composite of tree and one-dimensional array circuits: the eight FA modules form four one-dimensional arrays, each
of which is a small ripple-carry adder. The LM and gp modules form
two separate one-dimensional arrays, each of which has no intermodule
signals. Although not immediately apparent, the C modules are overlapping one-dimensional arrays as well. The GP modules of the adder
form two convergent tree circuits [12] of one and two levels. The regularity of Fig. 1 allows it to be easily scaled. It should be noted that
the testability analysis presented in the next sections applies to other
carry-lookahead structures. For example, the use of more carry modules in conjunction with smaller ripple-carry adder arrays does not affect our design methodology. This fact allows traditional speed/area
tradeoffs to be made without affecting testability.
III. ALU TESTABILITY
The widely used single stuck-line (SSL) fault model [13] has been
shown to be inadequate for the dominant circuit technology CMOS
[14]. We have adopted the cell fault model [15], which allows the function of any single module in the circuit to change to any other function
of the same number of inputs. All modules are assumed to be combinational, and the function of a faulty module is assumed to remain
combinational. A fault fj can change the response to a module input
pattern ipj from oj to o0j ; we denote this change by ipj ! (oj ; oj0 ) and
refer to it as an input pattern (IP) fault [4]. The pair of distinct output
values (oj ; oj0 ) denoting the good and faulty values, respectively, is the
error corresponding to the IP fault. The cell fault model allows a single
module to be affected by one or more IP faults.
A circuit is testable under the cell fault model if and only if every
circuit module can be tested for all of its possible IP faults. An ALU circuit that is testable is linearly testable (Lin-testable) if it has a complete
test set whose size is proportional to N . A circuit is constant-testable
(C-testable) [2] if it is testable with a test set whose size remains fixed
at some constant K regardless of the circuit's size.
It can be easily shown that an N -bit array ALU is testable with
4K1 + 8 test patterns [16], where K1 is the number of logic operations implemented by the LM modules. The 4K1 tests completely test
Fig. 1.
An 8-bit tree ALU utilizing a carry-lookahead adder.
the LM modules, while the remaining eight tests cover the FA modules. The combination of the two subsets of tests also completely test
the Mux modules.
Much of the tree ALU's circuitry is easy to test. Like the array ALU,
the LM modules of the tree ALU are C-testable with only 4K1 test
patterns. The four ripple-carry adders can be tested with the same eight
C-tests used for an array ALU. This is true since the C-tests produce the
same carry values independent of the adder's structure. Also, the Mux
modules can be easily tested when the LM, SM, and FA modules are
tested. The testing difficulty associated with the tree ALU arises from
the carry-lookahead circuitry. Although testable for all SSL faults [3],
the carry-lookahead circuitry (the gp, C, and GP modules) is not fully
testable [5], [16]. For example, the IP fault 10 10 ! (10,11) in GP is
untestable. Note that untestable IP faults do not necessarily equate to a
reduction in circuitry as is the case for SSL faults [4]. The untestable
IP faults identified here do not lead to a reduced implementation.
IV. IMPROVED ALU TESTABILITY
It turns out that we can easily modify the carry-lookahead circuitry
so that it becomes completely Lin-testable for all IP faults. We solve
the problems described in [16] and [5] for the gp and C modules by
connecting a single control line Z1 to all the gp modules and by slightly
altering the function of all the C modules. The modified gp functions
are Gi = Z10 1 (Ai 1 Bi ) + Z1 1 Ai and Pi = Z10 1 (Ai 8 Bi ) + Z1 1 Bi ,
and the modified C function is Cj +1 = Gi:j 1 Pi0:j + Pi:j 1 Ci . With
Z1 = 0, the gp and C modules implement their original functions.
[This is the case for the C modules since the input (Gi:j Pi:j ) = (11) is
never applied when Z1 = 0.] The modified ALU becomes Lin-testable
with at most 4K1 + 16 + K2 N tests, where K1 is the number of logic
operations of the LM module and K2 < 2 is a second constant.
Our C-testable ALU design is based on a GP function described in
[5] and [12]. On including the altered C and gp functions described
above, this C-testable design is completely testable for all IP faults with
137+4K1 test patterns, independent of its operand size N .
222
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 2, APRIL 2000
TABLE I
GATE OVERHEAD, FAULT NUMBERS, AND FAULT COVERAGE OF VARIOUS ALU DESIGNS
V. ALU IMPLEMENTATION
VI. CONCLUSION
To validate our designs, we have generated 8-, 16-, 32-, and 64-bit
Verilog structural models of the basic, unmodified ALU, a Lin-testable
ALU, and a C-testable ALU. The implementation and test characteristics of the three ALU designs are shown in Table I. Section 1 of Table I
gives the statistics of gate-level implementations synthesized from the
Verilog models by the commercial synthesis tool Synplify. Here, the
number of gates varies for each individual ALU size depending upon
the level of testability indicated. The numbers in parentheses give the
gate-count overhead incurred for the testable implementations. For example, the 32-bit Lin-testable ALU requires 6.3% more gates than the
unmodified ALU. For the 64-bit ALU’s, the gate-level overhead is quite
reasonable—less than 16%— for the testable implementations.
Sections 2 and 3 of Table I list the total number of IP and SSL faults
(after equivalent fault collapsing), respectively. Test patterns targeting
the IP faults (we call these IP tests) were manually generated for all
12 ALU designs, and SSL test patterns were automatically generated
using the ATPG tool Atalanta. The IP and SSL patterns were simulated
for all possible IP and SSL faults using Cadence's SSL fault simulator
Verifault and the IP fault analysis technique from [4]. Section 4 of Table
I lists the numbers of SSL test patterns generated and their coverage of
SSL faults. The entries of less than 100% SSL coverage result from
redundancies introduced by Synplify. Section 5 of Table I lists similar
statistics for IP faults.
The final sections (6 and 7) of the table show how well the SSL
tests detect IP faults and the IP tests detect SSL faults, respectively.
As expected, the IP tests cover all the detectable SSL faults; the SSL
fault coverage figures of Sections 4 and 7 are identical. However, the
coverage of IP faults by SSL tests is quite poor, never reaching 90% for
any ALU implementation. For the 64-bit implementations, IP coverage
slightly exceeds 70% for the C-testable design.
In general, the number of IP tests is much higher than the number
of SSL tests except for the C-testable design. This is the case because
the IP tests for each ALU subcircuit were generated manually without
fault dropping. A better approach would have been to generate IP tests
only for the IP faults not detected by SSL tests, but this is difficult to
do manually. In spite of this, the C-testable designs achieve 100% IP
fault coverage using fewer than 200 IP tests.
The problem of making a fast ALU based on carry-lookahead easily
testable has been investigated. Its testability has been analyzed for a
fault model that requires functional verification of all the module types
in the ALU. The availability of don't cares in one module type (gp)
was exploited to make the ALU easily testable. Methods for designing
Lin-testable and C-testable ALU’s were presented. The C-testable design is more “testable” than the Lin-testable design at the cost of more
gate overhead, which illustrates an interesting design tradeoff. Both the
Lin-testable and C-testable designs require only one extra test input, independent of the size of the ALU. In addition, we have also shown that
we can maintain the tree-like structure of the ALU to preserve its desirable layout and scaling properties.
It should be noted that the design methodology presented in this
work is also applicable to general n-ary carry-lookahead adder trees.
For example, a high-speed, 64-bit, carry-lookahead/carry-select adder
is approximately twice as fast as the 8-bit design of Fig. 1. The speed
increase is due to a couple of factors. First, we use quad-input GP modules to quickly compute the carry signals. Second, we simultaneously
compute 64-bit conditional sums using ripple-carry adders composed
of 2-bit add cells instead of 1-bit full adders. The final sum is then multiplexed to the output under the control of the generated carry signals.
This design is made Lin-testable for all IP faults by altering the gp and
C modules exactly as explained in Section IV using a gate overhead of
only 0.5%.
REFERENCES
[1] T. Sridhar and J. P. Hayes, “Design of easily testable bit-sliced systems,”
IEEE Trans. Comput., vol. C-30, pp. 842–854, Nov. 1981.
[2] A. D. Friedman, “Easily testable iterative systems,” IEEE Trans.
Comput., vol. C-22, pp. 1061–1064, Dec. 1973.
[3] B. Becker, “Efficient testing of optimal time adders,” IEEE Trans.
Comput., vol. 37, pp. 1113–1120, Sept. 1988.
[4] R. D. Blanton and J. P. Hayes, “Properties of the input pattern fault
model,” in Proc. 1997 Int. Conf. Computer Design, Oct. 1997, pp.
372–380.
[5] R. Blanton and J. P. Hayes, “Testability of onvergent tree circuits,” IEEE
Trans. Comput., vol. 45, pp. 950–963, Aug. 1996.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 2, APRIL 2000
[6] W. Cheng and J. H. Patel, “Concurrent error detection in iterative logic
arrays,” in Proc. 14th Int. Symp. Fault-Tolerant Computing, 1984, pp.
10–15.
[7] F. J. Ferguson and J. P. Shen, “The design of easily testable VLSI array
multipliers,” IEEE Trans. Comput., vol. C-33, pp. 554–560, June 1984.
[8] H. Elhuni and L. Kinney, “Techniques for testing hex connected systolic
arrays,” in Proc. 1986 Int. Test Conf., Sept. 1986, pp. 1024–1033.
[9] J. H. Kim, “On the design of easily testable and reconfigurable systolic
arrays,” in Proc. Int. Conf. Systolic Arrays, 1988, pp. 1024–1033.
[10] C. Wu and P. Cappello, “Easily testable iterative logic arrays,” IEEE
Trans. Comput., vol. 39, pp. 640–652, May 1990.
[11] M. Schlag and F. J. Ferguson, “Detection of multiple faults in two-dimensional ILAs,” IEEE Trans. Comput., vol. 45, pp. 741–746, June
1996.
[12] R. D. Blanton, “Design and testing of regular circuits,” Ph.D. dissertation, Univ. Michigan, 1995.
[13] M. Abramovici, M. A. Breuer, and A. D. Friedman, Digital Systems
Testing and Testable Design. Piscataway, NJ: IEEE Press, 1990.
[14] W. Maly, “Realistic fault modeling for VLSI testing,” in Proc. Design
Automation Conf., June 1987, pp. 173–180.
[15] W. H. Kautz, “Testing for faults in cellular logic arrays,” in Proc. 8th
Symp. Switching Automata Theory, 1967, pp. 161–174.
[16] R. D. Blanton and J. P. Hayes, “Design of a fast, easily testable ALU,”
in Proc. 14th VLSI Test Symp., Apr. 1996, pp. 9–16.
Path Delay Fault Simulation of Sequential Circuits
Tapan J. Chakraborty, Vishwani D. Agrawal, and Michael L. Bushnell
Abstract—A differential algorithm for concurrent simulation of path
delay faults in sequential circuits is presented. The simulator analyzes
all three conditions, namely, initialization, signal transition propagation
through the path, and fault effect observation at a primary output for
vector pairs and considers the hazard states occurring between vectors.
The main contribution is in methods of propagating signals between time
frames. An optimistic method assumes that all nondestination flip-flops are
not affected by delays. The pessimistic method converts all nondestination
flip-flops with nonsteady values to the unknown state before these values
are propagated beyond the time frame in which a path is activated. A
13-valued algebra is shown to improve the efficiency of fault simulation.
Index Terms—Delay test, fault models, fault simulation, path delay faults,
sequential circuit timing analysis.
I. INTRODUCTION
The present test methodology that is based on the stuck-at fault
model allows no specific consideration of delay testing. It is sometimes
believed that an at-speed application of vectors can detect delay faults.
However, without a delay fault simulator this conjecture cannot be
verified.
Most of the literature on delay testing deals with combinational or
scan type of circuits. Only recently has the problem of delay testing in
sequential circuits received attention. This is evident from publications
Manuscript received January 20, 1998; revised November 30, 1998 and April
7, 1999. This paper is based on a presentation at the First IEEE Asian Test Symposium.
T. J. Chakraborty is with Bell Labs, Lucent Technologies, Princeton, NJ
08540 USA.
V. D. Agrawal is with Bell Labs, Lucent Technologies, Murray Hill, NJ 07974
USA.
M. L. Bushnell is with the Department of Electrical and Computer Engineering, Rutgers University, New Brunswick, NJ 08855 USA.
Publisher Item Identifier S 1063-8210(00)00762-9.
223
that discuss delay test application methods [1], test generation methods
[2], and fault models [3]. We present path delay fault simulation algorithms for two clock application schemes. In the first scheme, a rated
clock is assumed for all vectors and at each vector all three conditions,
namely, initialization, path activation, and fault effect propagation, are
evaluated. This mode of simulation is useful for identifying path delay
faults covered by the conventional tests consisting of the design verification and/or stuck-fault coverage vectors. These tests are applied at
constant clock rate (rated clock) to check the correct functionality of a
design. Separate tests can be generated for the delay faults not covered
by the conventional tests. Such delay tests may use variable clock rate
[3], and for that we use an alternative fault simulation algorithm.
II. BACKGROUND
Since the simulation models of this paper were first presented [4],
several papers have appeared. Bose et al. [5] have presented a simulation algorithm for sequential circuits using the rated clock. For robust
detection, their optimistic update rule allows the flip-flops to carry the
correct values across time frames, provided those values are produced
by robustly activated paths. Pomeranz and Reddy [6] give similar algorithms. Hsu and Gupta [7] recognized a problem associated with the
rated-clock simulation. They observed that determination of a signal
value in a circuit with a possible delay fault requires that the signal
must remain steady for two or more time frames. They postprocess the
simulation result to ensure that condition. Pointing to a discrepancy
in the optimistic update rule, Heragu [8] has suggested that flip-flop
values be computed two ways: one for fault effect propagation in the
next time frame and the other for their states in the subsequent time
frames.
Recent papers describe two techniques of path delay testing, namely,
the variable-clock method [9] and the rated-clock method [10]. The
rated-clock method uses the normal operating mode of the circuit where
all vectors are applied and flip-flops are clocked at the rated speed. A
problem arises with this method when the rated-clock speed exceeds
the capability of the test equipment [11]. Besides, the test generation
for this mode is highly complex [12]. In the variable-clock mode, vector
application and flip-flop clocking are done at a slower speed, with the
exception of one path activation vector applied at the rated speed. Test
generation is simplified since the slow clock allows the faulty circuit to
behave like a fault-free circuit. The disadvantage, however, is that the
test application time becomes large. For example, to test for all delay
faults activated by a set of 100 vectors, we must apply the vector set
100 times, with the rated clock used for a different single vector in
each application. Even though the variable clock is an artificial mode
of testing, it detects all delay faults detectable in the rated-clock mode
[13].
III. PATH DELAY TESTING AND FAULT MODELS
Path delay tests must detect any excessive propagation delays in the
combinational paths embedded in a sequential circuit. In general, the
test sequence for a path delay fault consists of the following steps: 1)
initialization; 2) path activation; and 3) fault effect propagation. The
path delay testing method is illustrated using the iterative logic array
model in Fig. 1. Each copy of the circuit represents the combinational
behavior of the circuit during one time frame. A time frame is one clock
period. Thus, Ti is the copy of the combinational logic with signals
corresponding to the ith input vector. The top inputs PI and the outputs
PO at the bottom represent the primary inputs and outputs at each time
frame. The blocks shown as 1, 2, and 3 are the flip-flops that transfer
data from one time frame to the next under the control of a clock signal.
1063–8210/00$10.00 © 2000 IEEE
Download