Circuit Modeling and Fault Injection Approach to Predict SEU Rate

advertisement
Circuit Modeling and Fault Injection Approach to Predict SEU Rate and
MTTF in Complex Circuits1
Fabian Vargas, Alexandre Amory
Catholic University – PUCRS
Electrical Engineering Dept.
Av. Ipiranga, 6681. 90619-900 Porto Alegre, Brazil
vargas@computer.org
Abstract
This work presents a novel approach to predict the SEU rate and the mean time to failure (MTTF) for
complex circuits. When compared to traditional in-flux methods, the approach described herein does not
require laboratory experiments to characterize microelectronic devices for operation in radiation environments.
Therefore, due to the simplicity to be performed the proposed approach presents an intrinsic low-cost. Also,
because it is a fully analytical approach based on a bundle of computer programs, researchers and development
engineers need only a workstation to compute the failure (SEU) rate and estimate MTTF. We also present a
computation example to illustrate the proposed approach. This methodology is being automated through the
development of a CAD tool that performs circuit modeling, fault injection and simulation data analysis.
Keywords: Failure Rate; MTTF Estimation; VHDL Language; Transient-Fault Injection; Fault Simulation; SEU; Reliable
Complex Circuits.
1. Preliminary Considerations
By the year of 1975, the existence of Single-Event Upsets (SEUs) had just been discovered [1], even
though predictions of such a phenomena were made in 1962 [2]. Since 1975, the investigation of single particle
phenomena has progressed rapidly [3-5]. Extensive theoretical work has been performed to explain failure
mechanisms, and sophisticated test techniques and procedures have been developed to extrapolate the laboratory
data failure rates in realistic or worst case radiation environments like space, nuclear power plants, or
commercial flights operating in high altitudes (33,000 feet) [6-10].
Most single particle phenomena in electronic devices can be characterized by a critical charge related
to the circuit design, and a cross section related to the geometry of the sensitive area to cause upset, latchup, or
burnout. The critical charge Qc of a memory cell is defined as the greatest charge that can be deposited in its
sensitive node area before the cell be corrupted, that is, its logic state is inverted [8]. The cross section of a
device is just the total number of errors divided by the circuit incident particle fluency. This parameter is given
in cm2 and represents, in other words, the memory cell sensitive node area. The sensitive node area of a memory
element is represented by the interface between the reverse-biased n+ (resp. p+) drain depletion region and the psubstrate (resp. n-well), for the case of an n-well (reps. p-well) technology. This drain depletion region belongs
to off-transistors of memory cells [4,9].
SEU Characterization process of microelectronic devices has been directed toward measuring a cross
section versus LET (Linear Energy Transfer) by means of laboratory experiments. LET represents the amount of
energy that is transferred to silicon when an incident particle strikes the circuit surface and looses energy
through the ionization of the substrate. Therefore, the threshold or critical LET (LET c) of a circuit represents the
amount of energy deposited equal to the critical charge Qc for a memory cell. In other words, any incident
particle that deposits in a sensitive node an energy higher than the one represented by the LET c provokes an
upset on that node. Fig. 1 illustrates results for memory upsets in typical microcircuits [7,23]. Curve A shows
data for a simple RAM which has only one type of upset mechanism. In this case, there is a single value of
threshold Linear Energy Transfer (LET) and device error cross section. Curve B illustrates the results for a
microprocessor where there are several types of bistable circuits, each with its own threshold LET and sensitive
area. In this case, the data will appear as staircase with each step representing the addition of a new failure
mode. Characterization of such a type of device is very complex and requires several sets of tests at many values
of LET. The worst case parameters for such a device are characterized by a limiting cross section, L, and a
minimum threshold or critical LET (i.e., LET c), as shown in fig. 1. The LET c of a device, given in
1
This work is partially supported by CNPq and FAPERGS.
“MeV.cm2/mg”, determines the portion of a particles energy spectrum to which the device is sensitive. The
integrated flux value over this range times the cross section area of the circuit’s sensitive volume yields the
expected number of state changes in the circuit, or SEU rate, usually reported in “upset/bit.day”.
Fig. 1. Representative data for device error cross section versus particle LET for two types of microcircuits [7,23].
Laboratory experiments are typically performed using the in-flux test method [23]. The radiation source
is usually a high-energy particle accelerator, such as a cyclotron, which can be operated with a variety of ion
species. In practice, only one type of ion specie is used during the laboratory experiment due to the associated
high-cost necessary to change ion sources into the accelerator. The irradiations are performed with the device
under test in an evacuated chamber, with the device package lid removed. The test socket is mounted on a
platform which can be rotated so that the angle of incidence between the ion beam and chip surface can be
changed. The circuit is electrically exercised by a tester connected to the test socket through a set of cables and
special connectors to the vacuum chamber. Despite the fact this experimental approach provides very accurate
SEU rate predictions, it presents important drawbacks. The most important of them is the high cost associated,
since just two or three cyclotron hours may result in some thousands of tenths of dollars. In addition, the use of
this type of equipment requires the development of specific hardware (and software) interfaces, which takes
money and time during the design process. Finally, the parameter “time-to-market” is drastically affected
because the requirement of using this type of equipment implies the development of rigorous test sets, which
take long procedures to be validated before the device characterization step itself takes place. For detailed
information about the SEU test equipment and related proceedings, the reader should address references [9,23].
Fig. 2 compares the proposed approach with the in-flux test method, commonly used so far.
In [11,12] the authors present a tool (FT-PRO) that manipulates automatically a VHDL description by
appending fault tolerant functions [13-16]. These functions are based on information redundancy by means of
two types of coding techniques: a) Parity code (one bit per memory element) and b) Hamming code plus one
parity bit, to perform single error correction/double error detection per memory element [17]. At present, FTPRO is being modified to be incorporated into the design flow shown in fig. 2b (more precisely, in the “Circuit
Design” step), where configurable fault-tolerant functions have been appended in the “VHDL Design
Libraries” [11].
Therefore, the determination of the SEU rate for complex circuits, as microprocessors for instance, is a
very complex, time and money consuming step at the end of the design process. By attempting to minimize this
problem, the present work proposes a novel approach to predict the error (SEU) rate and the mean time to
failure (MTTF) for such a type of circuits. This is an analytical approach. When compared to traditional in-flux
methods, the approach described herein does not require laboratory experiments to characterize microelectronic
devices for operation in radiation environments. Therefore, due to its execution simplicity the proposed
approach presents an intrinsic low-cost. Also, because it is an analytical approach based on a bundle of
computer programs, researchers and development engineers need only a workstation to obtain the failure rate
and estimate MTTF at their own work site.
It is also important to mention that several expressive works with different degrees of success have also
been proposed in the literature to perform fault modeling, fault injection [18,19], and to automate the fault
simulation process as well [20-22]. The most important difference between the proposed approach and those
found in the literature is the fact the work proposed herein is the first to present not only fault injection
mechanisms adapted to the case of circuits modeled in VHDL language, but it also considers a fault modeling
strategy that really represents real radiation-induced transient faults (i.e., SEUs) in memory elements of complex
circuits. Additionally, the proposed work is being automated through the coupling with the FT-PRO Tool
methodology.
Fig. 2. Comparison between the design flows of devices for operation in radiation environments: (a) the traditional in-flux
method [11,12] and (b) the proposed approach.
2. Circuit Modeling and Fault Injection Approach
It is of common agreement the widespread use of high-level description languages to describe
hardware parts as software programs. Consequently, a transient fault that affects the hardware operation can be
considered as a fault affecting the software execution. In other words, a bit-flip fault affecting the hardware
operation (e.g., an SEU-induced fault in a memory element) can have an equivalent representation at the
software implementation level. In this section, we present the circuit modeling & fault-injection techniques we
have developed to produce transient faults in memory elements during VHDL fault-simulation. The fault model
assumed is not restricted to single faults, thus any combination of faults can occur in a memory element or
group of memory elements of the circuit.
The circuit modeling & fault injection strategy deals to prepare the VHDL code to run in a fault
simulation process. As can be seen in fig. 2, the starting point is a synthesizable VHDL description of the circuit
whose reliability with respect to transient faults in memory elements is to be estimated.
As the first step, the circuit modeling & fault injection instantiates an “Error Management Unit - EMU”
inside the architecture of the circuit VHDL main code. The goal of this unit is to control the whole fault
injection process during fault simulation. To do so, this unit:
a) reads data from an external file: randtime.txt (which was generated by the Srand Function, to be
detailed later), in order to obtain the time instants to inject faults in the circuit;
b) reads data from the external file: randtime.txt to get the initial seeds for the LFSR processes that
generate the register address and the bit position where faults will be injected. These LFSR processes are
instantiated as a Component into the architecture of the Error Management Unit, and will be detailed later in this
section. The LFSR processes are completely controlled by the EMU.
c) generates a simulation report file: result.txt, which contains information about the total number of
faults injected, the list of memory elements and bit positions affected by faults, and the number of faults injected
in each one of these elements.
The EMU has been implemented as a separate piece of code and then, instantiated as a Component
inside the architecture of the circuit VHDL main code [13,14], as can be seen in fig. 3. Of course, it must be
kept in mind that these modifications performed along with the circuit VHDL code have the unique purpose of
allowing fault simulation. Thus before synthesis, they are completely eliminated from the main VHDL code.
The main structures of the skeleton-based VHDL code can be detailed as follows:
- Lines 4 - 6 and 8 - 23 describe the Entity and the Architecture of the circuit_example , that will be
modeled to run in a fault simulation process.
- Lines 28 - 30 and 32 - 55 describe the Entity and the Architecture of the Error Management Unit, that
will control the fault simulation process and provide the user with a final simulation data report.
- In lines 11 - 13, the Component Error Management Unit is declared and then instantiated inside the
architecture of the circuit_example (lines 19 and 20).
- In lines 35 - 37, the Component LFSR is declared and then, instantiated inside the architecture of the
Error Management Unit as an address selector of the memory element that will be upset (lines 47 and 48). The
same LFSR is instantiated again in lines 51 and 52 as a selector of the bit position inside the memory element
that will be inverted.
- In lines 40 and 41, the files randtime.txt and result.txt are generated, on the computer hard disc. The
first file is a read-only one, and contains a list of time instants generated by the Srand Function, at the operating
system level. The EMU uses this list to control the time instants when faults will be injected during the fault
simulation process. During this process, the EMU writes data into the result.txt file. These data concern the
address of all registers and bit positions selected for fault injection during simulation, and the history of the
faults injected, per register address and bit position, that were detected and corrected, only detected, and not
detected. This information will be used later (as described in Section 3), in order to calculate the cross section
and the failure rate for the circuit on the validation.
Line
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Code Structure
library IEEE;
use IEEE.std_logic_1164.all;
Entity circuit_example is
port ( ...);
End circuit_example;
Architecture arch of circuit_example is
|
|
Component ErrorManagementUnit
port ( ...);
End Component;
|
|
Begin
|
|
EMU: ErrorManagementUnit
port map ( ... );
|
|
End arch;
--------------------------------------------------------------------------------------------------------library IEEE;
use IEEE.std_logic_1164.all
Entity ErrorManagementUnit is
port ( ... );
End ErrorManagementUnit;
Architecture arch_EMU of ErrorManagementUnit is
|
|
Component LFSR is
port ( ... );
End Component;
|
|
File RandomTimeFile : TEXT open READ_MODE is “randtime.txt”;
File ResultFile : TEXT open WRITE_MODE is “result.txt”;
|
|
Begin
|
|
LFSR_Reg_ Selector: LFSR
port map ( ... );
|
|
LFSR_Bit_Selector: LFSR
port map ( ... );
|
|
End arch_EMU;
Fig. 3. Skeleton of the VHDL code generated by the “circuit modeling & fault injection” strategy. This skeleton-based VHDL
code is melt to run in a fault simulation set. One of the main characteristics of this proposal is the ease automation of the
procedure by which the skeleton can be generated from a synthesizable VHDL circuit description.
The two entities (LFSR_Reg_ Selector and LFSR_Bit_ Selector) instantiated as a Component into the
architecture of the circuit on the design (see fig. 3) are Linear Feedback Shift Registers (LFSRs). The functions
of these structures are described as follows:
a) the LFSR_Reg_ Selector goal is to select the memory element to which the transient fault will be
injected. Note that the LFSR is implemented by a primitive polynomial in order to generate all 2 n - 1 possible
combinations, where n is the degree of the polynomial and is obtained as follows:
n = log2R
where R is the number of memory elements susceptible to be upset.
b) the second LFSR, LFSR_Bit_ Selector, aims to select the bit position in the memory element that
will be upset. This LFSR is also implemented by a primitive polynomial, but in this case, the polynomial degree
is equal to the number of bits that compose the memory element. Numerically , this parameter is defined as:
n’ = log2R’
where R’ is the number of bits in the memory element. Note that for registers with different bit lengths, different
polynomial degrees, n’, must be selected. For example, two different LFSR entities are required to perform fault
simulation in a circuit that contains a bank of 16 32-bit registers, and an FIFO (First-In First-Out) control buffer
that stores 4 8-bit instructions. In this case, the degree of the first polynomial will be 5, while the second, 3.
After including the two LFSR entities above in the architecture of the circuit VHDL main code, we use
the standard Srand function, from the C ANSI language, to randomly generate the time instants when faults will
be injected in the memory elements during the simulation process. The only parameter that must be previously
defined is the overall fault simulation runtime. This function also provides the initial seeds for the EMU to
initialize the LFSR_Reg_ Selector and LFSR_Bit_ Selector components during simulation.
Once the fault injection time instants have all been written by the Srand Function into the randtime.txt
file, the simulation can properly starts. As these time instants are being reached during the simulation procedure,
both of the LFSR entities (that are running in parallel with the application) are requested to provide the register
address and the bit position that they have computed at that moment, in order for the fault to be injected.
In addition to inclusion of the EMU inside the architecture of the circuit VHDL code to simulated, the
circuit modeling & fault injection strategy also modifies the registers declared along with the circuit VHDL
code in order to allow a “write upset” operation into these elements. This special write operation performs, in
order words, the fault injection during circuit simulation. Note that in order to mimic real upsets, this write
operation must be asynchronous with respect to the circuit clock signal. Also, if the memory elements to be
upset have some kind of error detection and/or correction (EDAC) code associated with, the write operation
must be performed “around” the EDAC control. In other words, the fault injection mechanism must be able to
write an upset in the memory element without triggering the EDAC logic to operate in the memory element
(because every time one writes into these elements, a new parity is generated by the EDAC control logic). With
this goal in mind, the circuit modeling & fault injection strategy modifies the registers declared along with the
VHDL code by adding two new input ports in the entity that defines a memory elements. These two ports are:
bitflip_en and bit_ind. While the goal of the first input is to operate as an enable to the write operation, the
second input indicates the bit position where the write (i.e., the upset) will be performed.
Note that one must keep in mind that all the above modifications performed along with the VHDL code
have the unique purpose of allowing fault simulation. Thus before synthesis, they are completely eliminated
from the VHDL code of the simulated circuit.
3. Development of the Statistical Models
In this section, we present the models used to analytically estimate, first the error (SEU) rate, and then,
the MTTF for the circuit on the design. It is important to remember that all these models are directly applied to
the VHDL netlist describing the circuit on the design. This means that this approach is completely independent
of the circuit implementation.
3.1. SEU Rate
Having in mind the goal of this section, which is the SEU rate estimation, hereafter we redefine the
cross section parameter according to different variables, when compared to those traditionally used, and
obtained by means of complex laboratory experiments, as mentioned in Section 1. Therefore, the cross section
of a circuit, which has been defined as being the total number of errors divided by the circuit incident particle
fluence (given in cm2) and representing, in other words, the memory cell sensitive node area, is given in this
work as follows:
L =
N , given in [(errors.device)/(particles.bit)]
1R
where: N: number of functional errors [errors]
1: incident particle fluence, i.e., the total number of faults (bit-flips) injected
during the fault simulation procedure [faults/device]
R: number of SEU-susceptible memory cells [bits]
After determining the cross section parameter, i.e., the SEU susceptibility of the circuit on the design,
we are able to estimate the SEU rate for such a circuit as being:
 = L 2, given in [errors/(bit.s)]
where: L: cross section of the circuit under test [(errors.device)/(faults.bit)]
2: real environment incident particle flux, i.e., frequency of which faults
(bit-flips) are injected in the circuit in real environment
[faults/device.second]
3.2. MTTF Rate
In the following, it is presented the reliability model developed to estimate the mean time to failure
(MTTF) of a complex circuit, composed of different types of groups of memory elements. By “group of
memory elements”, the reader can understand as being, for instance, a bank of general purpose registers, a bank
of specific control-flow registers, or even an embedded cache memory in a circuit like a microprocessor.
Initially, assume that the probability of an upset in a single bit of a memory element, after time t, can be
obtained by using the following relation [6,17]:
-
P1,1 = 1 - e t
-t
(I)
(where e
is the survival probability in t days for a single bit)
Assuming that the probability P1,1 is the same for each bit, and the occurrence of an error in a bit is
independent of the occurrence of errors in any of the other bits, the probability of r errors in n bits is given by
the binomial distribution:
n-r
P = C .P r.(1 - P )
(II)
r,n
n,r 1,1
1,1
(where Cn,r denotes the number of combinations of r errors in n bits)
Initially, the word is error free. After time t, the probability that the word is correct is:
R1(t) = 1 - Pr,n
r=d+1
(III)
(where d is the number of errors which can always be detected)
After N intervals of t, the probability that the word is correct is:
N
R1(Nt) = [1 - Pr,n]
(IV)
Assuming independence of the words, the reliability of a group of memory elements of W words after
time Nt is:
NW
Rw(Nt) = [1 - Pr,n]
(V)
The expected life of the group of memory elements, which is referred to as mean time to failure (MTTF),
is defined by the equation [17]:

MTTF =
Rwdt (where Rw is the reliability of the group of memory elements)
(VI)

0
Noting that N = t/t, then:
Rw(Nt) = {[1 - Pr,n]Wt/t }t
(VII)
which when substituted into (VI), yields the solution:
-t
(VIII)
MTTF =
W.ln[1 - Pr,n]
Note that the probabilistic model shown above for MTTF calculation considers a single group of W n-bit
memory elements, each of them with the maximum number of r errors in an Nt period of time. Similarly, for
another group of registers, namely W1, of n1-bit length memory elements, the same calculation given in (VIII)
must be performed:
-t
(IX)
MTTF1 =
W1.ln[1 - Pr,n1]
Therefore, for a given complex circuit containing W n groups of nn-bit memory elements, the final MTTF
computation (MTTFCC) for such a complex circuit is the smallest value among those computed for the
individual groups of registers:
MTTFCC = MTTFMIN (MTTF1 , MTTF2 , MTTF3 , ... , MTTFn)
(X)
4. Discussions & Example of Computation
This section presents a computation example that we have developed to illustrate the proposed
methodology. With this purpose, we have used the initial VHDL description of a single microprocessor, namely
R3 [24] (see table 1 for details), which was modified according to the FT PRO methodology [11,25]. This
modification yielded 3 different fault-tolerant (FT) implementations of the R3 processor, all of them based on
the use of information redundancy (in the form of Hamming code plus 1 parity bit per memory element) to
protect the memory elements:
a) distributed only inside the datapath (version 1);
b) distributed only inside the control path (version 2);
c) distributed inside both parts of the processor: data and control (version 3).
General Characteristics
Type of architecture
Number and type of instructions
Data/Instruction cache
Number of registers in the datapath
Number of registers in the control path
Type of register bank
Number of flags
Data format
Von
Neumann
Remarks
Multiplexed instruction/data bus and memory
5 (branch), 5 (Logic), 4 (Arithmetic), 5 (Mem. Access),
8 (Others), 16-bit Instructions
No
--15
16-bit registers (General Purpose)
16-bit registers (Program Counter - PC, Instruction
2
Register - IR)
Dual Port A Single-Port version was also implemented
4
Carry-out, Overflow, Negative, Zero
16 bits --27
Table 1. Details of the original R3 microprocessor architecture [24], before the implementation of the 3 fault-tolerant versions.
Note that the reliability (in terms of faults detected and/or corrected) is not only a function of which
memory elements have been protected with EDAC codes, but also a function of the application itself. Starting
from the point of view that memory elements are checked only when they are used (i.e., read out) in the
application, it may happen that after a given period out of use, the memory element can be corrupted with more
errors than those that can be handled by the EDAC code associated with that memory element.
Assume that this processor is intended to operate in a commercial aircraft that will fly at 33,000 feet
altitude during a period of time equal to 10 hours. Then according to [10], at this region, the high-energy
particles are represented mainly by neutrons, whose energy varies up to 100MeV, in a flux up to 10
particles/(cm2.hour). (Note that if a particle with such an energy strikes the sensitive node of a memory element
designed with one of the present state-of-the-art submicronic technologies, it is able to produce an upset at this
node.) Therefore, to obtain the frequency by which faults (bit-flips) are injected in the FT version of the R3
processor in this real environment, we will assume the “worst case” parameter, i.e., 10 particles incident on the
circuit, with all of them producing an upset per hour of circuit operation.
We have also defined as application program for the processor a code whose goal is to sort data stored in
a memory array by inverting their position according to a given order. This application program was run in the
3 different FT versions of the processor, one at a time.
Also, the program used to generate the random time instants for fault injection (Srand) generated a total
number of 239 time instants for a pre-specified fault simulation time of the VHDL code equal to 4 hours. The
complete execution of this VHDL code fault simulation yielded the results summarized in table 2.
Version of the processor
Number of
faults
detected
Number of
faults
corrected
Number of faults
escaping
detection
(functional errors)
201
189
38
1
FT memory elements implemented only the datapath
2
FT memory elements implemented only the control part
28
25
211
3
FT memory elements implemented in both parts: datapath
and control
229
217
10
Table 2. Fault simulation summary for the 3 FT versions of the R3 processor described in VHDL [11,25].
Then, from table 2, it is possible to compute the SEU cross section for the processor of this example as
follows:
L1= N/(1R) = 38/(239.272) = 5,85 x 10-4 [(errors.devive)/(particles.bit)]
(version 1)
L2= N/(1R) = 211/(239.272) = 32,46 x 10-4 [(errors.devive)/(particles.bit)]
L3= N/(1R) = 10/(239.272) = 1,54 x 10-4 [(errors.devive)/(particles.bit)]
(version 2)
(version 3)
And the error (SEU) rate can be obtained as follows:
1 = L12 = 5,85 x 10-4 . 10 = 5,85 x 10-3 [errors/(bit.hour)]
2 = L2.2 = 32,46 x 10-4 . 10 = 32,46 x 10-3 [errors/(bit.hour)]
3 = L3.2 = 1,54 x 10-4 . 10 = 1,54 x 10-3 [errors/(bit.hour)]
(version 1)
(version 2)
(version 3)
Now, considering equations (I) to (X) presented in Section 3.2, we can estimate the MTTF for this circuit
as follows:
Version 1:
P1,1 = 1 - e-t = 0.0568
Pr,n = Cn,r.P1,1r.(1 - P1,1)n - r =
0.04798 (for P3,16)
0.3780 (for P1,16)
MTTF for the 15 16-bit FT memory elements on the datapath:
MTTF(for P3,16) =
-t
W.ln[1 - Pr,n]
=
- 10
. = 13.55 hours
15 . ln(1 - 0.04798)
MTTF for the 2 16-bit (non FT) memory elements on the control part:
MTTF(for P1,16) =
-t
W.ln[1 - Pr,n]
=
- 10
. = 10.53 hours
2 . ln(1 - 0.3780)
Version 2:
P1,1 = 1 - e-t = 0.277
Pr,n = Cn,r.P1,1r.(1 - P1,1)n - r =
0.1756 (for P3,16)
0.0341 (for P1,16)
MTTF for the 15 16-bit (non FT) memory elements on the datapath:
MTTF(for P1,16) =
-t
W.ln[1 - Pr,n]
=
- 10
. = 19.21 hours
15 . ln(1 - 0.0341)
MTTF for the 2 16-bit FT memory elements on the control part:
MTTF(for P3,16) =
-t
W.ln[1 - Pr,n]
=
- 10
. = 25.89 hours
2 . ln(1 - 0.1756)
Version 3:
P1,1 = 1 - e-t = 0.01528
P = C .P r.(1 - P )n - r =
0.00164 (for P3,16)
r,n
n,r 1,1
1,1
MTTF for the 17 16-bit FT memory elements (15 on the datapath plus 2 on the control part):
MTTF(for P1,16) =
-t
W.ln[1 - Pr,n]
=
- 10
. = 358.38 hours
17 . ln(1 - 0.00164)
To conclude, table 3 summarizes the MTTF computations for the 3 FT versions of the R3 processor.
FT Version for the
R3 Processor
MTTF estimated
(hours)
1
10.53
2
19,21
3
358.38
Table 3. Summary of the MTTF computations for the 3 FT versions of the R3 processor.
5. Final Considerations
This work presented a novel approach based on a VHDL description to predict the SEU rate and the
mean time to failure (MTTF) for complex circuits. When compared to traditional in-flux methods, the approach
described herein does not require laboratory experiments to characterize microelectronic devices for operation in
radiation environments. Therefore, due to the simplicity to be performed the proposed approach presents an
intrinsic low-cost. Also, because it is a fully analytical approach based on a bundle of computer programs,
researchers and development engineers need only a workstation to compute the failure (SEU) rate and estimate
MTTF.
The core of the proposed methodology is the Error Management Unit - EMU, described in VHDL as an
entity that can be automatically parameterized by the user. This code is then instantiated as a Component inside
the architecture of the circuit whose reliability is to be estimated. We also presented a computation example to
illustrate the proposed approach. This methodology is being automated through the development of a CAD tool
that performs circuit modeling, fault injection and simulation data analysis.
For those interested in more details about the EMU VHDL code, a commented version of this program
(emu.vhd) can be retrieved from the following URL: www.ee.pucrs.br/~vargas/Programs .
References
[1]
Binder, D.; Smith, C.; Holman, A. B. Satellite Anomalies from Galactic Cosmic Rays. IEEE Transactions
on Nuclear Science, Vol. NS-22, No. 6, Dec. 1975, pp.2675-2680.
[2]
Wallmark, J. T.; Marcus, S. M. Minimum Size and Maximum Packing Density of Nonredundant
Semiconductor Devices. Proceedings of IRE, Mar. 1962, p.286.
[3]
Velazco, R.; Cheynet, P.; Muller, J-D.; Ecoffet, R. Artificial Neural Network Robustness for On-Board
Satellite Image Processing: Results of SEU Simulations and Ground Tests. IEEE Transactions on Nuclear
Science, Part I, Vol. 44, 1997, pp. 2337-2344.
[4]
Srour, J. R.; McGarrity, J. M. Radiation Effects on Microelectronics in Space. Proc. of the IEEE, vol. 76,
n°11, Nov. 1988.
[5]
Calin, T.; Vargas, F.; Nicolaidis, M.; Velazco, R. A Low-Cost, Highly Reliable SEU-Tolerant SRAM:
Prototype and Test Results. IEEE Transactions on Nuclear Science, Dec. 1995.
[6]
Browning, J. S.; Koga, R.; Kolasinski, W. A. Single Event Upset Rate Estimates for a 16-K CMOS SRAM.
IEEE Transactions on Nuclear Science, vol. NS-32, n° 6, Dec. 1985.
[7]
Turflinger, T. L.; Davey, M. V. Understanding Single Event Phenomena in Complex Analog and Digital
Integraded Circuits. IEEE Transactions on Nuclear Science, vol. NS-37, n° 6, Dec. 1990.
[8]
Vargas, F. L.; Nicolaidis, M. SEU-Tolerant SRAM Design Based on Current Monitoring. 24th FTCS International Symposium on Fault-Tolerant Computing. Austin - Texas, USA, Jun. 1994, pp. 106-115.
[9]
Kerns E. S.; Shafer, B. D.; ed. The design of Radiation-Hardened ICs for space: a compendium of
approaches. Proc. of the IEEE, vol. 76, n°11, Nov. 1988.
[10] Normand, E.; Baker, T. J. Altitude and Latitude Variations in Avionics SEU and Atmospheric Neutron Flux.
IEEE Transactions on Nuclear Science, Vol. 40, No. 6, Dec. 1993, pp.1484-1490.
[11] Vargas, F.; Amory A. Transient-Fault Tolerant VHDL Descriptions: A Case-Study for Area Overhead
Analysis. 9th IEEE Asian Test Symposium - ATS’00. Taipei, Taiwan, Dec. 04-06, 2000.
[12] Vargas, F.; Bezerra, E.; Terroso, A. Testability Verification of Embedded Systems Based on Weak
Mutation Analysis. 3rd IEEE International Workshop on Testing Embedded Core-Based System-Chips -TECS’99.
Dana Point - CA, USA, Apr. 28-29,1999. pp. 31-37.
[13] Lipsett, R.; Schaefer, C.; Ussery, C. VHDL: Hardware Description and Design. Kluwer Academic
Publishers. 1992. 299p.
[14] IEEE Standard VHDL Language Reference Manual; IEEE Std 1076-1987. 1988.
[15] Aho, A. V.; Sethi, R.; Ullman, J. D. Compilers: Principles Techniques, and Tools. Addison-Wesley, 1988.
796p.
[16] Levine, J. R.; Mason, T.; Brown, D. Lex & Yacc. Cambridge: O'Reilly, 1998. 364p.
[17] Pradhan, D. K. Fault-Tolerant Computer System Design. Prentice-Hall, 1996. 544p.
[18] Arlat, J.; Aguera, M.; Amat, L.; Crouzet, Y.; Fabre, J.-C.; Laprie, J.-C. Martins, E.; Powell, D. Fault
Injection for Dependability Validation: a Methodology and Some Applications. IEEE Transactions on Software
Engineering, Vol. 16, No. 2, Feb. 1990, pp. 166-182.
[19] Karisson, J.; Liden, P.; Dahlgren, P.; Johansson, R.; Gunnefio, U. Using Heavy-Ion Radiation to Validate
Fault-Handling Mechanisms. IEEE Micro, Vol. 14, No. 1, 1994, p. 8-23.
[20] Kanawati, G. A.; Kanawati, N. A.; Abraham, J. A. FERRARI: A Flexible Software-Based Fault and Error
Injection System. IEEE Transactions on Computers Vol. 44, No. 2, February 1995, pp. 148-160.
[21] Sieh, V.; Tschäche, O.; Balbach, B. VERIFY: Evaluation of Reliability Using VHDL-Models with
Embedded Fault Descriptions. 27th International Symposium on Fault-Tolerant Computing (FTCS '97).
[22] Goswami, K. K.; Iyer, R. K.; Young, L. T. DEPEND: A Simulation-Based Environment for System Level
Dependability Analysis. IEEE Transactions on Computers 46(1): 60-74 (January 1997).
[23] Pease, R. L.; Johnston, A. H.; Azarewicz, J. L. Radiation Testing of Semiconductor Devices for Space
Electronics. Proc. of the IEEE, vol. 76, n°11, Nov. 1988.
[24]
[25]
www.inf.pucrs.br/~gaph
www.ee.pucrs.br/~sisc
Download