Software Faults and Software Reliability

advertisement
Software Faults and Faultaaaaaa
Injection Models
Submitted to Dr.Box
As a part of coursework for CSC532
Advanced Software Engineering
--Raviteja Varanasi
Software Faults and Fault Injection Models
Abstract:
Software Faults can be created at any time in any phase of the software development.
This paper will explore software faults in the perspective of software reliability.
Complex software faults occurring in various systems have been studied and are
classified, basing on the behavior of the fault. Various Software fault injection and
detection models are studied, and the behavior of the models has been summarized.
Some of the methods for avoidance and detection of software faults are summarized.
Various methods of software fault mitigation, in case the software fault cannot be
avoided are discussed.
Introduction:
While rapid advances in computing hardware have led to powerful, multi−gigahertz
processors, advances in software reliability have not kept pace with this progress.[1]
Software program bugs continue to be frequent, in spite of increasing requirements that
software be reliable. Nonstop systems have stringent uptime requirements and must be
kept running even in the face of hardware or software errors and may be required to be
monitored, debugged and patched on the fly.[2] While software program crashes are
problematic enough, perhaps more dangerous are undetected errors which silently
compromise the results of a computation.
A failure in a computer-based system that controls critical applications may lead to
significant economic losses or even the loss of human lives. The causes of failures in
computer-based systems are manifold: physical faults, maintenance errors, design and
implementations mistakes resulting in hardware or software defects, and user or operator
mistakes. [3] These kinds of faults are all undesired circumstances that hinder the system
from delivering the expected service. There are two complementary ways to ensure that a
system delivers the expected service: fault prevention, i.e. avoid the introduction of
faults; and fault tolerance, i.e. ensure that the system delivers its service despite the
presence of faults. [4] A fault-tolerant system should tolerate both hardware and software
faults, as both categories can have a great impact on it. Furthermore, it is essential that
confidence in a fault-tolerant system’s ability is reached if it shall be deployed for critical
applications.
One attractive approach to reach confidence in a fault-tolerant system’s capability is fault
injection.[5] Fault injection can be used for studying the effects of hardware and software
faults. However, in both the academic community and industry, most fault injection
studies have aimed at the effects of physical hardware faults. Only a few studies have
been concerned with software faults, for the reason that knowledge of software faults
experienced by systems in the field is limited. As a result, it is difficult to define realistic
fault sets to inject. This is crucial if a fault injection experiment is intended to quantify a
system’s fault tolerance. Consequently, more research is needed in the fault injection area
- especially studies targeting software faults and errors induced by them.
This term paper contributes towards fulfilling this need by investigating models of
software faults and models of errors induced by software faults. Models and Techniques
for emulating representative software faults were also studied and analyzed.
Taxonomy of Software Faults:
Faults that affect software executions include hardware faults that lead to software errors
(hardware-induced software errors) and software faults (software design/implementation
faults) [1]. Faults can be classified into physical faults, design faults and Interaction
faults.
FAULTS
Physical Faults
Design Faults
Hardware Faults
Software Faults
•Memory Faults
•CPU Faults
•Bus Faults
•I/O Faults
•Initialization Faults
•Assignment Faults
•Condition Check Faults
•Function Faults
•Documentation Faults
Interaction Faults
Faults Induced
by the User
Physical faults are the Hardware faults, which may occur in any part of a computer
system. Some of the hardware faults can affect program execution directly affecting the
software.[6] Such kinds of faults are called hardware-induced software faults. Hardware
faults are classified into memory, CPU, bus and I/O faults.
Memory faults are those that corrupt the contents of a particular memory location. They
can occur in text segments and data segments.
CPU faults include computation, control flow, and register faults. From the software
viewpoint, all these faults result in the corruption of registers. The corrupted registers
can be general registers or special registers, such as program counter (PC), next program
counter (nPC), the processor state register (PSR), or the stack pointer (SP).
Bus faults can occur on address lines or data lines.
instructions or data transmitted through the bus.
They may affect bits in the
I/O faults are from peripheral devices. Device drivers are designed to be able to handle
these exception situations.
Design faults are the software faults, which can be classified according to their causes of
symptoms.[7] Automatic error logs available in several operating systems usually give
information about the error symptoms. Analyses of human-collected error reports,
especially from manufacturers, can usually provide insight into the causes. Software
faults can be classified into initialization, assignment, condition check, functional and
documentation faults.
Initialization faults include uninitialized variables and wrongly initialized variables or
parameters.[8] The value of an uninitialized variable is compiler-dependent. The value
is set to zero if the variable is global, or unknown if the variable is local. Most of the
uninitialized variables can be detected by a smart compiler. Wrongly initialized
parameters are similar to miss assigned variables. Wrongly initialized parameters are
those that are initialized to incorrect values, for example defining a small value to a
parameter MAXAREASIZE. Incorrect argument of function calls are also initialization
faults because the arguments are wrongly initialized.
Assignment faults can be missing assignments or incorrect assignments. A fault in an
incorrect assignment may be in the right hand side causing one incorrect data value (for
example using x=y+z and x=y+w corrupts a) or in the left hand side causing two
incorrect data values (e.g., using a=b+c for d=b+c corrupts a and d).
Condition check faults include missing condition checks (for example fail to check return
values) and incorrect condition checks.
Function faults mean that the faulty parts are not single statement faults, that is, these
faults are complicated and the correction of this type of fault involves multi-statement
modification or function rewriting.
Documentation faults mean that the system messages or documents are incorrect. These
faults do not affect program execution.
Interaction Faults are the faults induced by the use. For example, in Database systems,
a mistake done by the Database administrator can cause severe damage to the system or
even loss of vital data.
Software fault Propagation Models
Fault propagation models are built for both hardware and software faults. Fault injection
has been used to evaluate the dependability of computer systems, but most fault-injection
studies concentrate on the final impacts of faults on the system with an emphasis on fault
latency and coverage issues.[9] There has not been much research on what happens after
a fault is injected and how a fault propagates in a software system.
DIDUCE (Dynamic Invariant Detection U Checking Engine)
DIDUCE (Dynamic Invariant Detection U Checking Engine) is and automatic bug
detection tool that dynamically checks invariants in Java applications.
DIDUCE
instruments Java byte code to perform dynamic and automatic invariant detection and
checking.
A program invariant is a property that is true at a particular program point. Invariants
explicate data structures and algorithms and are helpful for programming tasks from
design to maintenance. Invariants can be dynamically detected from program traces that
capture variable values at program points of interest.
DIDUCE helps in debugging programs that fail on some inputs. It is a common
occurrence for a program which works correctly on many inputs, to fail on others.
DIDUCE can be used to quickly pinpoint differences in behavior between the successful
and the failing runs.
DIDUCE helps in debugging failures in long running programs by flagging anomalies
prior to the failure. Some of the hardest bugs to track down are those that occur only
after a program has executed for a long time. DIDUCE continually monitors all the
variables in the program and is better suited to locate such errors.
DIDUCE helps in debugging component based software where the component works in
some systems but not in others. For component based software, however, we can first
train DIDUCE on other codes that use the same components correctly, and apply it to
check the behavior of the component in the context of the new software.
DIDUCE helps in testing programs where the correct output of some inputs is unknown
by training on known input/output pairs and testing on the unknown pairs. It aids in
program evolution by testing if program modifications affect other portions of code.
DIDUCE associates invariants with static program points (specific locations in program’s
code). These points are i) program points which read from or write to objects, ii)
program points which read from or write to static variable iii) procedure call sites. Stack
accesses are ignored because of overhead and since all Java objects are on the heap.
Automatically tracked expressions/invariants include i) the value being read or written ii)
the difference between old and new values after a write iii) the parent object. Users can
extend the basic DIDUCE classes to customize their invariant tracking.
Invariants are assigned a confidence level that is a function of the number of successful
evaluations. Invariants that have held true for a long time are assigned a high confidence.
High confidence invariants that fail often indicate a bug. DIDUCE was implemented to
test java programs using the Byte Code Engineering Library. It was used to test four
applications i) MAJC a CPU architecture developed at sun with support for on-chip
multiprocessing ii) Mail Manage an opens-source email management utility iii)the Java
Secure Socket Extension library, and iv) JOEQ – a java virtual machine system with a
just-in-time compiler[12].
FINE (A Fault Injection and Monitoring Environment for tracing the UNIX System
Behavior Under Faults
The fault injection and monitoring environment (FINE) is a tool to study fault
propagation in the UNIX kernel. FINE injects hardware-induced software errors and
software faults into the UNIX kernel and traces the execution flow and key variables of
the kernel. FINE consists of a fault injector, a software monitor, a workload generator, a
controller, and several analysis utilities. Experiments on SunOS 4.1.2 are conducted by
applying FINE to investigate fault propagation and to evaluate the impact of various
types of faults. Fault propagation models are built for both hardware and software faults.
Transient Markov reward analysis is performed to evaluate the loss of performance due
to an injected fault [6]. Experimental results shows that memory and software faults
usually have a very long latency, while bus and CPU faults tend to crash the system
immediately. About half of the detected errors are data faults, which are detected when
the system is tries to access an unauthorized memory location. Only about 8% of faults
propagate to other UNIX subsystems. Markov reward analysis shows that the
performance loss incurred by bus faults and CPU faults is much higher than that incurred
by software and memory faults. Among software faults, the impact of pointer faults is
higher than that of non-pointer faults.
CONCLUSION
Software fault propagation is an immature area of research. As more and more complex
systems get designed and built, especially safety critical systems, software fault tolerance
and the next generation of hardware fault tolerance will need to evolve to be able to solve
the design fault problem. [10] Unlike fault tolerance practiced in any other field, the
necessity to be able to design fault tolerance into the system for design faults and
unexpected circumstances has never been greater. The current generation of software
fault tolerance methods cannot adequately compensate for these faults. Part of this next
generation of software fault tolerance methods will have to include an in-depth view at
how to combat the increasing cost of building correct software.[11] It will be necessary
for the next generation of fault tolerance methods to be cost effective enough to be
applied to the safety critical systems in which they will be necessary.
The view that software has to have bugs will have to be conquered. If software cannot be
made (at least relatively,) bug free then the next generation of safety critical systems will
be very flawed. Reliable computing systems, often used for transaction servers, made by
companies like Tandem, Stratos, and IBM, have shown that reliable computers can
currently be made; however, they have also demonstrated that the cost is significant.
In this term paper, I've introduced basic fault propagation concepts, techniques and tools
to achieve this special system feature, and also give a description of the type of faults,
their manifestation and behavior. In general fault tolerance is considered as a study of
faults/ failures, as mastering of faults/failures behavior is the reasonable starting point of
stopping their effects as any system defects, and all those techniques and tools are
developed towards how to probe this behavior and further how to stop the propagation.
As most of the techniques and tools are generated initially for coping with hardware
defects, or more effective when applied to hardware world, software fault tolerance still
has not been that relatively mature in comparison with hardware. And software fault
tolerance research has drawn more and more focus nowadays, as the majority of system
defects are shown to be software defects
References:
[1] J. Dures and H. Madeira, “Characterization of Operating Systems Behavior in the
Presence of Faulty Drivers Through Software Fault Emulation,” PRDC2002 Pacific Rim
International Symposium on Dependable Computing, pp. 16–18, December 2002.
[2] H. Madeira, M. Vieira, and D. Costa, “On the Emulation of Software Faults by
Software Fault Injection,” IEEE International Conference on Dependable Systems and
Networks, pp. 25–28, June 2000.
[3] J. Arlat, Y. Crouzet, and J. Karlsson, “Comparison of Physical and SoftwareImplemented Fault Injection Techniques,” IEEE Transactions on Computers, pp. 1115–
1133, September 2003.
[4] M. Hsueh, T.Tsai, and R.K.Iyer, “Fault Injection Techniques and Tools,” IEEE
Transactions on Computers., pp. 75–82, April 1997.
[5] R. Chillarege, I.S.Bhandari, J.K.Chaar, M.J.Halliday, D.Moebus, B.Ray, and
M.Wong, “Orthogonal Defect Classification - A Concept for In-Process Measurement,”
IEEE Transactions on Software Engineering, pp. 943–956, November 1992.
[6] W. Kao, R.K.Iyer, and D.Tang, “FINE: A Fault Injection and Monitoring
Environment for Tracing the UNIX system Behavior Under Faults,” IEEE Transactions
on Software Engineering., pp. 1105–1118, November 1993.
[7] J. Carreira, H. Madeira, and J. G. Silva, “Xception: A Technique for the Experimental
Evaluation of Dependability in Modern Computers,” IEEE Transactions on Software
Engineering, pp. 125–136, February 1998.
[8] D. Costa, M. V. Tiago Rilho, and H. Madeira, “ ESFFI - A Novel Technique for the
Emulation of Coftware Faults in COTS Components ,” IEEEEighth Annual IEEE
International Conference and Workshop on the Engineering of Computer Based Systems
(ECBS ’01), pp. 197–204, April 2001.
[9] T. Jarboui, J. Arlat, Y. Crouzet, and K. Kanoun, “ Experimental Analysis of the
Errors Induced into Linux by Three Fault Injection Techniques ,” DSN’02International
Conference on Dependable Systems and Networks, pp. 23–26, June 2002.
[10] N. S. Bowen and D. K. Pradhan, “ The Effect of Program Behaviour on Fault
Observability,” IEEETransactions on Computers, pp. 868–880, Aug 1996.
[11] A. Johansson, “Software Implemented Fault Injection Used for Software Evaluation
,” Predicting System Trustworthiness for Software Component Trustworthiness, pp.38–
43, July 2002.
[12] S. Hangal and M.S.Lam “ Tracking down software bugs using automatic anomaly
detection.” International Conference on Software Engineering, pp 291-301, May-2002
Download