Testing the Reliability of Component

advertisement
Testing the Reliability of Component-Based Safety Critical Software
J. H. R. May, Ph.D.; Safety Systems Research Centre; University of Bristol, UK
Keywords: Software verification, architectural software reliability models, software component re-use,
software testability and complexity.
Abstract
Testing remains a fundamentally important way to check that a software program behaves as required, but a
weakness of testing is that successful testing only leads to informal quality statements. Even where
quantitative methods are employed, it is not clear how the objective statements (e.g. 100% code coverage
has been achieved) relate to the statements that are really useful such as “the software is correct,” or “the
software is reliable.” This inconclusive nature of testing is at the heart of Dijkstra’s famous comment
“Program testing can be used to show the presence of bugs, but never to show their absence!” (ref. 1)
This paper argues that Dijkstra’s comment is not as important as it might seem, and that software reliability
estimates produced by new component-based statistical software testing (CBSST) models provide a testing
framework for software quality that is thoroughly formal, but in a different sense to that envisaged by
Dijkstra. A significant benefit of these models is that they offer a new verification method for software built
with component re-use, based on “proven-in-use” components.
Introduction
Dijkstra’s famous comment (see Abstract) is not in itself a criticism of testing. However, it becomes a
severe criticism of testing under the supplementary assumption that absence of any errors must be
established before software may be used. I.e. software must be proved to be perfect. This assumption
disqualifies virtually all software in use today, but it also ignores a crucial theoretical possibility.
Specifically, it disqualifies imperfect software that can be shown to be fit to some degree for its purpose. If
it were possible to extract a failure probability estimate from the results of software testing, the drawbacks
described by Dijkstra’s comment and the supplementary assumption could be side-stepped. In this case,
testing would provide a new framework for software quality that is thoroughly formal, but in a different
sense to that envisaged by Dijkstra.
Statistical Software Testing (SST) is a branch of testing research into methods of obtaining the failure
probability estimate warranted by test results. It is sometimes called Statistical System Testing since it can
be applied to design verification of hardware or software systems, and to systems consisting of both. SST
models estimate operational reliability for programs whose reliability is stable (ref. 2). In its simplest form,
Black Box SST (BBSST) is well developed and has been successfully applied to complex commercial
systems (ref. 3). BBSST is so called because, given a N valid BBSST test results, the code (length,
complexity, or any aspect of its structure) has no effect on the reliability estimate produced by the BBSST
model.
However, there are drawbacks with BBSST. It applies to a monolithic system, and it is after-the-fact. After
a software system has been built it is too late to find that is unreliable. Modern software engineering
techniques are moving towards component-based development, with component re-use, and this offers the
opportunity for a new improved approach to software quality. Ideally, re-use of high reliability components
would produce high reliability software faster and more efficiently. There is a need for SST models
specifically aimed at supporting component re-use and exploring these possibilities. BBSST can not do this
because to support component re-use, SST models are needed to:1. Explain how the reliability of software depends on its components;
2. Be able to exploit re-used software components of known reliability in estimating an overall software
system reliability;
3. Provide a new definition of software complexity based on testability which can be used to discover how
to design component-based software that is inherently testable i.e. demonstrably reliable.
In point 2, re-use means more than the transplantation of code from an old system into a new one. It also
entails the ability to use a component’s history of successful previous testing or usage in the old system as
evidence of its reliability in the new system, and thereby to contribute to the reliability estimate for that new
system. This is the idea of “proven-in-use” components. One objective of CBSST research is to make it
possible to build sets of re-usable software components, constantly aggregating evidence of their reliability
as they are tested and used in different applications, and then use those components to build demonstrably
reliable software. In 3 above, statistical testability is the property of software that determines the number of
tests required to demonstrate that it achieves a given level of reliability, according to the models in this
paper. It can be seen as a new, unknown software complexity measure, and its discovery is a second
objective of CBSST research.
BBSST background: All SST models verify software design using empirical evidence from testing or, if the
software has been observed appropriately, from operational use. In BBSST, when a software system
executes correctly on N valid BBSST tests, it is possible to make an estimate of the system probability of
failure on test (pft) (ref. 4). This statistical test experiment requires that test selection is performed by
sampling from a global set of possible tests to produce the ‘statistical test set’ that is presented to the
software. The frequency distribution of the different kinds of tests in the statistical test set must be
consistent with the operational profile faced by the software in practice (refs. 4,5). Generation of the correct
test frequency distribution clearly requires knowledge of the inputs the system will see over a representative
life or mission time. However, with this knowledge, methods for the synthesis of the distribution are
available. The essential techniques have been developed and demonstrated for real, complex system
applications (refs. 3,5).
Previous work on reliability models that use code structure: Much of the work on measurement of test
effectiveness, and software quality assessment based on testing, has emphasized the relevance of code
structure (ref. 6). However, it has proved difficult to link code structure and statistical testing models.
Architectural Software Reliability (ASR) models seek to evaluate the influence of code structure on
reliability estimates. ASR models reference ‘parts’ of the software, but the parts are not necessarily
components in the usual sense – they might be execution paths for instance. A review of different types of
ASR models is available (ref. 7).
The models proposed in this paper are a type of ASR model in which software is represented as a set of
interacting components. They are the result of a particular line of research to extend the BBSST models of
Miller et al (ref. 4) to be component-based (ref. 8,9,10,11,12). Early examples of this type of model can be
found in (refs. 13,8). The new models in this paper are called CBSST models. In CBSST models, a
component is defined in a traditional way as a physical ‘parcel’ of code that, without internal modification,
can be placed in a software system to provide some fixed functionality. There are two main research
problems associated with CBSST:1. Modeling the change in a component’s reliability when it is transferred between environments;
2. Modeling dependent component failure.
The first problem occurs because the excitation rates of bugs are different in different component
environments (applications). Solutions to this problem have been proposed (ref. 4,14). This paper does not
contribute to this area, but instead concentrates on the dependence problem. The particular problem has
been identified as follows “Without exception the existing models assume that the failure processes
associated across different components are mutually independent.” (ref. 7) This dependence is sometimes
called ‘[inter-] component dependence’ (ref. 15) to distinguish it from dependence between tests.
There are two possible approaches to component dependence. The first approach is to accept its existence
and attempt to model it, and this is the approach taken in this paper. The second approach is to build
software in ways that avoid causing component dependence (ref. 16). A difficulty with this approach is that
component dependence can exist without any communication between components so that manipulating
code can not rule out dependence. Two components executing in series can fail dependently without
communicating in any way. It may just occur by chance that the bugs they contain, for example, never cause
failure on the same test. Although independence of component failures may not be achieved, the approach
of Woit and Mason (ref. 16) remains promising because the methods may result in software with low
dependence between components. If CBSST, or other ASR models can distinguish between levels of
component dependence, there will be a benefit in achieving it: low dependence is associated with higher
statistical testability (it allows higher reliability estimates for a given number of tests).
CBSST Modeling
In essence, CBSST applies BBSST to a program’s components, and then combines the results to obtain a
program reliability estimate. The BBSST model is also called the Single Urn Model (SUM), and the
random pattern of failures it describes will be referred to as a SUM process. The SUM is given by the
formula in (eq. 1) in which, f1 is a probability density function describing the location of the component pft,
θ, conditional on seeing 0 failures in a sequence of N tests.
f1 (θ | 0, N ) = ( N + 1)(1 − θ ) N
(1)
Components failing according to (eq. 1) will be called SUM process components, or ‘SUM components.’
Whether or not a component failure pattern is a SUM process depends on a triple (C, S, T): respectively the
component code, its specification, and the statistical tests it receives.
In this paper, a statistical test set T for a component (or program) C with specification S, means a sequence
of tests generated by picking tests sequentially with replacement, using a random choice mechanism from a
set ψ that covers all valid executions of C based on S, and according to a probability distribution (the
operational profile) over ψ. A statistical test set does not necessarily imply a SUM process. That is, relative
to a valid statistical test set and a component specification, not all components are SUM components. The
reason is that the determinism of the software computation can prevent the random test choice mechanism
from causing random failures. Section “Series execution of components with non-SUM interaction”
provides a concrete example of how this can occur. This is important because, although it may be possible
in principle to achieve a SUM process for any program by carefully defining T, this is not useful where the
aim is re-use of component test results. To re-use component test results W, it is necessary to ensure that the
component is tested in the new program with tests defined in the same manner as in W. This ensures the
relevance of the previous tests to the new environment but it also means that program testing does not
always cause a SUM process.
Simply testing all of a system’s components individually (unit testing) is not a full test of a system. The
interactions between components also need to be tested (integration testing). Interactions between
components are caused by the connectivity between components: by which we mean the code causing data
communication and flow of control between components. The types of connectivity studied in this paper are
simple. For example, conditional execution is not studied; we consider the case where all components
execute on each test. In addition, data communication mechanisms are restricted to parameter passing and
use of global variables and it is assumed that failure of any component or connectivity fails the software.
We define a SUM interaction to be any interaction between two SUM components that causes the two
components considered as one entity to fail as a SUM process. Any other form of interaction is termed nonSUM. A CBSST model for a more complex component architecture must be derived to fit that architecture,
based on these two types of interaction. This is similar in concept to construction of, for example, models of
AC electrical circuits, and future work will concentrate on convenient methods of building CBSST models
in this way. Models for the two fundamental types of interaction are described in the following two sections.
Series execution of components with SUM interaction: Consider two SUM components A and B executing
in series, interacting by passing data between components as shown by the arrows in Figure 1. In this
notation, a box represents a SUM component and an unbroken arrow represents data communication that
occurs within the duration of a single test, and that communicates data computed using only program input
data received on that test.
A
B
Figure 1 - A two-component system
We will initially assume that the connectivity and resulting data communication is correct. In this case, it
can be argued that the system in Figure 1 behaves as a SUM process i.e. the communications cause a SUM
interaction and the overall system behaves as a single SUM process. The reason is that this configuration
ensures an unchanging size of failure domain in the system test space over time (i.e. for every test in any test
sequence). The test selection mechanism is the only influence on whether a given test succeeds or fails, and
that mechanism is random.
In practice, the connectivity cannot be assumed correct in the sense that it implements the system
requirements. It must be treated as a third ‘element’ which itself can fail. If we assume that the code
implements the connectivity as shown in the diagram, connectivity failures (system failures not due to either
component contradicting its specification) in Figure 1 also have a static domain in the test space. The
connectivity involves only simple data delivery, and so its failure domain can not be dependent on previous
test history. Hence the whole system, including connectivity, is SUM. The assumption is that the
connectivity is simple in nature. However, we may not wish to rely on such an assumption – the intended
connectivity might be simple, whilst the reality could be complex due to programming mistakes. In this case
it would be possible to use a limited form of formal proof to show the connectivity is correct given the
components are correct. This is discussed further in section “Analysis of connectivity using formal
methods” and later in the section on commercial off-the-shelf (COTS) components.
Series execution of components with non-SUM interaction: In Figure 2, a program S is arbitrarily shown
with one data input and two data outputs. The broken arrow indicates transfer of data across tests. That is,
system S maintains some state from one test to the next. Such a configuration will generally, although not
necessarily, result in a non-SUM process. This section looks at an example that produces this system
behavior with two SUM components as in Figure 3.
f3(x)
S
Figure 2 - A system transferring state across tests
f4(x)
x
t
A
B
f2(x,m)
Figure 3 - Non-SUM connectivity
The broken arrow in Figure 3 shows the same data communication as the broken arrow in Figure 2 and
indicates that on a single test, A performs a computation that is dependent on B’s computation in previous
tests. The computation of A is therefore ‘conditioned’ by previous tests. A simple example of code that is
described by Figs 2 and 3 is given below. For example, t could be an integer, x a global variable, m a local
variable, and the fi simple arithmetic functions.
{component A:}
input(t)
m = read_memory(addr)
x = f1(t)
output (f2(x,m))
{component B:}
write_memory(addr, f3(x))
output(f4(x))
The persistence of state between tests in Figure 3 means the resulting system cannot be assumed to behave
as a SUM process despite both A and B behaving individually as SUM components. An example can be
constructed to illustrate this. We will assume that the connectivity is correct. Suppose that when B fails on
any test in its failure subset in the test space, it correctly generates specific data d (generated by f3(x)) to
send to A (which A receives in the following test), and that when B succeeds the data it sends to A is never
d. Further assume that d is an acceptable input for A according to A’s specification. Now suppose that A
fails on receiving d from B, irrespective of the value of t. Under these circumstances the system fails in a
particular pattern. If the results of system tests were spread across the page from left to right with time, and
a dot represented a test failure whilst a space represented a test success (this will be called a fail line), the
pattern would have the following form.
..
..
..
..
..
..
..
.. etc.
This failure pattern is not a SUM process: a failure of B is always followed immediately by failure of A on
the subsequent test. In contrast to the system of Figure 1, we cannot employ the SUM to analyze system
reliability directly. The example shows how easy it is to create a non-SUM process from interacting SUM
components. It exhibits dependence of system failure among successive tests, a phenomenon that has been
studied before (refs. 17,18). Unlike these existing approaches, our CBSST models do not postulate
parameters to characterize the level of dependence because of the difficulty of evaluating these parameters
in practice for a particular program. Our approach explains test dependence in terms of inter-component
dependence, obtaining bounds on the reliability estimates by analyzing extreme component dependence
conditions.
The example given above shows only one possible pattern of failure behavior. In general, the computation
can produce an endless variety of failure patterns. Analysis of specific patterns is not possible since it relies
on knowledge of the actual failure behavior, which is not known. This is why we seek a bounding analysis.
Our key assertion is that the example interaction described above for Figure 3 is special - it provides a
bounding case. If we estimate the system pft assuming this form of interaction, then if the real interaction is
anything different, the estimate will be conservative i.e. an overestimate. The argument for conservatism is
developed below.
The example of adjacent test failures, described above for Figure 3, shows a particular extreme form of
dependence between failure of A and B. It is extreme in the following senses: the dependence on the same
test is total negative dependence (exclusivity), whilst the dependence between different tests shows the
closest clustering possible. This can be pictured using separate fail lines for A and B.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
etc. (A)
etc. (B)
.
The two fail lines below show two SUM processes behaving independently: in this case coincident failure
and clustering both occur randomly.
.
.
.
. .
.
.
. .
.
.
.
.
.
.
.
.
(A)
(B)
Notice that if the independence failure pattern is assumed for the purposes of modeling, given fixed
component failure probabilities, the probability of observing a long fail-free sequence of tests is lower than
in the case of of adjacent component failures. Therefore, given an observation of a fail-free sequence, the
assumption of the independence failure pattern leads, statistically, to lower estimates for the component
pfts.
Now consider the SUM interaction of Figure 1. The first kind of dependence is possible – it is quite
possible that more (or fewer) same-test failures of A and B occur than would occur if the two SUM
processes were independent. However, the second type of dependence, clustering on different tests, is
prevented by the randomness of the demand selection procedure.
Given SUM processes with on-demand fail probabilities θA and θB, over the range of all possible same-test
dependencies, exclusivity produces the largest value of θS, the system probability of failure on demand.
Similarly given θA and θB, and given any state of same-test dependence, extreme clustering (adjacent
failures of A and B) produces the largest probability of zero system fails in N tests. In the usual statistical
fashion, if we turn this statement around, extreme clustering is the assumption that produces the estimators
that most favor large θA and θB when we observe no failures. Therefore if we observe no failures during
testing and use both of these assumptions simultaneously, we effectively estimate the largest θA and θB
possible, and then sum them to obtain the system fail probability. That is, from the point of view of system
fail probability, we are making the worst case assumptions for the two types of dependence, both in
isolation and in conjunction. Given the observed evidence, the system fail probability estimator that most
favors large fail probabilities will be obtained.
Following this approach, the estimator for θS of the system in Figure 3 is given in (eq. 2). A proof is given
in Appendix A. Λ is not to be thought of as the system on-demand fail probability θS but simply a sum of
two fail probabilities, θA + θB , which is a quantity bigger than θS. Λ ranges between 0 and 2. We can use
(eq. 2) to specify arbitrarily small λ and δ such that P(Λ>λ) = δ, in which case the P(θS > λ) < δ, giving us
the upper bound λ on fail probability (with confidence δ) that is the important value in decision making e.g.
to achieve certification of safety-critical system.
Λ

( N + 2 )[(1 − ) N + 1 − (1 − Λ ) N + 1 ] ; Λ ∈ [ 0 ,1]

2
f 2 ( Λ | 0, N ) = 
Λ

( N + 2 )(1 − ) N + 1 ; Λ ∈ (1, 2 ]
2

(2)
The formula in (eq. 2) has the form shown in Figure 4. The pointedness and left-skew of the distribution in
Figure 4 increases with the number of tests N. The basic idea here is that if we test two SUM components in
their role within the system, we will increase our confidence in the system. The type of interaction between
the components influences the rate at which our confidence increases: given the same number of tests, (eq.
2) produces larger estimates of system fail probabilities (see section “Statistical Testability”) than would be
obtained if BBSST applied.
f2
Λ
Figure 4 - The form of f2(Λ|0,N)
Analysis of connectivity using formal methods: To remove the need for any assumptions about the
connectivity and its behavior, it would be possible to use a limited form of formal proof to refine the
program specification to the level of the components. That is, to prove that the connectivity is correct
assuming the components meet their specifications. Given such a proof, (eq. 2) remains the model for
estimating the failure probability in Figure 3 since components A and B can still fail dependently. This is an
attractive approach, as discussed later in the section on COTS components.
A 3-component system: A similar analysis for 3 SUM components in series, failing dependently in this way
(exclusive, adjacent test failures) produces the formula in (eq. 3). Like f2, it shows a linear combination of
‘harmonics’ of f1. Since bounding the fail probability with values above 1 is never going to be of interest, it
is unnecessary to calculate the formula over the range [1,3]. Over the range [0,3], f3 is a probability density
function (pdf), but the formula in (eq. 3) is not a pdf, due to its restricted range.
f 3 (Λ | 0, N ) =
Λ
Λ
( N + 3) 

 3(1 − ) N + 2 − 4(1 − ) N + 2 + (1 − Λ) N + 2 ; Λ ∈ [0,1]
2 
3
2

(3)
As N increases, (eq. 3) tends to
λ
the probability of interest
3( N + 3)
Λ
(1 − ) N + 2
2
3
∫ f (ν )dν
3
. However, for large but practically achievable levels of N
depends on the other two terms when λ and δ are small.
0
Systems Analysis
CBSST analysis of non-SUM system (program) behaviour is performed by considering interactions between
SUM-components. That is, whilst the system as a whole is not behaving as a SUM process during testing,
its components at some level of decomposition are behaving as SUM-processes on the same tests. The cases
studied in this paper are very simple, but the general idea is to identify a system decomposition that
distinguishes the SUM and non-SUM interactions between the SUM components and apply the analysis
accordingly.
What systems/components are SUM?: Systems or components that avoid memory at some level of
abstraction, such as combinational electronic circuits, naturally produce SUM processes. A component
containing explicit memory can also be SUM if the memory effects can be encapsulated within tests by
choosing tests appropriately i.e. if tests can be found such that state does not ‘escape’ from one test to the
next. Such a component may have extremely complex memory usage and state evolution within tests.
For example, two components need not be simple code blocks executing and communicating in sequence as
in Figure 1. A and B could be objects, with multiple bi-directional data communications occurring in a
single test due to method calling, as in Figure 5. The composition result of section “Series execution of
components with SUM interaction” still applies. Arbitrarily complex software components containing
memory, simple conditional branching, iteration, recursion etc. can be connected to build a SUM system if
Figure 5 - Multiple data communications
under some system testing: 1. they are individually SUM, and 2. they are connected together using
connectivity that only causes SUM interactions. Identifying SUM interactions requires an analysis of the
code - if the connectivity between components delivers data that is derived wholly from data input in the
current test, the interaction is SUM. In the case where data is delivered that is not derived solely from the
current test, the interaction is SUM if that data is the same on every test, otherwise the interaction must be
assumed to be non-SUM.
Analysis of systems built by connecting together Commercial Off-The-Shelf (COTS) components:
Case 1 - System testing of SUM interaction: Suppose two COTS components are both SUM under previous
testing of the component functions used in a new application. Suppose in the new application they are
connected together using connectivity that causes SUM interactions. Then the system will be SUM, and can
be analyzed using BBSST.
Case 2 - System testing of non-SUM interaction: In section “Series execution of components with nonSUM interaction,” A and B could be COTS components but despite the simplicity of the system, the
BBSST cannot be used to estimate its on-demand fail probability. However, it is possible to estimate the
system fail probabilities using the CBSST model (eq. 2) to analyze system test results.
Case 3 - Re-use of component reliabilities using CBSST and formal methods: In principle, formal proof
methods would, on their own, provide a sound solution to the problem of verification of software built with
re-used components. Given the components have been formally proved correct by their developers, program
verification would require an ‘integration proof’ to verify that the components, as connected together by the
‘glue’ code, satisfy a formal system specification. The informal term ‘integration proof,’ is used here to
mean a traditional refinement proof that is cut-short at the level of code components rather than continuing
down to source code statements. This form of partial proof is a suitable verification method for a company
integrating off-the-shelf components into the specific application they require. The proof effort is reduced
(it is not a full proof of code because proof of components is not required), it only requires access to
component specifications rather than code, and it is performed by the party closer to the risks. Where the
software is safety-critical it is appropriate that the company has expertise in proof methods. Unfortunately,
they will seldom be able to employ that expertise to prove the components themselves, because the
developers often protect their intellectual property by not releasing source code. Experience shows that
developers of generic re-usable components are also unlikely to prove those components correct. One
possible reason for this is commercial pressure: a component developer using formal proof methods may
lose out to competitors that produce components more cheaply. Safety critical software is a small, niche
sector only, and the remaining large majority of the market is not concerned to buy formally developed
components. The true reasons that formal methods are resisted are hard to judge, but the effects are already
felt in practice. There is little formally proved software on the market. This suggests it is important to find
an alternative verification method for re-used components that is easier to apply and/or that can be applied
by the components users rather than the component developers.
A possible way forward is a collaboration of formal proof to verify integration of components, and CBSST
models to verify the components themselves. Consider the case where there is previous individual execution
history for each of two COTS components A and B, and an integration proof has been performed. If N of
A’s previous tests are consistent with A’s operational profile in the new system (using the techniques in ref.
18) and if similarly, M of B’s previous tests are likewise consistent with B’s operational profile in the new
system, and N<M, then the system pft can be estimated conservatively using model f2 with N tests.
The result is ‘reuse of component reliabilities,’ and generalises to any number of components. Furthermore,
it applies to SUM and non-SUM interactions i.e. any data connectivity can be used to connect the
components together. Therefore, systems developed in this way from pre-built components that have been
previously statistically tested, can have a reliability associated with them based on the previous execution
history of that system’s components.
In fact, no system would be deployed without system testing, and this may be the only practical way to
obtain the components’ new operational profiles (through pure observation, using code instrumentation –
although for an alternative point of view, see ref. 5). Such system testing allows direct use of f2 as a means
of verification, ignoring component testing, but used on its own this is not an attractive approach because it
does not take advantage of re-use; previous testing of components cannot be used as evidence of system
reliability. Re-use offers the possibility of extremely large numbers of component tests, collected over a
long period of time in different applications. Thus the component based approach has the potential to
produce much higher reliability estimates than those warranted by the system testing needed to observe
operational profiles. These extra system tests could be used to increase the numbers of available component
tests, although given well re-used components the effect on reliability estimates is likely to be minimal.
Statistical Testability
An important result in the field of statistical testing is the number of fail-free tests required to demonstrate a
system failure probability of less than 10-3 at confidence 99%. The number is well known for BBSST:
• A system behaving as a SUM process requires approximately 4600 tests (using model f1).
This can be compared against the two following new results:
• A system built from two SUM components with non-SUM interaction requires approximately 9200
tests (this follows from model f2). A consequence of this is that, when using separate component test
results to justify the reliability of a new system, each component would need to have received previous
testing/usage equivalent to 9200 tests in their new operational profiles.
• A system built from three SUM components with non-SUM interaction requires approximately 18300
tests (model f3).
Conclusions
The study of behavior in complex man-made systems is an unusual application of statistics. The
deterministic nature of the software interferes with the requirement of the statistics to study random
phenomena. Nevertheless, CBSST has attractive advantages for software quality analysis. The re-use of test
results is perhaps the most commercially appealing. New conclusions from this paper are listed below.
•
•
•
•
•
New failure probability estimation models have been derived. These are capable of modeling
dependent component failure, a major obstacle for component-based software reliability models.
A promising approach to software V&V combining formal methods and statistical testing has been
suggested. It exploits the strengths of both techniques. This test/proof combination approach is
consistent with the current trend to use formal methods for selective proof objectives, rather than full
proof of code – thus long proofs at the detail of low level code are avoided. The approach is
particularly suitable for software constructed from commercial-off-the-shelf (COTS) software
components, avoiding the difficulties associated with a full proof-of-code approach.
The models allow re-use of component test results. A simple example has been given to show how this
re-use can provide failure probability estimation for software systems built from COTS components.
The new models suggest statistical testability as a new system complexity metric. Low complexity
systems are defined as those that require fewer tests to justify a given level of reliability. Software that
is highly testable in this sense, using tests of a reasonable length, is therefore desirable. This may have
important implications for future designs of safety-critical software. Just as some software design
solutions have desirable low computational complexity compared to others implementing the same
requirements, so some designs will have higher testability.
Research in CBSST is at a very early stage of development, and its practicality is currently speculative.
This paper has attempted to identify open research questions that will prove important for the
successful future application of the technique.
Acknowledgments
This work presented in this paper comprises aspects of studies (DDT project) performed as part of the UK
Nuclear Safety Research programme, funded and controlled by the Industry Management Committee.
Biography
J. H. R. May, Ph.D., Lecturer, Safety Systems Research Centre (SSRC), Dept. Computer Science,
University of Bristol, UK, telephone - +44 (0)117 954-5141, facsimile - +44 (0)117 954-5208, e-mail –
j.may@bristol.ac.uk.
John May has researched software reliability for over 10 years, in industrial collaborative projects. He is
currently working on new software reliability models for software built with re-used components.
References
1.
2.
3.
4.
5.
Dijkstra E. “Notes on Structured Programming” in Structured Programming Dahl O, Dijkstra E, Hoare
C (Eds.) Academic Press 1972
Kanoun K, Laprie JC, Thevenod-Fosse P “Software reliability: state-of-the-art and perspectives” LAAS
Report 95205, May 1995. http://www.laas.fr/laasve/
Hughes G, May JHR, Lunn AD “Reliability estimation from appropriate testing of plant protection
software”, IEE Software Engineering Journal, v10 n6, November 1995
Miller WM , Morell LJ, Noonan RE, Park SK, Nicol DM, Murrill BW and Voas JM “Estimating the
probability of failure when testing reveals no failures” IEEE Trans. on Software Engineering v18 n1
1992
Musa JD “Operational profiles in software reliability engineering,” IEEE Software 10(2) 1993
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
Zhu H, Hall PAV, May JHR “Software Unit Test Coverage and Adequacy,” ACM Computing Surveys
1997
Goseva-Popstojanova K, & Trivedi K.S. “Architecture-based approach to reliability assessment of
software systems” Performance Evaluation, v45 n2-3, July 2001
May JHR and Lunn AD “New Statistics for Demand-Based Software Testing” Information Processing
Letters 53, 1995
May JHR, Kuball S, Hughes G “Test Statistics for System Design Failure” International Journal of
Reliability, Quality and Safety Engineering, v6 n3 1999 pp. 249-264
Kuball S., May J.H.R. & Hughes G. “Structural software reliability estimation,” Lecture Notes in
Computer Science v1698 ‘Computer Safety, Reliability and Security’ Felici, Kanoun, Pasquini (Eds)
pp336-349 ISBN 3540664882 Springer 1999
Kuball S., May J.H.R. & Hughes G. “Building a System Failure Rate Estimator by Identifying
Component Failure Rates” Procs. Of the 10th Int. Symposium on Software Reliability Engineering
(ISSRE’99), Boca Raton, Florida, Nov 1-4, 1999 pp32-41 IEEE Computer Society 1999
Kuball S., May J.H.R. & Hughes G. “Software Reliability Assessment for Branching Structures: A
Hierarchical Approach,” 2nd Int. Conference on Mathematical Methods in Reliability, vol 2, Bordeaux,
July 4-7 2000
May JHR, Lunn AD “A Model of Code Sharing for Estimating Software Failure on Demand
Probabilities” IEEE Trans. on Software Engineering SE-21 (9) 1995
Hamlet D, Mason D, Woit D “Theory of software reliability based on components” Proceedings of the
23rd International Conference on Software Engineering (ICSE 2001), pp12-19 May 2001, Toronto,
Ontario, Canada. IEEE Computer Society 2001, ISBN 0-7695-1050-7
Krishnamurthy S., & Mathur P. “On the estimation of reliability of a software system using reliabilities
of its components” In Procs. of 8th Intl. Symposium on Software Reliability Engineering (ISSRE’97),
Albuquerque, New Mexico, November 1997
Woit D.M. & Mason D.V. “Software component independence”, Procs. 3rd IEEE High-Assurance
Systems Engineering Symposium (HASE'98), Washington DC, Nov 1998
Gokhale S.S. & Trivedi K.S. “Dependency Characterization in path-based approaches to architecturebased software reliability prediction” Procs. Symposium on Application-Specific Systems and Software
Technology (ASSET’98) pp. 86-89, Richardson, TX, March 1998
Goseva-Popstojanova K, & Trivedi K.S. “Failure correlation in software reliability models” IEEE Trans.
on Reliability 49(1), pp37-48, 2000
Appendix
The proof below is for 2 components. The proof of the result for 3 components follows the same procedure.
The manipulation of the limits of integration is more involved in the 3-component case. The N-component
problem remains open.
For a given θ1 and θ2, in the case of exclusive failures showing extreme clustering, (eq. 4) describes the
probability of a failure-free sequence of N tests when N is very large and dwarfs the number of components
(which in this case is 2). This can be seen from inspection of the fail-line in section “Series execution of
components with non-SUM interaction,” and noting that f1 (0 | θ i , N ) = (1 − θ i ) N where 0 denotes zero failures.
P(0 | θ1 ,θ 2 , N ) = MIN{(1 − θ1 ) N , (1 − θ 2 ) N }
An expression for
f 2 (Λ | 0, N ) =
(4)
f 2 (Λ | 0, N )
is shown in (eq. 5).
∫ h(θ ,θ
| 0, N )dθ1dθ 2
1
2
(5)
Λ =θ 1 + θ 2
which can be evaluated using (eq. 6).
h(θ1 ,θ 2 | 0, N ) =
P(0 | θ1 , θ 2 , N ) g (θ1 ,θ 2 | N )
P (0 | N )
(6)
1 1
where g is a prior for (θ1,θ2) and P(0 | N ) = ∫ ∫ P(0 | θ1 , θ 2 , N ) g (θ1 ,θ 2 | N )dθ1dθ 2 .
0 0
Using g (θ1 , θ 2 | N ) = 1 to express no prior preference on the location of (θ1,θ2) within [0,1]2, it follows that
h(θ1 ,θ 2 | 0, N ) = k .MIN{(1 − θ1 ) N , (1 − θ 2 ) N } where
1
=
k
∫ ∫ MIN{(1 − θ )
1
1
0
0
1
N
, (1 − θ 2 ) N }dθ1dθ 2 . This gives (eq. 7).
( N + 1)( N + 2 )
MIN {(1 − θ 1 ) N , (1 − θ 2 ) N }
2
Substituting θ 2 = Λ − θ 1 , (eq. 4) becomes (eq. 8).
h (θ 1 , θ 2 | 0, N ) =
(7)
Λ
 h(θ1 , Λ − θ1 | 0, N )dθ1 ; Λ ∈ [0,1]

f 2 (Λ | 0, N ) =  01
 h(θ , Λ − θ | 0, N )dθ ; Λ ∈ (1,2]
1
1
1

 Λ −1
(8)
∫
∫
Which can be rewritten as (eq. 9).

Λ

Λ

 ( N + 1)(N + 2)  2
N
N

(1 − Λ + θ1 ) dθ1 + (1 −θ1 ) dθ1  ; Λ ∈ [0,1]

2
0


Λ



2


f 2 (Λ | 0, N ) = 
Λ
 2


1


 ( N + 1)(N + 2)  (1 − Λ + θ ) N dθ + (1 −θ ) N dθ  ; Λ ∈ (1,2]
1
1
1
1

2
 Λ −1

Λ



2



∫
∫
∫
(9)
∫
Which evaluates to (eq. 10).
Λ

( N + 2)[(1 − ) N +1 − (1 − Λ ) N +1 ] ; Λ ∈ [0,1]

2
f 2 ( Λ | 0, N ) = 
Λ

( N + 2)(1 − ) N +1 ; Λ ∈ (1,2]
2

(10)
Download