Testing the Reliability of Component-Based Safety Critical Software J. H. R. May, Ph.D.; Safety Systems Research Centre; University of Bristol, UK Keywords: Software verification, architectural software reliability models, software component re-use, software testability and complexity. Abstract Testing remains a fundamentally important way to check that a software program behaves as required, but a weakness of testing is that successful testing only leads to informal quality statements. Even where quantitative methods are employed, it is not clear how the objective statements (e.g. 100% code coverage has been achieved) relate to the statements that are really useful such as “the software is correct,” or “the software is reliable.” This inconclusive nature of testing is at the heart of Dijkstra’s famous comment “Program testing can be used to show the presence of bugs, but never to show their absence!” (ref. 1) This paper argues that Dijkstra’s comment is not as important as it might seem, and that software reliability estimates produced by new component-based statistical software testing (CBSST) models provide a testing framework for software quality that is thoroughly formal, but in a different sense to that envisaged by Dijkstra. A significant benefit of these models is that they offer a new verification method for software built with component re-use, based on “proven-in-use” components. Introduction Dijkstra’s famous comment (see Abstract) is not in itself a criticism of testing. However, it becomes a severe criticism of testing under the supplementary assumption that absence of any errors must be established before software may be used. I.e. software must be proved to be perfect. This assumption disqualifies virtually all software in use today, but it also ignores a crucial theoretical possibility. Specifically, it disqualifies imperfect software that can be shown to be fit to some degree for its purpose. If it were possible to extract a failure probability estimate from the results of software testing, the drawbacks described by Dijkstra’s comment and the supplementary assumption could be side-stepped. In this case, testing would provide a new framework for software quality that is thoroughly formal, but in a different sense to that envisaged by Dijkstra. Statistical Software Testing (SST) is a branch of testing research into methods of obtaining the failure probability estimate warranted by test results. It is sometimes called Statistical System Testing since it can be applied to design verification of hardware or software systems, and to systems consisting of both. SST models estimate operational reliability for programs whose reliability is stable (ref. 2). In its simplest form, Black Box SST (BBSST) is well developed and has been successfully applied to complex commercial systems (ref. 3). BBSST is so called because, given a N valid BBSST test results, the code (length, complexity, or any aspect of its structure) has no effect on the reliability estimate produced by the BBSST model. However, there are drawbacks with BBSST. It applies to a monolithic system, and it is after-the-fact. After a software system has been built it is too late to find that is unreliable. Modern software engineering techniques are moving towards component-based development, with component re-use, and this offers the opportunity for a new improved approach to software quality. Ideally, re-use of high reliability components would produce high reliability software faster and more efficiently. There is a need for SST models specifically aimed at supporting component re-use and exploring these possibilities. BBSST can not do this because to support component re-use, SST models are needed to:1. Explain how the reliability of software depends on its components; 2. Be able to exploit re-used software components of known reliability in estimating an overall software system reliability; 3. Provide a new definition of software complexity based on testability which can be used to discover how to design component-based software that is inherently testable i.e. demonstrably reliable. In point 2, re-use means more than the transplantation of code from an old system into a new one. It also entails the ability to use a component’s history of successful previous testing or usage in the old system as evidence of its reliability in the new system, and thereby to contribute to the reliability estimate for that new system. This is the idea of “proven-in-use” components. One objective of CBSST research is to make it possible to build sets of re-usable software components, constantly aggregating evidence of their reliability as they are tested and used in different applications, and then use those components to build demonstrably reliable software. In 3 above, statistical testability is the property of software that determines the number of tests required to demonstrate that it achieves a given level of reliability, according to the models in this paper. It can be seen as a new, unknown software complexity measure, and its discovery is a second objective of CBSST research. BBSST background: All SST models verify software design using empirical evidence from testing or, if the software has been observed appropriately, from operational use. In BBSST, when a software system executes correctly on N valid BBSST tests, it is possible to make an estimate of the system probability of failure on test (pft) (ref. 4). This statistical test experiment requires that test selection is performed by sampling from a global set of possible tests to produce the ‘statistical test set’ that is presented to the software. The frequency distribution of the different kinds of tests in the statistical test set must be consistent with the operational profile faced by the software in practice (refs. 4,5). Generation of the correct test frequency distribution clearly requires knowledge of the inputs the system will see over a representative life or mission time. However, with this knowledge, methods for the synthesis of the distribution are available. The essential techniques have been developed and demonstrated for real, complex system applications (refs. 3,5). Previous work on reliability models that use code structure: Much of the work on measurement of test effectiveness, and software quality assessment based on testing, has emphasized the relevance of code structure (ref. 6). However, it has proved difficult to link code structure and statistical testing models. Architectural Software Reliability (ASR) models seek to evaluate the influence of code structure on reliability estimates. ASR models reference ‘parts’ of the software, but the parts are not necessarily components in the usual sense – they might be execution paths for instance. A review of different types of ASR models is available (ref. 7). The models proposed in this paper are a type of ASR model in which software is represented as a set of interacting components. They are the result of a particular line of research to extend the BBSST models of Miller et al (ref. 4) to be component-based (ref. 8,9,10,11,12). Early examples of this type of model can be found in (refs. 13,8). The new models in this paper are called CBSST models. In CBSST models, a component is defined in a traditional way as a physical ‘parcel’ of code that, without internal modification, can be placed in a software system to provide some fixed functionality. There are two main research problems associated with CBSST:1. Modeling the change in a component’s reliability when it is transferred between environments; 2. Modeling dependent component failure. The first problem occurs because the excitation rates of bugs are different in different component environments (applications). Solutions to this problem have been proposed (ref. 4,14). This paper does not contribute to this area, but instead concentrates on the dependence problem. The particular problem has been identified as follows “Without exception the existing models assume that the failure processes associated across different components are mutually independent.” (ref. 7) This dependence is sometimes called ‘[inter-] component dependence’ (ref. 15) to distinguish it from dependence between tests. There are two possible approaches to component dependence. The first approach is to accept its existence and attempt to model it, and this is the approach taken in this paper. The second approach is to build software in ways that avoid causing component dependence (ref. 16). A difficulty with this approach is that component dependence can exist without any communication between components so that manipulating code can not rule out dependence. Two components executing in series can fail dependently without communicating in any way. It may just occur by chance that the bugs they contain, for example, never cause failure on the same test. Although independence of component failures may not be achieved, the approach of Woit and Mason (ref. 16) remains promising because the methods may result in software with low dependence between components. If CBSST, or other ASR models can distinguish between levels of component dependence, there will be a benefit in achieving it: low dependence is associated with higher statistical testability (it allows higher reliability estimates for a given number of tests). CBSST Modeling In essence, CBSST applies BBSST to a program’s components, and then combines the results to obtain a program reliability estimate. The BBSST model is also called the Single Urn Model (SUM), and the random pattern of failures it describes will be referred to as a SUM process. The SUM is given by the formula in (eq. 1) in which, f1 is a probability density function describing the location of the component pft, θ, conditional on seeing 0 failures in a sequence of N tests. f1 (θ | 0, N ) = ( N + 1)(1 − θ ) N (1) Components failing according to (eq. 1) will be called SUM process components, or ‘SUM components.’ Whether or not a component failure pattern is a SUM process depends on a triple (C, S, T): respectively the component code, its specification, and the statistical tests it receives. In this paper, a statistical test set T for a component (or program) C with specification S, means a sequence of tests generated by picking tests sequentially with replacement, using a random choice mechanism from a set ψ that covers all valid executions of C based on S, and according to a probability distribution (the operational profile) over ψ. A statistical test set does not necessarily imply a SUM process. That is, relative to a valid statistical test set and a component specification, not all components are SUM components. The reason is that the determinism of the software computation can prevent the random test choice mechanism from causing random failures. Section “Series execution of components with non-SUM interaction” provides a concrete example of how this can occur. This is important because, although it may be possible in principle to achieve a SUM process for any program by carefully defining T, this is not useful where the aim is re-use of component test results. To re-use component test results W, it is necessary to ensure that the component is tested in the new program with tests defined in the same manner as in W. This ensures the relevance of the previous tests to the new environment but it also means that program testing does not always cause a SUM process. Simply testing all of a system’s components individually (unit testing) is not a full test of a system. The interactions between components also need to be tested (integration testing). Interactions between components are caused by the connectivity between components: by which we mean the code causing data communication and flow of control between components. The types of connectivity studied in this paper are simple. For example, conditional execution is not studied; we consider the case where all components execute on each test. In addition, data communication mechanisms are restricted to parameter passing and use of global variables and it is assumed that failure of any component or connectivity fails the software. We define a SUM interaction to be any interaction between two SUM components that causes the two components considered as one entity to fail as a SUM process. Any other form of interaction is termed nonSUM. A CBSST model for a more complex component architecture must be derived to fit that architecture, based on these two types of interaction. This is similar in concept to construction of, for example, models of AC electrical circuits, and future work will concentrate on convenient methods of building CBSST models in this way. Models for the two fundamental types of interaction are described in the following two sections. Series execution of components with SUM interaction: Consider two SUM components A and B executing in series, interacting by passing data between components as shown by the arrows in Figure 1. In this notation, a box represents a SUM component and an unbroken arrow represents data communication that occurs within the duration of a single test, and that communicates data computed using only program input data received on that test. A B Figure 1 - A two-component system We will initially assume that the connectivity and resulting data communication is correct. In this case, it can be argued that the system in Figure 1 behaves as a SUM process i.e. the communications cause a SUM interaction and the overall system behaves as a single SUM process. The reason is that this configuration ensures an unchanging size of failure domain in the system test space over time (i.e. for every test in any test sequence). The test selection mechanism is the only influence on whether a given test succeeds or fails, and that mechanism is random. In practice, the connectivity cannot be assumed correct in the sense that it implements the system requirements. It must be treated as a third ‘element’ which itself can fail. If we assume that the code implements the connectivity as shown in the diagram, connectivity failures (system failures not due to either component contradicting its specification) in Figure 1 also have a static domain in the test space. The connectivity involves only simple data delivery, and so its failure domain can not be dependent on previous test history. Hence the whole system, including connectivity, is SUM. The assumption is that the connectivity is simple in nature. However, we may not wish to rely on such an assumption – the intended connectivity might be simple, whilst the reality could be complex due to programming mistakes. In this case it would be possible to use a limited form of formal proof to show the connectivity is correct given the components are correct. This is discussed further in section “Analysis of connectivity using formal methods” and later in the section on commercial off-the-shelf (COTS) components. Series execution of components with non-SUM interaction: In Figure 2, a program S is arbitrarily shown with one data input and two data outputs. The broken arrow indicates transfer of data across tests. That is, system S maintains some state from one test to the next. Such a configuration will generally, although not necessarily, result in a non-SUM process. This section looks at an example that produces this system behavior with two SUM components as in Figure 3. f3(x) S Figure 2 - A system transferring state across tests f4(x) x t A B f2(x,m) Figure 3 - Non-SUM connectivity The broken arrow in Figure 3 shows the same data communication as the broken arrow in Figure 2 and indicates that on a single test, A performs a computation that is dependent on B’s computation in previous tests. The computation of A is therefore ‘conditioned’ by previous tests. A simple example of code that is described by Figs 2 and 3 is given below. For example, t could be an integer, x a global variable, m a local variable, and the fi simple arithmetic functions. {component A:} input(t) m = read_memory(addr) x = f1(t) output (f2(x,m)) {component B:} write_memory(addr, f3(x)) output(f4(x)) The persistence of state between tests in Figure 3 means the resulting system cannot be assumed to behave as a SUM process despite both A and B behaving individually as SUM components. An example can be constructed to illustrate this. We will assume that the connectivity is correct. Suppose that when B fails on any test in its failure subset in the test space, it correctly generates specific data d (generated by f3(x)) to send to A (which A receives in the following test), and that when B succeeds the data it sends to A is never d. Further assume that d is an acceptable input for A according to A’s specification. Now suppose that A fails on receiving d from B, irrespective of the value of t. Under these circumstances the system fails in a particular pattern. If the results of system tests were spread across the page from left to right with time, and a dot represented a test failure whilst a space represented a test success (this will be called a fail line), the pattern would have the following form. .. .. .. .. .. .. .. .. etc. This failure pattern is not a SUM process: a failure of B is always followed immediately by failure of A on the subsequent test. In contrast to the system of Figure 1, we cannot employ the SUM to analyze system reliability directly. The example shows how easy it is to create a non-SUM process from interacting SUM components. It exhibits dependence of system failure among successive tests, a phenomenon that has been studied before (refs. 17,18). Unlike these existing approaches, our CBSST models do not postulate parameters to characterize the level of dependence because of the difficulty of evaluating these parameters in practice for a particular program. Our approach explains test dependence in terms of inter-component dependence, obtaining bounds on the reliability estimates by analyzing extreme component dependence conditions. The example given above shows only one possible pattern of failure behavior. In general, the computation can produce an endless variety of failure patterns. Analysis of specific patterns is not possible since it relies on knowledge of the actual failure behavior, which is not known. This is why we seek a bounding analysis. Our key assertion is that the example interaction described above for Figure 3 is special - it provides a bounding case. If we estimate the system pft assuming this form of interaction, then if the real interaction is anything different, the estimate will be conservative i.e. an overestimate. The argument for conservatism is developed below. The example of adjacent test failures, described above for Figure 3, shows a particular extreme form of dependence between failure of A and B. It is extreme in the following senses: the dependence on the same test is total negative dependence (exclusivity), whilst the dependence between different tests shows the closest clustering possible. This can be pictured using separate fail lines for A and B. . . . . . . . . . . . . . . . etc. (A) etc. (B) . The two fail lines below show two SUM processes behaving independently: in this case coincident failure and clustering both occur randomly. . . . . . . . . . . . . . . . . . (A) (B) Notice that if the independence failure pattern is assumed for the purposes of modeling, given fixed component failure probabilities, the probability of observing a long fail-free sequence of tests is lower than in the case of of adjacent component failures. Therefore, given an observation of a fail-free sequence, the assumption of the independence failure pattern leads, statistically, to lower estimates for the component pfts. Now consider the SUM interaction of Figure 1. The first kind of dependence is possible – it is quite possible that more (or fewer) same-test failures of A and B occur than would occur if the two SUM processes were independent. However, the second type of dependence, clustering on different tests, is prevented by the randomness of the demand selection procedure. Given SUM processes with on-demand fail probabilities θA and θB, over the range of all possible same-test dependencies, exclusivity produces the largest value of θS, the system probability of failure on demand. Similarly given θA and θB, and given any state of same-test dependence, extreme clustering (adjacent failures of A and B) produces the largest probability of zero system fails in N tests. In the usual statistical fashion, if we turn this statement around, extreme clustering is the assumption that produces the estimators that most favor large θA and θB when we observe no failures. Therefore if we observe no failures during testing and use both of these assumptions simultaneously, we effectively estimate the largest θA and θB possible, and then sum them to obtain the system fail probability. That is, from the point of view of system fail probability, we are making the worst case assumptions for the two types of dependence, both in isolation and in conjunction. Given the observed evidence, the system fail probability estimator that most favors large fail probabilities will be obtained. Following this approach, the estimator for θS of the system in Figure 3 is given in (eq. 2). A proof is given in Appendix A. Λ is not to be thought of as the system on-demand fail probability θS but simply a sum of two fail probabilities, θA + θB , which is a quantity bigger than θS. Λ ranges between 0 and 2. We can use (eq. 2) to specify arbitrarily small λ and δ such that P(Λ>λ) = δ, in which case the P(θS > λ) < δ, giving us the upper bound λ on fail probability (with confidence δ) that is the important value in decision making e.g. to achieve certification of safety-critical system. Λ ( N + 2 )[(1 − ) N + 1 − (1 − Λ ) N + 1 ] ; Λ ∈ [ 0 ,1] 2 f 2 ( Λ | 0, N ) = Λ ( N + 2 )(1 − ) N + 1 ; Λ ∈ (1, 2 ] 2 (2) The formula in (eq. 2) has the form shown in Figure 4. The pointedness and left-skew of the distribution in Figure 4 increases with the number of tests N. The basic idea here is that if we test two SUM components in their role within the system, we will increase our confidence in the system. The type of interaction between the components influences the rate at which our confidence increases: given the same number of tests, (eq. 2) produces larger estimates of system fail probabilities (see section “Statistical Testability”) than would be obtained if BBSST applied. f2 Λ Figure 4 - The form of f2(Λ|0,N) Analysis of connectivity using formal methods: To remove the need for any assumptions about the connectivity and its behavior, it would be possible to use a limited form of formal proof to refine the program specification to the level of the components. That is, to prove that the connectivity is correct assuming the components meet their specifications. Given such a proof, (eq. 2) remains the model for estimating the failure probability in Figure 3 since components A and B can still fail dependently. This is an attractive approach, as discussed later in the section on COTS components. A 3-component system: A similar analysis for 3 SUM components in series, failing dependently in this way (exclusive, adjacent test failures) produces the formula in (eq. 3). Like f2, it shows a linear combination of ‘harmonics’ of f1. Since bounding the fail probability with values above 1 is never going to be of interest, it is unnecessary to calculate the formula over the range [1,3]. Over the range [0,3], f3 is a probability density function (pdf), but the formula in (eq. 3) is not a pdf, due to its restricted range. f 3 (Λ | 0, N ) = Λ Λ ( N + 3) 3(1 − ) N + 2 − 4(1 − ) N + 2 + (1 − Λ) N + 2 ; Λ ∈ [0,1] 2 3 2 (3) As N increases, (eq. 3) tends to λ the probability of interest 3( N + 3) Λ (1 − ) N + 2 2 3 ∫ f (ν )dν 3 . However, for large but practically achievable levels of N depends on the other two terms when λ and δ are small. 0 Systems Analysis CBSST analysis of non-SUM system (program) behaviour is performed by considering interactions between SUM-components. That is, whilst the system as a whole is not behaving as a SUM process during testing, its components at some level of decomposition are behaving as SUM-processes on the same tests. The cases studied in this paper are very simple, but the general idea is to identify a system decomposition that distinguishes the SUM and non-SUM interactions between the SUM components and apply the analysis accordingly. What systems/components are SUM?: Systems or components that avoid memory at some level of abstraction, such as combinational electronic circuits, naturally produce SUM processes. A component containing explicit memory can also be SUM if the memory effects can be encapsulated within tests by choosing tests appropriately i.e. if tests can be found such that state does not ‘escape’ from one test to the next. Such a component may have extremely complex memory usage and state evolution within tests. For example, two components need not be simple code blocks executing and communicating in sequence as in Figure 1. A and B could be objects, with multiple bi-directional data communications occurring in a single test due to method calling, as in Figure 5. The composition result of section “Series execution of components with SUM interaction” still applies. Arbitrarily complex software components containing memory, simple conditional branching, iteration, recursion etc. can be connected to build a SUM system if Figure 5 - Multiple data communications under some system testing: 1. they are individually SUM, and 2. they are connected together using connectivity that only causes SUM interactions. Identifying SUM interactions requires an analysis of the code - if the connectivity between components delivers data that is derived wholly from data input in the current test, the interaction is SUM. In the case where data is delivered that is not derived solely from the current test, the interaction is SUM if that data is the same on every test, otherwise the interaction must be assumed to be non-SUM. Analysis of systems built by connecting together Commercial Off-The-Shelf (COTS) components: Case 1 - System testing of SUM interaction: Suppose two COTS components are both SUM under previous testing of the component functions used in a new application. Suppose in the new application they are connected together using connectivity that causes SUM interactions. Then the system will be SUM, and can be analyzed using BBSST. Case 2 - System testing of non-SUM interaction: In section “Series execution of components with nonSUM interaction,” A and B could be COTS components but despite the simplicity of the system, the BBSST cannot be used to estimate its on-demand fail probability. However, it is possible to estimate the system fail probabilities using the CBSST model (eq. 2) to analyze system test results. Case 3 - Re-use of component reliabilities using CBSST and formal methods: In principle, formal proof methods would, on their own, provide a sound solution to the problem of verification of software built with re-used components. Given the components have been formally proved correct by their developers, program verification would require an ‘integration proof’ to verify that the components, as connected together by the ‘glue’ code, satisfy a formal system specification. The informal term ‘integration proof,’ is used here to mean a traditional refinement proof that is cut-short at the level of code components rather than continuing down to source code statements. This form of partial proof is a suitable verification method for a company integrating off-the-shelf components into the specific application they require. The proof effort is reduced (it is not a full proof of code because proof of components is not required), it only requires access to component specifications rather than code, and it is performed by the party closer to the risks. Where the software is safety-critical it is appropriate that the company has expertise in proof methods. Unfortunately, they will seldom be able to employ that expertise to prove the components themselves, because the developers often protect their intellectual property by not releasing source code. Experience shows that developers of generic re-usable components are also unlikely to prove those components correct. One possible reason for this is commercial pressure: a component developer using formal proof methods may lose out to competitors that produce components more cheaply. Safety critical software is a small, niche sector only, and the remaining large majority of the market is not concerned to buy formally developed components. The true reasons that formal methods are resisted are hard to judge, but the effects are already felt in practice. There is little formally proved software on the market. This suggests it is important to find an alternative verification method for re-used components that is easier to apply and/or that can be applied by the components users rather than the component developers. A possible way forward is a collaboration of formal proof to verify integration of components, and CBSST models to verify the components themselves. Consider the case where there is previous individual execution history for each of two COTS components A and B, and an integration proof has been performed. If N of A’s previous tests are consistent with A’s operational profile in the new system (using the techniques in ref. 18) and if similarly, M of B’s previous tests are likewise consistent with B’s operational profile in the new system, and N<M, then the system pft can be estimated conservatively using model f2 with N tests. The result is ‘reuse of component reliabilities,’ and generalises to any number of components. Furthermore, it applies to SUM and non-SUM interactions i.e. any data connectivity can be used to connect the components together. Therefore, systems developed in this way from pre-built components that have been previously statistically tested, can have a reliability associated with them based on the previous execution history of that system’s components. In fact, no system would be deployed without system testing, and this may be the only practical way to obtain the components’ new operational profiles (through pure observation, using code instrumentation – although for an alternative point of view, see ref. 5). Such system testing allows direct use of f2 as a means of verification, ignoring component testing, but used on its own this is not an attractive approach because it does not take advantage of re-use; previous testing of components cannot be used as evidence of system reliability. Re-use offers the possibility of extremely large numbers of component tests, collected over a long period of time in different applications. Thus the component based approach has the potential to produce much higher reliability estimates than those warranted by the system testing needed to observe operational profiles. These extra system tests could be used to increase the numbers of available component tests, although given well re-used components the effect on reliability estimates is likely to be minimal. Statistical Testability An important result in the field of statistical testing is the number of fail-free tests required to demonstrate a system failure probability of less than 10-3 at confidence 99%. The number is well known for BBSST: • A system behaving as a SUM process requires approximately 4600 tests (using model f1). This can be compared against the two following new results: • A system built from two SUM components with non-SUM interaction requires approximately 9200 tests (this follows from model f2). A consequence of this is that, when using separate component test results to justify the reliability of a new system, each component would need to have received previous testing/usage equivalent to 9200 tests in their new operational profiles. • A system built from three SUM components with non-SUM interaction requires approximately 18300 tests (model f3). Conclusions The study of behavior in complex man-made systems is an unusual application of statistics. The deterministic nature of the software interferes with the requirement of the statistics to study random phenomena. Nevertheless, CBSST has attractive advantages for software quality analysis. The re-use of test results is perhaps the most commercially appealing. New conclusions from this paper are listed below. • • • • • New failure probability estimation models have been derived. These are capable of modeling dependent component failure, a major obstacle for component-based software reliability models. A promising approach to software V&V combining formal methods and statistical testing has been suggested. It exploits the strengths of both techniques. This test/proof combination approach is consistent with the current trend to use formal methods for selective proof objectives, rather than full proof of code – thus long proofs at the detail of low level code are avoided. The approach is particularly suitable for software constructed from commercial-off-the-shelf (COTS) software components, avoiding the difficulties associated with a full proof-of-code approach. The models allow re-use of component test results. A simple example has been given to show how this re-use can provide failure probability estimation for software systems built from COTS components. The new models suggest statistical testability as a new system complexity metric. Low complexity systems are defined as those that require fewer tests to justify a given level of reliability. Software that is highly testable in this sense, using tests of a reasonable length, is therefore desirable. This may have important implications for future designs of safety-critical software. Just as some software design solutions have desirable low computational complexity compared to others implementing the same requirements, so some designs will have higher testability. Research in CBSST is at a very early stage of development, and its practicality is currently speculative. This paper has attempted to identify open research questions that will prove important for the successful future application of the technique. Acknowledgments This work presented in this paper comprises aspects of studies (DDT project) performed as part of the UK Nuclear Safety Research programme, funded and controlled by the Industry Management Committee. Biography J. H. R. May, Ph.D., Lecturer, Safety Systems Research Centre (SSRC), Dept. Computer Science, University of Bristol, UK, telephone - +44 (0)117 954-5141, facsimile - +44 (0)117 954-5208, e-mail – j.may@bristol.ac.uk. John May has researched software reliability for over 10 years, in industrial collaborative projects. He is currently working on new software reliability models for software built with re-used components. References 1. 2. 3. 4. 5. Dijkstra E. “Notes on Structured Programming” in Structured Programming Dahl O, Dijkstra E, Hoare C (Eds.) Academic Press 1972 Kanoun K, Laprie JC, Thevenod-Fosse P “Software reliability: state-of-the-art and perspectives” LAAS Report 95205, May 1995. http://www.laas.fr/laasve/ Hughes G, May JHR, Lunn AD “Reliability estimation from appropriate testing of plant protection software”, IEE Software Engineering Journal, v10 n6, November 1995 Miller WM , Morell LJ, Noonan RE, Park SK, Nicol DM, Murrill BW and Voas JM “Estimating the probability of failure when testing reveals no failures” IEEE Trans. on Software Engineering v18 n1 1992 Musa JD “Operational profiles in software reliability engineering,” IEEE Software 10(2) 1993 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. Zhu H, Hall PAV, May JHR “Software Unit Test Coverage and Adequacy,” ACM Computing Surveys 1997 Goseva-Popstojanova K, & Trivedi K.S. “Architecture-based approach to reliability assessment of software systems” Performance Evaluation, v45 n2-3, July 2001 May JHR and Lunn AD “New Statistics for Demand-Based Software Testing” Information Processing Letters 53, 1995 May JHR, Kuball S, Hughes G “Test Statistics for System Design Failure” International Journal of Reliability, Quality and Safety Engineering, v6 n3 1999 pp. 249-264 Kuball S., May J.H.R. & Hughes G. “Structural software reliability estimation,” Lecture Notes in Computer Science v1698 ‘Computer Safety, Reliability and Security’ Felici, Kanoun, Pasquini (Eds) pp336-349 ISBN 3540664882 Springer 1999 Kuball S., May J.H.R. & Hughes G. “Building a System Failure Rate Estimator by Identifying Component Failure Rates” Procs. Of the 10th Int. Symposium on Software Reliability Engineering (ISSRE’99), Boca Raton, Florida, Nov 1-4, 1999 pp32-41 IEEE Computer Society 1999 Kuball S., May J.H.R. & Hughes G. “Software Reliability Assessment for Branching Structures: A Hierarchical Approach,” 2nd Int. Conference on Mathematical Methods in Reliability, vol 2, Bordeaux, July 4-7 2000 May JHR, Lunn AD “A Model of Code Sharing for Estimating Software Failure on Demand Probabilities” IEEE Trans. on Software Engineering SE-21 (9) 1995 Hamlet D, Mason D, Woit D “Theory of software reliability based on components” Proceedings of the 23rd International Conference on Software Engineering (ICSE 2001), pp12-19 May 2001, Toronto, Ontario, Canada. IEEE Computer Society 2001, ISBN 0-7695-1050-7 Krishnamurthy S., & Mathur P. “On the estimation of reliability of a software system using reliabilities of its components” In Procs. of 8th Intl. Symposium on Software Reliability Engineering (ISSRE’97), Albuquerque, New Mexico, November 1997 Woit D.M. & Mason D.V. “Software component independence”, Procs. 3rd IEEE High-Assurance Systems Engineering Symposium (HASE'98), Washington DC, Nov 1998 Gokhale S.S. & Trivedi K.S. “Dependency Characterization in path-based approaches to architecturebased software reliability prediction” Procs. Symposium on Application-Specific Systems and Software Technology (ASSET’98) pp. 86-89, Richardson, TX, March 1998 Goseva-Popstojanova K, & Trivedi K.S. “Failure correlation in software reliability models” IEEE Trans. on Reliability 49(1), pp37-48, 2000 Appendix The proof below is for 2 components. The proof of the result for 3 components follows the same procedure. The manipulation of the limits of integration is more involved in the 3-component case. The N-component problem remains open. For a given θ1 and θ2, in the case of exclusive failures showing extreme clustering, (eq. 4) describes the probability of a failure-free sequence of N tests when N is very large and dwarfs the number of components (which in this case is 2). This can be seen from inspection of the fail-line in section “Series execution of components with non-SUM interaction,” and noting that f1 (0 | θ i , N ) = (1 − θ i ) N where 0 denotes zero failures. P(0 | θ1 ,θ 2 , N ) = MIN{(1 − θ1 ) N , (1 − θ 2 ) N } An expression for f 2 (Λ | 0, N ) = (4) f 2 (Λ | 0, N ) is shown in (eq. 5). ∫ h(θ ,θ | 0, N )dθ1dθ 2 1 2 (5) Λ =θ 1 + θ 2 which can be evaluated using (eq. 6). h(θ1 ,θ 2 | 0, N ) = P(0 | θ1 , θ 2 , N ) g (θ1 ,θ 2 | N ) P (0 | N ) (6) 1 1 where g is a prior for (θ1,θ2) and P(0 | N ) = ∫ ∫ P(0 | θ1 , θ 2 , N ) g (θ1 ,θ 2 | N )dθ1dθ 2 . 0 0 Using g (θ1 , θ 2 | N ) = 1 to express no prior preference on the location of (θ1,θ2) within [0,1]2, it follows that h(θ1 ,θ 2 | 0, N ) = k .MIN{(1 − θ1 ) N , (1 − θ 2 ) N } where 1 = k ∫ ∫ MIN{(1 − θ ) 1 1 0 0 1 N , (1 − θ 2 ) N }dθ1dθ 2 . This gives (eq. 7). ( N + 1)( N + 2 ) MIN {(1 − θ 1 ) N , (1 − θ 2 ) N } 2 Substituting θ 2 = Λ − θ 1 , (eq. 4) becomes (eq. 8). h (θ 1 , θ 2 | 0, N ) = (7) Λ h(θ1 , Λ − θ1 | 0, N )dθ1 ; Λ ∈ [0,1] f 2 (Λ | 0, N ) = 01 h(θ , Λ − θ | 0, N )dθ ; Λ ∈ (1,2] 1 1 1 Λ −1 (8) ∫ ∫ Which can be rewritten as (eq. 9). Λ Λ ( N + 1)(N + 2) 2 N N (1 − Λ + θ1 ) dθ1 + (1 −θ1 ) dθ1 ; Λ ∈ [0,1] 2 0 Λ 2 f 2 (Λ | 0, N ) = Λ 2 1 ( N + 1)(N + 2) (1 − Λ + θ ) N dθ + (1 −θ ) N dθ ; Λ ∈ (1,2] 1 1 1 1 2 Λ −1 Λ 2 ∫ ∫ ∫ (9) ∫ Which evaluates to (eq. 10). Λ ( N + 2)[(1 − ) N +1 − (1 − Λ ) N +1 ] ; Λ ∈ [0,1] 2 f 2 ( Λ | 0, N ) = Λ ( N + 2)(1 − ) N +1 ; Λ ∈ (1,2] 2 (10)