Engineering Judgement Martyn Thomas Visiting Professor of Software Engineering Oxford University Computing Laboratory martyn@thomas-associates.co.uk Engineering Judgement When I hear the words “engineering judgement” I know they are just going to make up numbers”. Richard Feynman, 1988. ASCSC 2004 Brisbane Workshop 2 The argument in brief • Almost all safety-related systems have target failure probabilities (pfh) below 10-5/hour • Assuring such a pfh would require evidence that is rarely available at the time of certification. • Assessors therefore rely on their engineering judgement. In effect, they make up numbers. • Accepting that this is inevitable, we need to make radical changes in the way we develop and maintain systems, and certify them. ASCSC 2004 Brisbane Workshop 3 Safety Integrity Levels High demand Safety integrity level 4 3 2 1 IEC 61508 High demand or continuous mode of operation (Probability of a dangerous failure per hour) -9 -8 10 to 10 -8 -7 10 to 10 -7 -6 10 to 10 -6 -5 10 to 10 Even SIL 1 is beyond reasonable assurance by testing. It would take 10+ years under operational conditions, no failures & no modifications. What sense does it make to attempt to distinguish single factors of 10 in this way? Do we really know so much about the effect of different development methods on product failure rates? Of course not! ASCSC 2004 Brisbane Workshop 4 What would provide adequate evidence for 10-5 pfh? • Sufficient operational measurements • Proof of correct implementation of a correct specification What do we actually use? • Testing • Process-based evidence • Compliance with standards ASCSC 2004 Brisbane Workshop 5 Sufficient Operational Measurements • For 10-n pfh, at least 10n hours without unsafe failure or modification. • Such criteria are used for ETOPS certification of aircraft engines • Such an approach is impractical for most safety-related transport systems ASCSC 2004 Brisbane Workshop 6 Proof of Correctness • Proof is an important form of verification. It can show that a system meets its specification, but provides no absolute information about the probability of unsafe failure. • It is very difficult to prove that all possible unsafe system states have been considered. • Full formal proof is very expensive. ASCSC 2004 Brisbane Workshop 7 What do we actually do? • Testing • Process-based evidence • Compliance with standards ASCSC 2004 Brisbane Workshop 8 Testing • What can testing tell us? – If the tests were statistically representative of the operation, then sufficient tests would show pfh. – If a mathematical analysis had established equivalence classes, then testing a member of each class would allow an inductive proof that there could be no failures. – How the system behaves on the tests. – … nothing else ASCSC 2004 Brisbane Workshop 9 Process-based evidence • Good processes do not guarantee safe products – but poor processes almost guarantee unsafe ones • Good processes are essential if you need to trust their output (eg version control). • The output from a good process may provide useful evidence. – For example, if you can trust a proof process, the proof may tell you something about the system’s properties ASCSC 2004 Brisbane Workshop 10 Compliance with standards The nice thing about standards is that there are so many to choose from … Andrew Tanenbaum • Standards result from negotiation in committee, often with strong vested interest from industry. – It would be surprising if they represented best practice – … and astonishing if they led to radical improvements • Much effort goes into meeting standards that would be better spent improving safety. ASCSC 2004 Brisbane Workshop 11 An aside on SIL 0 • If your safety argument allows the use of components with pfh > 10-5 then IEC 61508 assumes that normal industrial software will be good enough. That is absurd. – Little industrial/commercial software has an MTBF approaching one year… – nor does it come with a safety analysis, or failure history … • I believe that all safety-related software should be developed to higher standards than almost all industrial software has been to date. ASCSC 2004 Brisbane Workshop 12 An aside about maintenance • In principle, any system change invalidates all the operational history of that system – unless you can prove that the change has some restricted impact (which, typically, you cannot) • So should all the original assurance activities be repeated? – Obviously, yes. Although some of the outputs may be able to be reused. • Does this happen? Not in my experience. • It seems likely that we shall see an increasing number of incidents caused by defects introduced in maintenance. ASCSC 2004 Brisbane Workshop 13 Safety Assurance: the state of practice • There is insufficient empirical evidence to justify even the pfh associated with SIL 1, to 99% confidence. • Development methods and tools in common use are too informal to support reasoning about correctness. • So most attention is given to process issues and conformance with standards, despite the very weak causal link with safety. • We usually get away with it because people are very careful and try very hard (and very expensively). • It seems unlikely that this approach will scale up. ASCSC 2004 Brisbane Workshop 14 We are like the barber-surgeons of earlier ages, who prided themselves on the sharpness of their knives and the speed with which they dispatched their duty -either shaving a beard or amputating a limb. Imagine the dismay with which they greeted some ivory-towered academic who told them that the practice of surgery should be based on a long and detailed study of human anatomy, on familiarity with surgical procedures pioneered by great doctors of the past, and that it should be carried out only in a strictly controlled bug-free environment, far removed from the hair and dust of the normal barber’s shop. (Professor Sir Tony Hoare 1984) ASCSC 2004 Brisbane Workshop 15 A possible future • • • • • • Greater rigour with minimal innovation Minimal defect construction Maintenance as the central activity Licensing of independent safety assessors New-generation Safe COTS components Regulation to drive radical change ASCSC 2004 Brisbane Workshop 16 Greater rigour with minimal innovation • Our systems are among the most complex ever attempted. We must adopt the power of mathematics to master that complexity. • A good scientist is a person with original ideas. A good engineer is a person who makes a design that works with as few original ideas as possible. There are no prima donnas in engineering. Freeman Dyson 2001. ASCSC 2004 Brisbane Workshop 17 Minimal defect construction • Dijkstra observed in 1972 that most of the cost in developing software came from the effort required to remove the defects. • Praxis’ Correct by Construction methods are delivering <0.04 defects/KLoC with a productivity of >25 LoC/person-day. • That should become the benchmark for professional work in safety-related systems. If your methods do not deliver such high quality at such low costs, change to CbC. ASCSC 2004 Brisbane Workshop 18 Maintenance as the central activity • A successful system will spend far more time being used and maintained than being developed. • Our development methods and tools, and our assessment and certification protocols, should focus on safe and cost-effective maintenance. ASCSC 2004 Brisbane Workshop 19 Licensing of independent safety assessors • Even with far better methods and tools, safety assessment and certification will continue to depend on judgement. • We need to enforce standards of competence (education, training and experience) for the people whom society trusts to take such decisions. ASCSC 2004 Brisbane Workshop 20 New-generation Safe COTS components • Most COTS components have not been developed to be highly dependable and do not come with the evidence needed to allow adequate safety assessment. • We could redevelop the entire suite of core COTS components for a few $B. • This would be a worthwhile focus for international engineering collaboration. ASCSC 2004 Brisbane Workshop 21 Conclusion • Current practices cannot be justified: they are unsafe and/or too expensive. (Either way, not ALARP). • Radical change must be created: progress is too slow • Software engineers need competence in mathematics (discrete and continuous) and statistics. Core curriculum. • All safety-related systems should be formally specified and developed using fully-defined languages supported by powerful static analysis tools. Not C or C++. • Safety assessment should be based on the best practicable evidence, evaluated by a licensed assessor. • Core COTS components must be re-implemented properly - or avoided. ASCSC 2004 Brisbane Workshop 22