TOWARD A THEORY OF TEST DATA SELECTION John B. Go0denough Susan L. Gerhart ~ SofTech, Inc., Waltham, Mass. Ke]rwords and Phrases: testing, • W h a t a r e t h e p o s s i b l e s o u r c e s of f a i l u r e i n a program? • What test data should be selected to demonstrate that failures do not arise from these sources ? proofs of correctness Abstract This paper examines the theoretical and p r a c t i c a l r o l e of t e s t i n g i n s o f t w a r e d e v e l o p m e n t . We prove a fundamental theorem showing that p r o p e r l y s t r u c t u r e d t e s t s a r e c a p a b l e of d e m o n strating the absence of errors in a program. The theoremts proof hinges on our definition of test reliability and validity, but its practical utility hinges on being able to show when a test is actuaUy reliable. We explain what makes tests unreliable (for example, we show by example why testing all program statements, predicates, or paths is not usually sufficient to insure test reliability), and we outline a possible approach to developing reliable tests. We also show how the analysis required to define reliable tests can help in checking a program's design and specifications as wetl~as in preventing and detecting implementation errors. 1. 1.1 of t h i s p a p e r i s : • t o s u r v e y t h e p u r p o s e a n d l i m i t a t i o n of t e s t ing as presently conceived, • to demonstrate a systematic procedure for developing valid and reliable test data. • to establish a view of the purposes and l i m i t a t i o n s of t e s t i n g t h a t w i l l p e r m i t t e s t ing methods to be s~rstematically improved. I n t h e r e m a i n d e r of t h i s I n t r o d u c t i o n , w e d e f i n e some basic concepts relevant to software testing a n d l a y t h e b a s i s f o r a t h e o r y of t e s t d a t a s e l e c t i o n . I n S e c t i o n 2, w e d i s c u s s t w o e x a m p l e s of programs published in the literature which conrain errors. Our purpose there is to illustrate • the problems a theory of testing must deal with. I n S e c t i o n 3, w e e x a m i n e w h a t o t h e r s h a v e s a i d about the goals, methods, and difficulties of program testing before turning attention to our prop o s e d m e t h o d , w h i c h w e p r e s e n t i n S e c t i o n 4. I n S e c t i o n 5, w e a p p l y o u r t h e o r e t i c a l c o n c e p t s t o t h e r e s u l t s of S e c t i o n 4. The fundamental out this paper are: questions examined *Supported in part under Contract C0469. '~Now at Duke University, Durham, Testing Concepts Consider a program F whose input domain is t h e s e t of d a t a D. F ( d ) d e n o t e s t h e r e s u l t o f e x e c u t i n g F w i t h i n p u t d ~ D. O U T ( d , F ( d ) ) s p e c i f i e s t h e o u t p u t r e q u i r e m e n t f o r F , i. e . , O U T ( d , F ( d ) ) is true if and only if F(d) is an acceptable result. We will write OK(d) as an abbreviation for O U T ( d , F ( d ) ) . L e t T b e a s u b s e t o f D. T c o n s t i ~ t u t e s a n i d e a l t e s t i f OK(t) f o r a l l t ~ T i m p l i e s OK(d) f o r a l l d E D, i. e . , i f f r o m s u c c e s s f u l e x e c u t i o n of a s a m p l e of t h e i n p u t d o m a i n w e c a n conclude the program contains no errors, then the sample constitutes an ideal test. Clearly,he validity of this conclusion depends on how "thoroughly" T exercises F. Most papers on testing [e. g. , Poole ( 1 9 7 3 ) . S t u c k i ( 1 9 7 4 ) ] e q u a t e a t h o r ough test with one that is exhaustive, i. e., one f o r w h i c h T = D. B u t s u c h a d e f i n i t i o n g i v e s n o insight into problems of test data selection. Ins t e a d , w e d e f i n e a " t h o r o u g h " t e s t , T, t o b e o n e s a t i s f y i n g C O M P L E T E ( T , C), w h e r e C O M P L E T E is a predicate that defines, in effect, how some d a t a s e l e c t i o n c r i t e r i o n , C, i s u s e d i n s e l e c t i n g a p a r t i c u l a r s e t of t e s t d a t a T . C d e f i n e s w h a t properties of a program must be exercised to c o n s t i t u t e a " t h o r o u g h " t e s t , i. e o , o n e w h o s e successful execution implies no errors in a tested program. COMPLETE insures that T satisfies all these properties. (We w i l l d i s c u s s l a t e r , i n S e c t i o n 5, t h e p r a g m a t i c r e a s o n s f o r using an incomplete test. ) Introduction The purpose Fundamental The purpose of testing is to determine whether a program contains any errors. An ideal test, therefore, succeeds only when a program conrains no errors. In this paper, one of our goals is to define the characteristics of a n i d e a l t e s t i n a way that gives insight into problems of testing. We begin with some basic definitions. The data selection criterion C must be def i n e d s o t e s t s s a t i s f y i n g C O M P L E T E (T~ C) produce consistent and meaningful resnlts. We express this requirement by saying C must be reliable and valid. In general, reliability refers to t~ consistency with which results are prod u c e d , r e g a r d l e s s o ~ w h e t h e r t h e r e s u l t s are meaningful. In particulars a data selection through- DAAA25-74N.C. 493 Given a p r o g r a m F, wlth domain D, output r e q u i r e m e n t OK(d) = OUT(d, F(d)) and t e s t data selection c r i t e r i o n C: (I) SUCCESSFUL(T) (2) RELIABLE(C) = (VTi, T2CD)(COMPLETE(TI, C)^COMPLETE(T2, C)D(SUCCESSFUL(TI) SSUCCESSFUL(T2)) ) (3) VALID(C) = (VdG D)~OK(d) D (~[T: D)(COMPLETE(T, C)^'~SUCCESSFUL(T)~ = ( V t i T ) OK(t) Fundamental T h e o r e m (~[TC~D)(~[C)(COMPLETE(T, C)~RELIABLE(C) ^VALID(C)~SUCCESSFUL(T)) D (Vd ~ D) OK(d) Figure Formal 1 Definitions and the Fundamental criterion is reliable if and only if every T satisf y i n g C O M P L E T E ( T , C) i s p r o c e s s e d s u c c e s s fully by F, or if every such T is unsuccessfully p r o c e s s e d ( s e e F i g u r e 1). I n s h o r t t o b e r e l i a b l e , C m u s t i n s u r e s e l e c t i o n of t e s t s t h a t a r e consistent in their ability to reveal errors, as opposed to necessarily being able to detect all errors. N o t e t h a t i f C i s r e l i a b l e , i t i s only--" necessary to test one complete set of test data-no further information will be derived from testing other complete sets of test data. Theorem of T e s t i n g s t r a t e s t h a t t e s t s s a t i s f y i n g C O M P L E T E ( T , C} where C satisfies RELIABLE and VALID are "thorough" in the appropriate sense. Note that proving a data selection criterion to be reliable and valid, and then finding and successfully executing a complete test satisfying this criterion is just a way of proving the correctness of the program. In effect, the theorem states that i n s o m e c a s e s , a t e s t i.s_ a p r o o f o f c o r r e c t n e s s . The proof of a selection criterion's validity and reliabil~ty is easy in some cases. For example, if C is defined so the only complete test is a n e x h a u s t i v e o n e , i. e . , i f C O M P L E T E ( T , C ) D (T = D), t h e n C o b v i o u s l y s a t i s f i e s R E L I A B L E and VALID. Another interesting example is w h e n C i s u n s a t i s f i a b l e b y a n y d e D. T h e n T will be empty, i.e., no testing will be done, In this case, such a C clearly satisfies RELIABLE(C). Proof of Cts validity, however, is more difficult. In fact, such a proof exists if and only if the program contains no errors. Hence in this case, proof of C'S validity is equivalent to a direct proof of the program*s correctness. V a l i d i t y , i n c o n t r a s t to r e l i a b i l i t y , c u s t o m arily refers to the abilit 7 to produce meaningful r e s u l t s , r e g a r d l e s s of h o w c o n s i s t e n t l y s u c h results are produced. Hence a test data selection criterion is valid to the extent it does not f o r b i d s e l e c t i o n o f t e s t d a t a c a p a b l e of r e v e a l i n g some error. C i s v a l i d if a n d o n l y if f o r e v e r y e r r o r in a p r o g r a m , there exists a complete set o f t e s t d a t a c a p a b l e of r e v e a l i n g t h e e r r o r . (3ee Figure 1 for a formal definition. ) But note: validity does not imply that every complete test i s n e c e s s a r i l y c a p a b l e of d e t e c t i n g e v e r y e r r o r in a prograrr~ This is the case only if the data selection criterion is reliable as well as valid. Validity only implies it is possible to selectdata that will reveal an error; it does not guarantee that such data will be selected. A p r o o f of v a l i d i t y i s t r i v i a l , h o w e v e r , i f C does not exclude any member of D from some s e t o f t e s t d a t a , i. e. , i f i t c a n b e s h o w n t h a t f o r a l l d ~ D, t h e r e e x i s t s a T c o n t a i n i n g d a n d s a t i s f y i n g C O M P L E T E ( T , C). I n t h i s c a s e , s o m e testing must be performed, and in general, the proof of Cts reliability is not trivial. The remainder of the paper will concentrate on what must be known about programs to insure reliability in this case, and in Section 5 we will give some guidelines for finding non-trivial reliable test data selection criteria. A l s o i n S e c t i o n 5, we will give a specific example of a type of data selection criterion and its correspondfng deftnitions of COMPLETE, RELIABLE, andVALID. But to motivate this example, we first need to look at sources of errors in programs, both in general (Section 1.2) and with reference to example programs ( S e c t i o n 2). T h e f o r m a l d e f i n i t i o n s of R E L I A B L E a n d VALID given in Figure 1 merely state precisely what we already have said informally about RELIABLE(C) and VALID(C). To show the utility of the definitions, we use them in stating and proving the Fundamental Theorem on w h i c h a l l t e s t i n g i s b a s e d ( s e e F i g u r e 1). I t s proof is simple: Assume there exists some d ~ D for which F f a i l s (i. e. , - v OK(d)). T h e n V A L I D ( C ) i m p l i e s t h e r e e x i s t s a c o m p l e t e s e t of t e s t data, T, that is not successful. RELIABLE(C) implies that if one complete test fails, all fail. But this contradicts the theorem~s premise, i. e., that there exists a complete test that is successfully executed. Q.E.D. 1.2 The theorem is called "basic" not because it is deep--obviously we have constructed our definitions of RELIABLE, VALID and COMPLETE to permit the theorem to be proved. Instead the t h e o r e m i s b a s i c b e c a u s e of t h e i n s i g h t i t g i v e s into the concepts of test reliability and validity, and these concepts are basic to the construction of "thorough" tests. The theorem merely demon- Types of Program Errors In general, software errors can be classed as performance errors (failure to produce results within specified or~desired time and space limitations) and logic errors (production of incorrect results independent of the time and space required). It is useful to distinguish various types of logic errors: 494 s const~'t~ction errors (failure to satisfy a specification through error in an implementation); @ specification errors (failure to write a specification that correctly represents a design); @ design errors (failure to satisfy an understood requirement); • requirements errors (failure to satisfy the real requirement). ful for other purposes, e.g. p to understand the effect of a programming language on software reliabillty [Rubey (1968)], or to understand why e r r o r s occur [Boehm (1975)]. But insight into test reliability is given by the proposed classification. For example, consider the test data selection criterion, "Choose data to exercise a l l s t a t e m e n t s a n d b r a n c h c o n d i t i o n s i n a n i~mp l e m e n t a t i o n . " I n e v a l u a t l n g t h e r e l i a b i l i t y of this criterion, we would ask, "WiUall construction, specification, design, and requirements errors always be detected by exercising programs with data satisfying this criterion?" Clearly, if a design error, for example, is manifested as a missing path in an implementation, then this criterion for test data selection will not be reliable. So o u r t y p o l o g y o f i m p l e m e n t a t i o n errors is useful to help understand factors impairing test reliability. 2. E x a m p l e s o f P r o g r a m E r r o r s Thorough tests must be able to detect errors a r i s i n g f o r a n y of t h e s e r e a s o n s , a n d t h i s m e a n s that test data selection criteria must reflect information derived from each stage of software development. Ultimately, however, each type of l o g i c e r r o r i s m a n i f e s t e d a s a n i m p r o p e r e f fect produced by an implementation, and for this reason, it is useful to categorize sources of errors in implementation terms: • Missing Cont'rbl Flow Paths - This type of error arises from failure to test a particular condition, and hence result in the execution (or non-execution) of inappropriate actions. For example, failure to test for a zero divisor before executing a division may be a missing path error. Other examples will be given later. This t y p e of e r r o r r e s u l t s f r o m f a i l i n g t o s e e t h a t some condition or combination of conditions req u i r e s a unicLue s e q u e n c e o f a c t i o n s t o b e b a n d i e d properly. When a program contains this type of error, it may be possible to execute all control flow• p a t h s t h r o u g h t h e p r o g r a m w i t h o ~ - d e t e c t i n g the error. This is why exercising all program paths does not constitute a reliable test. In this section we wiU discuss errors in two programs that have appeared in the published l i t e r a t u r e t o i l l u s t r a t e t h e k i n d s of p r o b l e m s t e s t i n g m u s t d e a l w i t h a n d to l a y t h e g r o u n d w o r k for our proposed testing methodology. 2. 1 Example I: ASimple TextReformatter In the article "Programming by Action C l u s t e r s , ,I p . N a u r (1969) p r e s e n t s t h e f o U o w i n g problem: "Given a text consisting of words separated by BLANKS or by NL (new line) characters, convert it to a line-by-line form in accordance with the following rules: • Inappropriate Path Selection - This type of error occurs when a condition is expressed incorrectly, and therefore, an action is some-times performed (or omitted) under inappropriate conditions. For example, writing IF A instead of IF A AND B means that when A is true and B false, an inappropriate action will be taken or omitted. When a program contains this type of error, it is quite possible to exercise all statements and ali branch conditions without detecting the error. This error canalso o c c u r n o t m e r e l y t h r o u g h f a i l u r e to t e s t t h e r i g h t combination of conditions, but through failure to s e e t h a t t h e m e t h o d o f t e s t i s n o t a d e q u a t e , e. g . , testing for the equality of three numbers by writi n g ( X + Y + Z ) / 3 = X. (I) line breaks must be made only where the given text has BLANK or NL; (2) e a c h l i n e i s f i l l e d a s f a r a s p o s s i b l e as long as (3) n o l i n e w i l l c o n t a i n m o r e t h a n h 4 A X P O S characters. " N a u r ' s p u r p o s e i s , i n p a r t , to j u s t i f y a n a p p r o a c h to algorithm construction "on the basis of General S n a p s h o t s n e e d e d t o p r o v e t h e a l g o r i t h m . ,i H e does not intend to present a formal proof, but he does present "prescriptions" (assertions) and uses t h e m t o g u i d e a n d j u s t i f y t h e c o n s t r u c t i o n of a c t i o n clusters in a top-down manner. So h e r e w e h a v e ~a p u b l i s h e d a r t i c l e w h i c h a t t e m p t s t o u s e t h e t o p down design method and the guidance of program p r o v i n g , b u t t h e p r o g r a m ( s e e F i g u r e 2) t u r n s o u t to have at least seven problems. • Inappropriate or M.Issing Action - Examples are calculating a value using a method that does n o t n e c e s s a r i l y g i v e t h e c o r r e c t r e s u l t (e. g . , W * W i n s t e a d o f W+W), o r f a i l i n g t o a s s i g n a v a l u e to a variable, or calling a function or procedure with the wrong argument list. Some of these errors are revealed when the action is executed under any circumstances. Requiring all statements in a program to be executed will catch such errors. But sometimes, the action is incorrect only under certain combinations of conditions~ in this case, merely exercising the action (or the part of the program where a m i s s i n g a c t i o n s h o u l d a p p e a r ) wi~l n o t n e c e s sarily rev&al the error. F o r exa~mple, t h i s i s t h e c a s e i f W ~ W i s w r i t t e n i n s t e a d o f W+ W. N__I: T h e p r o g r a m d o e s n o t t e r m i n a t e w h e n t h e end of the given text is reached, although Naur provides for termination (through the undefined language construct "Alarm") if a word containing more than h/AXPOS characters (an oversizeword) is seen. (Note that in this case the specification cannot be satisfied. ) Non-termination on end of text will, of course, be discovered with any test data not containing an oversize word. The effect of processing an oversize word, however, would only be seen if test data contained such a word: A reliable test methodology will insure that oversize words are presented, to the program (even though they are not mentioned inthe specification), since such data are not excluded from the prog r a m V s i n p u t d o m a i n a n d t h e p r o g r a m WiU n o t necessarily process oversize words correctly if it processes shorter words correctly. This classification of errors is useful because our goal is to detect errors by constructing appropriate tests. Other classifications are use- 495 bufpos := 01 outcharacter(LF); fill := 01 next character : incharacter (CW) i f CW = B L A N K V C W = LF then b e g i n i_f fill + 1 + b u f p o s < M A X P O S then b e g i n o u t c h a r a c t e r (BIJOU) ; fill := fill + 1 end els,e b e q i n o u t c h a r a c t e r (LF) ; fill := 0 end; f o r k := 1 step 1 until b u f p o s d ~ o u t c h a r a c t e r (buffer [k]); fill := fill + b u f p o s ; b u f p o s := 01 end else i_f b u f p o s = M A X P O S th e__n A l a r m else b e g i n b u f p o s := b u f p o s + i; b u f f e r [bufpos] := CW end; ~_o t o next character; Figure 1 Alarm := .false; 2 bufpos := 01 f£11 := O; 3 4 repeat £ n c h a r a c t e r (CW) ; 5 I f C W = B L V CW = N L V C W = ET 6 then 7 if bufpos ~ 0 8 t h e n ,be~!n 9 I f fill + ' b u f p o s < M A X P O S A fill # 0 10 then b e g i n 11 o u t c h a r a c t e r (BL) ; 12 fill := fill + 11 end 13 else b e g i n 14 o u t c h a r a c t e r (NL) ; 15 fill := 0 end; 16 for k := 1 step 1 u n t i l b u f p o s d_o 17 o u t c h a r a c t e r (buffer [k]); 18 fill := fill + b u f p o s ; 19 b u f p o s := 0 end 20 else 21 i_f b u f p o s = M A X P O S 22 .then A l a r m := true; 23 .e.ls e b e g i n 24 b u f p o s :=. b u f p o s + I ; 25 b u f f e r [bufpos] := C W .end 26 27 ~intil A l a r m V C W = ET~ Figure Z Corrected Naurls original program 3 version of Naur's program a s s u m p t i o n a b o u t t h e f o r m o f t h e i n p u t (i. e . , no c o n s e c u t i v e BLANKS o r N L c h a r a c t e r s ) a n d do s o n o t r e v e a l t h e e r r o r . How can this sort of error be discovered through a systematic a p p r o a c h to t e s t i n g ? N2: T h e l a s t w o r d i n t h e t e x t w i l l n o t be o u t put unless it is followed by a BLANK or NL. N3: A b l a n k w i l l a p p e a r b e f o r e t h e f i r s t w o r d on the f i r s t line e x c e p t w h e n the f i r s t w o r d is exactly MAXPOS characters long. This can c a u s e a v i o l a t i o n of c o n s t r a i n t (2) f o r t h e f i r s t line. The reason for this error is clear. After creating action clusters for a word buffer, Naur makes the following assertion: "The input chara c t e r p r e c e d i n g t h e o n e h e l d i n b u f f e r [1] w a s a B L A N K q r N L . T h i s h a s n o t b e e n o u t p u t " ( p . 252). This assertion is false for the first word and is never disproved. T h u s a b o u n d a r y type of c a s e (first character, first word) causes a proof error. Of c o u r s e t h i s e r r o r w i l l b e f o u n d f o r any test data whose first word is less than MAXPOS characters long. N e : If t h e f i r s t w o r d o f t h e i n p u t p r e c e d e d by a BL or NL, the output rain either two blanks preceding the or a line c o n t a i n i n g j u s t two b l a n k s , c o n s t r a i n t (2). text is will confirstword violating N7: T h e s p e c i f i c a t i o n s u s e NL as the newline character but the program uses LF. This e r r o r p r o b a b l y a r i s e s f r o m f a i l u r e to p r o o f r e a d , hut c o u l d a l s o b e t r a c e d to a f a i l u r e to specify the character set of the problem. If LF and NL are distinct characters, then any input text containing a NL or LF character will reveal the problem. N4: W h e n t h e f i r s t w o r d i s M A X P O S c h a r a c t e r s long, it will be p r e c e d e d by two line b r e a k s in the output, even though an empty line violates c o n s t r a i n t (2). What can we conclude from this example ? N5: No p r o v i s i o n i s m a d e f o r p r o c e s s i n g s u c c e s s i v e a d j a c e n t b r e a k s (e. g . , t w o b l a n k s ) . This error arises because a word is defined as the characters (other than NL or BLANK) appearing between successive NL or BLANK characters, and w o r d s of z e r o l e n g t h a r e s i m ply not c o n s i d e r e d in any d i s c u s s i o n of the program's assertions. S p e c i f i c a t i o n (Z) r e quires as many words as possible on a line, and this specification makes no sense if zerolength words are permitted. So a n i m p o r t a n t case has not been considered in either the program, the input description, or the program's informal proof. Naur's suggested test data for t h i s p r o g r a m a p p e a r to m a i n t a i n t h i s i m p l i c i t 496 1. Top-down construction and very informal p r o o f s do n o t a l w a y s p r e v e n t p r o g r a n ~ errors. Nor does the reading of the desc r i p t i o n of t h e p r o g r a m g u a r a n t e e t h a t errors will be caught. Presumably this paper went through some review process and was proofread, yet the errors persisted. The reviewer in Computing R e v i e w s ( R e v i e w 19,420) m e n t i o n e d only the gratuitous blank preceding the first w o r d ( e r r o r N3). L o n d o n (1971) c o r r e c t e d e r r o r s N1, N3, N4, a n d N7 b u t n o t e r r o r s NZ, N5, a n d N6, e v e n t h o u g h h e " p r o v e s " h i s v e r s i o n of the p r o g r a m c o r r e c t . 2. If t h i s p r o g r a m had b e e n c o d e d and r u n on s o m e t e s t data, s u c h a s t h a t u s e d to i l l u s t r a t e t h e p r o g r a m o u t p u t (p. Z51), e r r o r s N1, NZ, N3, a n d N7 w o u l d h a v e been detected. E r r o r s N4, Nb, a n d N6 would not have been revealed. N7: Since w e w i s h to u s e t h i s e x a m p l e to i l l u s t r a t e other types of possible errors and then again in Section 4 to illustrate a general method for selecting test data, we will clean up the specifications and program: Note that breaks preceding the first word of the text will also be ignored. L F w a s s y s t e m a t i c a l l y c h a n g e d to NL throughout the program. Note how sometimes several error symptoms are corrected with a single program modification; it's o f t e n not c l e a r w h e t h e r the n u m b e r of e r r o r s in a p r o g r a m s h o u l d b e m e a s u r e d by the n u m b e r of c o r rections needed to produce a correct program, or by the n u m b e r of s y m p t o m s d i s c o v e r e d . G i v e n an input t e x t h a v i n g the f o l l o w i n g p r o p e r ties: I I : I t i s a s t r e a m of c h a r a c t e r s , w h e r e t h e characters are classified as break and non-break characters. A break charact e r is a BL (blank), NL (new line i n d i c a tor), or ET (end-of-text indicator). IZ: T h e f i n a l c h a r a c t e r i n t h e t e x t i s E T . I3: A w o r d i s a n o n e m p t y s e q u e n c e o f n o n break characters. I4: A b r e a k i s a s e q u e n c e o f o n e o r m o r e break characters. (Thus the input can be viewed as a sequence of words separated by breaks with possibly leading and trailing breaks, and ending with ET. ) We will now use this program to show how t e s t i n g b a s e d s o l e l y on k n o w l e d g e of a p r o g r a m ' s i n t e r n a l s t r u c t u r e c a n n o t l e a d to r e l i a b l e t e s t s . To m a k e our d i s c u s s i o n m o r e r e a d i l y u n d e r s t a n d a b l e a n d to l a y t h e g r o u n d w o r k f o r f u r t h e r d i s c u s s i o n i n S e c t i o n 4, w e f i r s t c a s t t h e c o r r e c t e d p r o gram into the form of a limited entry decision table [King(1969), Pooch(1974)] (see Table i). All p r e d i c a t e s of the p r o g r a m a r e w r i t t e n in the c o n d i tion stub and all assignment and input/output statements in the action stub. The relevant outcomes o f t h e p r e d i c a t e s a r e r e p r e s e n t e d a s Y, N, (Y), (N), o r - - , w h e r e (Y) m e a n s t h e o u t c o m e i s n e c e s s a r i l y t r u e , (N) m e a n s t h e o u t c o m e i s n e c e s s a r i l y f a l s e , and -- means the outcome can be either true or false. A rule is a single column in the table and s p e c i f i e s a s e q u e n c e of a c t i o n s to be p e r f o r m e d f o r a p a r t i c u l a r c o n d i t i o n c o m b i n a t i o n , i. e . , a p a r t i c u l a r s e t o f o u t c o m e s of t h e p r e d i c a t e s . F o r s o m e r u l e s , the o u t c o m e s of c e r t a i n p r e d i c a t e s imply that the outcomes of other predicates are e i t h e r d e t e r m i n e d (and so of no i n t e r e s t ) o r a r e irrelevant. F o r e x a m p l e , w h e n C4 i s t r u e , C6 i s n e c e s s a r i l y f a l s e , a n d s o i n r u l e 1, C6 n e e d not be tested. Predicates whose outcomes are r e p r e s e n t e d a s (Y), (N), o r - - a r e n o t a c t u a l l y evaluated in the program represented by the decision table. For example, the table shows that w h e n C1 i s Y, c o n d i t i o n s CZ a n d C6 a r e n o t e v a l u a t e d (see King (1969)). The program's output should be the same s e q u e n c e of w o r d s a s in the i n p u t w i t h the following properties: O1 : A n e w l i n e s h o u l d s t a r t o n l y b e t w e e n w o r d s and at the b e g i n n i n g of the output t e x t , if any; OZ: A b r e a k i n t h e i n p u t i s r e d u c e d to a s i n g l e b r e a k c h a r a c t e r in the output; 03: As many words as possible should be p l a c e d on e a c h l i n e (i. e. , b e t w e e n successive NL characters); 0 4 : No l i n e m a y c o n t a i n m o r e t h a n M A X P O S characters (words and BLs); 0 5 : A n o v e r s i z e w o r d (i. e. , a w o r d c o n t a i n ing more than MAXPOS characters) should cause an error exit from the p r o g r a m (i. e. , a v a r i a b l e A l a r m s h o u l d have the value TRUE); The fact that the conditions specified in the condition stub are not all independent is represented by the condition constraints given at the r i g h t s i d e of the t a b l e ; it is t h e r e , f o r e x a m p l e , t h a t t h e r e l a t i o n b e t w e e n t h e t r u t h of C4 a n d t h e f a l s i t y o f C6 i s a s s e r t e d . The predicates assoc i a t e d w i t h t h e a c t i o n s t u b s (e. g. , ( C 1 V C Z ) A C 3 ) indicate under what conditions the associated a c t i o n i s to b e p e r f o r m e d . Specifying these conditions is not necessary, but provides useful r e d u n d a n c y in c h e c k i n g t h e c o r r e c t n e s s o f a table. The c o r r e c t e d v e r s i o n of N a u r ' s p r o g r a m a p pears in Figure 3. First let's look at the correct i o n s f o r e r r o r s N1 t h r o u g h N7. N1, N2: T h e e n d l e s s f l o o p c o n s t r u c t e d w i t h a goto in Naur's program has been replaced with a repeat-until having Alarm as a Boolean variable and ET as an end-of-text indicator. N3: T h e c o n d i t i o n f i l l ~ 0 h a s b e e n c o n j o i n e d with the condition fill + bufpos < MAXPOS to p r e v e n t the output of a BL b e f o r e the f i r s t w o r d and to p r o d u c e a N L i n s t e a d . The condition fill ~ 0 holds only for the first line, since every other line will contain at least one word. Equally well, we c o u l d h a v e i n i t i a l i z e d fill to the v a l u e lVIAXPOS, i n s u r i n g t h a t f i l l + b u f p o s < IV[AXPOS w o u l d b e f a l s e t h e f i r s t t i m e . N4: L i n e Z of N a u r ' s p r o g r a m , o u t c h a r a c t e r (NL), i s r e m o v e d s o t h a t o n l y o n e N L c h a r a c t e r w i l l p r e c e d e t h e f i r s t w o r d of the output text. N5, N6: A n e x t r a p r e d i c a t e , b u f p o s ~ 0, p r e vents output when two consecutive breaks o c c u r ; the f i r s t b r e a k f o r c e s the w o r d to be output and b u f p o s to be r e s e t to z e r o . The d e c i s i o n t a b l e f o r m a t m a k e s it e a s i e r to s e e how v a r i o u s s e t s of t e s t data s a t i s f y v a r i o u s test data selection criteria° Below the program a r e w r i t t e n f o u r s e t s of t e s t d a t a . E a c h s e t a s s u m e s I~AXPOS = 3 and meets some test data selection c r i t e r i o n b a s e d on the p r o g r a m ' s s t r u c t u r e . D1 e x e r c i s e s a l l s t a t e m e n t s (by e x e r c i s i n g r u l e s 3, 5, 9 a n d 1 0 ) , DZ a l l s t a t e m e n t s a n d c o m p o s i t e p r e d i c a t e s , a n d D3 a l l s t a t e m e n t s a n d i n d i v i d u a l p r e d i c a t e s . ~ ~An i n d i v i d u a l p r e d i c a t e is c o n s i d e r e d to b e e x e r cised when it is necessessarily evaluated for s o m e data and it t a k e s on b o t h t r u e and f a l s e values. For example, data satisfying rules 1-4 a r e n o t c o n s i d e r e d t o e x e r c i s e CZ b e c a u s e CZ n e e d n o t b e e v a l u a t e d . CZ i s e x e r c i s e d b y d a t a s a t i s f y i n g o n e o£ r u l e s 5 - 8 a n d e i t h e r r u l e 9 o r r u l e 10. 497 Table 1 DECISION T A B L E Initial c0nditions:. Cl: C2: C$: CA: C5: C6: AI a: b: A2 a: b: A3 a: b: c: REPRESENTATION "7 C3A-TCSA CW= B L V C W = NL CW = ET OF PROGRAM CW = incharacter 1 zl3 4 S y yIY Y (N))N', (N] (N) 819 10 N N NiNIN X Y Y Y N N - '-eC3~--1C6 N C4~--~C6; --~C4~ (C3 VCS) C3A "-7 C5 ~ C4; ~ C5 A C6=) "-7C4 C6 ~ C 3 ; C6~--~C4 N Y Y Y N I' Y Y N (Y) Y Y N - Y ,Y N - (N)- Y N - Y N outcharacter (BL) /ill := /111 + I outcharacter (NL) till := 0 - (N) (N) (N) X X - (N) Y Ix, !x x / o r k := 1 untilbufpoJ xlx' x X x, Ix~x xlxx fill := fill + bufpos bufpos := 0 X X X X k4: Alarm A5 a: b: bufpol := bufpos + 1 buffer [bufposl := CW A6 a: b: incharacter (CW) Repeat table A7: Exit table DI.1 DI.2 Test Data A,AoA.A. ET A.A,A. BL, B. BL. C, ET DZ.I DZ.2 A . A . A . A . ET A. A,A. BL. B, BL, C. NL. ET D3.1 D3.Z A . A . A . A . ET A, BL. B, NL. C. NL. ET D4.1 D4.2" D4. 3 D4.4 D4.5 A.A.A.A.ET A. BL0 BL. B. BL. C. NL. ET A. ET A. BL, BoET A. BL, B. NL. C0 BL, D. D. D. ET X X X X C2 C2 ~ - 7 C l C1~'-~ (CIV C2) AC3 A C4 A C5A'~C6 X X x = FALSE. 77 f i l l + bufpo~ < MAXPOS flU ~ 0 bufpos = MAXPOS - (CW) AAlarm & bufpoe ~ 0 (N) (N) AND TEST DATA X X (C1 VCZ) AC3 A (-~C4V-~C5) (Cl VCZ) AC3 X X "7 Cl A "s CZ AC6 := T R U E "7CIA "7 C2 A"I C6 X!X XiX X X X X ClV (--~ClA--TC2A'-~C6) X X X :, CZ V(--TC1 AC6) X X Rule Exercised j Xs X X X X X X N X I0. I0, I 0 . 9 X X 10, 10, 10, 3, 10, 3, 10, 1, 8 X X X X :X X Ixllx BI. Ill* .N' .| . I~L X X N L.X I l l , X I l L X D3 a l s o e x e r c i s es all loop iteration paths, i.e., all pa ths through a loop that are possible on some iteration of the loop; by exercising all individual predicates a s w e l l , D3 e n s u r e s a l l l o o p e x i t i n g c o n d i t i o n s a r e also tested. D4 exercises all rules of the table. Note that exercising all rules is equivalent to exercising each statement under all condition combinations considered relevant by the writer of the program. Table Z shows more clearly what combination of rules need to be exercised to satisfy the various test data selection criteria. as weU as how the data actually satisfy the criteria. F o r e x a m p l e , t h i s ta~ble s h o w s t h a t t h e c o m p o s i t e p r e d i c a t e C 4 A C 5 (i. e . , f i U + b u f p o s < l v I A X P O S ^ f i l l ~0) i s t r u e f o r a n y d a t a e x e r c i s i n g r u l e s I a n d 5 a n d f a l s e f o r a n y d a t a e x e r c i s i n g r u l e s Z, 3, 6 o r 7. F o r d a t a e x e r c i s i n g a n y o t h e r r u l e , the v a l u e o f t h i s c o m p o s i t e predicate is irrelevant. T h e a c t u a l t e s t d a t a e x e r c i s e r u l e s I a n d 3. 498 Rule Sequence Exercised I0, I0. I0.9 I0. I0. I0.3, I0.3, I0.5 X X X X X I0. I0. I0.9 I0,2, I0, I. I0, 3, 8 10,10,10,9 10, 2, 4, 10, 1, 10, 3, 8 10,6 10,2, 10, 5 10,2,10,1,10,3,10,10,10,7 We wish to show that none of these exercising criteria are completely reliable, i.e., itis p o s s i b l e to e x e r c i s e a p r o g r a m containing an error using any of these criteria without necessarily discovering the error. This shows that tests based solely on a programts internal structure are unreliable; their success is poor evidence that a program contains no errors. Five examples of errors are summarized in T a b l e 3. T h e e r r o r s can be characterized as follow s: 1) a n i n c o r r e c t p r e d i c a t e ( f i l l + b u f p o s ~ M A X P O S instead of fill + bufpos < MAXPOS) causing rules 1 or 5 to be selected under circumstances where rules 3 or 7 should have been selected. This is an inappropriate path selection type of construction error. It w i l l n o t b e detected by any of the four test data sets. To Table 2 1. R U L E S E X E R C I S E D T O SATISFY V A R I O U S C O M P L E T E N E S S Composite Predicates Composite Rules To Be Exercised Predicates (I-8) CI V C 2 Alarm VC2 C3 (5-9) (1-3, 5-7) (1, 5) | (9) C4 A C5 J C6 Z. Rules Exercised By D2 TRUE FALSE FALSE (9, 10) (1-4, 10) (4.8) (z. 3, 6.7) (10) 1, 3 8, 9 lr3 1 9 Rules Exercised By DI TRUE 9, 10 1, 3, 10 8 3 10 FALSE 3 9, 10 5.9 10 3,5 5 3 9 10 Individual Predicates Rules Exercised By D3 TRUE FALSE i, 2 3, 8, 9, I0 R u l e s To B e E x e r c i s e d Individual Predicates TRUE FALSE CIBL (I-4) (I-10) CINL (1-4) (5-10) 3 cz (5-8) (9, lO) 8 c3 (l-3, 5-7) c4 (1, z, 5, 6) i Rules Exercised ByD2 TRUE FALSE I 3, 8, 9, I 0 8, 9, I0 3 8, 9, 10 9,10 8 9, 10 (4, 8) 1, Z. 3 8 1,3 8 (3, 7) 1, Z 3 1 3 (2, 6) 1 Z 1 C5 l (1, 5) C6 ! (9) (10) 9 10 9 10 (9) (10) 9 10 9 10 Alarm 3. I TRUE CRITERIA Loop Iteration Paths Distinct Paths Rules Exercised By D 3 1 (I, 5) (Z, 3, 6, 7) 2,3 (4, 8) (9) 8 9 (lO) 10 Table 3 E F F E C T OF ERRORS Error in Program I. Change f i l l + bufpos < MAXPOS to f i H ÷ bufpos < ~.AXPOS 2. O m i t l i n e 13 flU:= f i l l + 1 3. O m i t bufpos ~ 0 t e s t (line 8) (This i s N a u r ' s e r r o r N5, 6. ) 4. O m i t from (This error flU 4 0 l i n e 10 is Naur's N3. ) 5. O m i t CW= E T f r o m line 6 E r r o r Type inappropriate path selection missing action missing path E f f e c t on T a b l e C o n d i t i o n C4 changed similarly. No m a r k on Action Alb. C o n d i t i o n C3 l i n e w i n be r e m o v e d a n d therefore rules 4 and 8 can be dropped. inappropriate path selection C o n d i t i o n C5 r e m o v e d from table. Rules 2 and 6 are eliminated, being subsumed under r u l e s I a n d 5. inappropriate path selection R u l e s 5-8 e l i r a i n a t e d and rule 9 is modified. See T a b l e 4. 499 Table DECISION TABLE REPRESENTATION Illzl3i4 I Cl: CW=BLVCW=NL CZ: CW = ET C3: C4: bufpos ~ 0 C5: C6: bufpos = M A X P O S A 1 a: b: IY N fill +,bufpos < MAXPOS - (Y) (N) fill ~ 0 Y N I - - - (N) Y' outcharaeter (BL) fill := fill + I AT: Exit table . _C4~-7C6;'-~C4~ (C3 VC5) . '-7C3A'-IC5=~C4;':'7C5 N , C6DC3; AC6~--YC4 C6~-~C4 C I A C3 A C4 AC5A--wC6 X X C1 AC3 A (-'~C4V-7C5) IX X X X X C1 AC3 X X X X X X -~C1A C6 X X X ,, buffer [bufpos] := CW i n c h a r a c t e r ICW) Repeat table CZ=:~--tCl ~-IC3~'-vC6 X bufpos := bufpos + 1 A6 a: : b: CI~'wCZ X A l a r m := TRUE b: N., N N f o r k := 1 u n t l l b u f p o s "--outcha r a c ~ (buffer [k]) fill := fill + bufpos bufpos := 0 A 5 a: N Y Y N (N) (N) - 5 5 1 6 1 7 el9' lO I Y Y A3a: A4: YIYIY P R O G R A M CONTAINING ERROR Y Y o u t c h a r a c t e r (NL~ fill:= 0 c: I I ( N ) ) N } (N) (N) AZ a: b: b: 4 OF X X X X X X X X X X •-rC1A-7 Cb X X C1 V ('-'tC6A--sCZ) o I I X X cZ v ( ~ C l ^ C 6 ) r e p l a c e d w i t h f i l l := M A X P O S ) . T h i s e r r o r is easily detected by any data whose first word is less than MAXPOS characters long. N o t e t h a t D1 a n d D2 do n o t d e t e c t t h i s e r r o r precisely because the first word is exactly MAXPOS characters long. In fact, data set DZ e x e r c i s e s a l l i n d i v i d u a l p r e d i c a t e s , i n t e rior loop paths, and statements in the erroneous program without showing this error. It is a l s o r e a d i l y p o s s i b l e to d e v i s e d a t a to exercise all rules in a decision table representation of the erroneous program without revealing the error. show this error, the loop must be executed w i t h f i l l + b u f p o s = M A X P O S a n d f i l l ~ 0. T h i s r e q u i r e s at l e a s t M A X P O S + 2 i t e r a t i o n s of the loop and the word lengths must be just right. Although this is an "oil-by-one" type of error, not just any data such that fill+bufpos = M A X P O S w i l l r e v e a l it, e . g . , D1.Z a n d DZ.Z d o n o t d o s o . Z) a m i s s i n g a c t i o n ( f i l l := f i l l ÷ 1) y i e l d i n g e s s e n tially the same effect as error 1 when there a r e o n l y ttwo w o r d s o n a l i n e . T h i s e r r o r a l s o w i l l not be d e t e c t e d by any of the f o u r t e s t d a t a sets. 5) a m i s s i n g p r e d i c a t e (CW = E T i n l i n e 6) c a u s ing inappropriate path selection. This is e s s e n t i a l l y e r r o r N2. T h i s e r r o r w i l l n o t b e f o u n d b y D2 o r D3. It a l s o i s n o t n e c e s sarily found by data exercising all rules in the decision table representing the altered p r o g r a m ( s e e f o r e x a m p l e , T a b l e 4; d a t a D4.1 a n d D 4 . 2 e x e r c i s e a l l t h e r u l e s i n t h i s table without revealing an error). This error shows that even when all conditions relevant to the correct operation of a program appear explicitly in the program,, test data selection criteria based solely on a program's internal structure will not necessarily guarantee tests that will reveal the error. 3) a m i s s i n g p r e d i c a t e ( b u f p o s ~ 0) y i e l d i n g a m i s s i n g p a t h t y p e of e r r o r t h a t w o u l d not be d e t e c t e d w i t h t h e D1 d a t a s e t . ( T h i s i s o n e o f Naur's errors. ) Although the other data sets would detect this error, different data sets can be constructed that will not detect the error and yet will satisfy the various exercising criteria for the erroneous program. F o r e x a m p l e , D1 a s i t s t a n d s w o u l d e x e r c i s e all statements and composite predicates in t h e m o d i f i e d p r o g r a m ( s e e T a b l e 2 w i t h C3 eliminated). A l s o , t e s t D4. 2 c o u l d t h e n b e e l i m i n a t e d a n d t h e o t h e r D4 d a t a w o u l d then be sufficient to exercise all rules, into. rior loop paths, and individual predicates without revealing the error. T h i s a n a l y s i s of f i v e p o s s i b l e e r r o r s s h o w s the unreliability of test data selected merely to exercise all statements, compopite predicates, individual predicates, loop iteration paths, or rules in a program's decision table representation. These exarnDles of errors show that: 4) a m i s s i n g p r e d i c a t e ( f i l l ~ 0) c a u s i n g a n i n appropriate path selection. (This is also one of Naur's errors; ) Alternatively, this error could be characterized as an inappropriate a c t i o n ( n a m e l y , f i l l := 0 i n l i n e 3 s h o u l d b e 5O0 Table 5 A L L F E A S I B L E R U L E S I M P L I E D BY T A B L E 1 $1L CW - B L V C W • N L C l ~ - I C2 ~3.* b u r as iI 0 •-~ C3:~ -'i C6 C4: C5.,=.:~ f l - + bu,f o$ < M A X P O S t i l l II 0 --i C3A --i CS ~ C4; ~ C5 A C 6 ~ "-~iC4 C6: buf~__._.___...=...M__~A X P O 3 C6 ~ C 3 : C 6 ~ " I C 4 AI a: Table I Rule Numborl ou~charactor BL ( C l V C 2 ) A C 3 A C4 A C S A ' V C 6 b: ~2 a: b: ~3 at C4~--tC6I ~C4=3 (C3 V C$) f i l l := R I I ÷ 1 ~ outcharactor . . ~ ~ t i l l := f i l l + b _ ~ _ p . ~ _ ~ e: but 08 :- 0 f~4.~ Alarm := T R U E h5 a : buf.o~ bur b: buffer b: ( C l VC2) A C 3 k :u I u n t i l bu.fpo8 --outcharac~er buffee" k ~ b: &6 a : ( e l VC2) A C 3 A (-'JC4 V ' ~ C 5 ) f i l l :• 0 bu.f o s lncharacter --VCl A--IC2 A C 6 • + 1 ' "aCt A --'l C2 A--B C6 := CW ~_~ C I V ~"~ C l A ~ C 2 / ~ C6) Re e a t t a b l e C2 V ( ~ C l AT_,~___Extt t a b l e Test I~bL AC6) l~alo S e q u e n c e E x e r c i s e d Rules E x e r c i s e d 27 20 20 22 27,20.20 5 S 6 25.10 27 20 20 5 25 6 2$ 1 17 27,2,7,25.1,26°6,16 27,11 27.2. Z5.10 1) T o d e t e c t e r r o r s r e l i a b l y , i t i s i n g e n e r a l n e c e s s a r y to e x e c u t e a s t a t e m e n t u n d e r m o r e t h a n o n e c o m b i n a t i o n o f c o n d i t i o n s to v e r i f y that its e f f e c t is a p p r o p r i a t e u n d e r all circumstances. Exercising any statement just once is usually inadequate. e f f e c t c o m b i n e s r u l e s 12, 13, 2"1, a n d ZZ to f o r m r u l e 9' a n d r u l e s 10, 11, 1 4 - 1 8 to f o r m r u l e 8 ' . I f f o r e r r o r s 3 a n d 4, c o n d i t i o n s C5 a n d C3 a r e r e tained in Table 5 even though they are not tested in the p r o g r a m , t e s t data to d e t e c t t h e s e e r r o r s w i l l n e c e s s a r i l y b e g e n e r a t e d i f a l l 27 c o n d i t i o n combinations are exercised. Errors 1 and 2 will be found if the c o n d i t i o n f i l l + bufpos = M A X P O S i s a d d e d to t h e t a b l e . D a t a to e x e r c i s e t h i s c o n dition in combination with all the others will n e c e s s a r i l y r e v e a l e r r o r s 1 a n d 2. ( T h i s c o n d i t i o n a d d s o n l y s i x r u l e s to T a b l e 5 b e c a u s e i t i s not i n d e p e n d e n t of the o t h e r c o n d i t i o n s . ) 2) E q u a l l y w e l l , t h e s a m e p a t h t h r o u g h a l o o p w i l l u s u a l l y h a v e to be e x e r c i s e d m o r e t h a n once before the right combfnation of condit i o n s i s f o u n d to r e v e a l a m i s s i n g p a t h o r i n appropriate path selection error. For examp l e , e r r o r 5 ( s e e T a b l e 3) r e q u i r e s e x e r c i s i n g r u l e 8' o f T a b l e 4 w i t h b u f p o s ¢ 0 to s h o w the error. This path should also be exerc i s e d w i t h b u f p o s = 0 to g u a r d a g a i n s t s o m e o t h e r e r r o r , e . g . , e r r o r 3, e v e n t h o u g h bufpos's value is not even tested when executing r u l e 8'. In s h o r t , the s e c r e t of r e l i a b l e t e s t i n g is to f i n d a l l c o n d i t i o n s r e l e v a n t to a p r o g r a m l s c o r r e c t o p e r a t i o n a n d to e x e r c i s e a l l p o s s i b l e c o m b i n a t i o n s o f t h e s e c o n d i t i o n s . In S e c t i o n 4, w e will illustrate how relevant conditions can be discovered. In s h o r t , a r e l i a b l e t e s t i s d e s i g n e d n o t s o m u c h to e x e r c i s e p r o g r a m a t l a s to e x e r c i s e a t e under circumstances such that an error is detectable if one exists. Tests based solely on the i n t e r n a l s t r u c t u r e of a p r o g r a m a r e l i k e l y to be unreliable. 2.3 E x a m p l e 2: An Exam Scheduler This program is presented by Hoare in Dahl (1972) a s a n e x a m p l e of t h e u s e o f v a r i o u s a b s t r a c t data s t r u c t u r e s and an i n f o r m a l p r o o f of correctness. To find data that reliably reveal errors like those illustrated here, all the conditions relev a n t t o t h e c o r r e c t o p e r a t i o n of a p r o g r a m m u s t b e k n o w n a n d t e s t d a t a m u s t b e d e v i s e d to e x e r c i s e a l l p o s s i b l e c o m b i n a t i o n s of t h e s e c o n d i t i o n s , w h e t h e r or not the p r o g r a m l s i n t e r n a l s t r u c t u r e indicates certain condition combinations cause d i f f e r e n t s e q u e n c e s o f a c t i o n s to b e p e r f o r m e d . For example,, Tables 1 and 4 actually represent 27 d i s t i n c t r u l e s i f a l l " d o n t t c a r e " c o n d i t i o n s a r e w r i t t e n out e x p l i c i t l y as d i f f e r e n t r u l e s .(see T a b l e 5). T e s t d a t a r e q u i r e d to e x e r c i s e a l l 27 r u l e s w i l l d e t e c t e r r o r 5 b e c a u s e t h i s e r r o r in T h e p r o b l e m i s to c o n s t r u c t a. t i m e t a b l e f o r university examinations such that 1) T h e n u m b e r o f s e s s i o n s i s k e p t s m a l l , t h o u g h no a b s o l u t e m i n i m u m i s s t i p u l a t e d . Z) E a c h e x a m i s s c h e d u l e d f o r o n e s e s s i o n . 3) No e x a m i s s c h e d u l e d f o r m o r e t h a n o n e session. 4)No session involves more than K exams. S) No s e s s i o n i n v o l v e a m o r e t h a n h s t u d e n t s . 6)No student takes more than one exam in a session. 501 -t The solution p r o p o s e d b y H o a r e t r i e s a l l c o m b i n a t i o n s o f u n s c h e d u l e d e x a m s t h a t s a t i s f y (Z) t h r o u g h (6) a n d s e l e c t s t h e c o m b i n a t i o n t h a t m a x i m i z e s h. p r o o f of c o r r e c t n e s s bilit~r. 3. a p p r o a c h to s o f t w a r e r e l i a - Views on Testing Having discussed some fundamental definitions in the t e s t i n g a r e a and e x a m i n e d e x a m p l e s of t h e p r o b l e m s to b e s o l v e d b y a n a p p r o a c h to t e s t i n g , we will now look at what others have said about the goals, methods, and difficulties of program t e s t i n g b e f o r e t u r n i n g o u r a t t e n t i o n to o u r p r o posed method. The specifications and assertions, except (1) a r e p r e c i s e l y a n d f o r m a l l y s t a t e d s u c h t h a t a formal proof could be performed. However, a n e r r o r c r e e p s i n r e l a t e d to (1). It i s i m p o s s i b l e to s t a t e (1) p r e c i s e l y s i n c e t h e r e i s n o w a y to p r e d e t e r m i n e the n u m b e r of s e s s i o n s , so it i s a s s u m e d t h a t t h e s o l u t i o n m e t h o d w i l l do t h e best it can. The error arises from a variable " r e m a i n i n g " w h i c h i s to c o n t a i n all t h o s e e x a m s w h i c h do n o t a p p e a r i n t h e t a b l e s o f a r , i. e. , unscheduled exams. The error is manifested in two ways: 3.1 "Exhaustive" testing "Software certification.., ideally.., means checking all possible logical paths through a p r o g r a m . " [ B o e h m (1973)] "Exhaustive testing, i. e. , testing all possible (input)... " [Poole (1973), p. 310] 1) P e r f o r m a n c e . E x a c t l y 500 e x a m s a r e a l w a y s p r o c e s s e d e v e n if t h e r e a r e f e w e r t h a n 500 e x a m s t o b e s c h e d u l e d . S i n c e t h e program is recursive and searches all comb i n a t i o n s of u n s c h e d u l e d e x a m s , the p e r f o r m anc~ will be degraded by processing "empty e x a m s " ( e x a m s f o r w h i c h no s t u d e n t i s r e g i s tered). Exhaustive testing, defined either in terms of program paths or a program's input domain (alt h o u g h t h e s e d e f i n i t i o n s a r e n o t e q u i v a l e n t ) S, i s usually kited as the impractical means of achieving a reliable and valid test. The importance of the cited statements is in the questions which arise from them: Z) V i o l a t i o n o f c o n s t r a i n t s . Empty exams still h a v e to b e s c h e d u l e d in s o m e s e s s i o n , s o i t i s p o s s i b l e t h a t t h e l i m i t a t i o n i m p l i c i t i n (1) could be violated unnecessarily. That is, the p r o g r a m a s it s t a n d s m a y be u n a b l e to produce a solution where it should if empty exams were not processed. • Is t h e r e a w a y to t e s t a p r o g r a m w h i c h is e q u i v a l e n t t o e x h a u s t i v e t e s t i n g (in t h e s e n s e of b e i n g r e l i a b l e and v a l i d ) ? • Is there a practical tive testing ? approximation to e x h a u s - • If so, how good is the a p p r o x i m a t i o n ? T h i s e r r o r i s e a s y to c o r r e c t by i n i t i a l i z i n g " r e m a i n i n g " to i n c l u d e only t h o s e e x a m s f o r w h i c h at least one student has registered. It s h o u l d b e n o t e d t h a t e x h a u s t i v e t e s t i n g o f a program:s input domain is a process not necess a r i l y g u a r a n t e e d to t e r m i n a t e . There are some p r o g r a m s w h o s e b e h a v i o r (e. g . , w h e t h e r t h e y stop) is i m p o s s i b l e to v e r i f y by t e s t i n g o r any other means and some programs have infinite input domains, so exhaustive testing can never be completed. This error is very different from those of Example 1 and is important for the following reasons: 1) T h i s p r o g r a m w o u l d b e v e r y h a r d t o u n d e r s t a n d and to t e s t t h o r o u g h l y w i t h o u t s o m e s o r t o f a n i n f o r m a l p r o o f to g u a r a n t e e t h a t all c o m b i n a t i o n s a r e p r o p e r l y h a n d l e d . But t h e p r o o f o n l y c o n s i d e r s a s s e r t i o n s (2) - (6) a n d i g n o r e s (1) b e c a u s e i t c a n ' t b e f o r m a l i z e d . T h e e r r o r t h e n a r i s e s b e c a u s e (1) i s not checked out informally even if it can't be formally stated and proved. Even formally s p e c i f i a b l e n e c e s s a r y c o n d i t i o n s f o r (1) t o b e s a t i s f i e d a r e n o t p r o v e d , e . g . , t h a t no s e s s i o n s c h e d u l e s an e m p t y e x a m . This error s h o w s t h e d a n g e r o f c o n c e n t r a t i n g on t h e d i f f i c u l t p a r t s of a p r o g r a m a n d t h e f o r m a l i z e d s p e c i f i c a t i o n s to t h e e x c l u s i o n o f t h e easier parts. Although some programs cannot be exhaustively tested, a basic hypothesis for the reliab i l i t y and v a l i d i t y of t e s t i n g is that the input domain of a program can be partitioned into a f i n i t e n u m b e r of e q u i v a l e n c e c l a s s e s s u c h t h a t a t e s t o f a r e p r e s e n t a t i v e of e a c h c l a s s w i l l , b y induction, t e s t the e n t i r e c l a s s , and h e n c e , the e q u i v a l e n t of e x h a u s t i v e t e s t i n g of the input d o main can be performed. If s u c h t e s t s a l l t e r m inate and if the p a r t i t i o n i n g is a p p r o p r i a t e , a c o m p l e t e l y r e l i a b l e t e s t of the p r o g r a m will h a v e been performed. T h i s i s n o t , of c o u r s e , a n o v e l i d e a . H o a r e [in B u x t o n ( 1 9 7 0 ) , p . 21] h a s p o i n t e d o u t t h a t t h e e s s e n c e o f t e s t i n g i s to e s t a b l i s h t h e b a s e p r o p o s i t i o n of a n i n d u c t i v e p r o o f (we p u r s u e t h i s i d e a f u r t h e r i n S e c t i o n 5). T h i s p i n p o i n t s the f u n d a m e n t a l p r o b l e m of t e s t i n g - - t h e i n f e r e n c e f r o m t h e s u c c e s s of o n e s e t o f t e s t d a t a t h a t o t h e r s will also succeed, and that the success of one Z) T h i s p r o g r a m i s s i m i l a r t o m a n y o t h e r p r o grams where the correct answer can't be f o r m a l l y c h a r a c t e r i z e d and w h e r e it w o u l d b e d i f f i c u l t to e v e n d e t e r m i n e w h e t h e r t h e produced timetable has the fewest number of sessions. A s s u m i n g no p r o o f e x i s t e d o r a s c o n f i r m a t i o n of the p r o o f , w h a t type of t e s t d a t a s h o u l d be s e l e c t e d ? What c r i t e r i a f o r t e s t data can h o p e to y i e l d a r e l i a b l e t e s t of t h i s p r o g r a m ? Our prop o s e d a p p r o a c h i s n o t y e t s u f f i c i e n t l y p o w e r f u l to answer these questions. We c i t e t h i s e x a m p l e t o show the difficulties that must be resolved by a c o m p l e t e t h e o r y of t e s t data s e l e c t i o n a s w e l l a s to s h o w t h a t t h e s e d i f f i c u l t i e s c h a l l e n g e e v e n a SExhaustive testing of paths will not necessarily find all errors. For example, both paths in the following program can be tested without necessarily revealing the error: IF ( X + Y + Z ) / 3 = X THEN P R I N T ("X, Y, and Z are equal in value"); E L S E P R I N T ("X, Y, and Z are not equal in value"); 5~ t e s t data s e t i s e q u i v a l e n t t o s u c c e s s f u l l y c o m pleting an exhaustive test of a program's input domain. 3. Z Is Proving Really Better What testing lacks (and what this paper is attempting to provide a start towards) is a theoreticaUy sound (but practical) definition of what constitutes an adequate test. When our unders t a n d i n g of w h a t m a k e s a n a d e q u a t e t e s t i s a s g o o d a s o u r u n d e r s t a n d i n g of w h a t m a k e s a n ' a d e quate proof, testing can assume its proper role in the theory and practice of software development. Than Testing? "Program testing can be used to show the presenc_e of_bugs, but never to show their absence! "[in Dahl (1972), p. 6] " [ W h e n ] y o u h a v e g i v e n t h e p r o o f o f [a p r o gram'sJ correctness .... [you] can dispense with testing altogether:'[inNaur (1969b),p. 517 3. 3 Just because testing is not a completely reliaMe means of demonstrating program correctness does not mean it's sensible to rely solelyonproofs. Proofs aren't completely reliable either. Proofs c a n o n l y p r o v i d e a s s u r a n c e of c o r r e c t n e s s i f a l l the following are true: on "Testing must be planned for throughout the entire design and coding process" [ H e t z e l (1973), p. 18] Testing must not only he planned for, it is in fact performed in one way or another at every stage of the programming process, not just when a program has been completed. Specifications m u s t b e t e s t e d a g a i n s t e x a m p l e s to s e e i£ t h e y are complete, consistent, and unambiguous; programs are desk checked by informally consideri n g w h a t w i l l h a p p e n w h e n c e r t a i n k i n d s of d a t a are processed; a proof must partition data into cases that affect the validity of assertions differe n t l y . T e s t i n g , i n t h e s e n s e of i d e n t i f y i n g d i s tinguishable cases and evaluating their effect on the programming requirement, specifications, c o d e , o r a p r o o f , o c c u r s i n e v e r y p h a s e of p r o gramming. Moreover, the need to test affects t h e p r o c e s s of p r o g r a m c o n s t r u c t i o n i n v a r i o u s ways: a. There is a complete axiomatization of the e n t i r e r u n n i n g E n v i r o n m e n t of t h e p r o g r a m - all language, operating system, and hardware processors. b. The processors are proved consistent the axiomatization, T h e E f f e c t of T e s t i n g C o n s i d e r a t i o n s Program Development with c. T h e p r o g r a m i s c o m p l e t e l y a n d f o r m a l l y implemented in such a way a proof can be performed or checked mechanically. d. The specifications are correct in that if every program in the system is correct with respect to its specifications, then the entire system performs as desired. a) T e s t i n g b e i n g i n e v i t a b l e , i t i s g o o d p r a c t i c e to identify testing needs, e.g., weak or critical links in a system, early in program d e s i g n (e. g . , s e e H a n s e n (1973)). These requirements a r e f a r b e y o n d t h e s t a t e of t h e a r t of p r o g r a m s p e c i f i c a t i o n a n d m e c h a n i c a l theorem proving, and we must be satisfied in practice with informal specifications, axiomatizations, and proofs. Then problems arise when proofs have errors, specifications are incomplete, a m b i g u o u s , o r u n f o r m a l i z a b l e ( a s i n e x a m p l e Z), and systems are not axiomatizable. The two examples already discussed have clearly shown that an incomplete attempt at a program proof does not assure a program will not fail. These examples are very realistic in terms of the state of t h e a r t of p r o g r a m p r o v i n g f o r r e a l p r o g r a m s . b) Programs should be structured so logical t e s t i n g o f v a r i o u s a b s t r a c t i o n s of t h e p r o g r a m c a n r e d u c e a c t u a l t e s t i n g of t h e f i n a l program. c) S p e c i f i c a t i o n s testable. must be precise enough to be d) T h e n e e d t o g e t s p e c i a l i n f o r m a t i o n j u s t t o verify the successful execution of a test run a f f e c t s p r o g r a m d e s i g n , (e. g . , t e s t i n g a n operating system scheduler requires access to information not ordinarily available). Despite the practical fallibility of proving, attempts at proof are nonetheless valuable. They can reveal errors and assist in their prevention. Furthermore, programs cannot be proved unless they can be understood. Facilitating provability sets a worthy standard for good languages, program structure, specifications , and documentation, and thereby assists in preventing errors. I m p r o v e d u n d e r s t a n d i n g o f t h e n a t u r e of t e s t ing will help in developing and performing tests at all stages in the program development process. 3. 4 Conclusions There is a general concept which unifies all the activities involved in demonstrating that a program does not fail. That concept is case analysis. Case analysis occurs in In short, neither proofs nor tests can, in practice, provide complete assurance that prog r a m s w i l l n o t f a i l . E v e n s o , a t t e m p t s to p r o v e programs, like attempts to test them completely, a r e e s s e n t i a l t o t h e d e v e l o p m e n t of u s a b l e p r o grams. Tests have the advantage of providing accurate information about a program's actual behavior in its actual environment; a proof is limited to conclusions about behavior in a p0stu 1~ted environment. Both have an essential role In program development, though each is a fallible technique. Testing and proving are complementary methods for decreasing the likelihood of program failure. a) D e s i g n a n d s p e c i f i c a t i o n w h e r e i t i s n e c e s s a r y to exclude cases of data where the program is not expected to operate or to identify cases in the input or output which require Special treatment. I n E x a m p l e 1, t h e s e w o u l d i n c l u d e s u c cessive break characters and termination characters, and in Example Z the empty exam. b) P r o g r a m c o n s t r u c t i o n , w h e r e t h e s o l u t i o n t o the problem dictates that certain cases require 503 special treatment and other cases can be lumped together. These are iUustrated in Example 1 where the predicates bufpos ~ 0 and fill ~ 0 form paths which treat successive break characters and the first word on the first line, respectively, and in Example Z where the empty exam must be filtered out of t h e e x a m s t o b e p r o c e s s e d . c) P r o g r a m p r o v i n g , w h e r e a s s e r t i o n s m u s t b e made about special cases or generalized to cover several cases and where proofs break into cases. Example 1 shows an assertion which must have a case which covers the first word as well as the case that covers all nonfirst words and Example Z shows that the assertion which is not formalized must be checked out in a different manner from the more formalized assertions. d) T e s ~ , where it must be demonstrated that the total of all cases involved in program specification, design, construction, and proof have been correctly and completely handled. Ideally, the cases created during design specification, construction, and proof should be an adequate basis for test data selection, although the cases may need further analysis and combination in order to obtain a reasonably sized and still effective set of test data. There is a great need for more examples to m a k e v a g u e t e s t i n g c o n c e p t s (e. g . , " b o u n d a r y " condition) at least more intuitively clear and for theories of test data selection which try to anal y z e t h e f a c t s a b o u t t e s t i n g i n r e l a t i o n to e a c h o t h e r f o r t h e p u r p o s e of g u i d i n g t h e s e l e c t i o n o f test data toward greater effectiveness. Section 4 will illustrate a technique for selecting test data and Section S will analyze this technique i n t e r m s o f t h e t h e o r y s k e t c h e d i n S e c t i o n 1. 4. Developing Effective Program Tests Our goal is to define how to select test data for a particular program such that if the program processes all test data correctly, we can be reas o n a b l y c o n f i d e n t t h e p r o g r a m Will p r o c e s s al_~ data correctly. For this to be possible, the test data must cover all errors in the program b e i n g t e s t e d , i. e . , t e s t d a t a s e l e c t i o n c r i t e r i a must ensure that data revealing any errors will be selected. Our approach to testing (and indeed, any approach) is based on assumptions about how program errors occur. Our approach assumes errors are primarily due to 1)inadequateunderstanding of all conditions a program must deal with, and Z)failure to realize that certain comb i n a t i o n s of c o n d i t i o n s r e q u i r e s p e c i a l t r e a t m e n t . We distinguish: • Test data, the actual values from a program's input domain that collectively satisfy some test data selection criterion; • Test predicates, a description of conditions a n d c o m b i n a t i o n s of c o n d i t i o n s r e l e v a n t t o the program's correct operation. In short, test predicates describe what aspecte of a p r o g r a m a r e t o b e t e s t e d ; t e s t d a t a c a u s e these aspects to be tested. O u r p u r p o s e i n t / U s p a r t of t h e p a p e r i s 1 ) t o iUustrate the l~roblems and issues faced in developing reliable and valid tests for a particular p r o g r a m s o t h a t l a t e r , i n S e c t i o n 5, w h e n w e discuss formal criteria for reliability and validity, examples illustrating the formal crite.riawill b e a t h a n d ; a n d 2) t o s h o w i n f o r m a l l y , w h a t m a y , with further research, become a practical approach to defining reliable tests. 4. 1 An Overview of t h e M e t h o d There are several sources of information about the conditions and combinations of conditions relevant to a programts operation: • the general requirement a program is to satisfy; • the program's specification; • general characteristics of t h e i m p l e m e n tation method used, including special conditions relevant to data structures and how data is represented; and • the specific internal structure of an actual implementation. N o n e of t h e s e s o u r c e s of i n f o r m a t i o n i s b y i t s e l f sufficient for generating a reliable test. Inforro.ation from all these sources must be used to i n s u r e a l l c o m b i n a t i o n s of c o n d i t i o n s a r e i d e n t i fied. Our technique for developing and describing test predicates is derived from decision table techniques, and will be called the condition table method. A condition table looks like the top half of a decision table. Each column in a condition table is a logically possible combination of conditions that can occur when executing the program being described. Each column is our representation of a test predicate. The basic idea is to ident i f y c o n d i t i o n s d e s c r i b i n g s o m e a s p e c t of t h e p r o b l e m o r p r o g r a m to b e t e s t e d , a n d t h e n b u i l d condition tables for groups of conditions. Sometimes separate condition tables are combined into a single table; when to combine tables and when to leave them separate is an important issue needing further study. 4. Z Deriving Test Cases from Specifications A program's specifications are an important source of testbecause data satisfying suchpredicates are able to detect missing path and inappropriate path selection construction errors. Such data can also detect design and specification errors, as we shall see. We will useNaur's specifications for Example 1 to illustrate test case development from specifications. Then, using knowledge of an actual implementation (either that shown in F i g u r e Z o r F i g u r e 3), w e w i l l s h o w h o w t o e l i m inate some test predicates without impairing test reliability. (It m a y a l s o h a p p e n t h a t s o m e ' t e s t predicates will have to be added when information about an actual implementation becomes available. but this does not occur here. ) T h e f i r s t p a r t of N ~ u r ' s specification states: (1) " G i v e n a t e x t , c o n s i s t i n g of w o r d s s e p a r a t e d by BLANKs or NL (new line)characters .... " This clause attempts to describe the program's input domain. We begin to develop a condition Table 6 FIRST CONDITION TABLE FOR INPUT DOMAIN • 6~ Cls Pre~h~s~SL, ML, Ot] B L 8L B ~ BL N L ML N L HL IO t i O t Ot t i cz, c = c 1 , ~ d s ~ N L . o t ] S L S L I C t 1. s c ~ c c e 1. I B ~ ~ L l o t Ot 1' 1' I' 1' 1. s~ ~., c¢ 1. ZZ. 23 Table 7 REVISED 1 el, PrlrvCba.~BL.NL. Ot] BL 2 $ BL B L 4 S BL B L i C2: ~ r C h a d B L . NL, Or, ET} B L BL N L NL dot C3: F i r s t Word? Y H Y N • CONDITION TABLE FOR INPUT DOMAIN 24 15 2'6 127 z8 BL B L B L B L B L B L BL I ML N L N L N L : N L N L N L N L NL N L N L NL I ! Gt Ot Ot Ot Ot Ot ETBL BL ~ L NL ~ ~ Y' N N r 6 ? 10Jl1 819 i lZ [ ~3 14 16 17 18 19 20 11 29130;31 3z l ET!ET r ET Y N N Y N Y N >1 1 >1 ' 1' ? 1' 1' Y ~ I~ N i E T E T E T IE ~ i N Yiy >1 ! N N iOt 1, 1, [ 1. 1, B L N L Ot ET BL NL 0t E~ (1') (Y) ~ ) (Y] (N) (N) (1,1:) ('M) I C4, Sink lensth[1,>l] . I 1' i 1' ? ? 1 ' >1 '1 >1 J 1 1 >1 1 >1 1 >1 1, I ? ? 1. 1. 1' 1' C o n s t r a i n t 8 b e t w e e n ¢onditiol~s C 2 ~ L ) V CZ(NL) : C4(1.) Cl(Ot) : G3 c1(1') =-~c3 table by loolcing for conditions relevant to the input domain that are also relevant to the processi n g r e q u i r e d b y t h e s p e c i f i c a t i o n , i. e . , w e t r y t o extract conditions from the specification that we, as programmers, would consider relevant to deciding when it is appropriate to perform certain actions. F o r e x a r n p l e , i f w e t h i n k i n t e r m s of scanning the input text character by character, then it's possible and reasonable to describe the i n p u t i n t e r m s of t h e c h a r a c t e r c u r r e n t l y b e i n g ~canned and the one irnrnediately preceding it. This approach gives rise to the condition table s h o w n i n T a b l e 6. T h e n o t a t i o n t h e r e s h o w s t h a t t h e v a l u e s of P r e v C h a r ( c o n d i t i o n C I ) a n d C u r C h a r ( c o n d i t i o n CZ) c o n s t i t u t e c o n d i t i o n s relevant to the program's correct operation. In particular, t h e p o s s i b l e v a l u e s f o r C1 a n d CZ a r e partitioned into three subsets, the value BL (for BLANK), the value NL, and all other character v a l u e s , O t . A n y c o n d i t i o n m a y , of c o u r s e , b e undefined (represented in the table by a question mark), and so there are four possibilities to be c o n s i d e r e d f o r e a c h c o n d i t i o n , y i e l d i n g 16 p r e d i c a t e s i n a11. f o r d e a l i n g w i t h v a r i o u s t y p e s of i n p u t m e d i a ( e . g . , cards or paper tape) and this interpretation subsumes the more restricted case of single break characters, s o w e a d o p t i t . Of c o u r s e , t h e a m biguity should be checked with the specification writer, since our reasoning for permitting data satisfying these predicates may not be valid in t h e c o n t e x t of t h e a c t u a l u s e o f t h e p r o g r a m . It can be seen here how preparation of a condition table is a natural way to check a specification f o r c o r n p l e t e n e s s a n d l a c k of a m b i g u i t y , a s w e l l as for identifying characteristics of test datathat should be Presented to the program during the test phase. Next we need to check the test predicates for r e l i a b i l i t y , i. e . , i f w e s e l e c t d a t a t o c o v e r e a c h predicate, wiU we be forced to select data capa b l e o f r e v e a l i n g al'l e r r o r s i n a n i m p l e m e n t a t i o n ? To answer this question, we must decide whether any conditions relevant to the correct operation of t h e p r o g r a m a r e m i s s i n g f r o m t h e t a b l e a n d whether value sets have been partitioned appropriately. We must also decide whether the test p r e d i c a t e s a r e i n d e p e n d e n t , i. e . , w h e t h e r t h e sequence in which test predicates are exercised when processing test data is potentially signifi c a n t to t h e c o r r e c t n e s s of the program. For example, we can see that data chosen to exercise all the predicates in Table 6 will have to include data where words are separated by at least two break characters, but all predicates can be exercised without necessarily having exactly one break character between any w o r d s . ~ I s t h e o c c u r r e n c e of a b r e a k of l e n g t h one significant to the correct operation of a program? If a program correctly processes t e x t c o n t a i n i n g b r e a k s of l e n g t h t w o o r g r e a t e r , will it necessarily correctly process text conraining breaks of length one? What about breaks of l e n g t h o n e a n d t w o p r e c e d i n g t h e f i r s t w o r d i n The next step is to check if all these predicates can be satisfied. Clearly the current or previous character may have an undefined value. CZ i s u n d e f i n e d w h e n ~ end of the text is found. C1 i.s u n d e f i n e d w h e n t h e c u r r e n t c h a r a c t e r i s t h e f i r s t c h a r a c t e r i n t h e t e x t . P r e d i c a t e 16 t h e r e fore is satisfied by an empty text. To make this interpretation of undefined values clearer, in later condition tables we will redefine the value set for CurChar to include the pseudo-character E T , d e s i g n a t i n g e n d - o f - t e x t , a n d n o t p e r m i t C2 to be undefined. L o o k i n g f u r t h e r , a t p r e d i c a t e s 1 a n d 2, w e immediately see an ambiguity in the specification. It's not ciear whether words are separated by a single BLANK or NL character, or whether several such break characters are permitted. If o n l y o n e is permitted, t h e n p r e d i c a t e s 1, Z, 5, a n d 6 a r e n o t satisfiable. Also, if breaks are not permitted after the last word in the text or before the first w o r d , t h e n p r e d i c a t e s 4, 8, 13, a n d 1 4 c a n b e . e l i m inated as unsatisfiable. Permitting all these predicates to be satisfied seems the most reasonable approach since it provides more flexibility ~For example, if predicate 9 is exercised foll o w e d b y p r e d i c a t e 3, a b r e a k o f l e n g t h o n e h a s been processed, b u t i f t h e s e q u e n c e i s 9, 1, 3, t h e n t h e b r e a k i s t w o c h a r a c t e r s l o n g . So length of breaks is represented as a particular s e q u e n c e of p r e d i c a t e s b e i n g e x e c u t e d . 505 ? the text or following the last word? From experience, we know that errors involving all these s o r t s of c o n d i t i o n s a r e q u i t e p o s s i b l e , s o w e w i 1 1 a d d c o n d i t i o n s to f o r c e s e l e c t i o n o f d a t a t h a t check these error possibilities. suggested. Examining the clause with respect t o T a b l e 7, h o w e v e r , s h o w s t h a t a l i n e b r e a k c a n never be made before the first word in the text unless the first word is preceded by one or more break characters. Naur's program itself violates this clause because it always puts out a line break • at the beginning of the output. Undoubtedly, the i n t e n t of t h e c l a u s e w a s n o t t o f o r b i d a n i n i t i a l i z ing line break in the output, but rather to say that a word in the input text must be contained completely on a single line in the output; here is another specification error (failure to correctly e x p r e s s t h e i n t e n t of t h e d e s i g n e r ) r e a d i l y d e t e c t e d by case analysis. To distinguish breaks before the first word, we add a condition "First Word?", which is false u n t i l o n e Ot c h a r a c t e r i s s e e n a n d t h e n t r u e . To ensure test data contain breaks of length I, the condition "Break length" is added; its values r e f e r to t h e l e n g t h of a b r e a k p r e c e d i n g E T o r Ot. Table 7 results; note the conditions are not aU independent. W h e n ' C l i s u n d e f i n e d , C3 i s f a l s e , etc. Constraints between conditions are indicated below the condition table. The remainder of the specification states that "each line is filled as far as possible as long as no line will contain more than MAXPOS characters. " We will not construct a condition table for this part of the specification. Such a table would c o n t a i n c o n d i t i o n s d e s c r i b i n g t h e l e n g t h Of w o r d s , current and future lengths of lines, and possibly, t h e n u m b e r of w o r d s o n a l i n e . T h e t a b l e w i l l show how to exercise a program for aU relevant combinations of boundary conditions. Arguments discussing what constitutes a proper set of test p r e d i c a t e s f o r t h i s p a r t of t h e s p e c i f i c a t i o n a r e complex and alrnost require a paper in itself. We only intend to sketch an approach to test predicate construction here. Are the test predicates sufficiently independent n o w ? Do s o m e s e q u e n c e s o f t e s t p r e d i c a t e s e x e c u t i o n c o r r e s p o n d ~o s p e c i a l p r e d i c a t e s t h a t s h o u l d be included? It can be seen that data consisting of a t m o s t o n e w o r d w o u l d s u f f i c e t o e x e r c i s e a l l the test predicates. Equally well, all test data c o u l d c o n s i s t of t e x t s c o n t a i n i n g m o r e t h a n o n e word. It's highly unlikely that a program will process words correctly for texts several words long but not for texts one word long, or vice versa, although such a program could of course be specially written. We will be content, for now, w i t h n o t i n g t h a t t h e n u m b e r of w o r d s i n a n i n p u t text might be a significant condition for some programs; later when an implementation is availa b l e w e c a n d e c i d e w h e t h e r t h e n u m b e r of w o r d s in the text is a condition for which an error may exist; if so, the condition table can be modified t o i n s u r e t h i s a s p e c t of t h e p r o g r a m i s t h o r o u g h l y tested. 4. 3 Test predicates are the motivating force for data selection. A test predicate is found by identifying conditions and combinations of conditions relevant to the correct operation of a program. Conditions arise first and primarily from the specifications for a program. As implementations are considered, further conditions and predicates may be added. Conditions can arise from many sources, and as they come to mind, they are writt e n d o w n i n t h e c o n d i t i o n s t u b of a t a b l e . The columns of the table are then generated and checked. The checking process may suggest further conditions to be added. If the number of predicates becomes excessive, it is better to wait until information from an actual implementation is available to reduce the number of predicates so the impact of the reduction on test reliability can be explicitly assessed. T e s t p r e d i c a t e s c a n b e e l i m i n a t e d (i. e . , t h e i r exercising made unnecessary) if it can be shown that data satisfying the predicates are treated the same by the actual program and from the viewpoint of the specifications, For example, distinguishing between BL and NL in Table 7 is un~necessary. C u r s o r y e x a m i n a t i o n of t h e p r o g r a m s in either Figure 1 or Figure 2 shows that every t i m e CW i s t e s t e d f o r e q u a l i t y w i t h B L i t i s a l s o tested for equality with NL; the specifications, moreover, do n o t r e q u i r e d i f f e r e n t e f f e c t s d e p e n d ing on whether a BL or NL is the value of a break character. So w e c a n s a f e l y p a r t i t i o n t h e v a l u e set for CurChar into BL or NL, ET, and Ot and the value set for PrevChar into BL or NL, and Ot without reducing the reliability of this set of test predicates for these particular programs. This will reduce the number of test predicates i n T a b l e 7 f r o m 32 t o 16 a n d t h e n u m b e r o f t e s t runs required from ten (one for each ET condition) to six. This illustrates how the amount of t e s t i n g c a n b e s a f e l y r e d u c e d a f t e r a s e t o f test predicates has been developed independently of program structure. P r o o f s of s i m p l e p r o g r a m p r o p e r t i e s c a n r e d u c e t h e a m o u n t of t e s t i n g required. The next part of Naur's specification The major problem in our approach as illustrated so far is that conditions are considered as they spring to mind. This may mean wasting effort on conditions unlikely to be connected with errors in the actual implementation to be tested. Nonetheless, s u c h c o n d i t i o n s do t e s t s o m e t h i n g about a program. T h e q u a l i t y of t h e t e s t p r e d i cates cannot be decided at the exact moment of their conception. Test predicate analysis must be carried out in its entirety and then checked for overall reliability before deleting individual test predicates. states: "... line breaks (in the output) must be made only where the (input) text has BLANK or The emphasis is on a systematic approach, and the condition table fulfills this purpose by recording conditions and their combinations in an orderly and mechanically checkable fashion. It forces attention toward specifications and what the program should do rather than an NL... " T h i s c l a u s e states a constraint o n w h e n Summary There are two main features of the method of test data selection presented here--the use of test predicates and the use of a condition table. the action of o u t p u t t i n g a N L c h a r a c t e r i s v a l i d . S i n c e w e have already distinguished BLANKs and NL chara c t e r s i n T a b l e 7, n o n e w t e s t c o n d i t i o n s a r e 506 a c t u a l p r o g r a m s t r u c t u r e and w h a t a p r o g r a m s e e m s to doe I t a v o i d s the f l a w s of t e s t i n g m e t h o d s that f o c u s s o l e l y on the i n t e r n a l s t r u c t u r e of a p r o g r a m , but i s not n e c e s s a r i l y d i vorced from a program's internal structure since all p r o g r a m p r e d i c a t e s m u s t ultimately be r e p r e s e n t e d in the c o n d i t i o n ' t a b l e i f the t a b l e i s to d e f i n e a r e l i a b l e s e t of t e s t p r e d i cates. s a i d to b e t r u e . With t h i s de~finition in m i n d , C O M P L E T E ( T , C) i s d e f i n e d a s f o l l o w s : i f TC_D, C O M P L E T E ( T , C) = ( V c e C)(~It GT)c(t) h (Vt • T)(~ice C)c(t) This definition states that every test predicate b e l o n g i n g t o C m u s t b e s a t i s f i e d by a t l e a s t one t e T, and e v e r y t m u s t ss~fisfy a t l e a s t one t e s t predicate. Both r e q u i r e m e n t s a r e n e c e s s a r y , s i n c e p r o o f of Cts v a l i d i t y w i l l r e q u i r e p r o v i n g the c o r r e c t n e s s o f a p r o g r a m f o r data that s a t i s fy no t e s t p r e d i c a t e , a n d h e n c e , a r e i n c l u d e d in no s e t of t e s t d a t a T. Th e type of r e a s o n i n g u s e d in s e l e c t i n g t e s t p r e d i c a t e s i s m u c h l i k e that u s e d in c r e a t i n g a s s e r t i o n s , and h e n c e the a p p r o a c h f o c u s e s a t t e n t i o n on the a b s t r a c t p r o p e r t i e s of the p r o g r a m and i t s s p e c i f i c a t i o n s . A t e s t p r e d i c a t e a n a l y s i s of a p r o g r a m m a y be a p r a c tical f i r s t step toward p r o g r a m proving with the a d v a n t a g e t h a t both t e s t i n g and p r o v i n g c ou l d be p e r f o r m e d s e q u e n t i a l l y o r in p a r a l l e l . 5. T h e ex~amples in S e c t i o n 2 showe~ t h a t it i s r e a d i l y po~lsible to c h o o s e data t h a t e x e r c i s e t e s t p r e d i c a t e s in an o v e r l a p p i n g m a n n e r (e. g . , s e e T a b l e !5). T h i s s u g g e s t s a n a t u r a l p a r t i t i o n i n g of t h e i n p u t d o m a i n into r e l a t e d e q u i v a l e n c e c D i s s e s , E ( C ' ) d e f i n e d as f o U o w s : L e t C ~_~C. T h e T h e o r y of T e s t i n B Th e m e t h o d o l o g y i l l u s t r a t e d in the p r e c e d i n g s e c t i o n c o n s t i t u t e s an i n f o r m a l a p p l i c a t i o n of t h e t h e o r e t i c a l conce15ts d e f i n e d in S e c t i o n I , i. e . , S e c t i o n 4 d e m o n s t r a t e s th e u s e of C O M P L E T E ( T , C ) R E L I A B L E ( C ) , and V A L I D ( C ) , a s a m e a n s of d e vising thorough t e s t s when the t e s t data s e l e c t i o n c r i t e r i o n , C, c o n s i s t s of a s e t of t e s t p r e d i c a t e s . In t h i s S e c t i o n , w e w i l l p o i n t out how o u r u s e of t h i s s o r t of t e s t d a t a s e l e c t i o n c r i t e r i o n s a t i s f i e s these previously defined theoretical concepts. E(C' '} = { d, D I (Vc, C,)c(d)^(vc, c - c , he(d))_ i. e. , E ( C ' ) c o n t a i n s a l l d ~ D t h a t s a t i s f y a l l a n d only t h o s e t e s t p r e i l i c a t e s b e l o n g i n g to C% N o t e t h a t data b e l o n g i n g t o the e q n i v a l e n c e c l a s s E(~) s a t i s f y n o t e s t p r e d i c a t e . The r e l a t i o n b e t w e e n these equivalence classes for a C containing t h r e e t e s t p r e d i c a t e s i s shown s c h e m a t i c a l l y in F i g u r e 4. Note that data b e l o n g i n g to, f o r e x a m p l e , E ( C i 2 ) s a t i s f y j u s t t e s t p r e d i c a t e s c I and c~, and t h a t i f t e s t T c o n t a i n s data b e l o n g i n g to E(C12), T n e e d not a l s o c o n t a i n data b e l o n g i n g to E ( C i ) and E ( C 2 ) to s a t i s f y C O M P L E T E . Note t h a t a c o m p l e t e t e s t , t h e r e f o r e , i s e q u i v a l e n t to s e l e c t i n g one d a t u m f r o m a s u b s e t of the e q u i v a l e n c e c l a s s e s w i t h o u t s e l e c t i n g any data f r o m E(d). Since a test p r e d i c a t e is a condition c o m b i n a t i o n in a c o n d i t i o n t a b l e , s e l e c t i n g data to s a t i s f y a t e s t p r e d i c a t e m e a n s s e l e c t i n g d a t a to e x e r c i s e the c o n d i t i o n c o m b i n a t i o n i n th e c o u r s e of e x e c u ting F . N o t a t i o n a l l y , i f a d a t u m d E D s a t i s f i e s t e s t p r e d i c a t e c ~ C i n t h i s s e n s e , t h e n c(d) i s C = ( c | , c2, c~ C123 = C -- , \ c23 = (°2 C1 = (c a / E(Cz3) E(C 3) • E(¢) Figure 4 S t r u c t u r e s h o w in g r e l a t i o n s h i p s b e t w e e n e q u i v a l e n c e c l a s s e s i n d u c e d by a s e t of t e s t p r e d i c a t e s , C. If C is reli~Lble, the c o r r e c t p r o c e s s i n g of data d r a w n f r o m one e q u i v a l e n c e cDLss (e. g . , E(C~2)) p r o v e s th e c o r r e c t n e s s of the p r o g r a m f o r data d r ~ w n f r o m c e r t a i n o t h e r c l a s s e s ( e . g . , E(C~) and E(C2)), i . e . , s u c c e s s p r o p a g a t e s d o w n w a r d s in the l a t t i c e of e q u i v a l e n c e c l a s s e s . S i m i l a r l y , f a i l u r e p r o p a g a t e s u p w a r d . F o r e x a m p l e , the i n c o r r e c t p r o c e a s i n g of data d r a w n f r o m E(C3) i m p l i e s data d r a w n f r o m E(Cz3), E( C1 3 ) , and E(C1z3) w i l l a l s o be p r o c e s s e d i n c o r r e c t l y . 507 Of c o u r s e , not ju~st any s e t of t e s t p r e d i c a t e s w i l l c o n s t i t u t e a r e l i a b l e and v a l i d t e s t data s e l e c t i o n c r i t e r i o n . The f o l l o w i n g r e l a t i o n s h i p c a n b e u s e d , h o w e v e r , to sho~w that C w i l l s a t i s f y t h e g e n e r a l d e f i n i t i o n of V A L I D g i v e n i n F i g u r e I . (Vd e E(¢)) OK(d) D VAI~ID(C) T h i s m e a n s C i s v a l i d i~ a p r o g r a m i s c o r r e c t f o r a l l d a t a s a t i s f y i n g n o t e s t p r e d i c a t e . With r e s p e c t to the c o n d i t i o n t a b l e t e c h n i q u e , t h i s v a l i d i t y c r i t e r i o n r e q u i r e s that c o n s t r a i n t s a m o n g c o n d i t i o n s be p r o p e r l y d e s c r i b e d ; e v e r y c o n d i t i o n c o m b i n a t i o n e x c l u d e d a s i m p o s s i b l e m u s t a c t u a l l y be i m p o s s i b l e . M o r e o v e r , t h e d o m a i n of v a l u e s a s s o c i a t e d with s o m e v a r i a b l e m u s t be c o r r e c t l y d e s c r i b e d , e . g . , CW in S e c t i o n 4 m u s t a t m o s t t a k e on the v a l u e s B L , NL, E T , and O t h e r . A l t h o u g h a n o n - e m p t y E(~) c a n b e v i e w e d as w a r n ing t h a t a p r o g r a m c a n e x e c u t e a r e l i a b l e t e s t s u c c e s s f u l l y and s t i l l c o n t a i n e r r o r s b e c a u s e i t i s i n v a l i d , i t c a n a l s o be v i e w e d m e r e l : ~ a s i n d i catting that t h e c o r r e c t n e s s of t h e p r o g r a m f o r s u c h data i s to be a s s e s s e d b y m e a n s o t h e r than t h e t e s t s i m p l i e d by C. S e p a r a t e v e r i f i c a t i o n , f o r e x a m p l e , by p r o o f , m a y b e e a s i e r t h a n v e r i t i c a t i o n b y t e s t i n g , o r i t m a y b e that t h e d a t a in E(~) h a v e a l r e a d y b e e n v e r i f i e d b y p r i o r t e s t s . It m a y be t h a t a c o n s c i o u s d e c i s i o n h a s ]been m a d e not to e x e r c i s e c e r t a i n t e s t p r e d i c a t e s b e c a u s e it i s " o b v i o u s " the p r o g r a m will• c o r r e c t l y p r o c e s s data s a t i s f y i n g t h e m . Thins c a n r e d u c e the t o t a l t e s t i n g e f f o r t . In a n y e v e n t , the p r o g r a m V s c o r r e c t n e s s f o r d a t a b e l o n g i n g to E(~) i s not d e t e r m i n e d by t h e t e s t s d e f i n e d by C. It m a y s o m e t i m e s be i m p r a c t i c a l to c o n d u c t c o m p l e t e t e s t s . In t h i s c a s e , if C i s kno,~,n to be v a l i d b e c a u s e E(~) i s e m p t y , t h e n t e s t data should be c h o s e n j u s t f r o m equivalLence c l a s s e s t h a t s a t i s f y t e s t p r e d i c a t e s l i k e l y to be e n c o u n t e r e d in p r a c t i c e . W h il e s u c h a t e s t w i l l not n e c e s s a r i l y b e v a l i d , i t w i l l s t i l l be~ r e l i a b l e i f C i s r e l i a b l e , and e s t i m a t e s of o v e r a l l p r o g r a m r e l i a b i l i t y c a n be d e v i s e d by e s t l m a t i n g t h e f r e q u e n c y w i t h w h i c h data s a t i s f y i n g u n e x e r c i s e d t e s t p r e d i c a t e s w i l l be e n c o u n t e r e d i n a c t u a l u s e of t h e p r o g r a m . T h i s s o r t of a n a l y l J i s c a n s e t a l o w e r bound on the e s t i m a t e d f r e q m e n c y of f a i l u r e in the p r o g r a m V s a c t u a l o p e r a t i o n , e v e n w h e n a p r o g r a m i s t e s t e d only i n c o m p l e t e l y . T h e f o r m a l d e f i n i t i o n of R E L I A B L E ( ( : ) i m p l i e d by the u s e of t e s t p r e d i c a t e s i s r a t h e r c o m p l e x , b u t the e s s e n t i a l i d e a i s to d e t i r ~ e t e s t p r e d i c a t e s so s u c c e s s o r f a i l u r e when e x e c u t i n g a p r o g r a m F with a p a r t i c u l a r datum d depends only on w h a t i n d i v i d u a l t e s t p r e d i c a t e s d s ~ t i s t i e s , not the p a r t i c u l a r c o m b i n a t i o n s a t i s f i e d . T h i s i s the s e n s e in w h i c h t e s t p r e d i c a t e s 1Trust be independent. F o r example, using the d e f i n i t i o n of C g i v e n in F i g u r e 4, i f d12c E(C12) and OK(dlZ), then the d e f i n i t i o n of C O M P L E T E ~ s a y s t h a t t h e r e i s no n e e d to t e s t F w i t h d a t a b e l o n g i n g to E ( C i ) o r E(C2). H e n c e O K ( d i e ) i m p l ' i e s OK(d1) an d OK(de), W h e r e d I e E(C I) a n d d 2 c E( Cz) i f C i s r e l i a b l e . S i m i l a r l y , OK(d1) and OKA[d2) i m p l i e s O K ( d l 2 ) , s i n c e one c o m p l e t e t e s t c o u l d I n c l u d e d12 and a n o t h e r , b o t h d I and dz; r e H a r d l e s s of w h i c h c o m p l e t e t e s t i s a c t u a l l y p e r f 0 , r m e d , t h e s u c c e s s of e i t h e r c o m p l e t e t e s t m u s t I m | ~ l y t h e s u c c e s s of a l l o t h e r s i f C i s r e l i a b l e . N,ote a l s o t h a t the d e f i n i t i o n of C O M P L E T E i m p l l e ; s t h a t to b e r e l i a b l e , n e i t h e r t h e s e q u e n c e of t e s t p r e d i c a t e e x e c u t i o n n o r t h e f r e q u e n c y of e x e c u tion can be r e l e v a n t to the s u c c e s s f u l p r o c e s s i n g of t e s t data b e c a u s e data a r e s a i d to s a t i s f y a t e s t p r e d i c a t e w h e t h e r the data c a u s e the p r e d i c a t e t o l b e e x e r c i s e d one t i m e o r m a n y t i m e s , and no m a t t e r w h a t s e q u e n c e s of p r e d i c a t e s a r e e x e r c i s e d b e f o r e o r a f t e r a p a r t i c u l a r one. If f r e q u e n c y o r s e q u e n c e of t e s t p r e d i c a t e e x e c u t i o n i s p o t e n t i a l l y r e l e v a n t to an e r r o r ' s e x i s t ence, s e p a r a t e test p r e d i c a t e s m u s t be defined to c o v e r a l l r e l e v a n t s e q u e n c i n g o r f r e q u e n c y p o s s i b i l i t i e s . ( T h i s w a s t h e r e a s o n we a d d e d " b r e a k l e n g t h " and " f i r s t w o r d ? " to t h e c o n d i t i o n t a b l e c r e a t e d i n S e c t i o n 4. ) T h e f o r m a l d e f i n i t i o n of R E L I A B L E ( C ) , w h e n C i s a s e t of t e s t p r e d i c a t e s , t h e r e f o r e i s : RELIABLE(C) = (VC 1 c C)(VC z c C)( (~t I ¢ E(Ci)) (~t2¢ E(C2)) (OK(tl)~,OK (t2)) D (VC' C C 1 U C z)(c'~t~ D (Vd' e E(C')OK(d') ) ) This m e a n s that if a p r o g r a m is c o r r e c t for some d a t u m , d, ir~ e q u i v a l e n c e c l a s s e s o t h e r t h a n E(~), t h e n i t i s c o r r e c t f o r an y d a t a s a t i s f y i n g a (none m p t y ) s u b s e t of the t e s t p r e d i c a t e s e x e r c i s e d by d. N o t e t h a t t e s t s c o n t a i n i n g only d a t a s a t i s f y i n g l a r g e n u m b e r of t e s t p r e d i c a t e s a r e m o r e p o w e r f u l i n the s e n s e t h a t d a t a s e l e c t e d f r o m only a f e w e q u i v a l e n c e c l a s s e s s u f f i c e s to d e m o n strate program correctness. Such t e s t s a r e u s e ful w h e n t h e p r o b a b i l i t y of f i n d i n g an e r r o r i s d e e m e d low, as in r e g r e s s i o n t e s t i n g [ E l m e n d o r f (1969)]. T e s t s u s i n g data s a t i s f y i n g only a f e w t e s t p r e d i c a t e s c a n f a i l f o r f e w e r r e a s o n s and so a r e u s e f u l in e i t h e r l o c a l i z i n g a f a i l u r e o r in a t t e m p t i n g to i d e n t i f y as m a n y d i f f e r e n t e r r o r s a s p o s s i b l e in a s e t of t e s t r u n s . M i n i m i z i n g t h e c o s t of t e s t i n g r e q u i r e s j u d g e m e n t in d e c i d ing w h i c h e q u i v a l e n c e c l a s s e s to c h o o s e f r o m i n s a t i s f y i n g C O M P L E T E ( T , C). We d i s c u s s e d e a r l i e r why p r o v i n g v a l i d i t y n e e d n o t be d i f f i c u l t . P r o v i n g r e l i a b i l i t y i s h a r d e r . Such a p r o o f s u f f e r s f r o m t h e s a m e p r o b l e m s as d i r e c t p r o o f s of p r o g r a m c o r r e c t n e s s , i n s o f a r a s • i n s u r i n g t h e c o r r e c t n e s s of t h e p r o o f and a x i o m a t i z a t i o n of the r u n - t i m e e n v i r o n m e n t i s c o n c e r n e d . A p r o o f of r e l i a b i l i t y m a y be s o m e w h a t e a s i e r t h a n a d i r e c t p r o o f of p r o g r a m c o r r e c t n e s s , h o w e v e r , s i n c e t h e p r o o f c a n a s s u m e the p r o g r a m w o r k s c o r r e c t l y f o r s o m e d at a. T h e p r i n c i p a l p r o b l e m in the p r o o f i s to show that d a t a s a t i s f y i n g e a c h t e s t p r e d i c a t e a r e t r e a t e d in s u f f i c i e n t l y the s a m e way that successfully exercising each test predic a t e o n c e i s s u f f i c i e n t to i m p l y s u c c e s s f u l e x e cution~ a l l data s a t i s f y i n g the p r e d i c a t e . At a m i n i m u m , this r e q u i r e s understanding all condit i o n s r e l e v a n t to the s u c c e s s f u l p r o c e s s i ~ , g o f a t e s t d a t u m , b u t i t d o e s not r e q u i r e p r o v i n g t h a t the t e s t datum will be s u c c e s s f u l l y e x e c u t e d - that is proved b y t e s u c c e s s of a t e s t run. T h i s sh o w s t h a t p r o v i n g t e s t r e l i a b i l i t y i s t h e i n d u c t i o n s t e p i n an i n d u c t i v e p r o o f of p r o g r a m c o r r e c t n e s s u s i n g t h e f u n d a m e n t a l t h e o r e m ; the t e s t i t s e l f i s , of c o u r s e , the basis st ep . D e f e c t s in a p r o o f of r e l i a b i l i t y , a s i d e f r o m s i m p l e l o g i c a l e r r o r s , a r e m o s t l i k e l y to s t e m f r o m f a i l u r e to find a l l c o n d i t i o n s r e l e v a n t to t h e s u c c e s s f u l p r o c e s s i n g of data. F o r e x a m p l e , t h e s e t of t e s t p r e d i c a t e s d e s c r i b e d by a c o n d i t i o n t a b l e w i l l f a i l to b e r e l i a b l e i f t h e d o m a i n of v a l u e s a s s o c i a t e d with s o m e v a r i a b l e is not c o r r e c t l y p a r t i t i o n e d i n t ~ r e l e v a n t c l a s s e s of v a l u e s (e. g . , t h e t e s t p r e d i c a t e s d e v e l o p e d i n S e c t i o n 4 would be u n r e l i a b l e i f t h e o n l y r e l e v a n t v a l u e s of CW w e r e not B L , N L , E T , a n d ' H e r , i. e . , i f t h e s e w e r e n o t the only v a l u e s r e l e v a n t w h e n c e r t a i n a c t i o n s a r e to h e p e r f o r m e d ) o r i f some v a r i a b l e or condition w e r e completely m i s s i n g f r o m t h e t a b l e (e. g . , th e p r e d i c a t e fi11+bufpos = MAXPOS). Note t h e d i f f e r e n c e h e r e b e t w e e n a r e l i a b i l i t y e r r o r and a v a l i d i t y e r r o r . R e l i a b i l i t y r e q u i r e s p a r t i t i o n i n g the v a l u e s e t f o r CW c o r r e c t l y i n t o s e t s of v a l u e s " t r e a t e d the s a m e " b y t h e p r o g r a m . V a l i d i t y r e q u i r e s the p o s t u l a t e d v a l u e .set f o r CW i n c l u d e a l l p o s s i b l e v a l u e s f o r CW. 5) t h e t e s t p r e d i c a t e s m u s t b e i n d e p e n d e n t , e. g . , a l t data s a t i s f y i n g a p a r t i c u l a r t e s t p r e d i c a t e m u s t e x e r c i s e the s a m e path in the p r o g r a m and t e s t the s a m e b r a n c h p r e d i c a t e s . Note t h a t c o n s t r a i n t s 1 , 2 , and 3 a r e s a t i s f i e d only w i t h k n o w l e d g e of the d e t a i l s of a p r o g r a m * s i m p l e mentation. Satisfying c o n s t r a i n t 5 r e q u i r e s knowing s o m e t h i n g of the p r o g r a m l s i n t e r n a l s t r u c t u r e . V e r i f y i n g c o n s t r a i n t 4 m a y r e q u i r e both i n t e r n a l and e x t e r n a l k n o w l e d g e of a p r o g r a m . U n d o u b t e d l y , m o r e c o n s t r a i n t s than t h e s e f i v e n e e d to b e s a t i s f i e d to i n s u r e r e l i a b i l i t y , but t h i s i s a s u b j e c t f o r f u t u r e work. Clearly, satisfying all these c o n s t r a i n t s can l e a d to l o t s of t e s t p r e d i c a t e s r e q u i r i n g a l a r g e s e t of t e s t data f o r a c o m p l e t e t e s t . If an u n r e a s o n a b l e a m o u n t of t e s t data i s r e q u i r e d , a j u d i c i o u s c o m b i n a t i o n of d i r e c t p r o g r a m p r o v i n g and e m p i r i c a l j u d g e m e n t c a n r e d u c e s i z e of a c o m p l e t e t e s t . A s l o n g as t e s t p r e d i c a t e s a r e e l i m i n a t e d only a f t e r a11 c o n d i t i o n s p o t e n t i a l l y r e l e v a n t to the c o r r e c t o p e r a t i o n of the p r o g r a m h a v e b e e n i d e n t i f i e d , any r e d u c t i o n s in t e s t v a l i d i t y a r e at l e a s t b e i n g m a d e c o n s c i o u s l y r a t h e r than unwittingly. I t ' s i m p o s s i b l e to f o r m u l a t e g e n e r a l n e c e s s a r y conditions for reliability since reliability means t h a t a l i e r r o r s in a p r o g r a m w i l l be d e t e c t e d by a c o m p l e t e t e s t . If a p r o g r a m in f a c t h a s no e r r o r s, a ny s e t of t e s t p r e d i c a t e s i s r e l i a b l e . F o r e x a m p l e , e x e r c i s i n g a l l s t a t e m e n t s is not a n e c e s s a r y c o n d i t i o n f o r r e l i a b i l i t y as l o n g a s u n e x e r c i s e d s t a t e m e n t s a r e n e v e r r e s p o n s i b l e f o r an e r r o r . But s u p p o s e t h a t t e s t p r e d i c a t e s a r e d e f i n e d so data b e l o n g i n g to a p a r t i c u l a r e q u i v a l e n c e c l a s s s o m e t i m e s e x e r c i s e a p a r t i c u l a r s t a t e m e n t and s o m e t i m e s do not. T h e n p r o v i n g the r e l i a b i l i t y of t h e s e t e s t p r e d i c a t e s w i l l be h a r d e r b e c a u s e it m u s t be shown that the s t a t e m e n t s not e x e r c i s e d by s o m e data in t h e e q u i v a l e n c e c l a s s n e v e r c a u s e an e r r o r . O t h e r w i s e , i f an e r r o r i s a s s o c i a t e d with the o c c a s i o n a l l y e x e c u t e d s t a t e m e n t , s o m e data in the c l a s s w i l l e x e c u t e s u c c e s s f u l l y and s o m e w i l l not, and this c o n t r a d i c t s R E L I A BLE(C)o So, in g e n e r a l , we c a n only d e s c r i b e c e r t a i n c h a r a c t e r i s t i c s t h a t if not s a t i s f i e d e i t h e r m a k e a p r o o f of r e l i a b i l i t y h a r d e r o r m a k e i t m o r e l i k e l y the t e s t p r e d i c a t e s a r e not r e l i a b l e f o r t h e p r o g r a m to be t e s t e d . F o r e x a m p l e , it a p p e a r s t h a t a s e t of t e s t p r e d i c a t e s m u s t at l e a s t s a t i s f y t h e f o l l o w i n g c o n d i t i o n s to h a v e a r e a s o n a b l e c h a n c e of b e i n g r e l i a b l e : 6. Summary In t h i s p a p e r , we h a v e exa~_ined s o m e f u n d a m e n t a l q u e s t i o n s and p r o b l e m s c o n c e r n i n g p r o gram testing: 1. What is t h e p r o p e r r o l e of t e s t i n g in p r o g r a m d e v e l o p m e n t ? How d o e s t e s t i n g c o m p a r e to p r o g r a m p r o v i n g as a m e a n s of i m p r o v i n g software reliability? 2. What a r e t h e w e a k n e s s e s of t e s t i n g and to w h a t e x t e n t can t h e y be o v e r c o m e ? 3. What i s a s y s t e m a t i c m e t h o d f o r s e l e c t i n g t e s t data and h o w can the m e t h o d be evaluated a s to i t s r e l i a b i l i t y and v a l i d i t y ? 4. I s t h e r e a t h e o r y of t e s t i n g that can p o i n t the w a y to i m p r o v e d m e t h o d s and p r i n c i p l e s of testing? In a n s w e r i n g t h e s e q u e s t i o n s we h a v e shown the following. 1. Th e R o l e of T e s t i n g - T e s t i n g h a s an e s s e n t i a l r o l e in p r o g r a m d e v e l o p m e n t if f o r no o t h e r r e a s o n than the p e r v a s i v e n e s s of c a s e a n a l y s i s in a n a l y z i n g r e q u i r e m e n t s , d e f i n i n g s p e c i f i c a t i o n s , i m p l e m e n t i n g p r o g r a m s , and p r o v i n g p r o g r a m c o r r e c t n e s s . We h a v e d i s c u s s e d h o w the p r a c t i c £ of a t t e m p t i n g f o r m a l o r i n f o r m a l p r o o f s of p r o g r a m c o r r e c t n e s s is useful for improving r e l i a b i l i t y , but s u f f e r s f r o m the s a m e t y p e s of e r r o r s as p r o g r a m m i n g and t e s t i n g , n a m e l y , f a i l u r e to find and v a l i d a t e a l l s p e c i a l c a s e s (i. e . , c o m b i n a t i o n s of c o n d i t i o n s ) r e l e v a n t to a d e s i g n , i t s s p e c i f i c a t i o n s , t h e p r o g r a m , and i t s p r o o f . N e i t h e r testing: n o r p r o g r a m p r o v i n g can in p r a c t i c e p r o v i d e c o m p l e t e a s s u r a n c e of p r o g r a m c o r r e c t n e s s , but the t e c h n i q u e s c o m p l e m e n t e a c h o t h e r and t o g e t h e r p r o v i d e g r e a t e r a s s u r a n c e of c o r r e c t n e s s than e i t h e r alone. 1 ) e v e r y i n d i v i d u a l b r a n c h i n g c o n d i t i o n in the p r o g r a m m u s t be" r e p r e s e n t e d by an e q u i v a l e n t "~ c o n d i t i o n in the t a b l e ; 2) e v e r y p o t e n t i a l t e r m i n a t i o n c o n d i t i o n in th e p r o g r a m [Sites(1974)] e . g . , o v e r f l o w , m u s t be r e p r e s e n t e d by a c o n d i t i o n in the t a b l e ; 3) e v e r y v a r i a b l e m e n t i o n e d in a c o n d i t i o n m u s t h a v e b e e n p a r t i t i o n e d c o r r e c t l y into c l a s s e s t h a t a r e " t r e a t e d the s a m e " by t he program; 4 ) e v e r y c o n d i t i o n r e l e v a n t to the c o r r e c t o p e r a t i o n of the p r o g r a m t h a t i s i m p l i e d by the s p e c i f i c a t i o n , k n o w l e d g e of th e p r o g r a m ' s data s t r u c t u r e s , o r k n o w l e d g e of t h e g e n e r a l m e t h o d b e i n g i m p l e m e n t e d by t h e p r o g r a m m u s t be r e p r e s e n t e d as a c o n d i t i o n in the table; 2. T e s t W e a k n e s s e s - Th e w e a k n e s s of t e s t i n g l i e s in c o n c l u d i n g that f r o m t h e s u c c e s s f u l e x e c u t i o n of s e l e c t e d t e s t d at a, a p r o g r a m is c o r r e c t f o r a11 data. C r i t e r i a f o r t e s t data s e l e c t i o n b a s e d solely, on i n t e r n a l p r o g r a m s t r u c t u r e a r e c l e a r l y too w e a k to i n s u r e c o n f i d e n c e in the r e s u l t s of a successful test. since such testing methods a r e ~T h i s m e a n s v e r i f y i n g that the c o n d i t i o n s r e p r e s e n t e d in the c o n d i t i o n t a b l e a r e b e i n g a c c u r a t e l y t e s t e d i n s i d e the p r o g r a m ; t h i s c h e c k i s n e e d e d to i n s u r e that a t a b l e p r e d i c a t e , e . g . , "X, Y, and Z a r e aU e q u a l " i s not a c t u a l l y e v a l u a t e d in the p r o g r a m as ( X + Y + Z ) / 3 = X . 509 Hetzel, W.C. P r o g r a m T e s t Methods. l~rentice Hall, E n g l e w o o d Cliffs, N . J . , 1973. Hansen," P~ B. Testing m u l t i p r o g r a r n m i n g s y s t e m s . Software P r a c t i c e a n d Experience 3 , 2 ( A p r i l - J u n e too weak to n e c e s s a r i l y r e v e a l a l l d e s i g n e r r o r s o r m a n y t y p e s of c o n s t r u c t i o n e r r o r s . 11eliabilitty i s the k e y to m e a n i n g f u l t e s t i n g and l a c k of r e l i a b i l i t y is the r e a s o n for the c u r r e n t w e a k n e s s of t e s t i n g as a m e t h o d for i n s u r i n g s o f t w a r e c o r r e c t n e s s . F u r t h e r w o r k is n e e d e d to show when a t e s t is reliable. f973), 145-150. K i n g , 1n, J . H . The i n t e r p r e t a t i o n of l i m i t e d e n t r y decision t a b l e f o r m a t and r e l a t i o n s h i p s a m o n g c o n d i t i o n s . C o m i m t e r J o u r n a l 12, 4 (November 3. T e s t Methodology - We b e l i e v e m o s t s o f t w a r e e r r o r s r e s u l t f r o m f a i l i n g to see or deal c o r r e c t ly with a l l c o n d i t i o n s and c o m b i n a t i o n s of c o n d i t i o n s r e l e v a n t to the d e s i r e d o p e r a t i o n of a p r o gram. An effective methodology for reliable p r o g r a m d e v e l o p m e n t m u s t focus on u n c o v e r i n g t h e s e c o n d i t i o n s and t h e i r c o m b i n a t i o n s . This i s the m o t i v a t i o n for o u r u s e of c o n d i t i o n t a b l e s - it m a k e s it e a s i e r to find a n d a n a l y z e c o n d i t i o n c o m b i n a t i o n s . E x p e r i m e n t a l u s e of c o n d i t i o n t a b l e s and f u r t h e r study of the m e t h o d is n e e d e d , howeverp b e f o r e i t s full b e n e f i t s c a n be r e a l i z e d i n p r a c t i c e , In a n y event, a s y s t e m a t i c a p p r o a c h to t e s t data s e l e c t i o n i s c l e a r l y n e c e s s a r y to o b t a i n effective t e s t data with r e a s o n a b l e effort. M o s t s y s t e m a t i c m e t h o d s p r o p o s e d to date have b e e n b a s e d s o l e l y on knowledge of a p r o g r a m t s i n t e r n a l s t r u c t u r e , and y e t such k n o w l e d g e i s i n s u f f i c i e n t by i t s e l f to y i e l d a d e q u a t e l y r e l i a b l e tests. 1969), 3Z0-326. London• R . L . Software r e l i a b i l i t y t h r o u g h p r o v i n g p r o g r a m s c o r r e c t . I E E E Int. Syrnp. on F a u l t To!er:a n t Computing• h4arch 1971. N a u r , P. P r o g r a m m i n g by a c t i o n c l u s t e r s . BIT 9~ 3 (1969a), 250,258. N a u r , P . , and 11ande11, B. ( e d s . ) Software E n g i n e e r r O S c i e n t i f i c A f f a i r s D i v i s i o n • N~ATO• B r u s s e l s 39, gJ.u.m, J a n u a r y 1969b. P o o c h , U. W, T r a n s l a t i o n of d e c i s i o n t a b l e s . Com......: p u t i n g S u r v e y s 6, 2 (June 1974), 125°151. P o o l e , I:~. C. Delmgging and t e s t i n g ° I n B a u e r , F . L. A d v a n c e d C o u r s e on Software E n g i n e e r i n g • S p r i n g e r V e r l a g , N e w York~ 1973, Z78-318. 11ubey, 11. J. et. al. C o m p a r a t i v e E v a l u a t i o n of PL/I___~• AD 6 6 9 0 ~ , A p r i l 1968. 4. T h e o r y of T e s t i n g - We have m a d e a s t a r t t o w a r d a t h e o r y of t e s t i n g by d e f i n i n g s o m e b a s i c p r o p e r t i e s of t h o r o u g h t e s t s - - r e l i a b i l i t y , v a l i d i t y j and c o m p l e t e n e s s . We have a l s o d e v e l o p e d a s p e c i f i c v e r s i o n of the t h e o r y i n which the t e s t data s e l e c t i o n c r i t e r i o n c o n s i s t s of s e t s of t e s t p r e d i c a t e s o r g a n i z e d i n a c o n d i t i o n t a b l e . O t h e r t h e o r e t i c a l d e v e l o p m e n t s could p r o c e e d b y d e f i n i n g o t h e r types of t e s t data s e l e c t i o n c r i t e r i a . Sites, 11. L. P r o v i n g that c o m p u t e r p r o g r a m s t e r m i n a t e c l e a n l y . S t a n f o r d Univ. • C o m p u t e r Sci. Dept, • S T A N - C S - 7 4 - 4 1 8 , May 1974. Stucki, L . G . and Svegel, N . P . Software A u t o m a t e d V e r i f i c a t i o n S y s t e m Study, M c D o n n e l l D o u g l a s A s t r o n a u t i c s C o . , AD-7~4086, J a n u a r y 1974. The c o n c e p t of p r o v i n g t e s t r e l i a b i l i t y , v a l i d ity, and c o m p l e t e n e s s and t h e n t e s t i n g a p r o g r a m ( r a t h e r t h a n a t t e m p t i n g to p r o v e the p r o g r a m c o r r e c t d i r e c t l y o r u s i n g a n o n - r i g o r o u s a p p r o a c h to t e s t i n g ) i s i m p o r t a n t for s o f t w a r e r e l i a b i l i t y . P r e s u m a b l y s e v e r y p i e c e of s o f t w a r e u n d e r g o e s s o m e type of t e s t i n g , y e t the f u n d a m e n t a l t e c h n i q u e s of t e s t data s e l e c t i o n have not p r e v i o u s l y b e e n as c a r e f u l l y and c r i t i c a l l y a n a l y z e d as have s o m e l e s s w i d e l y u s e d t e c h n i q u e s such as p r o o f of c o r r e c t n e s s . We know l e s s about the t h e o r y of t e s t i n g , which we do oftenp t h a n about the t h e o r y of p r o g r a m p r o v i n g , which we do s e l d o m . This p a p e r is a step t o w a r d r e d r e s s i n g this i m b a l a n c e . P~efer e n c e s Boehm, B. Software and its i m p a c t : a q u a n t i t a t i v e a s s e s s m e n t . D a t a m a t i o n 19(May 1973), 48-59. Boehm, B . W . etoal. Some e x p e r i e n c e with a u t o m a t e d aids to the d e s i g n of l a r g e - s c a l e r e l i a b l e s o f t w a r e . In this p r o c e e d i n g s , A p r i l 1975, Buxton, J . N . and R a n d e l l , B. Software E n g i n e e r ing TechniqLL~s, S c i e n t i f i c A f f a i r s D i v i s i o n , NATO, B r u s s e l s 39, [~elgium, A p r i l 1970. Dzthl, O. -J. , D i j k s t r a , E. W . , and H o a r e , C. A. R. S t r u c t u r e d P r o g r a n l m i n g . A c a d e m i c P r e s s , New York, 1972. Elmendr~rf0 ~V. R. C o n t r o l l i n g the f u n c t i o n a l t e s t ing of an o p e r a t i n g s y s t e m . I E E E T r a n s . Sys. Sci. and C y b . ~ S S C - 5 , 4 (October 1969); 284-289. 510