Use of Orthogonal Defect Classification and Acceptance Sampling Models in the Design of Experiments Abstract: There are several techniques for testing Application Software for Quality Assurance. The most reliable technique is to perform a complete exhaustive testing of all the features of a particular software. However, this is usually very expensive - both in terms of time and money. In other instances, the testing is destructive, and therefore is counterproductive to perform 100% testing. Full exhaustive testing is usually mandatory for the initial release of any software. However, for subsequent releases, regression testing is the way to go. Under regression testing, the complete suite of test cases need not be executed. Only a subset of the test cases is required. The technique of selecting such a subset is the focus of this paper. One possible approach is to identify the most critical features of software, and place more emphasis on those, while putting less emphasis on the not-so-critical features. Another approach is to develop a statistical procedure for identifying which test cases to perform, without necessarily identifying which ones are critical, and which are not. Such random selection leads to a more representative sample of the overall features of particular software, and defects identified are presumed to be proportionate with the proportion of defects that would have been identified based on complete exhaustive testing. A third approach is to use the technique of Orthogonal Defect Classification (ODC). This is a mathematical technique that is used to develop a model that breaks down all the features of the AUT into categories, then with the aid of standard orthogonal arrays, selects which test cases to run. This paper focuses on the use of some of these statistical techniques in the design and execution of the test Test Plans. 1 Introduction Traditional Quality Assurance techniques involve the use of Acceptance Sampling. Acceptance Sampling is a technique whereby a sample is randomly selected from a population of Test Cases. The results of testing the sample can then be used to infer as to the quality of the Application under Test. The analyst could adopt either the Simple Sampling Technique, or the Double Sampling technique. The philosophy of the two approaches is essentially the same. The difference is that in the case of double sampling, the testing is performed twice. Whenever new software is developed, the proper approach is to execute all possible test scenarios. This is because there is no baseline to compare the results with; therefore complete exhaustive testing is to be expected. However, in subsequent releases of the Application, it is no longer necessary to perform complete exhaustive testing. In this case, Regression Testing can be done using one or the other of the available techniques. Literature Survey Construction of orthogonal arrays dates back to the early 1960s. Addelman (1960) was one of the earliest proponents of orthogonal arrays. He developed the procedure for the determination of the Main Effects for a Factorial Experiment. Box et. al. (1978) published their work on the Design and Development of Mathematical Models for Experiment. Kempthorne (1979) published his book on the Design and Analysis of Experiments. Other researchers in this field include: John, P. W. M., (1971), Phadke, M. S. (1982, 1986, 1983, & 1987). This article is based mostly on the work of Phadke (1989), in the book Quality Engineering using Robust Design. Taguchi (1987) invented the Linear Graph technique for estimating interactions in an orthogonal array experiment. Acceptance Sampling on the other hand, has been used extensively in the field of Operations Management. Bakir (1960) applied it to an appraisal of work performance using Control Charts. Lin and Sheen (2005) applied it to the development of Capability Index based on Quality Control charts. Mathes (2007) applied this technique to Statistical Process Control for Hospitals. This paper is an attempt to integrate the inherent concepts of Acceptance Sampling into the development and execution of test plans in an Interactive Voice Response environment. Model I: Design of the Experiment using Orthogonal Defect Classification The process of designing an orthogonal array experiment is based mostly on the theory of linear graphs developed by Taguchi. Before constructing an orthogonal array, the following requirements must be met: 1. 2. 3. 4. The number of factors to be studied The number of levels of each factor An estimate of any possible interactions Technical limitations of the study. 2 Consider the Application-Under-Test (AUT) to have a call flow that can be broken down into five basic categories (factors) 1. Finding a Store 2. Get Zip Code 3. Obtain information 4. Request Promotional Code 5. Request Order Status In order to simplify the selection of an appropriate orthogonal array matrix, it is necessary that all the factors have the same number of levels. This will be accomplished by segmenting the factors and adding dummy levels, where necessary. The result is the following: 1. Finding a Store a) Play Store Information b) Find More Stores c) Give me an agent 2. Get Zip Code a) Obtain Zip Code b) Obtain Telephone number of Main Store c) Give me an agent 3. Obtain information a) Get name of Supervisor b) Speak to a live Agent 4. Request Promotional Code a) Request Promotional Code b) Retrieve Promotional Code c) Give me an agent 5. Request Order Status a) Request status of an order b) Request account number c) Give me an agent 6. Request a live Agent a) I want an agent b) I don’t want to speak to a machine Here we have 6 factors, Four with 3 levels, and 2 with 2 levels. Orthogonal Array Selection It should be recalled that Orthogonal Array Technique is only good for Regression Testing. Therefore it does not necessarily cover every the leg of the Call Flow. However, it covers all the functionalities of the Call Flow. In order to select the appropriate orthogonal array, the following steps are required: 3 a) Determination of the degrees of freedom associated with the experiment. b) Selection of an appropriate orthogonal array for the number of levels of each factor/category. c) Determination of the existence of possible interactions between factors d) Make contingency plans for any deviations from ideal conditions. a) Determination of the Degrees of Freedom. Consider a situation with six factors: A, B, C, D, E, and F. Assume factor A has 2 levels, and factors (B, C, D, E, and F ) have 3 levels each. For any Orthogonal Array design, the overall mean is always assigned on degree of freedom. Usually, the number of degrees of freedom associated with any factor is equal to one less than the number of levels of the factor. The degrees of freedom associated with interaction between any two factors are given by the product of the degrees of freedom for each of the two factors. So if Factor A has (nA-1) degrees of freedom, and Factor B has (nB-1) degrees of freedom, then the degrees of freedom for interaction A X B = (nA-1)* (nB-1) = (nA-1)* (nB-1). The degrees of freedom for the experiment would be as shown in the table below: Factor/Interaction Degrees of Freedom 1 Overall mean 2–1=1 A B, C, D, E, and F 5* (3 – 1) = 10 (2 – 1) x (3 – 1) = 2 AXB 14 Total This means that at least 14 experiments need to be performed to cover the effects of all six factors and their interactions. b) Selecting a Standard Orthogonal Array There are several array-naming conventions. One convention is derived from the number of rows and columns in the array, as well as the number of levels of each column of the array. Thus the array L18(2137) indicates that the array has 18 rows; one 2-level column, and seven 3level columns. So the total number of columns in the array is 8, and the total number of test cases executed is 18. It should be noted that the number of test cases must be at least equal to the degrees of freedom required for the experiment. The number of columns on the other hand, indicates the maximum number of factors that can be tested using that array. In addition, the number of levels of the factors must match the numbers of levels of the columns in the array. The idea is to utilize the minimum array that satisfies all the requirements of the experiment. If, for instance, it is required to select an appropriate orthogonal array for an experiment that has one 2-level factor and six 3-level factors, then the most appropriate array to select would be L18. L16 would not have been appropriate because it does not have any 3-level columns. In situations where the orthogonal array matrix has more columns than the number of factors, the extra 4 columns can be filed b creating an extra (dummy) level; and the matrix will still be proportionally balanced. Table of Standard Orthogonal Arrays c) Determination of possible interactions. In most situations, interactions the existence of interactions is miniscule. For this experiment, there were no perceived interactions. The existence of interactions is dependent on the nature of the factors, and can by designed using linear graphing techniques invented by Taguchi (1987). Result This Application Under Test (AUT) is a software application that is used for handling customer requests in an Interactive Voice Response (IVR) environment. Each of the categories is considered a factor of the experiment. There are four 3-level factors, and two 2-level factors. If the testing were to be done using 100% testing, we would end up with or 324 (3422) test cases . Therefore we need to construct an orthogonal array with at least four 3-level factors and at least two 2-level factors. From the table of Standard Orthogonal Arrays, the most appropriate 5 orthogonal array to select is L36(2137). It is the minimum orthogonal array that satisfies the requirement of the number of columns (6) for this experiment. The conclusion here is that we have reduced the size of our test plan from 324 to 36 without losing the integrity of the test plan. The corresponding percent defects detected under the two methods, yielded a percent defective of 0.025, This shows that there is no loss in rigor by adopting the orthogonal array approach. The advantage therefore is that we are able to arrive at the same conclusion with much less resources. L36(211312) Orthogonal Array 6 Columns Expt. # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 4 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 2 2 2 2 3 3 3 3 5 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 1 1 1 1 6 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 1 1 1 1 2 2 2 2 7 1 1 2 2 2 1 1 1 2 2 2 1 1 2 3 1 2 3 3 1 2 2 3 8 1 1 2 2 2 1 1 1 2 2 2 2 2 3 1 2 3 1 1 2 3 3 1 23 11 11 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 9 9 9 9 9 9 8 8 8 8 8 8 0 12 0 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 9 9 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 0 0 12 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 9 1 1 2 2 2 1 1 1 2 2 2 3 3 1 2 3 1 2 2 3 1 1 2 10 1 2 1 2 2 1 2 2 1 1 2 1 1 3 2 1 3 2 3 2 1 3 2 11 1 2 1 2 2 1 2 2 1 1 2 2 2 1 3 2 1 3 1 3 2 1 3 12 1 2 1 2 2 1 2 2 1 1 2 3 3 2 1 3 2 1 2 1 3 2 1 13 1 2 2 1 2 2 1 2 1 2 1 1 2 3 1 3 2 1 3 3 2 1 2 14 1 2 2 1 2 2 1 2 1 2 1 2 3 1 2 1 3 2 1 1 3 2 3 15 1 2 2 1 2 2 1 2 1 2 1 3 1 2 3 2 1 3 2 2 1 3 1 16 1 2 2 2 1 2 2 1 2 1 1 1 2 3 2 1 1 3 2 3 3 2 1 17 1 2 2 2 1 2 2 1 2 1 1 2 3 1 3 2 2 1 3 1 1 3 2 18 1 2 2 2 1 2 2 1 2 1 1 3 1 2 1 3 3 2 1 2 2 1 3 19 2 1 1 2 1 1 2 2 1 2 1 1 2 1 3 3 3 1 2 2 1 2 3 20 2 1 1 2 1 1 2 2 1 2 1 2 3 2 1 1 1 2 3 3 2 3 1 21 2 1 1 2 1 1 2 2 1 2 1 3 1 3 2 2 2 3 1 1 3 1 2 22 2 1 1 1 2 2 2 1 1 1 2 1 2 2 3 3 1 2 1 1 3 3 2 23 2 1 1 1 2 2 2 1 1 1 2 2 3 3 1 1 2 3 2 2 1 1 3 24 2 1 1 1 2 2 2 1 1 1 2 3 1 1 2 2 3 1 3 3 2 2 1 25 2 1 1 2 2 2 1 2 2 1 1 1 3 2 1 2 3 3 1 3 1 2 2 26 2 1 1 2 2 2 1 2 2 1 1 2 1 3 2 3 1 1 2 1 2 3 3 27 2 1 1 2 2 2 1 2 2 1 1 3 2 1 3 1 2 2 3 2 3 1 1 28 2 2 2 1 1 1 1 2 2 1 2 1 3 2 2 2 1 1 3 2 3 1 3 29 2 2 2 1 1 1 1 2 2 1 2 2 1 3 3 3 2 2 1 3 1 2 1 30 2 2 2 1 1 1 1 2 2 1 2 3 2 1 1 1 3 3 2 1 2 3 2 31 2 2 2 2 1 2 1 1 1 2 2 1 3 3 3 2 3 2 2 1 2 1 1 32 2 2 2 2 1 2 1 1 1 2 2 2 1 1 1 3 1 3 3 2 3 2 2 33 2 2 2 2 1 2 1 1 1 2 2 3 2 2 2 1 2 1 1 3 1 3 3 34 2 2 2 1 2 1 2 1 2 2 1 1 3 1 2 3 2 3 1 2 2 3 1 35 2 2 2 1 2 1 2 1 2 2 1 2 1 2 3 1 3 1 2 3 3 1 2 2 2 2 1 2 1 2 1 2 2 1 3 2 3 1 2 1 2 3 1 1 2 3 36 1 2 3 18 18 18 18 18 18 18 18 18 18 18 0 12 12 12 12 12 12 12 12 12 12 12 12 18 18 18 18 18 18 18 18 18 18 18 0 12 12 12 12 12 12 12 12 12 12 12 12 0 0 0 0 0 0 0 0 0 0 0 0 12 12 12 12 12 12 12 12 12 12 12 12 A sample of the result of replacing the code number for each level of four of the 23 factors with the actual caller utterance is shown in the matrix below. Subset of L36(211312) Orthogonal Array after substitution 7 Columns Expt. # 1 2 12 1 Play Store Info Get Zip Code Play Store info 2 Play Store Info Get Zip Code Find More Stores 3 Play Store Info Get Zip Code Get an Agent 4 Play Store Info Get Zip Code Play Store info 5 Play Store Info Get Zip Code Find More Stores 6 Play Store Info Get Zip Code Get an Agent 7 Play Store Info Get Zip Code Play Store info 8 Play Store Info Get Zip Code Find More Stores 9 Play Store Info Get Zip Code Get an Agent 10 Play Store Info Get Tel # Play Store info 11 Play Store Info Get Tel # Find More Stores 12 Play Store Info Get Tel # Get an Agent 13 Play Store Info Get Tel # Play Store info 14 Play Store Info Get Tel # Find More Stores 15 Play Store Info Get Tel # Get an Agent 16 Play Store Info Get Tel # Play Store info 17 Play Store Info Get Tel # Find More Stores Play Store Info Get Tel # 18 Get an Agent 19 Find More Stores Get Zip Code Play Store info 20 Find More Stores Get Zip Code Find More Stores 21 Find More Stores Get Zip Code Get an Agent 22 Find More Stores Get Zip Code Play Store info 23 Find More Stores Get Zip Code Find More Stores 24 Find More Stores Get Zip Code Get an Agent 25 Find More Stores Get Zip Code Play Store info 26 Find More Stores Get Zip Code Find More Stores 27 Find More Stores Get Zip Code 28 Find More Stores Get Tel # Play Store info 29 Find More Stores Get Tel # Find More Stores 30 Find More Stores Get Tel # Get an Agent 31 Find More Stores Get Tel # Play Store info 32 Find More Stores Get Tel # Find More Stores 33 Find More Stores Get Tel # Get an Agent 34 Find More Stores Get Tel # Play Store info 35 Find More Stores Get Tel # Find More Stores 36 Find More Stores Get Tel # Get an Agent Get an Agent Model II: Design of Experiment using Acceptance Sampling The purpose of acceptance sampling is to specify criteria for accepting or rejecting a lot based on certain operational characteristics. In a Quality Assurance environment, the results of any 8 testing can be described as dichotomous. The Test Case either passes or fails. These are the two attributes to be considered when testing software. Attribute inspection is used when measurements take on only two values; e.g. {defective, non-defective} or {on, off}; {pass, fail}. Therefore in developing a sampling plan, we only need to specify two things: i. ii. sample size, n acceptance number, c For a given n and c: If the number of defectives, x is less than c, accept the lot. If the number of defectives, x is more than c, reject the lot. If on the other hand, the analyst is interested in ensuring higher quality, they can design a Double Sampling Plan. In that case, the considerations will be: First take a sample of size n1. If the number of defectives x1, is less than c1, accept the lot; If the number of defectives x1, is greater than c2, reject the lot; However, if c1 < x1 < c2; then: Take a second sample of size n2. If now (x1 + x2) > c2, reject the lot; otherwise, if (x1 + x2) < c2, accept the lot. The Single Acceptance Sampling Model: We will restrict our testing to single Acceptance Sampling since Double Sampling is basically a replication of Single Sampling. Let N be the total number of Test Cases in the Test Plan of a version of software that is already in the field. Let n be the total number of Test Cases to be selected for testing. Let p be the proportion of Test Cases that result in a defect being identified. (This is presently an unknown quantity). Let p be the proportion of Test Cases that are identified from testing a sample of size n. The idea is to obtain p as an estimate of the proportion of the overall test cases that will result in identifying a defect in the Application. In order to implement this test plan, two parameters need to be established - The maximum number of defects that the client can live with – the Acceptance Number (c); and the maximum risk that the client is willing to take () The fairest way to accomplish this is for both the software developer and the client, to generate an Operating Characteristic Curve, and use the result as a basis for determining the most equitable values of customer acceptance and developer risk. Development of the Operating Characteristic (O-C) curves O-C curves are generated for the purpose of reducing the amount of computation that goes into 9 determining whether or not to accept a particular sample. It is obtained from calculating the probability of acceptance for various values of percent defective. Mathematically, we have: n p q k p(k) c = = = = = = sample size percent defective (1 - p); percent non-defective number of defective items probability of k defectives acceptance number (the limit of number of defective items considered acceptable by the client) P(x); c P(A) x 0 n where P(x) px (1 p) n x x then Suppose we have a lot of size N, and take a sample of size n. What is the probability of accepting the lot given that the percentage of defective items in the lot is p, and the acceptance number C, is 0? This scenario follows a Binomial Distribution. The probability function for a Binomial Distribution is given by: n! n x n-x P(x) = p x (1 - p )n - x = p (1 - p) x! (n - x)! x Then Probability of acceptance, P(A) is given by: n P(A) P(x) p x (1 p)n x x These probabilities can be obtained from the Binomial Tables. A sample result of the calculations is shown below. 10 Construction of the Operating Characteristic (OC) Curve In order to construct the OC-curve 4 parameters have to be calculated: 1. AQL (Acceptance Quality Level): This is the minimum level of quality that is assignable to a lot that is considered GOOD. 2. LTPD (Lot Tolerance Percent Defective) This is the minimum level of quality that is assignable to a lot that is considered BAD. 3. Producer’s Risk ( is the probability that a producer’s GOOD lot is rejected. In other words, the probability of committing a Type I error. 4. Consumer’s Risk ( is the probability that a BAD lot is accepted by the consumer. In other words, the probability of committing a Type II error. O-C CURVES FOR ATTRIBUTES (Sample size, n = 10) 1.20 1.00 Probability of Acceptance - P(A) c=5 c=4 0.80 c=3 0.60 c=2 c=1 0.40 c=0 0.20 0.00 percent defective - p 11 Figure 2: The Larson Nomogram for OC Curves 12 Producer's and Consumer's Risk Values Producer's risk Consumer's risk n = 10, c = 0 .18 .10 n = 10, c = 1 .04 .37 n = 10, c = 2 .01 .67 Result. Applying Acceptance Sampling technique to the same Application Under Test (AUT) in an Interactive Voice Response (IVR) environment. We proceed by identifying the 6 basic categories to be tested: Finding a store Get Zip Code Obtain Payment Information Ask about Promotions Confirming an order Asking for the status of an order A caller’s utterance may or may not be correctly categorized; in which case, the application will request that the caller repeat their utterance. If, on the other hand, the caller intent is properly identified, the application will proceed to provide the required information. Thus each of these utterances can be broken down into 3 categories: IVR correctly recognizes the caller intent and proceeds to provide the requested information IVR barely recognizes caller intent and requests the caller to repeat their intent IVR does not recognize caller intent and transfers caller to a live agent. Using the Call Flow diagram, a Test Plan is developed to handle all possible combinations of the caller intent. The total number of test cases was 324. Assuming that the client would like to have 99.99% reliability, this implies that the customer is not ready to assume any risk more than 0.01. In other words, the consumer’s risk, = 0.01. The producer (software developer) on the other hand should be able to assume a greater risk than the consumer; so the producer decides to assume a risk of 0.05. Locating these two values on an Operating Characteristic Curve, we come up with a sampling plan of n = 50 and c = 1. 13 1.00 = 0.05 0.95 Producer's risk for AQL P(A) 0.10 0.00 Percent Defective = .10 Consumer's risk for LTPD Good Lot Indifference Zone Bad Lot AQL LTPD Using a random number generator, a random sample of 50 test cases was extracted. Out of the 50 test cases, two test cases failed; but one was due to network issues; so that test case was discarded. Essentially, 1 test case failed out of 49 test cases. This means that the proportion defective based on a sample size of 49 is 1/49 (0.02). This corresponds very strongly with the results previously obtained using orthogonal Defect Classification. Conclusions and Recommendation. Orthogonal Defect Classification belongs in the field of Robust Design developed in the 1950s and early 1960s by Taguchi. It has been developed to improve productivity during research and development so as to be able to develop high quality products at the cheapest cost. It applies the ideas from statistical experimental design to reduce the variation of a product’s functionality. It also ensures the conclusions arrived at in the laboratory are optimal. The three main categories to consider in the design of an orthogonal array are: Operating cost, manufacturing cost, and cost of Research and Development. Orthogonal Array experiments are utilized mainly in bringing the cost of Research and Development to a minimum. The two major tools of Robust Design are: Signal-to-Noise Ratio, and orthogonality. This is the same principle behind Acceptance Sampling – using sampling statistics to make inferences about population parameters. Because of the ease of setting up the Test Plan based on Acceptance Sampling techniques, it is recommended that Acceptance Sampling be used in preference to the orthogonal defect classification technique. 14 References Addelman, S. “Orthogonal Main Effect Plans for Asymmetrical Factorial Experiments” Technometrics (1962) vol. 3: pp. 21- 46. Bakir, S. T. “A Quality Control Chart for Work Performance Appraisal.” Quality Engineering 17, no. 3 (2005): 429 Besterfield, Dale H. Quality Control, 18th ed. Upper Saddle River, JN: Prentice Hall, 2009 Box, G. E. P., Hunter, W. G., and Hunter, J. S. Statistics for Experimenters – An Introduction to Design, Data Analysis and Model Building. New York: John Wiley and Sons, 1978. Cochran, W. G. and Cox, G. M. Experimental Design. New York: John Wiley and Sons, 1957 Elg, M., J. Olsson, and J. J. Dahlgaard. “Implementing Statistical Process Control.: The International Journal of Quality and Reliability Management 25, no. 6 (2008): 545. Goetsch, Dagid L., and Stanley B. Davis. Quality Management. 5th ed. Upper Saddle Rover, NJ: Prentice Hall, 2006 Lin, H., and G. Sheen. “Practical Implementation of the Capability Index Cpk Based on Control Chart Data.” Quality Engineering 17, no. 3 (2003) 371. Matthes, N., et al. “Statistical Process Control for Hospitals.” Quality Management in Health Care 16, no. 3. (July – September 2007): 205 Montgomery, D. C. Introduction to Statistical Quality Control, 6th ed. New York: Wiley, 2008 Summers, Donna. Quality Management, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2009 Taguchi, G. Orthogonal Arrays and Linear Graphs. Dearborn, MI: ASI Press, 1987 Taguchi, G. and Phadke, M. S. “Quality Engineering through Design Optimization.” Conference Record, GLOBECOM 84 Meeting, IEEE Communications Society. Atlanta, GA (November 1984) pp. 1106-1113, 15