Use of Orthogonal Defect Classification and Acceptance Sampling

advertisement
Use of Orthogonal Defect Classification and Acceptance Sampling Models in the Design of
Experiments
Abstract:
There are several techniques for testing Application Software for Quality Assurance. The most
reliable technique is to perform a complete exhaustive testing of all the features of a particular
software. However, this is usually very expensive - both in terms of time and money. In other
instances, the testing is destructive, and therefore is counterproductive to perform 100% testing.
Full exhaustive testing is usually mandatory for the initial release of any software. However, for
subsequent releases, regression testing is the way to go. Under regression testing, the complete
suite of test cases need not be executed. Only a subset of the test cases is required. The
technique of selecting such a subset is the focus of this paper. One possible approach is to
identify the most critical features of software, and place more emphasis on those, while putting
less emphasis on the not-so-critical features. Another approach is to develop a statistical
procedure for identifying which test cases to perform, without necessarily identifying which
ones are critical, and which are not. Such random selection leads to a more representative
sample of the overall features of particular software, and defects identified are presumed to be
proportionate with the proportion of defects that would have been identified based on complete
exhaustive testing. A third approach is to use the technique of Orthogonal Defect Classification
(ODC). This is a mathematical technique that is used to develop a model that breaks down all
the features of the AUT into categories, then with the aid of standard orthogonal arrays, selects
which test cases to run. This paper focuses on the use of some of these statistical techniques in
the design and execution of the test Test Plans.
1
Introduction
Traditional Quality Assurance techniques involve the use of Acceptance Sampling. Acceptance
Sampling is a technique whereby a sample is randomly selected from a population of Test
Cases. The results of testing the sample can then be used to infer as to the quality of the
Application under Test. The analyst could adopt either the Simple Sampling Technique, or the
Double Sampling technique. The philosophy of the two approaches is essentially the same. The
difference is that in the case of double sampling, the testing is performed twice. Whenever new
software is developed, the proper approach is to execute all possible test scenarios. This is
because there is no baseline to compare the results with; therefore complete exhaustive testing is
to be expected. However, in subsequent releases of the Application, it is no longer necessary to
perform complete exhaustive testing. In this case, Regression Testing can be done using one or
the other of the available techniques.
Literature Survey
Construction of orthogonal arrays dates back to the early 1960s. Addelman (1960) was one of
the earliest proponents of orthogonal arrays. He developed the procedure for the determination
of the Main Effects for a Factorial Experiment. Box et. al. (1978) published their work on the
Design and Development of Mathematical Models for Experiment. Kempthorne (1979)
published his book on the Design and Analysis of Experiments. Other researchers in this field
include: John, P. W. M., (1971), Phadke, M. S. (1982, 1986, 1983, & 1987). This article is
based mostly on the work of Phadke (1989), in the book Quality Engineering using Robust
Design. Taguchi (1987) invented the Linear Graph technique for estimating interactions in an
orthogonal array experiment.
Acceptance Sampling on the other hand, has been used extensively in the field of Operations
Management. Bakir (1960) applied it to an appraisal of work performance using Control Charts.
Lin and Sheen (2005) applied it to the development of Capability Index based on Quality
Control charts. Mathes (2007) applied this technique to Statistical Process Control for Hospitals.
This paper is an attempt to integrate the inherent concepts of Acceptance Sampling into the
development and execution of test plans in an Interactive Voice Response environment.
Model I: Design of the Experiment using Orthogonal Defect Classification
The process of designing an orthogonal array experiment is based mostly on the theory of linear
graphs developed by Taguchi. Before constructing an orthogonal array, the following
requirements must be met:
1.
2.
3.
4.
The number of factors to be studied
The number of levels of each factor
An estimate of any possible interactions
Technical limitations of the study.
2
Consider the Application-Under-Test (AUT) to have a call flow that can be broken down
into five basic categories (factors)
1. Finding a Store
2. Get Zip Code
3. Obtain information
4. Request Promotional Code
5. Request Order Status
In order to simplify the selection of an appropriate orthogonal array matrix, it is necessary that
all the factors have the same number of levels. This will be accomplished by segmenting the
factors and adding dummy levels, where necessary. The result is the following:
1. Finding a Store
a) Play Store Information
b) Find More Stores
c) Give me an agent
2. Get Zip Code
a) Obtain Zip Code
b) Obtain Telephone number of Main Store
c) Give me an agent
3. Obtain information
a) Get name of Supervisor
b) Speak to a live Agent
4. Request Promotional Code
a) Request Promotional Code
b) Retrieve Promotional Code
c) Give me an agent
5. Request Order Status
a) Request status of an order
b) Request account number
c) Give me an agent
6. Request a live Agent
a) I want an agent
b) I don’t want to speak to a machine
Here we have 6 factors, Four with 3 levels, and 2 with 2 levels.
Orthogonal Array Selection
It should be recalled that Orthogonal Array Technique is only good for Regression Testing.
Therefore it does not necessarily cover every the leg of the Call Flow. However, it covers all the
functionalities of the Call Flow. In order to select the appropriate orthogonal array, the
following steps are required:
3
a) Determination of the degrees of freedom associated with the experiment.
b) Selection of an appropriate orthogonal array for the number of levels of each
factor/category.
c) Determination of the existence of possible interactions between factors
d) Make contingency plans for any deviations from ideal conditions.
a)
Determination of the Degrees of Freedom.
Consider a situation with six factors: A, B, C, D, E, and F. Assume factor A has 2 levels, and
factors (B, C, D, E, and F ) have 3 levels each. For any Orthogonal Array design, the overall
mean is always assigned on degree of freedom. Usually, the number of degrees of freedom
associated with any factor is equal to one less than the number of levels of the factor. The
degrees of freedom associated with interaction between any two factors are given by the product
of the degrees of freedom for each of the two factors. So if Factor A has (nA-1) degrees of
freedom, and Factor B has (nB-1) degrees of freedom, then the degrees of freedom for
interaction A X B = (nA-1)* (nB-1) = (nA-1)* (nB-1). The degrees of freedom for the experiment
would be as shown in the table below:
Factor/Interaction Degrees of Freedom
1
Overall mean
2–1=1
A
B, C, D, E, and F 5* (3 – 1) = 10
(2 – 1) x (3 – 1) = 2
AXB
14
Total
This means that at least 14 experiments need to be performed to cover the effects of all six
factors and their interactions.
b)
Selecting a Standard Orthogonal Array
There are several array-naming conventions. One convention is derived from the number of
rows and columns in the array, as well as the number of levels of each column of the array.
Thus the array L18(2137) indicates that the array has 18 rows; one 2-level column, and seven 3level columns. So the total number of columns in the array is 8, and the total number of test
cases executed is 18. It should be noted that the number of test cases must be at least equal to
the degrees of freedom required for the experiment. The number of columns on the other hand,
indicates the maximum number of factors that can be tested using that array. In addition, the
number of levels of the factors must match the numbers of levels of the columns in the array.
The idea is to utilize the minimum array that satisfies all the requirements of the experiment. If,
for instance, it is required to select an appropriate orthogonal array for an experiment that has
one 2-level factor and six 3-level factors, then the most appropriate array to select would be L18.
L16 would not have been appropriate because it does not have any 3-level columns. In situations
where the orthogonal array matrix has more columns than the number of factors, the extra
4
columns can be filed b creating an extra (dummy) level; and the matrix will still be
proportionally balanced.
Table of Standard Orthogonal Arrays
c) Determination of possible interactions.
In most situations, interactions the existence of interactions is miniscule. For this experiment,
there were no perceived interactions. The existence of interactions is dependent on the nature of
the factors, and can by designed using linear graphing techniques invented by Taguchi (1987).
Result
This Application Under Test (AUT) is a software application that is used for handling customer
requests in an Interactive Voice Response (IVR) environment. Each of the categories is
considered a factor of the experiment. There are four 3-level factors, and two 2-level factors. If
the testing were to be done using 100% testing, we would end up with or 324 (3422) test cases .
Therefore we need to construct an orthogonal array with at least four 3-level factors and at least
two 2-level factors. From the table of Standard Orthogonal Arrays, the most appropriate
5
orthogonal array to select is L36(2137). It is the minimum orthogonal array that satisfies the
requirement of the number of columns (6) for this experiment. The conclusion here is that we
have reduced the size of our test plan from 324 to 36 without losing the integrity of the test plan.
The corresponding percent defects detected under the two methods, yielded a percent defective
of 0.025, This shows that there is no loss in rigor by adopting the orthogonal array approach.
The advantage therefore is that we are able to arrive at the same conclusion with much less
resources.
L36(211312) Orthogonal Array
6
Columns
Expt. #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
1
2
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
3
1
1
1
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
3
3
3
3
4
1
1
1
1
1
2
2
2
2
2
2
1
1
1
1
2
2
2
2
3
3
3
3
5
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
3
3
3
3
1
1
1
1
6
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
1
1
1
1
2
2
2
2
7
1
1
2
2
2
1
1
1
2
2
2
1
1
2
3
1
2
3
3
1
2
2
3
8
1
1
2
2
2
1
1
1
2
2
2
2
2
3
1
2
3
1
1
2
3
3
1
23
11
11
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
10
10
10
10
10
10
9
9
9
9
9
9
8
8
8
8
8
8
0
12
0
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
9
9
9
9
9
9
10
10
10
10
10
10
11
11
11
11
11
11
0
0
12
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
9
1
1
2
2
2
1
1
1
2
2
2
3
3
1
2
3
1
2
2
3
1
1
2
10
1
2
1
2
2
1
2
2
1
1
2
1
1
3
2
1
3
2
3
2
1
3
2
11
1
2
1
2
2
1
2
2
1
1
2
2
2
1
3
2
1
3
1
3
2
1
3
12
1
2
1
2
2
1
2
2
1
1
2
3
3
2
1
3
2
1
2
1
3
2
1
13
1
2
2
1
2
2
1
2
1
2
1
1
2
3
1
3
2
1
3
3
2
1
2
14
1
2
2
1
2
2
1
2
1
2
1
2
3
1
2
1
3
2
1
1
3
2
3
15
1
2
2
1
2
2
1
2
1
2
1
3
1
2
3
2
1
3
2
2
1
3
1
16
1
2
2
2
1
2
2
1
2
1
1
1
2
3
2
1
1
3
2
3
3
2
1
17
1
2
2
2
1
2
2
1
2
1
1
2
3
1
3
2
2
1
3
1
1
3
2
18
1
2
2
2
1
2
2
1
2
1
1
3
1
2
1
3
3
2
1
2
2
1
3
19
2
1
1
2
1
1
2
2
1
2
1
1
2
1
3
3
3
1
2
2
1
2
3
20
2
1
1
2
1
1
2
2
1
2
1
2
3
2
1
1
1
2
3
3
2
3
1
21
2
1
1
2
1
1
2
2
1
2
1
3
1
3
2
2
2
3
1
1
3
1
2
22
2
1
1
1
2
2
2
1
1
1
2
1
2
2
3
3
1
2
1
1
3
3
2
23
2
1
1
1
2
2
2
1
1
1
2
2
3
3
1
1
2
3
2
2
1
1
3
24
2
1
1
1
2
2
2
1
1
1
2
3
1
1
2
2
3
1
3
3
2
2
1
25
2
1
1
2
2
2
1
2
2
1
1
1
3
2
1
2
3
3
1
3
1
2
2
26
2
1
1
2
2
2
1
2
2
1
1
2
1
3
2
3
1
1
2
1
2
3
3
27
2
1
1
2
2
2
1
2
2
1
1
3
2
1
3
1
2
2
3
2
3
1
1
28
2
2
2
1
1
1
1
2
2
1
2
1
3
2
2
2
1
1
3
2
3
1
3
29
2
2
2
1
1
1
1
2
2
1
2
2
1
3
3
3
2
2
1
3
1
2
1
30
2
2
2
1
1
1
1
2
2
1
2
3
2
1
1
1
3
3
2
1
2
3
2
31
2
2
2
2
1
2
1
1
1
2
2
1
3
3
3
2
3
2
2
1
2
1
1
32
2
2
2
2
1
2
1
1
1
2
2
2
1
1
1
3
1
3
3
2
3
2
2
33
2
2
2
2
1
2
1
1
1
2
2
3
2
2
2
1
2
1
1
3
1
3
3
34
2
2
2
1
2
1
2
1
2
2
1
1
3
1
2
3
2
3
1
2
2
3
1
35
2
2
2
1
2
1
2
1
2
2
1
2
1
2
3
1
3
1
2
3
3
1
2
2
2
2
1
2
1
2
1
2
2
1
3
2
3
1
2
1
2
3
1
1
2
3
36
1
2
3
18 18 18 18 18 18 18 18 18 18 18 0 12 12 12 12 12 12 12 12 12 12 12 12
18 18 18 18 18 18 18 18 18 18 18 0 12 12 12 12 12 12 12 12 12 12 12 12
0 0 0 0 0 0 0 0 0 0 0 0 12 12 12 12 12 12 12 12 12 12 12 12
A sample of the result of replacing the code number for each level of four of the 23 factors with
the actual caller utterance is shown in the matrix below.
Subset of L36(211312) Orthogonal Array after substitution
7
Columns
Expt. #
1
2
12
1
Play Store Info
Get Zip Code
Play Store info
2
Play Store Info
Get Zip Code Find More Stores
3
Play Store Info
Get Zip Code
Get an Agent
4
Play Store Info
Get Zip Code
Play Store info
5
Play Store Info
Get Zip Code Find More Stores
6
Play Store Info
Get Zip Code
Get an Agent
7
Play Store Info
Get Zip Code
Play Store info
8
Play Store Info
Get Zip Code Find More Stores
9
Play Store Info
Get Zip Code
Get an Agent
10
Play Store Info
Get Tel #
Play Store info
11
Play Store Info
Get Tel #
Find More Stores
12
Play Store Info
Get Tel #
Get an Agent
13
Play Store Info
Get Tel #
Play Store info
14
Play Store Info
Get Tel #
Find More Stores
15
Play Store Info
Get Tel #
Get an Agent
16
Play Store Info
Get Tel #
Play Store info
17
Play Store Info
Get Tel #
Find More Stores
Play Store Info
Get Tel #
18
Get an Agent
19
Find More Stores Get Zip Code
Play Store info
20
Find More Stores Get Zip Code Find More Stores
21
Find More Stores Get Zip Code
Get an Agent
22
Find More Stores Get Zip Code
Play Store info
23
Find More Stores Get Zip Code Find More Stores
24
Find More Stores Get Zip Code
Get an Agent
25
Find More Stores Get Zip Code
Play Store info
26
Find More Stores Get Zip Code Find More Stores
27
Find More Stores Get Zip Code
28
Find More Stores
Get Tel #
Play Store info
29
Find More Stores
Get Tel #
Find More Stores
30
Find More Stores
Get Tel #
Get an Agent
31
Find More Stores
Get Tel #
Play Store info
32
Find More Stores
Get Tel #
Find More Stores
33
Find More Stores
Get Tel #
Get an Agent
34
Find More Stores
Get Tel #
Play Store info
35
Find More Stores
Get Tel #
Find More Stores
36
Find More Stores
Get Tel #
Get an Agent
Get an Agent
Model II: Design of Experiment using Acceptance Sampling
The purpose of acceptance sampling is to specify criteria for accepting or rejecting a lot based
on certain operational characteristics. In a Quality Assurance environment, the results of any
8
testing can be described as dichotomous. The Test Case either passes or fails. These are the two
attributes to be considered when testing software. Attribute inspection is used when
measurements take on only two values; e.g. {defective, non-defective} or {on, off}; {pass, fail}.
Therefore in developing a sampling plan, we only need to specify two things:
i.
ii.
sample size, n
acceptance number, c
For a given n and c: If the number of defectives, x is less than c, accept the lot. If the number of
defectives, x is more than c, reject the lot. If on the other hand, the analyst is interested in
ensuring higher quality, they can design a Double Sampling Plan. In that case, the
considerations will be:
First take a sample of size n1. If the number of defectives x1, is less than c1, accept the lot; If the
number of defectives x1, is greater than c2, reject the lot; However, if c1 < x1 < c2; then:
Take a second sample of size n2. If now (x1 + x2) > c2, reject the lot; otherwise, if (x1 + x2) < c2,
accept the lot.
The Single Acceptance Sampling Model:
We will restrict our testing to single Acceptance Sampling since Double Sampling is basically a
replication of Single Sampling.
Let N be the total number of Test Cases in the Test Plan of a version of software that is already
in the field.
Let n be the total number of Test Cases to be selected for testing.
Let p be the proportion of Test Cases that result in a defect being identified. (This is presently
an unknown quantity).
Let p be the proportion of Test Cases that are identified from testing a sample of size n.
The idea is to obtain p as an estimate of the proportion of the overall test cases that will result
in identifying a defect in the Application. In order to implement this test plan, two parameters
need to be established - The maximum number of defects that the client can live with – the
Acceptance Number (c); and the maximum risk that the client is willing to take ()
The fairest way to accomplish this is for both the software developer and the client, to generate
an Operating Characteristic Curve, and use the result as a basis for determining the most
equitable values of customer acceptance and developer risk.
Development of the Operating Characteristic (O-C) curves
O-C curves are generated for the purpose of reducing the amount of computation that goes into
9
determining whether or not to accept a particular sample. It is obtained from calculating the
probability of acceptance for various values of percent defective.
Mathematically, we have:
n
p
q
k
p(k)
c
=
=
=
=
=
=
sample size
percent defective
(1 - p); percent non-defective
number of defective items
probability of k defectives
acceptance number (the limit of number of defective
items considered acceptable by the client)
 P(x);
c
P(A) 
x 0
n
where P(x)    px (1  p) n  x
x
then
Suppose we have a lot of size N, and take a sample of size n. What is the probability of
accepting the lot given that the percentage of defective items in the lot is p, and the acceptance
number C, is 0? This scenario follows a Binomial Distribution.
The probability function for a Binomial Distribution is given by:
n!
n
x
n-x
P(x) =   p x (1 - p )n - x =
p (1 - p)
x! (n - x)!
x
Then Probability of acceptance, P(A) is given by:
n
P(A)   P(x)    p x (1  p)n  x
x
These probabilities can be obtained from the Binomial Tables. A sample result of the
calculations is shown below.
10
Construction of the Operating Characteristic (OC) Curve
In order to construct the OC-curve 4 parameters have to be calculated:
1. AQL (Acceptance Quality Level): This is the minimum level of quality that is
assignable to a lot that is considered GOOD.
2. LTPD (Lot Tolerance Percent Defective) This is the minimum level of quality that is
assignable to a lot that is considered BAD.
3. Producer’s Risk ( is the probability that a producer’s GOOD lot is rejected. In other
words, the probability of committing a Type I error.
4. Consumer’s Risk ( is the probability that a BAD lot is accepted by the consumer. In
other words, the probability of committing a Type II error.
O-C CURVES FOR ATTRIBUTES
(Sample size, n = 10)
1.20
1.00
Probability of Acceptance - P(A)
c=5
c=4
0.80
c=3
0.60
c=2
c=1
0.40
c=0
0.20
0.00
percent defective - p
11
Figure 2: The Larson Nomogram for OC Curves
12
Producer's and Consumer's Risk Values
Producer's risk 
Consumer's risk 
n = 10, c = 0
.18
.10
n = 10, c = 1
.04
.37
n = 10, c = 2
.01
.67
Result.
Applying Acceptance Sampling technique to the same Application Under Test (AUT) in an
Interactive Voice Response (IVR) environment. We proceed by identifying the 6 basic
categories to be tested:






Finding a store
Get Zip Code
Obtain Payment Information
Ask about Promotions
Confirming an order
Asking for the status of an order
A caller’s utterance may or may not be correctly categorized; in which case, the application will
request that the caller repeat their utterance. If, on the other hand, the caller intent is properly
identified, the application will proceed to provide the required information. Thus each of these
utterances can be broken down into 3 categories:
 IVR correctly recognizes the caller intent and proceeds to provide the requested
information
 IVR barely recognizes caller intent and requests the caller to repeat their intent
 IVR does not recognize caller intent and transfers caller to a live agent.
Using the Call Flow diagram, a Test Plan is developed to handle all possible combinations of
the caller intent. The total number of test cases was 324. Assuming that the client would like to
have 99.99% reliability, this implies that the customer is not ready to assume any risk more than
0.01. In other words, the consumer’s risk,  = 0.01. The producer (software developer) on the
other hand should be able to assume a greater risk than the consumer; so the producer decides to
assume a risk of 0.05. Locating these two values on an Operating Characteristic Curve, we
come up with a sampling plan of n = 50 and c = 1.
13
1.00
 = 0.05

0.95
Producer's risk
for AQL
P(A)
0.10
0.00
Percent
Defective
 = .10 
Consumer's
risk for
LTPD
Good
Lot
Indifference
Zone
Bad Lot
AQL
LTPD
Using a random number generator, a random sample of 50 test cases was extracted. Out of the
50 test cases, two test cases failed; but one was due to network issues; so that test case was
discarded. Essentially, 1 test case failed out of 49 test cases. This means that the proportion
defective based on a sample size of 49 is 1/49 (0.02). This corresponds very strongly with the
results previously obtained using orthogonal Defect Classification.
Conclusions and Recommendation.
Orthogonal Defect Classification belongs in the field of Robust Design developed in the 1950s
and early 1960s by Taguchi. It has been developed to improve productivity during research and
development so as to be able to develop high quality products at the cheapest cost. It applies the
ideas from statistical experimental design to reduce the variation of a product’s functionality. It
also ensures the conclusions arrived at in the laboratory are optimal. The three main categories
to consider in the design of an orthogonal array are: Operating cost, manufacturing cost, and
cost of Research and Development. Orthogonal Array experiments are utilized mainly in
bringing the cost of Research and Development to a minimum. The two major tools of Robust
Design are: Signal-to-Noise Ratio, and orthogonality. This is the same principle behind
Acceptance Sampling – using sampling statistics to make inferences about population
parameters. Because of the ease of setting up the Test Plan based on Acceptance Sampling
techniques, it is recommended that Acceptance Sampling be used in preference to the
orthogonal defect classification technique.
14
References
Addelman, S. “Orthogonal Main Effect Plans for Asymmetrical Factorial Experiments”
Technometrics (1962) vol. 3: pp. 21- 46.
Bakir, S. T. “A Quality Control Chart for Work Performance Appraisal.” Quality Engineering
17, no. 3 (2005): 429
Besterfield, Dale H. Quality Control, 18th ed. Upper Saddle River, JN: Prentice Hall, 2009
Box, G. E. P., Hunter, W. G., and Hunter, J. S. Statistics for Experimenters – An Introduction to
Design, Data Analysis and Model Building. New York: John Wiley and Sons, 1978.
Cochran, W. G. and Cox, G. M. Experimental Design. New York: John Wiley and Sons, 1957
Elg, M., J. Olsson, and J. J. Dahlgaard. “Implementing Statistical Process Control.: The
International Journal of Quality and Reliability Management 25, no. 6 (2008): 545.
Goetsch, Dagid L., and Stanley B. Davis. Quality Management. 5th ed. Upper Saddle Rover, NJ:
Prentice Hall, 2006
Lin, H., and G. Sheen. “Practical Implementation of the Capability Index Cpk Based on Control
Chart Data.” Quality Engineering 17, no. 3 (2003) 371.
Matthes, N., et al. “Statistical Process Control for Hospitals.” Quality Management in Health
Care 16, no. 3. (July – September 2007): 205
Montgomery, D. C. Introduction to Statistical Quality Control, 6th ed. New York: Wiley, 2008
Summers, Donna. Quality Management, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2009
Taguchi, G. Orthogonal Arrays and Linear Graphs. Dearborn, MI: ASI Press, 1987
Taguchi, G. and Phadke, M. S. “Quality Engineering through Design Optimization.”
Conference Record, GLOBECOM 84 Meeting, IEEE Communications Society. Atlanta, GA
(November 1984) pp. 1106-1113,
15
Download