A minimal adaptive sampling plan for finite lot inspection Wei Chen computer science, UNO Andrew W Swift mathematics, UNO Abstract Quality control is an important part of any manufacturing process. Ideally, to guarantee outgoing quality meets a certain standard, every item should be inspected. However, in many instances this is either not practical or not possible. Thus, sampling must be used. If the population size is large, statistical theory provides convenient methods for estimating quality. However, for small population size these methods are not valid. We have introduced an adaptive sampling plan based on exact confident intervals for the Hypergeometric Distribution. A key property of our sampling plan is that the sample size is always the smallest size required to show that the outgoing quality meets a specified target with a given confidence. Key words: sampling plan, confidence interval of hypergeometric, quality control 1. Introduction People usually cannot make direct observations of every item in the population produced. Instead, they collect data from a subset of items – a sample – and use those observations to make inferences about the entire population or the lot. Ideally, the sample corresponds to the larger population on defective rate of interest. In that case, the conclusions from the sample are probably applicable to the entire population. This type of correspondence between the sample and the larger population is most important when customers or manufacturers wants to know what proportion of the items is defective or defunctional – like boxes of sealed food , for example, require a sample to show what is the rate or chance the food in the box are edible. Ideally, if we can open every one to inspect, we can get guarantee that the food is good enough to eat, at the same time, we add more cost for producing the food and the food can not be shipped because of inspection. So in many instances this is either not practical or not possible to get guaranteed quality through one by one sampling. People want process which can reduce costs of inspection and yet accurate estimation of defective rate of items, even improves the quality of products at a given assurance. The connection with inspection and testing of produced items is one of earliest research area for quality control and improvement, even earlier than statistical methodology was introduced. Inspection or sampling can occur at many points in a process. Acceptance sampling is defined as the inspection or classification of a sample of items selected at random from a larger batch or lot and decides whether accept the whole lot or reject it. According to different point to execute the inspection, there are several kinds of inspection. The outgoing inspection performs immediately following the production and before the product is shipped. The incoming inspection performs before customers accept lots of batches of product from the manufacturer. A lot acceptance sampling plan is an inspection scheme and a set of rules for making decisions. The decision, based on counting the number of defectives in a sample, can be to accept the lot, reject the lot, or even, for multiple or sequential inspection processes, such as take another sample and then repeat the decision process. The most used sampling plans are simple sampling plans, double sampling plans, multiple sampling plans and sequential sampling plans. In single sampling plans, one sample of items is selected at random from a lot and the disposition of the lot is determined from the resulting information. These plans are usually denoted as (n,c) plans for a sample size n, where the lot is rejected if there are more than c defectives. These are the most common (and easiest) plans to use although not the most efficient in terms of average number of samples needed. In double sampling plans, after the first sample is tested, there are three choices: accept the lot, reject the lot, and continue another sampling. If the outcome is second sample, the procedure is to combine the results of both samples and make a final decision based on that information. Multiple sampling plans, is an extension of the double sampling plans where more than two samples are needed to reach a decision. The advantage of multiple sampling is smaller sample sizes. Sequential sampling plans is the extension of multiple sampling where items are selected from a lot one at a time and after inspection of each item a decision is made to accept or reject the lot or select another unit. [2] In this report, we will introduce the an new sequential sampling plan in outgoing inspection, which has small sampling size required to meet specific target in given confident level. 2. Exact confident interval of hypergeometric distribution In statistical methodology, the estimation of products distribution is very important. Traditionally the basic assumption of items distribution is under binomial or other distribution. Due to the large population, the estimate is acceptable. However, when the size of product shipped goes down, binomial distribution is difficult to catch the real character of defective rate among finite lot. Hypergeometric distribute has better estimation. The following section focuses on how to get exact confident interval under hypergeometric distribution. 2.1 Probabilities of “tail” intervals Consider X (d , n, N ) , a hypergeometric distributed random variable with positive integer d N d k n k parameters d , n and N , for integer k [0, n] P ( X (d , n, N ) k ) when N n d<k<n, P(X)=0 For c [0, n] , let the lower hypergeometric tail be this function d N d i n i h(c, d ) P( X (d , n, N ) c) N i 0 n c Note that, for c min( d , n) and all N max( d , n) h ( c, d ) 1 Proposition 1. let c [0, min( d , n)] , h(c,0) 1 and h(c, N ) 0 For 0 d N value of function h(c, d ) is non-increasing with value of d increasing and value of c fixed, or h(c, d ) is decreasing for d [0, N ] Proof: d N d c i n i P( H (d , N ) c) N i 0 n c d N d i n i i 0 N n Let it be p (d ) , we want to show p(d 1) p(d ) 0 d is a non negative integer less or equal to N. Proof Let f (d ) c d N d i n i , then i 0 c d 1 N d 1 c d N d f (d 1) f (d ) i 0 i n i i 0 i n i c d d N d 1 c d N d i 0 i i 1 n i i 0 i n i c d N d 1 c d N d 1 N d c d N d 1 c d N d 1 n i i 0 i n i n i i 0 i 1 n i i 0 i n i 1 i 0 i 1 d N d 1 c d N d 1 d N d 1 0 c n c 1 i 0 i n i 1 i 0 i n i 1 c 1 So f (d 1) f (d ) 0 . d N d 1 c d N d c i n i i 0 i n i f (d 1) f (d ) So h(c, d 1) h(c, d ) 0 N N N i 0 n n n when d [1,N ] Similarly, the upper hypergeometric tail is defined to be this function d N d i n i (c, d ) P( X (d , n, N ) c) 1 N i 0 n c 1 Proposition 2. let c [1, min( d , n)] , (c,0) 0 and (c, N ) 1 For 0 d N value of function (c, d ) is non-decreasing with value of d increasing and value of c fixed, or (c, d ) is increasing for d [0, N ] Proof: with c 0 , we have (c, d ) 1 h(c 1, d ) , c 1 [0, min( d , n) 1] . Application of the previous proposition to h(c 1, d ) , give the conclusion listed here. 2.2 Traditional exact confidence intervals Let d [0, N ] and n a positive integer. Given a number (0,1) and an Observation c [0, n] from a hypergeometric random variable X (d , n, N ) , we would like to describe a ( 100)% confidence interval I (n, c) [0, N ] . The basic idea behind traditional exact confidence intervals is this: an observation like c should be an unusual observation for any d outside of I (n, c) . Traditionally, an observation like c translates into an observation in [0, d’] or 1 1 or P ( X (d , n, N ) c) . 2 2 1 Since (0,1) , both cannot be true, and one might logically replace with 1 , this is 2 [d’,n]. “Unusual” translates into P ( X (d , n, N ) c) our case. Here is a formal statement of the traditional translation of this idea: d I (n, c) P( X (d , n, N ) c) 1 1 orP ( X (d , n, N ) c) 2 2 Another way to state the idea is that, having observed c, any d I ( n, c ) is rejected with a risk, for any specific rejected d but which is in fact correct; the probability of obtaining something like d from X (d ' , n, N ) is at most ( 1 ). Let us assume that c [0, n] . For any value of d, the function P( X (d , n, N ) c) h(c, d ) decreases from 1 at d = 0 to 0 at d = N. By the intermediate value theorem, there is a unique smallest d2 * [0, N ] such that P ( X ( d 2 *, n, N ) c) d [d 2 *, N ] P( X (d , n, N ) c) 1 . 2 1 or 2 Thus we reject any d [d2 *, N ] . Similarly, with c [1, n] , the function P( X (d , n, N ) c) (c, d ) increases from 0 at d = 0 to 1 at d = N. Again, by the intermediate value theorem, there is a unique d1* [0, N ] such that P( X (d1*, n, N ) c) 1 1 or d [0, d1*] P( X (d , n, N ) c) 2 2 Therefore, we reject any d [0, d1*] . Thus we may select I(n, d) = [ d1 * , d2 * ] as a ( 100)% confidence interval. In most cases, d1 * and d2 * are numerically search from the equations P( X (d1*, n, N ) c) 1 1 and P( X (d 2 *, n, N ) c) . 2 2 Proposition 3. For c [0, n] , we have d1* d2 * Proof: we shall prove this by contradiction. Suppose d1* d2 * , choose some d such that d2 * d d1 * . Since d d1 * P( X (d , n, N ) c) P( X (d1*, n, N ) c) 1 2 Since d2 * d P( X (d , n, N ) c) P( X (d 2 *, n, N ) c) 1 2 Therefore 1 P(0 X (d , n, N ) n) P( X (d , n, N ) c) P( X (d , n, N ) c) 1 1 When [0,1] . The contradiction implies Proposition 3 is ture. 2.3 One side “tail” In our case, we need one side “tail”, or a confidence interval is such format [d *, N ] , any d [d *, N ] P( X (d , n, N ) c) 1 The goal is to find smallest d* satisfy P( X (d *, n, N ) c) 1 ,or h(c, d *) 1 . We can use search algorithm to get solution. 3. The design of sampling plan 3.1 mapping the hypergeometric interval to inspection Image such a scenario, before the size of N of product is shipped out; we need defective rate below p at ( 100)% confidence level. We have no more knowledge about the product, how can we decide the size of inspection, n to archive the requirement. As mentioned above, the situation fits hypergeometric distribution P( X (d , n, N ) c) : N is the size of product under sampling, n is the size of inspection, d is estimated value of defective items containing in the lot, c is the number of defective items observed. With fixed N, n, c, the random variable X depends on value of d. If estimated d̂ d is in [0, d*], then it is true at least ( 100)% chance the true d is less then d*. Then come back to our problem, we can get d* from required defective rate and the size of lot N, which is p N . From this information, we want to get small n that when d [d *, N ] , P( X (d , n, N ) c) 1 , or when observed defective below or equal to c with sampling size n, and the defective items among N population is d, the chance is rare as (1 )% . The way to find n is to search n in [1, N]. We develop an quick algorithm to obtain the smallest n when fixed N, c, and d*, which satisfy P( X (d *, n, N ) c) 1 . Why the result of n is smallest sampling size? Suppose we find n is such a solution satisfy the equation P( X (d *, n, N ) c) 1 , which means we inspect n items randomly select from the lot, less then c defective item observed, we can conclude that total defective items is less then d* in confidence level. Then if we pick n+1 items with observed defective stay as c, P( X (d *, n 1, N ) c) P( X (d *, n, N ) c 1 ) leads to another conclusion that d has less chance that bigger than d* under n+1 sampling size compared with n sampling size, or in other words, more assurance d is below required one. In the end, we can say the n is smallest sampling size required. 3.2 finding smallest n From the analysis above, we know the range of n is from 1 to N. the value of P( X (d *, n, N ) c) is not decreasing with value of n increasing when d*, c and N fixed. The reason is obvious, when we get big sampling size, the more chance we will obtain bigger size of defective items, and probability of observed items less than c is less. Through this monotonic characteristic, we can always get a smallest n to satisfy condition P( X (d *, n, N ) c) 1 . Any local search can apply to this problem. In our case, a Pseudo gradient search algorithm is introduced. Search steps are following: 1. initialize n=1 and search step as 1, N, d*, c and are specified. 2. Check the current value of P( X (d *, n, N ) c) , if it is bigger than 1 , then n=n+step and doubles step; else n=n-step and step=step/2 3. if step is not equal to 0 then repeat 2; else jump out of the loop 4. if for current n, P( X (d *, n, N ) c) 1 , n=n+1; 3.3 the scheme of proposed sequential sampling plan Through the algorithm, we can get sampling size when N, d*, c and are specified. So we build an inspection table such that c is incremented at 1, the corresponding sampling size n is list beside it. Like the figure 1. Figure 1 sampling table. So items are taken randomly from the lot with population size N. as each item is selected and inspected, a decision is made whether to 1) accept the lot if a large enough number of nondefective items has encountered already; 2) continue to select some items to further inspection 3) inspect all the rest of items For example we want to check a box of bulbs with size 1000; the required defective rate is 0.1 with confident level 0.9, as described in the figure 1. according to the first line of the table, we need take 22 items from the lot. If all these bulbs work, then we accept all lot, if there 2 bulbs do not work, we need take another (51-22=29) bulbs to further inspect and remove the defective 2 bulbs. Repeat the process either in some c, no more defective items find out in the sampling or check all the population. 3.4 Analysis of sampling plan After this inspection process, or outgoing inspection, the shipped items have at most required defective rate p at given confident level . The risk of rejecting an acceptable lot is 0, because this inspection need no information about the assumed defective rate of produce line, in other word, no matter defective rate of produce line has, the required p can always be met by sequential sampling, if defective rate of produce line or the defective items among the lot is high, the sample size meet the population size frequently, which means check the item one by one; if the real defective rate is low or bad items in the lot are small, the sample size keeps small also. So when an acceptable lot is under inspection, we can be confirmed by checking item one by one at worst case, we don’t reject the lot. The risk of accepting an unacceptable lot is 1 at most. As we proved in previous section, when we observed c items defective, the real defective items contains in N is bigger than required one at most at chance of 1 . 4. Simulations and conclusions 4.1 Simulations We use computer to simulate the scenario of inspection. We put every 1000 items into a box virtually with randomly produced defective items. The required defective rate shipped is 0.1 or 100 defective itmes per 1000 items, the confident level is 90% . Then use the sequential sampling plan to inspect the boxes. The results of simulations are list in Table 1. Defective items per 1000 (before inspection ) Final sample size Defective items found (removed) Defective shipped 52 37 1 5.11% 80 101 6 7.6% 100 476 41 4.28% 105 999 105 0.0% rate Table 1. the simulation of products before and after the sequential sampling, the assumptions: every box contains 1000 items, the required defective rate 10%; the confident level is 90%; the real defective items in the boxes are 52, 80, 100 and 105. The sampling plan accept the lot with lower defective rate with smaller sample size. When the lot contains more defective items than required one, the sample size becomes bigger, even the whole population size. The defective rate of shipped product shows below required one. 4.2 conclusions Our sequential sampling plan is proposed to solve the problem that the inspected lot has small population size. Traditional sampling plan prefers big population size, or binomial distribution and normal distribution works better in big population size. The hypergeometric distribution can model the situation of the small population size better. The fundamental problem is estimation of defective rate p from produce line. If population size is bigger, the estimation of such p is more accurate. Our sampling is a kind of distribution free at defective rate of produce line. Avoiding the difficulty in obtaining the knowledge of defective rate in produce line, we can still meet the requirement. We can preset down exact confidence interval of defective items according to the requirement p and N, and then find smallest sampling size n to satisfy confidence level. Another merit of the sampling plan is that the process could improve the quality of items. If the defective rate in produce line is higher than required one, through inspection we can remove the defective items, the final passing lot meets the requirements. In this situation, the lot has low chance to pass through small sample size, it may meet big sample size even inspect one by one. Once the lot pass the sampling, it display required character at given confidence. To reduce the cost of inspection is improve product quality in production, or to improve the quality through strict or more inspection. Reference 1. Ross.S.M, Introduction to probability Models eighth edition, Academic press, 2003 2. Montgomery. D. C, Introduction to statistical quality control 5e, John Wiley & Sons, 2005 3. Ramsey. L.T, (Traditional) exact confidence intervals for the binomial distribution, www.math.hawaii.edu/~ramsey/TraditionalBinomialCI.pdf