Graduate Lectures and Problems in Quality Control and Engineering Statistics: Theory and Methods To Accompany Statistical Quality Assurance Methods for Engineers by Vardeman and Jobe Stephen B. Vardeman V2.0: January 2001 c Stephen Vardeman 2001. Permission to copy for educational ° purposes granted by the author, subject to the requirement that this title page be a¢xed to each copy (full or partial) produced. 2 Contents 1 Measurement and Statistics 1.1 Theory for Range-Based Estimation of Variances . . . . . . . . . 1.2 Theory for Sample-Variance-Based Estimation of Variances . . . 1.3 Sample Variances and Gage R&R . . . . . . . . . . . . . . . . . . 1.4 ANOVA and Gage R&R . . . . . . . . . . . . . . . . . . . . . . . 1.5 Con…dence Intervals for Gage R&R Studies . . . . . . . . . . . . 1.6 Calibration and Regression Analysis . . . . . . . . . . . . . . . . 1.7 Crude Gaging and Statistics . . . . . . . . . . . . . . . . . . . . . 1.7.1 Distributions of Sample Means and Ranges from Integer Observations . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.2 Estimation Based on Integer-Rounded Normal Data . . . 1 1 3 4 5 7 10 11 12 13 2 Process Monitoring 21 2.1 Some Theory for Stationary Discrete Time Finite State Markov Chains With a Single Absorbing State . . . . . . . . . . . . . . . 21 2.2 Some Applications of Markov Chains to the Analysis of Process Monitoring Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3 Integral Equations and Run Length Properties of Process Monitoring Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3 An Introduction to Discrete Stochastic Control Theory/Minimum Variance Control 37 3.1 General Exposition . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4 Process Characterization and Capability Analysis 4.1 General Comments on Assessing and Dissecting “Overall Variation” 4.2 More on Analysis Under the Hierarchical Random E¤ects Model 4.3 Finite Population Sampling and Balanced Hierarchical Structures 45 45 47 50 5 Sampling Inspection 53 5.1 More on Fraction Nonconforming Acceptance Sampling . . . . . 53 5.2 Imperfect Inspection and Acceptance Sampling . . . . . . . . . . 58 3 4 CONTENTS 5.3 Some Details Concerning the Economic Analysis of Sampling Inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Problems 1 Measurement and Statistics . . . . 2 Process Monitoring . . . . . . . . . 3 Engineering Control and Stochastic 4 Process Characterization . . . . . . 5 Sampling Inspection . . . . . . . . A Useful Probabilistic Approximation . . . . . . . . . . . . . . . . . . . . Control Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 69 . 69 . 74 . 93 . 101 . 115 127 Chapter 1 Measurement and Statistics V&J §2.2 presents an introduction to the topic of measurement and the relevance of the subject of statistics to the measurement enterprise. This chapter expands somewhat on the topics presented in V&J and raises some additional issues. Note that V&J equation (2.1) and the discussion on page 19 of V&J are central to the role of statistics in describing measurements in engineering and quality assurance. Much of Stat 531 concerns “process variation.” The discussion on and around page 19 points out that variation in measurements from a process will include both components of “real” process variation and measurement variation. 1.1 Theory for Range-Based Estimation of Variances Suppose that X1 ; X2 ; : : : ; Xn are iid Normal (¹,¾2 ) random variables and let R = max Xi ¡ min Xi = max(Xi ¡ ¹) ¡ min(Xi ¡ ¹) µ µ ¶ µ ¶¶ Xi ¡ ¹ Xi ¡ ¹ = ¾ max ¡ min ¾ ¾ = ¾ (max Zi ¡ min Zi ) where Zi = (Xi ¡ ¹)=¾. Then Z1 ; Z2 ; : : : ; Zn are iid standard normal random variables. So for purposes of studying the distribution of the range of iid normal variables, it su¢ces to study the standard normal case. (One can derive “general ¾” facts from the “¾ = 1” facts by multiplying by ¾.) Consider …rst the matter of the …nding the mean of the range of n iid standard normal variables, Z1 ; : : : ; Zn . Let U = min Zi ; V = max Zi 1 and W = V ¡ U : 2 CHAPTER 1. MEASUREMENT AND STATISTICS Then EW = EV ¡ EU and ¡EU = ¡E min Zi = E(¡ min Zi ) = E max(¡Zi ) ; where the n variables ¡Z1 ; ¡Z2 ; : : : ; ¡Zn are iid standard normal. Thus EW = EV ¡ EU = 2EV : Then, (as is standard in the theory of order statistics) note that V · t , all n values Zi are · t : So with © the standard normal cdf, P [V · t] = ©n (t) and thus a pdf for V is f (v) = nÁ(v)©n¡1 (v) : So EV = Z 1 ¡1 ¡ ¢ v nÁ(v)©n¡1 (v) dv ; and the evaluation of this integral becomes a (very small) problem in numerical analysis. The value of this integral clearly depends upon n. It is standard to invent a constant (whose dependence upon n we will display explicitly) : d2 (n) = EW = 2EV that is tabled in Table A.1 of V&J. With this notation, clearly ER = ¾d2 (n) ; (and the range-based formulas in Section 2.2 of V&J are based on this simple fact). To …nd more properties of W (and hence R) requires appeal to a well-known order statistics result giving the joint density of two order statistics. The joint density of U and V is ½ n(n ¡ 1)Á(u)Á(v) (©(v) ¡ ©(u))n¡2 for v > u f (u; v) = 0 otherwise : A transformation then easily shows that the joint density of U and W = V ¡ U is ½ n(n ¡ 1)Á(u)Á(u + w) (©(u + w) ¡ ©(u))n¡2 for w > 0 g(u; w) = 0 otherwise : 1.2. THEORY FOR SAMPLE-VARIANCE-BASED ESTIMATION OF VARIANCES3 Then, for example, the cdf of W is Z tZ P [W · t] = 0 1 g(u; w)dudw ; ¡1 and the mean of W 2 is 2 EW = Z 0 1Z 1 w2 g(u; w)dudw : ¡1 Note that upon computing EW and EW 2 , one can compute both the variance of W Var W = EW 2 ¡ (EW )2 p and the standard deviation of W , Var W . It is common to give this standard deviation the name d3 (n) (where we continue to make the dependence on n explicit and again this constant is tabled in Table A.1 of V&J). Clearly, having : p computed d3 (n) = Var W , one then has p Var R = ¾d3 (n) : 1.2 Theory for Sample-Variance-Based Estimation of Variances Continue to suppose that X1 ; X2 ; : : : ; Xn are iid Normal (¹; ¾2 ) random variables and take n 1 X : ¹ 2: s2 = (Xi ¡ X) n ¡ 1 i=1 Standard probability theory says that (n ¡ 1)s2 » Â2n¡1 : ¾2 Now if U » Â2º it is the case that EU = º and Var U = 2º. It is thus immediate that ¶ µ 2 ¶ µ ¶ µ 2 ¶µ (n ¡ 1)s2 ¾ (n ¡ 1)s2 ¾ 2 Es = E = E = ¾2 n¡1 ¾2 n¡1 ¾2 and Var s2 = Var so that µµ ¾2 n¡1 ¶µ (n ¡ 1)s2 ¾2 ¶¶ = p Var s2 = ¾ 2 µ r ¾2 n¡1 ¶2 2 : n¡1 Var µ (n ¡ 1)s2 ¾2 ¶ = 2¾4 n¡1 4 CHAPTER 1. MEASUREMENT AND STATISTICS Knowing that (n ¡ 1)s2 =¾2 » Â2n¡1 also makes it easy enough to develop p properties of s = s2 . For example, if 8 ³ x´ 1 < ( n¡1 2 )¡1 exp x for x > 0 ¡ n¡1 2 f (x) = 2(n¡1)=2 ¡( 2 ) : 0 otherwise is the Â2n¡1 probability density, then r r Z 1 p ¾2 (n ¡ 1)s2 ¾ p Es = E = xf(x)dx = ¾c4 (n) ; 2 n¡1 ¾ n¡1 0 for : c4 (n) = R1p 0 xf (x)dx p n¡1 another constant (depending upon n) tabled in Table A.1 of V&J. Further, the standard deviation of s is q q q p 2 2 Var s = Es ¡ (Es) = ¾2 ¡ (¾c4 (n))2 = ¾ 1 ¡ c24 (n) = ¾c5 (n) for : c5 (n) = q 1 ¡ c24 (n) yet another constant tabled in Table A.1. The fact that sums of independent Â2 random variables are again Â2 (with degrees of freedom equal to the sum of the component degrees of freedom) and the kinds of relationships in this section provide means of combining various kinds of sample variances to get “pooled” estimators of variances (and variance components) and …nding the means and variances of these estimators. For example, if one pools in the usual way the sample variances from r normal samples of size m to get a single pooled sample variance, s2pooled , r(m ¡ 1)s2p ooled =¾ 2 is Â2 with degrees of freedom º = r(m ¡ 1). That is, all of the above can be applied by thinking of s2p ooled as a sample variance based on a sample of size “n”= r(m ¡ 1) + 1. 1.3 Sample Variances and Gage R&R The methods of gage R&R analysis presented in V&J §2.2.2 are based on ranges (and the facts in §1.1 above). They are presented in V&J not because of their e¢ciency, but because of their computational simplicity. Better (and analogous) methods can be based on the facts in §1.2 above. For example, under the two-way random e¤ects model (2.4) of V&J, if one pools I £ J “cell” sample variances s2ij to get s2p ooled , all of the previous paragraph applies and gives methods of estimating the repeatability variance component ¾2 (or the repeatability standard deviation ¾) and calculating means and variances of estimators based on s2p ooled . 1.4. ANOVA AND GAGE R&R 5 Or, consider the problem of estimating ¾reproducibility de…ned in display (2.5) of V&J. With y¹ij as de…ned on page 24 of V&J, note that for …xed i, the J random variables y¹ij ¡ ®i have the same sample variance as the J random variables y¹ij , namely 1 X : s2i = (¹ yij ¡ y¹i: )2 : J ¡1 j But for …xed i the J random variables y¹ij ¡ ®i are iid normal with mean ¹ and 2 variance ¾¯2 + ¾®¯ + ¾ 2 =m, so that 2 Es2i = ¾¯2 + ¾®¯ + ¾ 2 =m : So 1X 2 s I i i 2 is a plausible estimator of ¾¯2 + ¾®¯ + ¾ 2 =m. Hence 1 X 2 s2p ooled s ¡ ; I i i m or better yet à 1 X 2 s2pooled max 0; s ¡ I i i m ! (1.1) 2 is a plausible estimator of ¾reproducibility . 1.4 ANOVA and Gage R&R Under the two-way random e¤ects model (2.4) of V&J, with balanced data, it is well-known that the ANOVA mean squares X 1 (yijk ¡ y¹:: )2 ; M SE = IJ(m ¡ 1) i;j;k X m M SAB = (¹ yij ¡ y¹i: ¡ y¹:j + y¹:: )2 ; (I ¡ 1)(J ¡ 1) i;j mJ X (¹ yi: ¡ y¹:: )2 ; and I ¡1 i mI X (¹ y:j ¡ y¹:: )2 ; M SB = J ¡1 i M SA = are independent random variables, that EMSE = ¾2 ; 2 EM SAB = ¾2 + m¾®¯ ; 2 EM SA = ¾2 + m¾®¯ + mJ¾®2 ; EMSB = ¾ 2 2 + m¾®¯ + mI¾¯2 ; and 6 CHAPTER 1. MEASUREMENT AND STATISTICS Table 1.1: Two-way Balanced Data Random E¤ects Analysis ANOVA Table ANOVA Table Source SS df MS EM S 2 Parts SSA I ¡1 MSA ¾ 2 + m¾®¯ + mJ¾®2 2 2 Operators SSB J ¡1 M SB ¾ + m¾®¯ + mI¾¯2 2 Parts£Operators SSAB (I ¡ 1)(J ¡ 1) M SAB ¾ 2 + m¾®¯ 2 Error SSE (m ¡ 1)IJ MSE ¾ Total SST ot mIJ ¡ 1 and that the quantities (m ¡ 1)IJM SE (I ¡ 1)(J ¡ 1)M SAB (I ¡ 1)MSA (J ¡ 1)M SB ; ; and EM SE EM SAB EMSA EMSB are Â2 random variables with respective degrees of freedom (m ¡ 1)IJ ; (I ¡ 1)(J ¡ 1) ; (I ¡ 1) and (J ¡ 1) : These facts about sums of squares and mean squares for the two-way random e¤ects model are often summarized in the usual (two-way random e¤ects model) ANOVA table, Table 1.1. (The sums of squares are simply the mean squares multiplied by the degrees of freedom. More on the interpretation of such tables can be found in places like §8-4 of V.) As a matter of fact, the ANOVA error mean square is exactly s2pooled from §1.3 above. Further, the expected mean squares suggest ways of producing sensible estimators of other parametric functions of interest in gage R&R contexts (see V&J page 27 in this regard). For example, note that 2 ¾reproducibility = 1 1 1 1 EM SB + (1 ¡ )EMSAB ¡ EM SE ; mI m I m which suggests the ANOVA-based estimator µ ¶ 1 1 1 1 2 ¾ breproducibility = max 0; M SB + (1 ¡ )MSAB ¡ MSE : mI m I m (1.2) What may or may not be well known is that this estimator (1.2) is exactly the 2 estimator of ¾reproducibility in display (1.1). Since many common estimators of quantities of interest in gage R&R studies are functions of mean squares, it is useful to have at least some crude standard errors for them. These can be derived from “delta method”/“propagation of error”/Taylor series argument provided in the appendix to these notes. For example, if M Si i = 1; : : : ; k are independent random variables, (ºi MSi =EMSi ) with a Â2ºi distribution, consider a function of k real variables f (x1 ; : : : ; xk ) and the random variable U = f (M S1 ; M S2 ; :::; MSk ) : 1.5. CONFIDENCE INTERVALS FOR GAGE R&R STUDIES 7 Propagation of error arguments produce the approximation à !2 à !2 ¯ ¯ k k X X 2(EM Si )2 @f ¯¯ @f ¯¯ Var U ¼ Var M S = ; i @xi ¯EMS1 ;EMS2 ;:::;EMSk @xi ¯EMS1 ;EMS2 ;:::;EMSk ºi i=1 i=1 and upon substituting mean squares for their expected values, one has a standard error for U , namely v !2 u k à p u X @f ¯¯ (MSi )2 t ¯ d : (1.3) Var U = 2 ¯ @xi MS1 ;MS2 ;:::;MSk ºi i=1 In the special case where the function of the mean squares of interest is linear in them, say k X U= ci M Si ; i=1 the standard error specializes to v u k p u X c2 (M Si )2 i d Var U = t2 ; ºi i=1 2 which provides at least a crude method of producing standard errors for ¾ breproducibility 2 and ¾ boverall . Such standard errors are useful in giving some indication of the precision with which the quantities of interest in a gage R&R study have been estimated. 1.5 Con…dence Intervals for Gage R&R Studies The parametric functions of interest in gage R&R studies (indeed in all random e¤ects analyses) are functions of variance components, or equivalently, functions of expected mean squares. It is thus possible to apply theory for estimating such quantities to the problem of assessing precision of estimation in a gage study. As a …rst (and very crude) example of this, note that taking the point of view of §1.4 above, where U = f (MS1 ; M S2 ; : : : ; MSk ) is a sensible p point estimator of d U is the standard an interesting function of the variance components and Var error (1.3), simple approximate two-sided 95% con…dence limits can be made as p dU : U § 1:96 Var These limits have the virtue of being amenable to “hand” calculation from the ANOVA sums of squares, but they are not likely to be reliable (in terms of holding their nominal/asymptotic coverage probability) for I,J or m small. Linear models experts have done substantial research aimed at …nding reliable con…dence interval formulas for important functions of expected mean 8 CHAPTER 1. MEASUREMENT AND STATISTICS squares. For example, the book Con…dence Intervals on Variance Components by Burdick and Graybill gives results (on the so-called “modi…ed large sample method”) that can be used to make con…dence intervals on various important functions of variance components. The following is some material taken from Sections 3.2 and 3.3 of the Burdick and Graybill book. Suppose that M S1 ; M S2 ; : : : ; M Sk are k independent mean squares. (The MSi are of the form SSi =ºi , where SSi =EMSi = ºi M Si =EM Si has a Â2ºi distribution.) For 1 · p < k and positive constants c1 ; c2 ; : : : ; ck suppose that the quantity µ = c1 EM S1 + ¢ ¢ ¢ + cp EM Sp ¡ cp+1 EMSp+1 ¡ ¢ ¢ ¢ ¡ ck EMSk (1.4) is of interest. Let µb = c1 MS1 + ¢ ¢ ¢ + cp M Sp ¡ cp+1 M Sp+1 ¡ ¢ ¢ ¢ ¡ ck MSk : Approximate con…dence limits on µ in display (1.4) are of the form q q L = µb ¡ VL and/or U = µb + VU ; for VL and VU de…ned below. Let F®:df1 ;df2 be the upper ® point of the F distribution with df1 and df2 degrees of freedom. (It is then the case that F®:df1 ;df2 = (F1¡®:df2 ;df1 )¡1 .) Also, let Â2®:df be the upper ® point of the Â2df distribution. With this notation VL = p X k X c2i M Si2 G2i + i=1 i=p+1 p p¡1 X p k X X X c2i M Si2 Hi2 + ci cj M Si M Sj Gij + ci cj M Si M Sj G¤ij ; i=1 j=p+1 for Gi = 1 ¡ Hi = Gij = i=1 j>i ºi ; Â2®:ºi ºi ¡1 ; Â21¡®:ºi 2 ¡ Hj2 (F®:ºi ;ºj ¡ 1)2 ¡ G2i F®:º i ;ºj ; F®:ºi ;ºj and G¤ij 8 > < 0 1 = > : p¡1 õ ! if p = 1 ¶2 ºi + ºj (ºi + ºj )2 G2i ºi G2j ºj 1¡ ¡ ¡ otherwise : ®:ºi +ºj ºi ºj ºj ºi On the other hand, VU = p X i=1 c2i M Si2 Hi2 + k X i=p+1 p k k¡1 k X X X X ¤ ; c2i M Si2 G2i + ci cj MSi MSj Hij + ci cj MSi M Sj Hij i=1 j=p+1 i=p+1 j>i 1.5. CONFIDENCE INTERVALS FOR GAGE R&R STUDIES 9 for Gi and Hi as de…ned above, and Hij = 2 (1 ¡ F1¡®:ºi ;ºj )2 ¡ Hi2 F1¡®:º ¡ G2j i ;ºj F1¡®:ºi ;ºj ; and 8 0 > > < 0à 1 if k = p + 1 !2 2 ¤ (ºi + ºj )2 G2i ºi Gj ºj A 1 Hij = @ 1 ¡ ºi + ºj otherwise : ¡ ¡ > 2 > : k¡p¡1 ®:ºi +ºj ºi ºj ºj ºi One uses (L; 1) or (¡1; U) for con…dence level (1 ¡ ®) and the interval (L; U ) for con…dence level (1 ¡ 2®). (Using these formulas for “hand” calculation is (obviously) no picnic. The C program written by Brandon Paris (available o¤ the Stat 531 Web page) makes these calculations painless.) A problem similar to the estimation of quantity (1.4) is that of estimating µ = c1 EM S1 + ¢ ¢ ¢ + cp EM Sp (1.5) for p ¸ 1 and positive constants c1 ; c2 ; : : : ; cp . In this case let µb = c1 MS1 + ¢ ¢ ¢ + cp M Sp ; and continue the Gi and Hi notation from above. Then approximate con…dence limits on µ given in display (1.5) are of the form v v u p u p uX uX 2 2 2 t b b L=µ¡ ci M Si Gi and/or U = µ + t c2i M Si2 Hi2 : i=1 i=1 One uses (L; 1) or (¡1; U) for con…dence level (1 ¡ ®) and the interval (L; U ) for con…dence level (1 ¡ 2®). The Fortran program written by Andy Chiang (available o¤ the Stat 531 Web page) applies Burdick and Graybill-like material and the standard errors (1.3) to the estimation of many parametric functions of relevance in gage R&R studies. Chiang’s 2000 Ph.D. dissertation work (to appear in Technometrics in August 2001) has provided an entirely di¤erent method of interval estimation of functions of variance components that is a uniform improvement over the “modi…ed large sample” methods presented by Burdick and Graybill. His approach is related to “improper Bayes” methods with so called “Je¤reys priors.” Andy has provided software for implementing his methods that, as time permits, will be posted on the Stat 531 Web page. He can be contacted (for preprints of his work) at stackl@nus.edu.sg at the National University of Singapore. 10 1.6 CHAPTER 1. MEASUREMENT AND STATISTICS Calibration and Regression Analysis The estimation of standard deviations and variance components is a contribution of the subject of statistics to the quanti…cation of measurement system precision. The subject also has contributions to make in the matter of improving measurement accuracy. Calibration is the business of bringing a local measurement system in line with a standard measurement system. One takes measurements y with a gage or system of interest on test items with “known” values x (available because they were previously measured using a “gold standard” measurement device). The data collected are then used to create a conversion scheme for translating local measurements to approximate gold standard measurements, thereby hopefully improving local accuracy. In this short section we note that usual regression methodology has implications in this kind of enterprise. The usual polynomial regression model says that n observed random values yi are related to …xed values xi via yi = ¯0 + ¯1 xi + ¯2 x2i + ¢ ¢ ¢ + ¯k xki + "i (1.6) for iid Normal (0; ¾2 ) random variables "i . The parameters ¯ and ¾ are the usual objects of inference in this model. In the calibration context with x a gold standard value, ¾ quanti…es precision for the local measurement system. Often (at least over a limited range of x) 1) a low order polynomial does a good job of describing the observed x-y relationship between local and gold standard measurements and 2) the usual (least squares) …tted relationship y^ = g(x) = b0 + bx + b2 x2 + ¢ ¢ ¢ + bk xk has an inverse g ¡1 (y). When such is the case, given a measurement yn+1 from the local measurement system, it is plausible to estimate that a corresponding measurement from the gold standard system would be x ^n+1 = g¡1 (yn+1 ). A reasonable question is then “How good is this estimate?”. That is, the matter of con…dence interval estimation of xn+1 is important. One general method for producing such con…dence sets for xn+1 is based on the usual “prediction interval” methodology associated with the model (1.6). That is, for a given x, it is standard (see, e.g. §9-2 of V or §9.2.4 of V&J#2) to produce a prediction interval of the form q y))2 y^ § t s2 + (std error(^ for an additional corresponding y. And those intervals have the property that for all choices of x; ¾; ¯0 ; ¯1 ; ¯2 ; :::; ¯k Px;¾;¯0 ;¯1 ;¯2 ;:::;¯k [y is in the prediction interval at x] = desired con…dence level = 1 ¡ P [a tn¡k¡1 random variable exceeds jtj] . 1.7. CRUDE GAGING AND STATISTICS 11 But rewording only slightly, the event “y is in the prediction interval at x” is the same as the event “x produces a prediction interval including y.” So a con…dence set for xn+1 based on the observed value yn+1 is fxj the prediction interval corresponding to x includes yn+1 g . (1.7) Conceptually, one simply makes prediction limits around the …tted relationship y^ = g(x) = b0 + bx + b2 x2 + ¢ ¢ ¢ + bk xk and then upon observing a new y sees what x’s are consistent with that observation. This produces a con…dence set with the desired con…dence level. The only real di¢culties with the above general prescription are 1) the lack of simple p explicit formulas and 2) the fact that when ¾ is large (so that the regression MSE tends to be large) or the …tted relationship is very nonlinear, the method can produce (completely rational but) unpleasant-looking con…dence sets. The …rst “problem” is really of limited consequence in a time when standard statistical software will automatically produce plots of prediction limits associated with low order regressions. And the second matter is really inherent in the problem. For the (simplest) linear version of this “inverse prediction” problem, there is an approximate con…dence method in common use that doesn’t have the de…ciencies of the method (1.7). It is derived from a Taylor series argument and has its own problems, but is nevertheless worth recording here for completeness sake. That is, under the k = 1 version of the model (1.6), commonly used approximate con…dence limits for xn+1 are (for x ^n+1 = (yn+1 ¡ b0 )=b1 and x ¹ the sample mean of the gold standard measurements from the calibration experiment) s p M SE 1 (^ xn+1 ¡ x ¹)2 x ^n+1 § t 1 + + Pn . jb1 j n ¹)2 i=1 (xi ¡ x 1.7 Crude Gaging and Statistics All real-world measurement is “to the nearest something.” Often one may ignore this fact, treat measured values as if they were “exact” and experience no real di¢culty when using standard statistical methods (that are really based on an assumption that data are exact). However, sometimes in industrial applications gaging is “crude” enough that standard (e.g. “normal theory”) formulas give nonsensical results. This section brie‡y considers what can be done to appropriately model and draw inferences from crudely gaged data. The assumption throughout is that what are available are integer data, obtained by coding raw observations via raw observation ¡ some reference value integer observation = smallest unit of measurement 12 CHAPTER 1. MEASUREMENT AND STATISTICS (the “smallest unit of measurement” is “the nearest something” above). 1.7.1 Distributions of Sample Means and Ranges from Integer Observations To begin with something simple, note …rst that in situations where only a few di¤erent coded values are ever observed, rather than trying to model observations with some continuous distribution (like a normal one) it may well make sense to simply employ a discrete pmf, say f, to describe any single measurement. In fact, suppose that a single (crudely gaged) observation Y has a pmf f(y) such that f (y) = 0 unless y = 1; 2; :::; M : Then if Y1 ; Y2 ; : : : ; Yn are iid with this marginal discrete distribution, one can easily approximate the distribution of a function of these variables via simulation (using common statistical packages). And for two of the most common statistics used in QC settings (the sample mean and range) one can even work out exact probability distributions using computationally feasible and very elementary methods. To …nd the probability distribution of Y¹ in this context, one can build up the probability distributions of sums of iid Yi ’s recursively by “adding probabilities on diagonals in two-way joint probability tables.” For example the n = 2 distribution of Y¹ can be obtained by making out a two-way table of joint probabilities for Y1 and Y2 and adding on diagonals to get probabilities for Y1 + Y2 . Then making a two-way table of joint probabilities for (Y1 + Y2 ) and Y3 one can add on diagonals and …nd a joint distribution for Y1 + Y2 + Y3 . Or noting that the distribution of Y3 + Y4 is the same as that for Y1 + Y2 , it is possible to make a two-way table of joint probabilities for (Y1 + Y2 ) and (Y3 + Y4 ), add on diagonals and …nd the distribution of Y1 + Y2 + Y3 + Y4 . And so on. (Clearly, after …nding the distribution for a sum, one simply divides possible values by n to get the corresponding distribution of Y¹ .) To …nd the probability distribution of R = max Yi ¡min Yi (for Yi ’s as above) a feasible computational scheme is as follows. Let Skj = ½ Pj 0 x=k f(y) = P [k · Y · j] if k · j otherwise and compute and store these for 1 · k; j · M . Then de…ne Mkj = P [min Yi = k and max Yi = j] : Now the event fmin Yi = k and max Yi = jg is the event fall observations are between k and j inclusiveg less the event fthe minimum is greater than k or the maximum is less than jg. Thus, it is straightforward to see that Mkj = (Skj )n ¡ (Sk+1;j )n ¡ (Sk;j¡1 )n + (Sk+1;j¡1 )n 1.7. CRUDE GAGING AND STATISTICS 13 and one may compute and store these values. Finally, note that P [R = r] = M¡r X Mk;k+r : k=1 These “algorithms” are good for any distribution f on the integers 1; 2; : : : ; M . Karen (Jensen) Hulting’s “DIST” program (available o¤ the Stat 531 Web page) automates the calculations of the distributions of Y¹ and R for certain f ’s related to “integer rounding of normal observations.” (More on this rounding idea directly.) 1.7.2 Estimation Based on Integer-Rounded Normal Data The problem of drawing inferences from crudely gaged data is one that has a history of at least 100 years (if one takes a view that crude gaging essentially “rounds” “exact” values). Sheppard in the late 1800’s noted that if one rounds a continuous variable to integers, the variability in the distribution is typically increased. He thus suggested not using the sample standard deviation (s) of rounded values but instead employing what is known as Sheppard’s correction to arrive at r 1 (n ¡ 1)s2 ¡ (1.8) n 12 as a suitable estimate of “standard deviation” for integer-rounded data. The notion of “interval-censoring” of fundamentally continuous observations provides a natural framework for the application of modern statistical theory to the analysis of crudely gaged data. For univariate X with continuous cdf F (xjµ) depending upon some (possibly vector) parameter µ, consider X ¤ derived from X by rounding to the nearest integer. Then the pmf of X ¤ is, say, ½ F (x¤ + :5jµ) ¡ F (x¤ ¡ :5jµ) for x¤ an integer : ¤ g(x jµ) = 0 otherwise : Rather than doing inference based on the unobservable variables X1 ; X2 ; : : : ; Xn that are iid F (xjµ), one might consider inference based on X1¤ ; X2¤ ; : : : ; Xn¤ that are iid with pmf g(x¤ jµ). The normal version of this scenario (the integer-rounded normal data model) makes use of 8 µ ¤ ¶ µ ¤ ¶ x + :5 ¡ ¹ x ¡ :5 ¡ ¹ < © ¡© for x¤ an integer : g(x¤ j¹; ¾) = ¾ ¾ : 0 otherwise ; and the balance of this section will consider the use of this speci…c important model. So suppose that X1¤ ; X2¤ ; : : : ; Xn¤ are iid integer-valued random observations (generated from underlying normal observations by rounding). For an observed vector of integers (x¤1 ; x¤2 ; : : : ; x¤n ) it is useful to consider the so-called 14 CHAPTER 1. MEASUREMENT AND STATISTICS “likelihood function” that treats the (joint) probability assigned to the vector (x¤1 ; x¤2 ; : : : ; x¤n ) as a function of the parameters, µ ¤ ¶¶ Y µ µ x¤ + :5 ¡ ¹ ¶ xi ¡ :5 ¡ ¹ : Y i L(¹; ¾) = g(x¤i j¹; ¾) = © ¡© : ¾ ¾ i i The log of this function of ¹ and ¾ is (naturally enough) called the loglikelihood and will be denoted as : L(¹; ¾) = ln L(¹; ¾) : A sensible estimator of the parameter vector (¹; ¾) is “the point (b ¹; ¾ b) maximizing the loglikelihood.” This prescription for estimation is only partially complete, depending upon the nature of the sample x¤1 ; x¤2 ; : : : ; x¤n . There are three cases to consider, namely: 1. When the sample range of x¤1 ; x¤2 ; : : : ; xn is at least 2, L(¹; ¾) is wellbehaved (nice and “mound-shaped”) and numerical maximization or just looking at contour plots will quickly allow one to maximize the loglikelihood. (It is worth noting that in this circumstance, usually ¾ b is close to the “Sheppard corrected” value in display (1.8).) 2. When the sample range of x¤1 ; x¤2 ; : : : ; xn is 1, strictly speaking L(¹; ¾) fails to achieve a maximum. However, with : m = #[x¤i = min x¤i ] ; (¹; ¾) pairs with ¾ small and ¹ ¼ min x¤i + :5 ¡ ¾©¡1 will have ³m´ n L(¹; ¾) ¼ sup L(¹; ¾) = m ln m + (n ¡ m) ln(n ¡ m) ¡ n ln n : ¹;¾ That is, in this case one ought to “estimate” that ¾ is small and the relationship between ¹ and ¾ is such that a fraction m=n of the underlying normal distribution is to the left of min x¤i + :5, while a fraction 1 ¡ m=n is to the right. 3. When the sample range of x¤1 ; x¤2 ; : : : ; xn is 0, strictly speaking L(¹; ¾) fails to achieve a maximum. However, sup L(¹; ¾) = 0 ¹;¾ and for any ¹ 2 (x¤1 ¡ :5; x¤1 + :5), L(¹; ¾) ! 0 as ¾ ! 0. That is, in this case one ought to “estimate” that ¾ is small and ¹ 2 (x¤1 ¡ :5; x¤1 + :5). 1.7. CRUDE GAGING AND STATISTICS 15 Beyond the making of point estimates, the loglikelihood function can provide approximate con…dence sets for the parameters ¹ and/or ¾. Standard “large sample” statistical theory says that (for large n and Â2®:º the upper ® point of the Â2º distribution): 1. An approximate (1¡®) level con…dence set for the parameter vector (¹; ¾) is 1 (1.9) f(¹; ¾)jL(¹; ¾) > sup L(¹; ¾) ¡ Â2®:2 g : 2 ¹;¾ 2. An approximate (1 ¡ ®) level con…dence set for the parameter ¹ is 1 f¹j sup L(¹; ¾) > sup L(¹; ¾) ¡ Â2®:1 g : 2 ¾ ¹;¾ (1.10) 3. An approximate (1 ¡ ®) level con…dence set for the parameter ¾ is 1 f¾j sup L(¹; ¾) > sup L(¹; ¾) ¡ Â2®:1 g : 2 ¹ ¹;¾ (1.11) Several comments and a fuller discussion are in order regarding these con…dence sets. In the …rst place, Karen (Jensen) Hulting’s CONEST program (available o¤ the Stat 531 Web page) is useful in …nding sup L(¹; ¾) and pro¹;¾ ducing rough contour plots of the (joint) sets for (¹; ¾) in display (1.9). Second, it is common to call the function of ¹ de…ned by L¤ (¹) = sup L(¹; ¾) ¾ the “pro…le loglikelihood” function for ¹ and the function of ¾ L¤¤ (¾) = sup L(¹; ¾) ¹ the “pro…le loglikelihood” function for ¾. Note that display (1.10) then says that the con…dence set should consist of those ¹’s for which the pro…le loglikelihood is not too much smaller than the maximum achievable. And something entirely analogous holds for the sets in (1.11). Johnson Lee (in 2001 Ph.D. dissertation work) has carefully studied these con…dence interval estimation problems and determined that some modi…cation of methods (1.10) and (1.11) is necessary in order to provide guaranteed coverage probabilities for small sample sizes. (It is also very important to realize that contrary to naive expectations, not even a large sample size will make the usual t-intervals for ¹ and Â2 -intervals for ¾ hold their nominal con…dence levels in the event that ¾ is small, i.e. that the rounding or crudeness of the gaging is important. Ignoring the rounding when it is important can produce actual con…dence levels near 0 for methods with large nominal con…dence levels.) 16 CHAPTER 1. MEASUREMENT AND STATISTICS Table 1.2: ¢ for 0-Range Samples Based on Very Small n ® n :05 :10 :20 2 3:084 1:547 :785 3 :776 :562 4 :517 Intervals for a Normal Mean Based on Integer-Rounded Data Speci…cally regarding the sets for ¹ in display (1.10), Lee (in work to appear in the Journal of Quality Technology) has shown that one must replace the value Â2®:1 with something larger in order to get small n actual con…dence levels not too far from nominal for “most” (¹; ¾). In fact, the choice à 2 ! t ® :(n¡1) 2 c(n; ®) = n ln +1 n¡1 (for t ®2 :(n¡1) the upper ®2 point of the t distribution with º = n ¡ 1 degrees of freedom) is appropriate. After replacing Â2®:1 with c(n; ®) in display (1.10) there remains the numerical analysis problem of actually …nding the interval prescribed by the display. The nature of the numerical analysis required depends upon the sample range encountered in the crudely gaged data. Provided the range is at least 2, L¤ (¹) is well-behaved (continuous and “mound-shaped”) and even simple trial and error with Karen (Jensen) Hulting’s CONEST program will quickly produce the necessary interval. When the range is 0 or 1, L¤ (¹) has respectively 2 or 1 discontinuities and the numerical analysis is a bit trickier. Lee has recorded the results of the numerical analysis for small sample sizes and ® = :05; :10 and :20 (con…dence levels respectively 95%; 90% and 80%). When a sample of size n produces range 0 with, say, all observations equal to x¤ , the intuition that one ought to estimate ¹ 2 (x¤ ¡ :5; x¤ + :5) is sound unless n is very small. If n and ® are as recorded in Table 1.2 then display (1.10) (modi…ed by the use of c(n; ®) in place of Â2®:1 ) leads to the interval (x¤ ¡ ¢; x¤ + ¢). (Otherwise it leads to (x¤ ¡ :5; x¤ + :5) for these ®.) In the case that a sample of size n produces range 1 with, say, all observations x¤ or x¤ + 1, the interval prescribed by display (1.10) (with c(n; ®) used in place of Â2®:1 ) can be thought of as having the form (x¤ + :5 ¡ ¢L ; x¤ + :5 + ¢U ) where ¢L and ¢U depend upon nx¤ = #[observations x¤ ] and nx¤ +1 = #[observations x¤ + 1] . (1.12) When nx¤ ¸ nx¤ +1 , it is the case that ¢L ¸ ¢U . And when nx¤ · nx¤ +1 , correspondingly ¢L · ¢U . Let m = maxfnx¤ ; nx¤ +1 g (1.13) 1.7. CRUDE GAGING AND STATISTICS 17 Table 1.3: (¢1 ;¢2 ) for Range 1 Samples Based on Small n ® n m :05 :10 :20 2 1 (6:147; 6:147) (3:053; 3:053) (1:485; 1:485) 3 2 (1:552; 1:219) (1:104; 0:771) (0:765; 0:433) 4 3 (1:025; 0:526) (0:082; 0:323) (0:639; 0:149) 2 (0:880; 0:880) (0:646; 0:646) (0:441; 0:441) 5 4 (0:853; 0:257) (0:721; 0:132) (0:592; 0:024) 3 (0:748; 0:548) (0:592; 0:339) (0:443; 0:248) 6 5 (0:772; 0:116) (0:673; 0:032) (0:569; 0:000) 4 (0:680; 0:349) (0:562; 0:235) (0:444; 0:126) 3 (0:543; 0:543) (0:420; 0:420) (0:299; 0:299) 7 6 (0:726; 0:035) (0:645; 0:000) (0:556; 0:000) 5 (0:640; 0:218) (0:545; 0:130) (0:446; 0:046) 4 (0:534; 0:393) (0:432; 0:293) (0:329; 0:193) 8 7 (0:698; 0:000) (0:626; 0:000) (0:547; 0:000) 6 (0:616; 0:129) (0:534; 0:058) (0:446; 0:000) 5 (0:527; 0:281) (0:439; 0:197) (0:347; 0:113) 4 (0:416; 0:416) (0:327; 0:327) (0:236; 0:236) 9 8 (0:677; 0:000) (0:613; 0:000) (0:541; 0:000) 7 (0:599; 0:065) (0:526; 0:010) (0:448; 0:000) 6 (0:521; 0:196) (0:443; 0:124) (0:361; 0:054) 5 (0:429; 0:321) (0:350; 0:242) (0:267; 0:163) 10 9 (0:662; 0:000) (0:604; 0:000) (0:537; 0:000) 8 (0:587; 0:020) (0:521; 0:000) (0:450; 0:000) 7 (0:515; 0:129) (0:446; 0:069) (0:371; 0:012) 6 (0:437; 0:242) (0:365; 0:174) (0:289; 0:105) 5 (0:346; 0:346) (0:275; 0:275) (0:200; 0:200) and correspondingly take ¢1 = maxf¢L ; ¢U g and ¢2 = minf¢L ; ¢U g . Table 1.3 then gives values for ¢1 and ¢2 for n · 10 and ® = :05; :10 and :2. Intervals for a Normal Standard Deviation Based on Integer-Rounded Data Speci…cally regarding the sets for ¾ in display (1.11), Lee found that in order to get small n actual con…dence levels not too far from nominal, one must not only replace the value Â2®:1 with something larger, but must make an additional adjustment for samples with ranges 0 and 1. Consider …rst replacing Â2®:1 in display (1.11) with a (larger) value d(n; ®) given in Table 1.4. Lee found that for those (¹; ¾) with moderate to large ¾, 18 CHAPTER 1. MEASUREMENT AND STATISTICS Table 1.4: d(n; ®) for Use ® n :05 2 10:47 3 7:26 4 6:15 5 5:58 6 5:24 7 5:01 8 4:84 9 4:72 10 4:62 15 4:34 20 4:21 30 4:08 1 3:84 in Estimating ¾ :10 7:71 5:23 4:39 3:97 3:71 3:54 3:42 3:33 3:26 3:06 2:97 2:88 2:71 making this d(n; ®) for Â2®:1 substitution is enough to produce an actual con…dence level approximating the nominal one. However, even this modi…cation is not adequate to produce an acceptable coverage probability for (¹; ¾) with small ¾. For samples with range 0 or 1, formula (1.11) prescribes intervals of the form (0; U). And reasoning that when ¾ is small, samples will typically have range 0 or 1, Lee was able to …nd (larger) replacements for the limit U prescribed by (1.11) so that the resulting estimation method has actual con…dence level not much below the nominal level for any (¹; ¾) (with ¾ large or small). That is if a 0-range sample is observed, estimate ¾ by (0; ¤0 ) where ¤0 is taken from Table 1.5. If a range 1 sample is observed consisting, say, of values x¤ and x¤ + 1, and nx¤ ; nx¤ +1 and m are as in displays (1.12) and (1.13), estimate ¾ using (0; ¤1;m ) where ¤1;m is taken from Table 1.6. The use of these values ¤0 for range 0 samples, and ¤1;m for range 1 samples, and the values d(n; ®) in place of Â2®:1 in display (1.11) …nally produces a reliable method of con…dence interval estimation for ¾ when normal data are integerrounded. 1.7. CRUDE GAGING AND STATISTICS Table 1.5: ¤0 for Use in Estimating ¾ ® n :05 :10 2 5:635 2:807 3 1:325 0:916 4 0:822 0:653 5 0:666 0:558 6 0:586 0:502 7 0:533 0:464 8 0:495 0:435 9 0:466 0:413 10 0:443 0:396 11 0:425 0:381 12 0:409 0:369 13 0:396 0:358 14 0:384 0:349 15 0:374 0:341 19 20 CHAPTER 1. MEASUREMENT AND STATISTICS n 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Table 1.6: ¤1;m for Use in Estimating ¾ (m in Parentheses) ® :05 :10 16:914(1) 8:439(1) 3:535(2) 2:462(2) 1:699(3) 2:034(2) 1:303(3) 1:571(2) 1:143(4) 1:516(3) 0:921(4) 1:231(3) 0:897(5) 1:153(4) 1:285(3) 0:752(5) 0:960(4) 1:054(3) 0:768(6) 0:944(5) 1:106(4) 0:660(6) 0:800(5) 0:949(4) 0:687(7) 0:819(6) 0:952(5) 0:599(7) 0:707(6) 0:825(5) 1:009(4) 0:880(4) 0:629(8) 0:736(7) 0:837(6) 0:555(8) 0:644(7) 0:726(6) 0:941(5) 0:831(5) 0:585(9) 0:677(8) 0:747(7) 0:520(9) 0:597(8) 0:654(7) 0:851(6) 0:890(5) 0:753(6) 0:793(5) 0:550(10) 0:630(9) 0:690(8) 0:493(10) 0:560(9) 0:609(8) 0:775(7) 0:851(6) 0:685(7) 0:763(6) 0:522(11) 0:593(10) 0:646(9) 0:470(11) 0:531(10) 0:573(9) 0:708(8) 0:789(7) 0:818(6) 0:626(8) 0:707(7) 0:738(6) 0:499(12) 0:563(11) 0:610(10) 0:452(12) 0:506(11) 0:544(10) 0:658(9) 0:733(8) 0:791(7) 0:587(9) 0:655(8) 0:716(7) 0:479(13) 0:537(12) 0:580(11) 0:436(13) 0:485(12) 0:520(11) 0:622(10) 0:681(9) 0:745(8) 0:558(10) 0:607(9) 0:674(8) 0:768(7) 0:698(7) 0:463(14) 0:515(13) 0:555(12) 0:422(14) 0:468(13) 0:499(12) 0:593(11) 0:639(10) 0:701(9) 0:534(11) 0:574(10) 0:632(9) 0:748(8) 0:682(8) Chapter 2 Process Monitoring Chapters 3 and 4 of V&J discuss methods for process monitoring. The key concept there regarding the probabilistic description of monitoring schemes is the run length idea introduced on page 91 and speci…cally in display (3.44). Theory for describing run lengths is given in V&J only for the very simplest case of geometrically distributed T . This chapter presents some more general tools for the analysis/comparison of run length distributions of monitoring schemes, namely discrete time …nite state Markov chains and recursions expressed in terms of integral (and di¤erence) equations. 2.1 Some Theory for Stationary Discrete Time Finite State Markov Chains With a Single Absorbing State These are probability models for random systems that at times t = 1; 2; 3 : : : can be in one of a …nite number of states S1 ; S2 ; : : : ; Sm ; Sm+1 : The “Markov” assumption is that the conditional distribution of where the system is at time t + 1 given the entire history of where it has been up through time t only depends upon where it is at time t. (In colloquial terms: The conditional distribution of where I’ll be tomorrow given where I am and how I got here depends only on where I am, not on how I got here.) So called “stationary” Markov Chain (MC) models employ the assumption that movement between states from any time t to time t + 1 is governed by a (single) matrix of (onestep) “transition probabilities” (that is independent of t) P (m+1)£(m+1) = (pij ) where pij = P [system is in Sj at time t + 1 j system is in Si at time t] : 21 22 CHAPTER 2. PROCESS MONITORING .8 .05 .1 S1 S2 .9 .1 .05 S3 1.0 Figure 2.1: Schematic for a MC with Transition Matrix (2.1) As a simple example of this, consider the transition matrix 0 1 :8 :1 :1 : P = @ :9 :05 :05 A : 3£3 0 0 1 (2.1) Figure 2.1 is a useful schematic representation of this model. The Markov Chain represented by Figure 2.1 has an interesting property. That is, while it is possible to move back and forth between states 1 and 2, once the system enters state 3, it is “stuck” there. The standard jargon for this property is to say that S3 is an absorbing state. (In general, if pii = 1, Si is called an absorbing state.) Of particular interest in applications of MCs to the description of process monitoring schemes are chains with a single absorbing state, say Sm+1 , where it is possible to move (at least eventually) from any other state to the absorbing state. One thing that makes these chains so useful is that it is very easy to write down a matrix formula for a vector giving the mean number of transitions required to reach Sm+1 from any of the other states. That is, with Li = the mean number of transitions required to move from Si to Sm+1 ; 0 1 0 L1 1 0 1 B L2 C B 1 R r B C B L =B . C ; P = @ m£m m£1 A ; and 1 =B . 0 m£1 m£1 1 @ .. A (m+1)£(m+1) @ .. 1£m Lm it is the case that 1£1 L = (I ¡ R)¡1 1 : 1 1 C C C A (2.2) 2.1. SOME THEORY FOR STATIONARY DISCRETE TIME FINITE STATE MARKOV CHAINS WITH A SIN To argue that display (2.2) is correct, note that the following system of m equations “clearly” holds: L1 = (1 + L1 )p11 + (1 + L2 )p12 + ¢ ¢ ¢ + (1 + Lm )p1m + 1 ¢ p1;m+1 L2 = (1 + L1 )p21 + (1 + L2 )p22 + ¢ ¢ ¢ + (1 + Lm )p2m + 1 ¢ p2;m+1 .. . Lm = (1 + L1 )pm1 + (1 + L2 )pm2 + ¢ ¢ ¢ + (1 + Lm )pmm + 1 ¢ pm;m+1 : But this set is equivalent to the set L1 = 1 + p11 L1 + p12 L2 + ¢ ¢ ¢ + p1m Lm L2 = 1 + p21 L1 + p22 L2 + ¢ ¢ ¢ + p2m Lm .. . Lm = 1 + pm1 L1 + pm2 L2 + ¢ ¢ ¢ + pmm Lm and in matrix notation, this second set of equations is L = 1 + RL : (2.3) So i.e. L ¡ RL = 1 ; (I ¡ R)L = 1 : Under the conditions of the present discussion it is the case that (I ¡ R) is guaranteed to be nonsingular, so that multiplying both sides of this matrix equation by the inverse of (I ¡ R) one …nally has equation (2.2). For the simple 3-state example with transition matrix (2.1) it is easy enough to verify that with µ ¶ :8 :1 R= :9 :05 one has µ ¶ 10:5 (I ¡ R)¡1 1 = : 11 That is, the mean number of transitions required for absorption (into S3 ) from S1 is 10:5 while the mean number required from S2 is 11:0. When one is working with numerical values in P and thus wants numerical values in L, the matrix formula (2.2) is most convenient for use with numerical analysis software. When, on the other hand, one has some algebraic expressions for the pij and wants algebraic expressions for the Li , it is usually most e¤ective to write out the system of equations represented by display (2.3) and to try and see some slick way of solving for an Li of interest. It is also worth noting that while the discussion in this section has centered on the computation of mean times to absorption, other properties of “time to absorption” variables can be derived and expressed in matrix notation. For example, Problem 2.22 shows that it is fairly easy to …nd the variance (or standard deviation) of time to absorption variables. 24 CHAPTER 2. PROCESS MONITORING 2.2 Some Applications of Markov Chains to the Analysis of Process Monitoring Schemes When the “current condition” of a process monitoring scheme can be thought of as discrete random variable (with a …nite number of possible values), because 1. the variables Q1 ; Q2 ; ::. fed into it are intrinsically discrete (for example representing counts) and are therefore naturally modeled using a discrete probability distribution (and the calculations prescribed by the scheme produce only a …xed number of possible outcomes), 2. “discretization” of the Q’s has taken place as a part of the development of the monitoring scheme (as, for example, in the “zone test” schemes outlined in Tables 3.5 through 3.7 of V&J), or 3. one approximates continuous distributions for Q’s and/or states of the scheme with a “…nely-discretized” version in order to approximate exact (continuous) run length properties, one can often apply the material of the previous section to the prediction of scheme behavior. (This is possible when the evolution of the monitoring scheme can be thought of in terms of movement between “states” where the conditional distribution of the next “state” depends only on a distribution for the next Q which itself depends only on the current “state” of the scheme.) This section contains four examples of what can be done in this direction. As an initial simple example, consider the simple monitoring scheme (suggested in the book Sampling Inspection and Quality Control by Wetherill) that signals an alarm the …rst time 1. a single point Q plots “outside 3 sigma limits,” or 2. two consecutive Q’s plot “between 2 and 3 sigma limits.” (This is a simple competitor to the sets of alarm rules speci…ed in Tables 3.5 through 3.7 of V&J.) Suppose that one assumes that Q1 ; Q2 ; : : : are iid and q1 = P [Q1 plots outside 3 sigma limits] and q2 = P [Q1 plots between 2 and 3 sigma limits] : Then one might think of describing the evolution of the monitoring scheme with a 3-state MC with states S1 = “all is OK,” S2 = “no alarm yet and the current Q is between 2 and 3 sigma limits,” and S3 = “alarm.” 2.2. SOME APPLICATIONS OF MARKOV CHAINS TO THE ANALYSIS OF PROCESS MONITORING SCH 1- q - q 1 0 2 q 2 S1 S2 1- q - q 1 q 2 q +q 1 1 2 S3 1.0 Figure 2.2: Schematic for a MC with Transition Matrix (2.4) For this representation, an appropriate transition matrix is 0 1 1 ¡ q1 ¡ q2 q2 q1 P = @ 1 ¡ q1 ¡ q2 0 q1 + q2 A 0 0 1 (2.4) and the ARL of the scheme (under the iid model for the Q sequence) is L1 , the mean time to absorption into the alarm state from the “all-OK” state. Figure 2.2 is a schematic representation of this scenario. It is worth noting that a system of equations for L1 and L2 is L1 = 1 ¢ q1 + (1 + L2 )q2 + (1 + L1 )(1 ¡ q1 ¡ q2 ) L2 = 1 ¢ (q1 + q2 ) + (1 + L1 )(1 ¡ q1 ¡ q2 ) ; which is equivalent to L1 = 1 + L1 ¢ (1 ¡ q1 ¡ q2 ) + L2 q2 L2 = 1 + L1 (1 ¡ q1 ¡ q2 ) ; which is the “non-matrix version” of the system (2.3) for this example. It is easy enough to verify that this system of two linear equations in the unknowns L1 and L2 has a (simultaneous) solution with L1 = 1 + q2 : 1 ¡ (1 ¡ q1 ¡ q2 ) ¡ q2 (1 ¡ q1 ¡ q2 ) As a second application of MC technology to the analysis of a process monitoring scheme, we will consider a so-called “Run-Sum” scheme. To de…ne such a 26 CHAPTER 2. PROCESS MONITORING scheme, one begins with “zones” for the variable Q as indicated in Figure 3.9 of V&J. Then “scores” are de…ned for various possible values of Q. For j = 0; 1; 2 a score of +j is assigned to the eventuality that Q is in the “positive j-sigma to (j + 1)-sigma zone,” while a score of ¡j is assigned to the eventuality that Q is in the “negative j-sigma to (j + 1)-sigma zone.” A score of +3 is assigned to any Q above the “upper 3-sigma limit” while a score of ¡3 is assigned to any Q below the “lower 3-sigma limit.” Then, for the variables Q1 ; Q2 ; : : : one de…nes corresponding scores Q¤1 ; Q¤2 ; : : : and “run sums” R1 ; R2 ; : : : where Ri = “the ‘sum’ of scores Q¤ through time i under the provision that a new sum is begun whenever a score is observed with a sign di¤erent from the existing Run-Sum.” (Note, for example, that a new score of Q¤ = +0 will reset a current Run-Sum of R = ¡2 to +0.) The Run-Sum scheme then signals at the …rst i for which jQ¤i j = 3 or jRi j ¸ 4. Then de…ne states for a Run-Sum process monitoring scheme S1 S2 S3 S4 S5 S6 S7 S8 S9 = “no alarm = “no alarm = “no alarm = “no alarm = “no alarm = “no alarm = “no alarm = “no alarm = “alarm.” yet yet yet yet yet yet yet yet and and and and and and and and R = ¡0,” R = ¡1,” R = ¡2,” R = ¡3,” R = +0,” R = +1,” R = +2,” R = +3,” and If one assumes that the observations Q1 ; Q2 ; : : : are iid and for j = ¡3; ¡2; ¡1; ¡0; +0; +1; +2; +3 lets qj = P [Q¤1 = j] ; an appropriate transition matrix for 0 q¡0 q¡1 q¡2 0 q+0 B 0 q¡0 q¡1 q¡2 q+0 B B 0 0 q¡0 q¡1 q+0 B B 0 0 0 q¡0 q+0 B q q q 0 q+0 P =B B ¡0 ¡1 ¡2 B q¡0 q¡1 q¡2 0 0 B B q¡0 q¡1 q¡2 0 0 B @ q¡0 q¡1 q¡2 0 0 0 0 0 0 0 describing the evolution of the scheme is q+1 q+1 q+1 q+1 q+1 q+0 0 0 0 q+2 q+2 q+2 q+2 q+2 q+1 q+0 0 0 0 0 0 0 0 q+2 q+1 q+0 0 q¡3 + q+3 q¡3 + q+3 q¡3 + q¡2 + q+3 q¡3 + q¡2 + q¡1 + q+1 q¡3 + q+3 q¡3 + q+3 q¡3 + q+2 + q+3 q¡3 + q+1 + q+2 + q+3 1 1 C C C C C C C C C C C C A and the ARL for the scheme is L1 = L5 . (The fact that the 1st and 5th rows of P are identical makes it clear that the mean times to absorption from S1 and S5 2.2. SOME APPLICATIONS OF MARKOV CHAINS TO THE ANALYSIS OF PROCESS MONITORING SCH f *(y) ... q -m q q -h/m 0 -1 -h 0 q q 1 h/m ... 2 q m-1 2h/m q m h Figure 2.3: Notational Conventions for Probabilities from Rounding Q ¡ k1 Values must be the same.) It turns out that clever manipulation with the “non-matrix” version of display (2.3) in this example even produces a fairly simple expression for the scheme’s ARL. (See Problem 2.24 and Reynolds (1971 JQT ) and the references therein in this …nal regard.) To turn to a di¤erent type of application of the MC technology, consider the analysis of a high side decision interval CUSUM scheme as described in §4.2 of V&J. Suppose that the variables Q1 ; Q2 ; : : : are iid with a continuous distribution speci…ed by the probability density f (y). Then the variables Q1 ¡ k1 ; Q2 ¡ k1 ; Q3 ¡ k1 ; : : : are iid with probability density f ¤ (y) = f(y + k1 ). For a positive integer m, we will think of replacing the variables Qi ¡ k1 with versions of them rounded to the nearest multiple of h=m before CUSUMing. Then the CUSUM scheme can be thought of in terms of a MC with states µ ¶ h ” Si = “no alarm yet and the current CUSUM is (i ¡ 1) m for i = 1; 2; : : : ; m and Sm+1 = “alarm.” Then let q¡m = Z qm = h ¡h+ 12 ( m ) ¡1 Z 1 h h¡ 12 ( m ) f ¤ (y)dy = P [Q1 ¡ k1 · ¡h + f ¤ (y)dy = P [h ¡ 1 2 µ h m ¶ 1 2 µ ¶ h ]; m < Q1 ¡ k1 ] ; and for ¡m < j < m take qj = h h Z j( m )+ 12 ( m ) h h j( m )¡ 12 ( m ) f ¤ (y)dy : (2.5) These notational conventions for probabilities q¡m ; : : : ; qm are illustrated in Figure 2.3. In this notation, the evolution of the high side decision interval CUSUM scheme can then be described in approximate terms by a MC with transition 28 CHAPTER 2. PROCESS MONITORING matrix 0 0 X qj B B j=¡m B B ¡1 X B B qj B B j=¡m B ¡2 X B P =B qj B (m+1)£(m+1) B j=¡m B B .. B . B B B q B ¡m + q¡m+1 @ 0 q1 q2 ¢¢¢ qm¡1 qm q0 q1 ¢¢¢ qm¡2 qm¡1 + qm q¡1 q0 ¢¢¢ qm¡3 qm¡2 + qm¡1 + qm .. . .. . q¡m+2 q¡m+3 ¢¢¢ q0 0 0 ¢¢¢ 0 .. . .. . m X qj j=1 1 For i = 1; : : : ; m the mean time to absorption from ¡ h ¢state Si (Li ) is approximately . (That is, the entries of the the ARL of the scheme with head start (i ¡ 1) m vector L speci…ed in display (2.2) are approximate ARL values for the CUSUM scheme using various possible head starts.) In practice, in order to …nd ARLs for the original scheme with non-rounded iid observations Q, one would …nd approximate ARL values for an increasing sequence of m’s until those appear to converge for the head start of interest. As a …nal example of the use of MC techniques in the probability modeling of process monitoring scheme behavior, consider discrete approximation of the EWMA schemes of §4.1 of V&J where the variables Q1 ; Q2 ; : : : are again iid with continuous distribution speci…ed by a pdf f (y). In this case, in order to provide a tractable discrete approximation, it will not typically su¢ce to simply discretize the variables Q (as the EWMA calculations will then typically produce a number of possible/exact EWMA values that grows as time goes on). Instead, it is necessary to think directly in terms of rounded/discretized EWMAs. So for an odd positive integer m, let ¢ = (U CLEW MA ¡ LCLEW MA )=m and think of replacing an (exact) EWMA sequence with a rounded EWMA sequence taking on values ai de…ned by ¢ : ai = LCLEW MA + + (i ¡ 1)¢ 2 for i = 1; 2; : : : ; m. For i = 1; 2; :::; m let Si = “no alarm yet and the rounded EWMA is ai ” and Sm+1 = “alarm.” 1 C C C C C C C C C C C: C C C C C C C C C A 2.3. INTEGRAL EQUATIONS AND RUN LENGTH PROPERTIES OF PROCESS MONITORING SCHEMES2 And for 1 · i; j · m, let qij = P [moving from Si to Sj ] ; ¢ ¢ = P [aj ¡ · (1 ¡ ¸)ai + ¸Q · aj + ] ; 2 2 aj ¡ (1 ¡ ¸)ai ¢ aj ¡ (1 ¡ ¸)ai ¢ = P[ ¡ ·Q· + ]; ¸ 2¸ ¸ 2¸ (j ¡ i)¢ ¢ (j ¡ i)¢ ¢ = P [ai + ¡ · Q · ai + + ]; ¸ 2¸ ¸ 2¸ ¢ Z ai + (j¡i)¢ + 2¸ ¸ = f (y)dy : ¢ ai + (j¡i)¢ ¡ 2¸ ¸ Then with 0 B q11 B B B B q B 21 B P =B B .. B . B B B B qm1 @ 0 q12 q22 ¢¢¢ q1m ¢¢¢ q2m .. . .. . qm2 ¢¢¢ qmm 0 ¢¢¢ 0 1¡ m X (2.6) 1 q1j C C C C 1¡ q2j C C C j=1 C C .. C . C m C X C 1¡ qmj C A j=1 j=1 m X 1 the mean time to absorption from the state S(m+1)=2 (the value L(m+1)=2 ) of a MC with this transition matrix is an approximation for the EWMA scheme ARL with EW M A0 = (U CLEW MA + LCLEW MA )=2. In practice, in order to …nd the ARL for the original scheme, one would …nd approximate ARL values for an increasing sequence of m’s until those appear to converge. The four examples in this section have illustrated the use of MC calculations in the second and third of the two circumstances listed at the beginning of this section. The …rst circumstance is conceptually the simplest of the three, and is for example illustrated by Problems 2.25, 2.28 and 2.37. The examples have also all dealt with iid models for the Q1 ; Q2 ; : : : sequence. Problem 2.26 shows that the methodology can also easily accommodate some kinds of dependencies in the Q sequence. (The discrete model in Problem 2.26 is itself perhaps less than completely appealing, but the reader should consider the possibility of discrete approximation of the kind of dependency structure employed in Problem 2.27 before dismissing the basic concept illustrated in Problem 2.26 as useless.) 2.3 Integral Equations and Run Length Properties of Process Monitoring Schemes There is a second (and at …rst appearance quite di¤erent) standard method of approaching the analysis of the run length behavior of some process monitoring 30 CHAPTER 2. PROCESS MONITORING schemes where continuous variables Q are involved. That is through the use of integral equations, and this section introduces the use of these. (As it turns out, by the time one is forced to …nd numerical solutions of the integral equations, there is not a whole lot of di¤erence between the methods of this section and those of the previous one. But it is important to introduce this second point of view and note the correspondence between approaches.) Before going to the details of speci…c schemes and integral equations, a small piece of calculus/numerical analysis needs to be reviewed and notation set for use in these notes. That concerns the approximation of de…nite integrals on the interval [a; a + h]. Speci…cation of a set of points a · a1 · a2 · ¢ ¢ ¢ · am · a + h and weights wi ¸ 0 with so that Z m X wi = h i=1 a+h f (y)dy may be approximated as a m X wi f (ai ) i=1 for “reasonable” functions f (y), is the speci…cation of a so-called “quadrature rule” for approximating integrals on the interval [a; a +h]. The simplest of such rules is probably the choice µ ¶ i ¡ 12 : : h ai = a + h with wi = : (2.7) m m (This choice amounts to approximating an integral of f by a sum of signed areas of rectangles with bases h=m and (signed) heights chosen as the values of f at midpoints of intervals of length h=m beginning at a.) Now consider a high side CUSUM scheme as in §4.2 of V&J, where Q1 ; Q2 ; : : : are iid with continuous marginal distribution speci…ed by the probability density f(y). De…ne the function : L1 (u) = the ARL of the high side CUSUM scheme using a head start of u : If one begins CUSUMing at u, there are three possibilities of where he/she will be after a single observation, Q1 . If Q1 is large (Q1 ¡ k1 ¸ h ¡ u) then there will be an immediate signal and the run length will be 1. If Q1 is small (Q1 ¡ k1 · ¡u) the CUSUM will “zero out,” one observation will have been “spent,” and on average L1 (0) more observations are to be faced in order to produce a signal. Finally, if Q1 is moderate (¡u < Q1 ¡ k1 < h ¡ u) then one observation will have been spent and the CUSUM will continue from u + (Q1 ¡ k1 ), requiring on average an additional L1 (u + (Q1 ¡ k1 )) observations to produce a signal. This reasoning leads to the equation for L1 , L1 (u) = 1 ¢ P [Q1 ¡ k1 ¸ h ¡ u] + (1 + L1 (0))P [Q1 ¡ k1 · ¡u] Z k1 +h¡u + (1 + L1 (u + y ¡ k1 ))f (y)dy : k1 ¡u 2.3. INTEGRAL EQUATIONS AND RUN LENGTH PROPERTIES OF PROCESS MONITORING SCHEMES3 Writing F (y) for the cdf of Q1 and simplifying slightly, this is L1 (u) = 1 + L1 (0)F (k1 ¡ u) + Z h L1 (y)f (y + k1 ¡ u)dy : 0 (2.8) The argument leading to equation (2.8) has a twin that produces an integral equation for : L2 (v) = the ARL of a low side CUSUM scheme using a head start of v : That equation is L2 (v) = 1 + L2 (0) (1 ¡ F (k2 ¡ u)) + Z 0 ¡h L2 (y)f (y + k2 ¡ v)dy : (2.9) And as indicated in display (4.20) of V&J, could one solve equations (2.8) and (2.9) (and thus obtain L1 (0) and L2 (0)) one would have not only separate high and low side CUSUM ARLs, but ARLs for some combined schemes as well. (Actually, more than what is stated in V&J can be proved. Yashchin in a Journal of Applied Probability paper in about 1985 showed that with iid Q’s, high side decision interval h1 and low side decision interval ¡h2 for nonnegative h2 , if k1 ¸ k2 and (k1 ¡ k2 ) ¡ jh1 ¡ h1 j ¸ max (0; u ¡ v ¡ max(h1 ; h2 )) ; for the simultaneous use of high and low side schemes ARLcombined = L1 (0)L2 (v) + L1 (u)L2 (0) ¡ L1 (0)L2 (0) : L1 (0) + L2 (0) It is easily veri…ed that what is stated on page 151 of V&J is a special case of this result.) So in theory, to …nd ARLs for CUSUM schemes one need “only” solve the integral equations (2.8) and (2.9). This is easier said than done. The one case where fairly explicit solutions are known is that where observations are exponentially distributed (see Problem 2.30). In other cases one must resort to numerical solution of the integral equations. So consider the problem of approximate solution of equation (2.8). For a particular quadrature rule for integrals on [0; h], for each ai one has from equation (2.8) the approximation L1 (ai ) ¼ 1 + L1 (a1 )F (k1 ¡ ai ) + m X j=1 wj L1 (aj )f(aj + k1 ¡ ai ) : 32 CHAPTER 2. PROCESS MONITORING That is, at least approximately one has the system of m linear equations L1 (a1 ) = 1 + L1 (a1 )[F (k1 ¡ a1 ) + w1 f (k1 )] + m X j=2 L1 (aj )wj f(aj + k1 ¡ a1 ) ; L1 (a2 ) = 1 + L1 (a1 )[F (k1 ¡ a2 ) + w1 f (a1 + k1 ¡ a2 )] + .. . m X j=2 L1 (am ) = 1 + L1 (a1 )[F (k1 ¡ am ) + w1 f(a1 + k1 ¡ am )] + L1 (aj )wj f(aj + k1 ¡ a2 ) ; m X j=2 L1 (aj )wj f (aj + k1 ¡ am ) in the m unknowns L1 (a1 ); : : : ; L1 (am ). Again in light of equation (2.8) and the notion of numerical approximation of de…nite integrals, upon solving this set of equations (for approximate values of (L1 (a1 ); : : : ; L1 (am )) one may approximate the function L1 (u) as L1 (u) ¼ 1 + L1 (a1 )F (k1 ¡ u) + m X j=1 wj L1 (aj )f (aj + k1 ¡ u) : It is a revealing point that the system of equations above is of the form (2.3) that was so useful in the MC approach to the determination of ARLs. That is, let 1 0 L1 (a1 ) B L1 (a2 ) C C B L=B C .. A @ . L1 (am ) and 0 B B R=B @ F (k1 ¡ a1 ) + w1 f(k1 ) F (k1 ¡ a2 ) + w1 f (a1 + k1 ¡ a2 ) .. . w2 f (a2 + k1 ¡ a1 ) w2 f(k1 ) .. . ¢¢¢ ¢¢¢ F (k1 ¡ am ) + w1 f (a1 + k1 ¡ am ) w2 f (a2 + k1 ¡ am ) ¢ ¢ ¢ wm f (am + k1 ¡ a1 ) wm f (am + k1 ¡ a2 ) .. . wm f(k1 ) and note that the set of equations for the “ai head start approximate ARLs” is exactly of the form (2.3). With the simple quadrature rule in display (2.7) note that a generic entry of R; rij , for j ¸ 2 is µ ¶ µ µ ¶ ¶ h h rij = wj f (aj + k1 ¡ ai ) = f (j ¡ i) + k1 : m m But using again the notation f ¤ (y) = f(y+k1 ) employed in the CUSUM example of §2.2, this means µ ¶ µ µ ¶¶ Z (j¡i)( h )+ 1 ( h ) m 2 m h h ¤ f ¤ (y)dy = qj¡i rij = f (j ¡ i) ¼ h h m m (j¡i)( m )¡ 12 ( m ) 1 C C C A 2.3. INTEGRAL EQUATIONS AND RUN LENGTH PROPERTIES OF PROCESS MONITORING SCHEMES3 (in terms of the notation (2.5) from the CUSUM example). The point is that whether one begins from a “discretize the Q ¡ k1 distribution and employ the MC material” point of view or from a “do numerical solution of an integral equation” point of view is largely immaterial. Very similar large systems of linear equations must be solved in order to …nd approximate ARLs. As a second application of integral equation ideas to the analysis of process monitoring schemes, consider the EWMA schemes of §4.1 of V&J where Q1 ; Q2 ; : : : are iid with a continuous distribution speci…ed by the probability density f (y). Let L(u) = the ARL of a EWMA scheme with EW MA0 = u : When one begins a EWMA sequence at u, there are 2 possibilities of where he/she will be after a single observation, Q1 . If Q1 is extreme (¸Q1 + (1 ¡ ¸)u > UCLEW MA or ¸Q1 + (1 ¡ ¸)u < LCLEW MA ) then there will be an immediate signal and the run length will be 1. If Q1 is moderate (LCLEW MA · ¸Q1 + (1 ¡ ¸)u · UCLEW MA ) one observation will have been “spent” and on average L(¸Q1 +(1¡¸)u) more observations are to be faced in order to produce a signal. Now the event LCLEW MA · ¸Q1 + (1 ¡ ¸)u · U CLEW MA is the event LCLEW MA ¡ (1 ¡ ¸)u U CLEW MA ¡ (1 ¡ ¸)u · Q1 · ; ¸ ¸ so this reasoning produces the equation ¶ µ U CLEW MA ¡ (1 ¡ ¸)u LCLEW MA ¡ (1 ¡ ¸)u · Q1 · ] L(u) = 1 ¢ 1 ¡ P [ ¸ ¸ ¡(1¡¸)u Z U CLEW MA ¸ + LCL (1 + L(¸y + (1 ¡ ¸)u)) f(y)dy ; ¡(1¡¸)u EW MA ¸ or L(u) = 1 + or …nally Z L(u) = 1 + UCLEW MA ¡(1¡¸)u ¸ LCLEW MA ¡(1¡¸)u ¸ 1 ¸ Z L(¸y + (1 ¡ ¸)u)f(y)dy ; U CLEW MA LCLEW MA L(y)f µ ¶ y ¡ (1 ¡ ¸)u dy : ¸ (2.10) As in the previous (CUSUM) case, one must usually resort to numerical methods in order to approximate the solution to equation (2.10). For a particular quadrature rule for integrals on [LCLEW MA ; U CLEW MA ], for each ai one has from equation (2.10) the approximation µ ¶ m 1X aj ¡ (1 ¡ ¸)ai L(ai ) ¼ 1 + wj L(aj )f : (2.11) ¸ j=1 ¸ 34 CHAPTER 2. PROCESS MONITORING Now expression (2.11) is standing for a set of m equations in the m unknowns L(a1 ); : : : ; L(am ) that (as in the CUSUM case) can be thought of in terms of the matrix expression (2.3) if one takes 0 1 ´1 ³ 0 L(a1 ) aj ¡(1¡¸)ai w f j ¸ B C .. A : (2.12) L=@ R =@ A and m£m . ¸ L(am ) Solution of the system represented by equation (2.11) or the matrix expression (2.3) with de…nitions (2.12) produces approximate values for L(a1 ); : : : ; L(am ) and therefore an approximation for the function L(u) as ¶ µ m 1X aj ¡ (1 ¡ ¸)u L(u) ¼ 1 + : wj L(aj )f ¸ j=1 ¸ Again as in the CUSUM case, it is worth noting the similarity between the set of equations used to …nd “MC” ARL approximations and the set of equations used to …nd “integral equation” ARL approximations. With the quadrature rule (2.7) and an odd integer m, using the notation ¢ = (U CLEW MA ¡ LCLEW MA )=m employed in §2.2 in the EWMA example, note that a generic entry of R de…ned in (2.12) is ³ ´ ³ ´ a ¡(1¡¸)ai ¢ Z ai + (j¡i)¢ + 2¸ wj f j ¸ ¢f ai + (j¡i)¢ ¸ ¸ rij = f (y)dy = qij ; = ¼ (j¡i)¢ ¢ ¸ ¸ ai + ¸ ¡ 2¸ (in terms of the notation (2.6) from the EWMA example of §2.2). That is, as in the CUSUM case, the sets of equations used in the “MC” and “integral equation” approximations for the “EWMA0 = ai ARLs” of the scheme are very similar. As a …nal example of the use of integral equations in the analysis of process monitoring schemes, consider the X=M R schemes of §4.4 of V&J. Suppose that observations x1 ; x2 ; : : : are iid with continuous marginal distribution speci…ed by the probability density f (y). De…ne the function L(y) = “the mean number of additional observations to alarm, given that there has been no alarm to date and the current observation is y.” Then note that as one begins X=MR monitoring, there are two possibilities of where he/she will be after observing the …rst individual, x1 . If x1 is extreme (x1 < LCLx or x1 > U CLx ) there will be an immediate signal and the run length will be 1. If x1 is not extreme (LCLx · x1 · U CLx ) one observation will have been spent and on average another L(x1 ) observations will be required in order to produce a signal. So it is reasonable that the ARL for the X=MR scheme is Z UCLx ARL = 1 ¢ (1 ¡ P [LCLx · x1 · UCLx ]) + (1 + L(y))f (y)dy ; LCLx 2.3. INTEGRAL EQUATIONS AND RUN LENGTH PROPERTIES OF PROCESS MONITORING SCHEMES3 that is ARL = 1 + Z U CLx L(y)f(y)dy ; (2.13) LCLx where it remains to …nd a way of computing the function L(y) in order to feed it into expression (2.13). In order to derive an integral equation for L(y) consider the situation if there has been no alarm and the current individual observation is y. There are two possibilities of where one will be after observing one more individual, x. If x is extreme or too far from y (x < LCLx or x > U CLx or jx ¡ yj > UCLR ) only one additional observation is required to produce a signal. On the other hand, if x is not extreme and not too far from y (LCLx · x · U CLx and jx ¡ yj · UCLR ) one more observation will have been spent and on average another L(x) will be required to produce a signal. That is, L(y) = 1 ¢ (P [x < LCLx or x > U CLx or jx ¡ yj > U CLR ]) Z min(U CLx ;y+U CLR ) (1 + L(x))f (x)dx ; + max(LCLx ;y¡UCLR ) that is, L(y) = 1 + =1+ Z min(UCLx ;y+UCLR ) L(x)f(x)dx max(LCLx ;y¡U CLR ) Z UCLx LCLx I[jx ¡ yj · UCLR ]L(x)f (x)dx : (2.14) (The notation I[A] is “indicator function” notation, meaning that when A holds I[A] = 1; and otherwise I[A] = 0.) As in the earlier CUSUM and EWMA examples, once one speci…es a quadrature rule for de…nite integrals on the interval [LCLx ; UCUx ], this expression (2.14) provides a set of m linear equations for approximate values of L(ai )’s. When this system is solved, the resulting values can be fed into a discretized version of equation (2.13) and an approximate ARL produced. It is worth noting that the potential discontinuities of the integrand in equation (2.14) (produced by the indicator function) have the e¤ect of making numerical solutions of this equation much less well-behaved than those for the other integral equations developed in this section. The examples of this section have dealt only with ARLs for schemes based on (continuous) iid observations. It therefore should be said that: 1. The iid assumption can in some cases be relaxed to give tractable integral equations for situations where correlated sequences Q1 ; Q2 ; : : : are involved (see for example Problem 2.27), 2. Other descriptors of the run length distribution (beyond the ARL) can often be shown to solve simple integral equations (see for example the integral equations for CUSUM run length second moment and run length probability function in Problem 2.31), and 36 CHAPTER 2. PROCESS MONITORING 3. In some cases, with discrete variables Q there are di¤erence equation analogues of the integral equations presented here (that ultimately correspond to the kind of MC calculations illustrated in the previous section). Chapter 3 An Introduction to Discrete Stochastic Control Theory/Minimum Variance Control Section 3.6 of V&J provides an elementary introduction to the topic of Engineering Control and contrasts this adjustment methodology with (the process monitoring methodology of) control charting. The last item under the Engineering Control heading of Table 3.10 of V&J makes reference to “optimal stochastic control” theory. The object of this theory is to model system behavior using probability tools and let the consequences of the model assumptions help guide one in the choice of e¤ective control/adjustment algorithms. This chapter provides a very brief introduction to this theory. 3.1 General Exposition Let f: : : ; Z(¡1); Z(0); Z(1); Z(2); : : :g stand for observations on a process assuming that no control actions are taken. One …rst needs a stochastic/probabilistic model for the sequence fZ(t)g, and we will let F stand for such a model. F is a joint distribution for the Z’s and might, for example, be: 1. a simple random walk model speci…ed by the equation Z(t) = Z(t ¡ 1) + ²(t), where the ²’s are iid normal (0; ¾2 ) random variables, 37 38CHAPTER 3. AN INTRODUCTION TO DISCRETE STOCHASTIC CONTROL THEORY/MIN 2. a random walk model with drift speci…ed by the equation Z(t) = Z(t ¡ 1)+d+²(t), where d is a constant and the ²’s are iid normal (0; ¾ 2 ) random variables, or 3. some Box-Jenkins ARIMA model for the fZ(t)g sequence. Then let a(t) stand for a control action taken at time t, after observing the process. One needs notation for the current impact of control actions taken in past periods, so we will further let A(a; s) stand for the current impact on the process of a control action a taken s periods ago. In many systems, the control actions, a, are numerical, and A(a; s) = ah(s) where h(s) is the so-called “impulse response function” giving the impact of a unit control action taken s periods previous. A(a; s) might, for example, be: 1. given by A(a; s) = a for s ¸ 1 in a machine tool control problem where “a” means “move the cutting tool out a units” (and the controlled variable is a measured dimension of a work piece), 2. given by A(a; s) = 0 for s · u and by A(a; s) = a for s > u in a machine tool control problem where “a” means “move the cutting tool out a units” and there are u periods of dead time, or ¡ ¡ ¢¢ 3. given by A(a; s) = 1 ¡ exp ¡sh a for s ¸ 1 in a chemical process ¿ control problem with time constant ¿ and control period h seconds. We will then assume that what one actually observes for (controlled) process behavior at time t ¸ 1 is Y (t) = Z(t) + t¡1 X s=0 A(a(s); t ¡ s) ; which is the sum of what would have been observed with no control and all of the current e¤ects of previous control actions. For t ¸ 0, a(t) will be chosen based on f: : : ; Z(¡1); Z(0); Y (1); Y (2); : : : ; Y (t)g : A common objective in this context is to choose the actions so as to minimize EF (Y (t) ¡ T (t)) or t X s=1 2 EF (Y (s) ¡ T (s))2 3.1. GENERAL EXPOSITION 39 for some (possibly time-dependent) target value T (s). The problem of choosing of control actions to accomplish this goal is called the “minimum variance“ (MV) control problem, and it has a solution that can be described in fairly (deceptively, perhaps) simple terms. Note …rst that given f: : : ; Z(¡1); Z(0); Y (1); Y (2); : : : ; Y (t)g one can recover f: : : ; Z(¡1); Z(0); Z(1); Z(2); : : : ; Z(t)g. This is because Z(s) = Y (s) ¡ s¡1 X r=0 A(a(r); s ¡ r) i.e., to get Z(s), one simply subtracts the (known) e¤ects of previous control actions from Y (s). Then the model F (at least in theory) provides one a conditional distribution for Z(t + 1); Z(t + 2); Z(t + 3); : : : given the observed Z’s through time t. The conditional distribution for Z(t + 1); Z(t + 2); Z(t + 3) : : : given what one can observe through time t, namely f: : : ; Z(¡1); Z(0); Y (1); Y (2); : : : ; Y (t)g, is then the conditional distribution one gets for Z(t + 1); Z(t + 2); Z(t + 3); : : : from the model F after recovering Z(1); Z(2); : : : ; Z(t) from the corresponding Y ’s. Then for s ¸ t + 1, let EF [Z(s)j : : : ; Z(¡1); Z(0); Z(1); Z(2); : : : ; Z(t)] or just EF [Z(s)jZ t ] stand for the mean of this conditional distribution of Z(s) available at time t. Suppose that there are u ¸ 0 periods of dead time (u could be 0). Then the earliest Y that one can hope to in‡uence by choice of a(t) is Y (t + u + 1). Notice then that if one takes action a(t) at time t, one’s most natural projection of Y (t + u + 1) at time t is t¡1 X : Yb (t + u + 1jt) = EF [Z(t + u + 1)jZ t ] + A(a(s); t + u + 1 ¡ s) + A(a(t); u + 1) s=0 It is then natural (and in fact turns out to give the MV control strategy) to try to choose a(t) so that Yb (t + u + 1jt) = T (t + u + 1) : That is, the MV strategy is to try to choose a(t) so that ( t A(a(t); u+1) = T (t+u+1)¡ EF [Z(t + u + 1)jZ ] + t¡1 X s=0 ) A(a(s); t + u + 1 ¡ s) A caveat here is that in practice MV control tends to be “ragged.” That is, in order to exactly optimize the mean squared error, constant tweaking (and often fairly large adjustments are required). By changing one’s control objective somewhat it is possible to produce “smoother” optimal control policies that are : 40CHAPTER 3. AN INTRODUCTION TO DISCRETE STOCHASTIC CONTROL THEORY/MIN nearly as e¤ective as MV algorithms in terms of keeping a process on target. That is, instead of trying to optimize EF t X s=1 (Y (s) ¡ T (s))2 ; in a situation where the a’s are numerical (a = 0 indicating “no adjustment” and the “size” of adjustments increasing with jaj) one might for a constant ¸ > 0 set out to minimize the alternative criterion ! à t t¡1 X X 2 2 : (Y (s) ¡ T (s)) + ¸ (a(s)) EF s=1 s=0 Doing so will “smooth” the MV algorithm. 3.2 An Example To illustrate the meaning of the preceding formalism, consider the model (F) speci…ed by ¾ Z(t) = W (t) + ²(t) for t ¸ 0 (3.1) and W (t) = W (t ¡ 1) + d + º(t) for t ¸ 1 for d a (known) constant, the ²’s normal (0; ¾²2 ), the º’s normal (0; ¾º2 ) and all the ²’s and º’s independent. (Z(t) is a random walk with drift observed with error.) Under this model and an appropriate 0 mean normal initializing distribution for W (0), it is the case that each : b + 1j t) = Z(t EF [Z(t + 1)jZ(0); : : : ; Z(t)] may be computed recursively as b + 1jt) = ®Z(t) + (1 ¡ ®)Z(tjt b Z(t ¡ 1) + d for some constant ® (that depends upon the known variances ¾²2 and ¾º2 ). We will …nd MV control policies under model (3.1) with two di¤erent functions A(a; s). Consider …rst the possibility A(a; s) = a 8s ¸ 1 ; (3:2:2) (3.2) (an adjustment “a” at a given time period takes its full and permanent e¤ect at the next time period). b Consider the situation at time t = 0. Available are Z(0) and Z(0j¡1) (the prior mean of W (0)) and from these one may compute the prediction : b b Z(1j0) = ®Z(0) + (1 ¡ ®)Z(0j¡1) +d : 3.2. AN EXAMPLE 41 That means that taking control action a(0), one should predict a value of : b Yb (1j0) = Z(1j0) + a(0) for the controlled process at time t = 1, and upon setting this equal to the target T (1) and solving for a(0) one should thus choose b a(0) = T (1) ¡ Z(1j0) : At time t = 1 one has observed Y (1) and may recover Z(1) by noting that Y (1) = Z(1) + A(a(0); 1) = Z(1) + a(0) ; so that Z(1) = Y (1) ¡ a(0) : Then a prediction (of the uncontrolled process) one step ahead is : b b Z(2j1) = ®Z(1) + (1 ¡ ®)Z(1j0) +d : That means that with a target of T (2) one should predict a value of the controlled process at time t = 2 of : b Yb (2j1) = Z(2j1) + a(0) + a(1) : Upon setting this value equal to T (2) and solving it is clear that one should choose ´ ³ b + a(0) : a(1) = T (2) ¡ Z(2j1) So in general under (3.2), at time t one may note that Z(t) = Y (t) ¡ t¡1 X a(s) s=0 and (recursively) compute : b + 1jt) = b Z(t ®Z(t) + (1 ¡ ®)Z(tjt ¡ 1) + d : Then setting the predicted value of the controlled process equal to T (t + 1) and solving for a(t), …nd the MV control action à ! t¡1 X b + 1jt) + a(t) = T (t + 1) ¡ Z(t a(s) : s=0 Finally, consider the problem of MV control under the same model (3.1), but now using ½ 0 if s=1 A(a; s) = (3.3) a for s = 2; 3; : : : 42CHAPTER 3. AN INTRODUCTION TO DISCRETE STOCHASTIC CONTROL THEORY/MIN (a description of response to process adjustment involving one period of delay, after which the full e¤ect of an adjustment is immediately and permanently felt). Consider the situation at time t = 0. In hand are Z(0) and the prior mean b of W (0), Z(0j¡1), and the …rst Y that one can a¤ect by choice of a(0) is Y (2). Now Z(2) = W (2) + ²(2) ; = W (1) + d + º(2) + ²(2) ; = Z(1) ¡ ²(1) + d + º(2) + ²(2) so that : b Z(2j0) = EF [Z(2)jZ(0)] ; = EF [Z(1) ¡ ²(1) + d + º(2) + ²(2)jZ(0)] ; b = Z(1j0) +d ; b = ®Z(0) + (1 ¡ ®)Z(0j¡1) + 2d is a prediction of where the uncontrolled process will be at time t = 2. Then a prediction for the controlled process at time t = 2 is : b b Yb (2j0) = Z(2j0) + A(a(0); 2) = Z(2j0) + a(0) and upon setting this equal to the time t = 2 target, T (2), and solving, one has the MV control action b a(0) = T (2) ¡ Z(2j0) : b At time t = 1 one has in hand Y (1) = Z(1) and Z(1j0) and the …rst Y that can be a¤ected by the choice of a(1) is Y (3). Now Z(3) = W (3) + ²(3) ; = W (2) + d + º(3) + ²(3) ; = Z(2) ¡ ²(2) + d + º(3) + ²(3) so that : b Z(3j1) = EF [Z(3)jZ(0); Z(1)] ; = EF [Z(2) ¡ ²(2) + d + º(3) + ²(3)jZ(0); Z(1)] ; b = Z(2j1) +d ; b = ®Z(1) + (1 ¡ ®)Z(1j0) + 2d is a prediction of where the uncontrolled process will be at time t = 3. Then a prediction for the controlled process at time t = 3 is : b b Yb (3j1) = Z(3j1) + A(a(0); 3) + A(a(1); 2) = Z(3j1) + a(0) + a(1) 3.2. AN EXAMPLE 43 and upon setting this equal to the time t = 3 target, T (3), and solving, one has the MV control action ³ ´ b a(1) = T (3) ¡ Z(3j1) + a(0) : Finally, in general under (3.3), one may at time t note that Z(t) = Y (t) ¡ t¡2 X a(s) s=0 and (recursively) compute : b + 2jt) = b Z(t ®Z(t) + (1 ¡ ®)Z(tjt ¡ 1) + 2d : Then setting the time t + 2 predicted value of the controlled process equal to T (t + 2) and solving for a(t), we …nd the MV control action ! à t¡1 X b + 2jt) + a(s) : a(t) = T (t + 2) ¡ Z(t s=0 44CHAPTER 3. AN INTRODUCTION TO DISCRETE STOCHASTIC CONTROL THEORY/MIN Chapter 4 Process Characterization and Capability Analysis Sections 5.1 through 5.3 of V&J discuss the problem of summarizing the behavior of a stable process. The “bottom line” of that discussion is that onesample statistical methods can be used in a straightforward manner to characterize a process/population/universe standing behind data collected under stable process conditions. Section 5.5 of V&J opens a discussion of summarizing process behavior when it is not sensible to model all data in hand as random draws from a single/…xed universe. The notes in this chapter carry the theme of §5.5 of V&J slightly further and add some theoretical detail missing in the book. 4.1 General Comments on Assessing and Dissecting “Overall Variation” The questions “How much variation is there overall?” and “Where is the variation coming from?” are fundamental to process characterization/understanding and the guidance of improvement e¤orts. To provide a framework for discussion here, suppose that in hand one has r samples of data, sample i of size ni (i = 1; : : : ; r). Depending upon the speci…c application, these r samples can have many di¤erent logical structures. For example, §5.5 of V&J considers the case where the ni are all the same and the r samples are naturally thought of as having a balanced hierarchical/tree structure. But many others (both “regular” and completely “irregular”) are possible. For example Figure 4.1 is a schematic parallel to Figure 5.16 of V&J for a “staggered nested data structure.” When data in hand represent the entire universe of interest, methods of probability and statistical inference have no relevance to the basic questions “How much variation is there overall?” and “Where is the variation coming from?” The problem is one of descriptive statistics only, and various creative 45 46CHAPTER 4. PROCESS CHARACTERIZATION AND CAPABILITY ANALYSIS Level of A Level of B(A) Level of C(B(A)) Level of D(C(B(A))) 1 1 1 1 2 2 2 2 1 1 1 1 1 1 Figure 4.1: Schematic of a staggered Nested Data Set combinations of methods of statistical graphics and basic numerical measures (like sample variances and ranges) can be assembled to address these issues. And most simply, a “grand sample variance” is one sensible characterization of “overall variation.” The tools of probability and statistical inference only become relevant when one sees data in hand as representing something more than themselves. And there are basically two standard routes to take in this enterprise. The …rst posits some statistical model for the process standing behind the data (like the hierarchical random e¤ects model (5.28) of V&J). One may then use the data in hand in the estimation of parameters (and functions of parameters) of that model in order to characterize process behavior, assess overall variability and dissect that variation into interpretable pieces. The second standard way in which probabilistic and statistical methods become relevant (to the problems of assessing overall variation and analysis of its components) is through the adoption of a “…nite population sampling” perspective. That is, there are times where there is conceptually some (possibly highly structured) concrete data set of interest and the data in hand arise through the application (possibly in various complicated ways) of random selection of some of the elements of that data set. (As one possible example, think of a warehouse that contains 100 crates, each of which contains 4 trays, each of which in turn holds 50 individual machine parts. The 20,000 parts in the warehouse could constitute a concrete population of interest. If one were to sample 3 crates at random, select at random 2 trays from each and then select 5 parts from each tray at random, one has a classical …nite population sampling problem. Probability/randomness has entered through the sampling that is necessitated because one is unwilling to collect data on all 20,000 parts.) Section 5.5 of V&J introduces the …rst of these two approaches to assessing and dissecting overall variation for balanced hierarchical data. But it does not treat the …nite population sampling ideas at all. The present chapter of these notes thus extends slightly the random e¤ects analysis ideas discussed in §5.5 and then presents some simple material from the theory of …nite population 4.2. MORE ON ANALYSIS UNDER THE HIERARCHICAL RANDOM EFFECTS MODEL47 sampling. 4.2 More on Analysis Under the Hierarchical Random E¤ects Model Consider the hierarchical random e¤ects model with 2 levels of nesting discussed in §5.5.2 of V&J. We will continue the notations yijk ; y¹ij ; y¹i: and y¹:: used in that section and also adopt some additional notation. For one thing, it will be useful to de…ne some ranges. Let Rij = max yijk ¡min yijk = the range of the jth sample within the ith level of A ; k k ¢i = max y¹ij ¡min y¹ij = the range of the J sample means within the ith level of A ; j j and ¡ = max y¹i: ¡ min y¹i: = the range of the means for the I levels of A : i i It will also be useful to consider the ANOVA sums of squares and mean squares alluded to brie‡y in §5.5.3. So let X SSTot = (yijk ¡ y¹:: )2 i;j;k = (IJK ¡ 1) £ the grand sample variance of all IJK observations ; X SSC(B(A)) = (yijk ¡ y¹ij )2 i;j;k = (K ¡ 1) £ the sum of all IJ “level C” sample variances ; X SSB(A) = K (¹ yij ¡ y¹i: )2 i;j = K(J ¡ 1) £ the sum of all I sample variances of J means y¹ij and SSA = KJ X (¹ yi: ¡ y¹:: )2 i = KJ(I ¡ 1) £ the sample variance of the I means y¹i: : Note that in the notation of §5.5.2, SSA = KJ(I ¡ 1)s2A , SSB(A) = K(J ¡ P P 1) Ii=1 s2Bi and SSC(B(A)) = (K ¡ 1) i;j s2ij = IJ(K ¡ 1)b ¾ 2 . And it is an algebraic fact that SSTot = SSA + SSB(A) + SSC(B(A)). Mean squares are derived from these sums of squares by dividing by appropriate degrees of freedom. That is, de…ne : SSA ; M SA = I ¡1 48CHAPTER 4. PROCESS CHARACTERIZATION AND CAPABILITY ANALYSIS : SSB(A) M SB(A) = ; I(J ¡ 1) and : SSC(B(A)) : M SC(B(A)) = IJ(K ¡ 1) Now these ranges, sums of squares and mean squares are interesting measures of variation in their own right, but are especially helpful when used to produce estimates of variance components and functions of variance components. For example, it is straightforward to verify that under the hierarchical random e¤ects model (5.28) of V&J ERij = d2 (K)¾ ; q E¢i = d2 (J) ¾¯2 + ¾ 2 =K and q E¡ = d2 (I) ¾®2 + ¾¯2 =J + ¾2 =JK : So, reasoning as in §2.2.2 of V&J (there in the context of two-way random e¤ects models and gage R&R) reasonable range-based point estimates of the variance components are µ ¹ ¶2 R 2 ¾ b = ; d2 (K) à µ ! ¹ ¶2 ¾ ¢ b2 2 ¾ b¯ = max 0; ¡ d2 (J) K and ¾ b®2 à µ ¶2 µ ¹ ¶2 ! 1 ¢ ¡ ¡ : = max 0; d2 (I) J d2 (J) Now by applying linear model theory or reasoning from V&J displays (5.30) and (5.32) and the fact that Es2ij = ¾2 , one can …nd expected values for the mean squares above. These are EMSA = KJ¾®2 + K¾¯2 + ¾ 2 ; EMSB(A) = K¾¯2 + ¾ 2 and EMSC (B(A)) = ¾2 : And in a fashion completely parallel to the exposition in §1.4 of these notes, standard linear model theory implies that the quantities (I ¡ 1)M SA IJ(K ¡ 1)M SC(B(A)) I(J ¡ 1)MSB(A) ; and EMSC(B(A)) EMSB(A) EM SA are independent Â2 random variables with respective degrees of freedom IJ(K ¡ 1); I(J ¡ 1) and (I ¡ 1) : 4.2. MORE ON ANALYSIS UNDER THE HIERARCHICAL RANDOM EFFECTS MODEL49 Table 4.1: Balanced Data Hierarchical Random E¤ects Analysis ANOVA Table (2 Levels of Nesting) ANOVA Table Source SS df MS EM S A SSA I ¡1 MSA KJ¾®2 + K¾¯2 + ¾ 2 B(A) SSB(A) I(J ¡ 1) MSB(A) K¾¯2 + ¾ 2 C(B(A)) SSC(B(A)) IJ(K ¡ 1) MSC(B(A)) ¾2 Total SSTot IJK ¡ 1 These facts about sums of squares and mean squares for the hierarchical random e¤ects model are conveniently summarized in the usual (hierarchical random e¤ects model) ANOVA table (for two levels of nesting), Table 4.1. Further, the fact that the expected mean squares are simple linear combinations of the variance components ¾®2 , ¾¯2 and ¾2 motivates the use of linear combinations of mean squares in the estimation of the variance components (as in §5.5.3 of V&J). In fact (as indicated in §5.5.3 of V&J) the standard ANOVA-based estimators SSC(B(A)) ¾ b2 = ; IJ(K ¡ 1) µ ¶ SSB(A) 1 2 2 max 0; ¡¾ b ¾ b¯ = K I(J ¡ 1) and ¾ b®2 µ ¶ SSA SSB(A) 1 max 0; ¡ = JK (I ¡ 1) I(J ¡ 1) are exactly the estimators (described without using ANOVA notation) in displays (5.29), (5.31) and (5.33) of V&J. The virtue of describing them in the present terms is to suggest/emphasize that all that was said in §1.4 and §1.5 (in the gage R&R context) about making standard errors for functions of mean squares and ANOVA-based con…dence intervals for functions of variance components is equally true in the present context. For example, the formula (1.3) of these notes can be applied to derive stanb®2 immediately above. Or since dard errors for ¾ b¯2 and ¾ ¾¯2 = 1 1 EM SB(A) ¡ EMSC(B(A)) K K and 1 1 EM SA ¡ EM SB(A) JK JK are both of form (1.4), the material of §1.5 can be used to set con…dence limits for these quantities. As a …nal note in this discussion of the what is possible under the hierarchical random e¤ects model, it is worth noting that while the present discussion has been con…ned to a “balanced data” framework, Problem 4.8 shows that at least ¾®2 = 50CHAPTER 4. PROCESS CHARACTERIZATION AND CAPABILITY ANALYSIS point estimation of variance components can be done in a fairly elementary fashion even in unbalanced data contexts. 4.3 Finite Population Sampling and Balanced Hierarchical Structures This brief subsection is meant to illustrate the kinds of things that can be done with …nite population sampling theory in terms of estimating overall variability in a (balanced) hierarchical concrete population of items and dissecting that variability. Consider …rst a …nite population consisting of N M items arranged into N levels of A, with M levels of B within each level of A. (For example, there might be N boxes, each containing M widgets. Or there might be N days, on each of which M items are manufactured.) Let yij = a measurement on the item at level i of A and level j of B within the ith level of A (e.g. the diameter of the jth widget in the ith box) : Suppose that the quantity of interest is the (grand) variance of all N M measurements, N X M X 1 S2 = (yij ¡ y¹: )2 : N M ¡ 1 i=1 j=1 (This is clearly one quanti…cation of overall variation.) The usual one-way ANOVA identity applied to the NM numbers making up the population of interest shows that the population variance can be expressed as ¡ ¢ 1 2 2 M (N ¡ 1)SA S2 = + N (M ¡ 1)SB NM ¡ 1 where 2 SA = and N 1 X (¹ yi ¡ y¹: )2 = the variance of the N “A level means” N ¡ 1 i=1 0 1 N M X X 1 1 2 @ = SB (yij ¡ y¹i )2 A = the average of the N “within A level variances.” N i=1 M ¡ 1 j=1 Suppose that one selects a simple random sample of n levels of A, and for each level of A a simple random sample of m levels of B within A. (For example, one might sample n boxes and m widgets from each box.) A naive way to estimate S 2 is to simply use the sample variance s2 = X 1 (yij ¡ y¹:¤ )2 nm ¡ 1 4.3. FINITE POPULATION SAMPLING AND BALANCED HIERARCHICAL STRUCTURES51 where the sum is over the nm items selected and y¹:¤ is the mean of those measurements. Unfortunately, this is not such a good estimator. Material from Chapter 10 of Cochran’s Sampling Techniques can be used to show that µ µ ¶¶ 1 m(n ¡ 1) 2 n(m ¡ 1) m(n ¡ 1) 1 2 2 S + + ¡ SB ; Es = nm ¡ 1 A nm ¡ 1 nm ¡ 1 m M which is not in general equal to S 2 . However, it is possible to …nd a linear combination of the sample versions of 2 2 SA and SB that has expected value equal to the population variance. That is, let 1 X ¤ s2A = (¹ yi ¡ y¹:¤ )2 n¡1 = the sample variance of the n sample means (from the sampled levels of A) and µ ¶ 1X 1 X (yij ¡ y¹i¤ )2 n m¡1 = the average of the n sample variances (from the sampled levels of A) : s2B = Then, it turns out that 2 Es2A = SA + µ 1 1 ¡ m M ¶ 2 SB and 2 Es2B = SB : From this it follows that an unbiased estimator of S 2 is the quantity µ ¶¶ µ 1 N (M ¡ 1) M(N ¡ 1) 1 M(N ¡ 1) 2 s + ¡ ¡ s2B : NM ¡ 1 A NM ¡ 1 NM ¡ 1 m M This kind of analysis can, of course, be carried beyond the case of a single level of nesting. For example, consider the situation with two levels of nesting (where both the …nite population and the observed values have balanced hierarchical structure). Then in the ANOVA notation of §4.2 above, take SSA ; (I ¡ 1)JK s2A = s2B = and s2C = SSB(A) I(J ¡ 1)K SSC(B(A)) : IJ(K ¡ 1) 52CHAPTER 4. PROCESS CHARACTERIZATION AND CAPABILITY ANALYSIS 2 2 2 Let SA ; SB and SC be the population analogs of s2A ; s2B and s2C , and fB and fC be the sampling fractions at the second and third stages of item selection. Then it turns out that 2 Es2A = SA + (1 ¡ fB ) 2 (1 ¡ fC ) 2 SB + SC ; J JK 2 Es2B = SB + and (1 ¡ fC ) 2 SC K 2 Es2C = SC : So (since the grand population variance, S 2 , is expressible as a linear combina2 2 2 tion of SA ; SB and SC , each of which can be estimated by a linear combination 2 2 2 of sA ; sB and sC ) an unbiased estimator of the population variance can be built as an appropriate linear combination of s2A ; s2B and s2C . Chapter 5 Sampling Inspection Chapter 8 of V&J treats the subject of sampling inspection, introducing the basic methods of acceptance sampling and continuous inspection. This chapter extends that discussion somewhat. We consider how (in the fraction nonconforming context) one can move from single sampling plans to quite general acceptance sampling plans, we provide a brief discussion of the e¤ects of inspection/measurement error on the real (as opposed to nominal) statistical properties of acceptance sampling plans, and then the chapter closes with an elaboration of §8.5 of V&J, providing some more details on the matter of economic arguments in the choice of sampling inspection schemes. 5.1 More on Fraction Nonconforming Acceptance Sampling Section 8.1 of V&J (and for that matter §8.2 as well) con…nes itself to the discussion of single sampling plans. For those plans, a sample size is …xed in advance at some value n, and lot disposal is decided on the basis of inspection of exactly n items. There are, however, often good reasons to consider acceptance sampling plans whose ultimate sample size depends upon “how the inspected items look” as they are examined. (One might, for example, want to consider a “double sampling” plan that inspects an initial small sample, terminating sampling if items look especially good or especially bad so that appropriate lot disposal seems clear, but takes an additional larger sample if the initial one looks “inconclusive” regarding the likely quality of the lot.) This section considers fraction nonconforming acceptance sampling from the most general perspective possible and develops the OC, ASN, AOQ and ATI for a general fraction nonconforming plan. Consider the possibility of inspecting one item at a time from a lot of N , and after inspecting each successive item deciding to 1) stop sampling and accept 53 54 CHAPTER 5. SAMPLING INSPECTION Xn 6 Accept 5 Reject 4 3 2 1 n 1 2 3 4 5 6 Figure 5.1: Diagram for the n = 6, c = 2 Single Sampling Plan the lot, 2) stop sampling and reject the lot or 3) inspect another item. With Xn = the number of nonconforming items found among the …rst n inspected a helpful way of thinking about various di¤erent plans in this context is in terms of possible paths through a grid of ordered pairs of integers (n; Xn ) with 0 · Xn · n. Di¤erent acceptance sampling plans then amount to di¤erent choices of “Accept Boundary” and “Reject Boundary.” Figure 5.1 is a diagram representing a single sampling plan with n = 6 and c = 2, Figure 5.2 is a diagram representing a “doubly curtailed” version of this plan (one that recognizes that there is no need to continue inspection after lot disposal has been determined) and Figure 5.3 illustrates a double sampling plan in these terms. Now on a diagram like those in the …gures, one may very quickly count the number of permissible paths from (0; 0) to a point in the grid by (working left to right) marking each point (n; Xn ) in the grid (that it is possible to reach) with the sum of the numbers of paths reaching (n ¡ 1; Xn ¡ 1) and (n ¡ 1; Xn ) provided neither of those points is a “stop-sampling point.” (No feasible paths leave a stop-sampling point. So path counts to them do not contribute to path counts for any points to their right.) Figure 5.4 is a version of Figure 5.2 with permissible movements through the (n; Xn ) grid marked by arrows, and path counts indicated. The reason that one cares about the path counts is that for any stop-sampling 5.1. MORE ON FRACTION NONCONFORMING ACCEPTANCE SAMPLING55 Accept Xn Reject 3 2 1 n 1 2 3 4 5 6 Figure 5.2: Diagram for Doubly Curtailed n = 6, c = 2 Single Sampling Plan Xn 5 4 Accept Reject 3 2 1 n 1 2 3 4 5 6 Figure 5.3: Diagram for a Small Double Sampling Plan 56 CHAPTER 5. SAMPLING INSPECTION Accept Xn Reject 1 3 6 10 1 3 6 10 10 1 2 3 4 4 1 1 1 1 1 2 3 4 3 2 1 n 5 6 Figure 5.4: Diagram for the Doubly Curtailed Single Sampling Plan with Path Counts Indicated point (n; Xn ), from perspective A P [reaching (n; Xn )] = (path count from (0,0) to (n; Xn )) while from perspective B ¡ N¡n Np¡Xn ¡N ¢ Np ¢ ; P [reaching (n; Xn )] = (path count from (0,0) to (n; Xn )) pXn (1 ¡ p)n¡Xn : And these probabilities of reaching the various stop sampling points are the fundamental building blocks of the standard statistical characterizations of an acceptance sampling plan. For example, with A and R respectively the acceptance and rejection boundaries, the OC for an arbitrary fraction nonconforming plan is X P [reaching (n; Xn )] : (5.1) Pa = (n;Xn )2A And the mean number of items sampled (the Average Sample Number) is X ASN = nP [reaching (n; Xn )] : (5.2) (n;Xn )2A[R Further, under the rectifying inspection scenario, from perspective B X n AOQ = (1 ¡ )pP [reaching (n; Xn )] ; N (5.3) (n;Xn )2A from perspective A AOQ = X (p ¡ (n;Xn )2A Xn )P [reaching (n; Xn )] N (5.4) 5.1. MORE ON FRACTION NONCONFORMING ACCEPTANCE SAMPLING57 and AT I = N (1 ¡ P a) + X nP [reaching (n; Xn )] : (5.5) (n;Xn )2A These formulas are conceptually very simple and quite universal. The fact that specializing them to any particular choice of acceptance boundary and rejection boundary might have been unpleasant when computations had to be done “by hand” is largely irrelevant in today’s world of plentiful fast and cheap computing. These simple formulas and a personal computer make completely obsolete the many many pages of specialized formulas that at one time …lled books on acceptance sampling. Two other matters of interest remain to be raised regarding this general approach to fraction nonconforming acceptance sampling. The …rst concerns the di¢cult mathematical question “What are good shapes for the accept and reject boundaries?” We will talk a bit in the …nal section of this chapter about criteria upon which various plans might be compared and allude to how one might try to …nd a “best” plan (“best” shapes for the acceptance and rejection boundaries) according to such criteria. But at this point, we wish only to note that Abraham Wald working in the 1940s on the problem of sequential testing, developed some approximate theory that suggests that parallel straight line boundaries (the acceptance boundary below the rejection boundary) have some attractive properties. He was even able to provide some approximate two-point design criteria. That is, in order to produce a plan whose OC curve runs approximately through the points (p1 ; P a1 ) and (p2 ; P a2 ) (for p1 < p2 and P a1 > P a2 ) Wald suggested linear stop-sampling boundaries with ³ ´ 1 ln 1¡p 1¡p2 ´ : slope = ³ (5.6) 1) ln pp21 (1¡p (1¡p2 ) An appropriate Xn -intercept for the acceptance boundary is approximately ´ ³ a1 ln P P a2 ´ ; hA = ³ (5.7) p2 (1¡p1 ) ln p1 (1¡p2 ) while an appropriate Xn -intercept for the rejection boundary is approximately ´ ³ a2 ln 1¡P 1¡P a1 ´ : (5.8) hR = ³ p2 (1¡p1 ) ln p1 (1¡p2 ) Wald actually derived formulas (5.6) through (5.8) under “in…nite lot size” assumptions (that also allowed him to produce some approximations for both the OC and ASN of his plans). Where one is thinking of applying Wald’s boundaries in acceptance sampling of a real (…nite N ) lot, the question of exactly how to truncate the sampling (close in the right side of the “continue sampling region”) 58 CHAPTER 5. SAMPLING INSPECTION Xn 1 2 3 4 1 2 3 4 4 1 1 1 1 3 2 1 n 0 1 2 3 4 5 6 Figure 5.5: Path Counts from (1; 1) to Stop Sampling Points for the Plan of Figure 5.4 must be answered in some sensible fashion. And once that is done, the basic formulas (5.1) through (5.5) are of course relevant to describing the resulting plan. (See Problem 5.4 for an example of this kind of logic in action.) Finally, it is an interesting side-light here (that can come into play if one wishes to estimate p based on data from something other than a single sampling plan) that provided the stop-sampling boundary has exactly one more point in it than the largest possible value of n, the uniformly minimum variance unbiased estimator of p for both type A and type B contexts is (for (n; Xn ) a stopsampling point) pb ((n; Xn )) = path count from (1,1) to (n; Xn ) : path count from (0,0) to (n; Xn ) For example, Figure 5.5 shows the path counts from (1,1) needed (in conjunction with the path counts indicated in Figure 5.4) to …nd the uniformly minimum variance unbiased estimator of p when the doubly curtailed single sampling plan of Figure 5.4 is used. Table 5.1 lists the values of pb for the 7 points in the stop-sampling boundary for the doubly curtailed single sampling plan with n = 6 and c = 2, along with the corresponding values of Xn =n (the maximum likelihood estimator of p). 5.2 Imperfect Inspection and Acceptance Sampling The nominal statistical properties of sampling inspection procedures are “perfect inspection” properties. The OC formulas for the attributes plans in §8.1 and §8.4 of V&J and §5.1 above are really premised on the ability to tell with certainty whether an inspected item is conforming or nonconforming. And the OC formulas for the variables plans in §8.2 of V&J are premised on an assumption that the measurement x that determines whether an item is conforming or 5.2. IMPERFECT INSPECTION AND ACCEPTANCE SAMPLING 59 Table 5.1: The UMVUE and MLE of p for the Doubly Curtailed Single Sampling Plan Stop-sampling point (n; Xn ) UMVUE, pb MLE, Xn =n (3; 3) 1=1 3=3 (4; 0) 0=1 0=4 (4; 3) 2=3 3=4 (5; 1) 1=4 1=5 (5; 3) 3=6 3=5 (6; 2) 4=10 2=6 (6; 3) 4=10 3=6 Table 5.2: Perspective B Description of a Single Inspection Allowing fo Inspection Error Inspection Result G D Actual G (1 ¡ wG )(1 ¡ p) wG (1 ¡ p) 1 ¡ p Condition D pwD p(1 ¡ wD ) p 1 ¡ p¤ p¤ nonconforming can be obtained for a given item completely without measurement error. But the truth is that real-world inspection is not perfect and the nominal statistical properties of these methods at best approximate their actual properties. The purpose of this section is to investigate (…rst in the attributes context and then in the variables context) just how far actual OC values for common acceptance sampling plans can be from nominal ones. Consider …rst the percent defective context and suppose that when a conforming (good) item is inspected, there is a probability wG of misclassifying it as nonconforming. Similarly, suppose that when a nonconforming (defective) item is inspected, there is a probability wD of misclassifying it as conforming. Then from perspective B, a probabilistic description of any single inspected item is given in Table 5.2, where in that table we are using the abbreviation p¤ = wG (1 ¡ p) + p(1 ¡ wD ) for the probability that an item (of unspeci…ed actual condition) is classi…ed as nonconforming by the inspection process. It should thus be obvious that from perspective B in the fraction nonconforming context, an attributes single sampling plan with sample size n and acceptance number c has an actual acceptance probability that depends not only on p but on wG and wD as well through the formula c µ ¶ X n P a(p; wG ; wD ) = (p¤ )x (1 ¡ p¤ )n¡x : (5.9) x x=0 On the other hand, the perspective A version of the fraction nonconforming scenario yields the following. For an integer x from 0 to n, let Ux and Vx be 60 CHAPTER 5. SAMPLING INSPECTION independent random variables, Ux » Binomial (x; 1 ¡ wD ) and Vx » Binomial (n ¡ x; wG ) : And let rx = P [Ux + Vx · c] be the probability that a sample containing x nonconforming items actually passes the lot acceptance criterion. (Note that the nonstandard distribution of Ux +Vx can be generated using the same “adding on diagonals of a table of joint probabilities” idea used in §1.7.1 to generate the distribution of x.) Then it is evident that from perspective A an attributes single sampling plan with sample size n and acceptance number c has an actual acceptance probability ¡Np¢¡N(1¡p)¢ n X x P a(p; wG ; wD ) = ¡Nn¡x ¢ rx : (5.10) n x=0 It is clear that nonzero wG or wD change nominal OC’s given in displays (8.6) and (8.5) of V&J into the possibly more realistic versions given respectively by equations (5.9) and (5.10) here. In some cases, it may be possible to determine wG and wD experimentally and therefore derive both nominal and “real” OC curves for a fraction nonconforming single sampling plan. Or, if one were a priori willing to guarantee that 0 · wG · a and that 0 · wD · b, it is pretty clear that from perspective B one might then at least guarantee that (5.11) P a(p; a; 0) · P a(p; wG ; wD ) · P a(p; 0; b) and have an “OC band” in which the real OC (that depends upon the unknown inspection e¢cacy) is guaranteed to lie. Similar analyses can be done for nonconformities per unit contexts as follows. Suppose that during inspection of product, real nonconformities are missed with probability m and that (independent of the occurrence and inspection of real nonconformities) “phantom” nonconformities are “observed” according to a Poisson process with rate ¸P per unit inspected. Then from perspective B in a nonconformities per unit context, the number of nonconformities observed on k units is Poisson with mean k(¸(1 ¡ m) + ¸P ) ; so that an actual acceptance probability corresponding to the nominal one given in display (8.8) of V&J is P a(¸; ¸P ; m) = c X exp (¡k(¸(1 ¡ m) + ¸P )) (k(¸(1 ¡ m) + ¸P ))x x=0 x! : (5.12) And from perspective A,¡ with ¢ a realized per unit defect rate ¸ on N units, k let U¸;m » Binomial (k¸; N (1 ¡ m)) be independent of V¸P » Poisson (k¸P ). 5.2. IMPERFECT INSPECTION AND ACCEPTANCE SAMPLING 61 Then an actual acceptance probability corresponding to the nominal one given in display (8.7) of V&J is P a(¸; ¸P ; m) = P [U¸;m + V¸P · c] : (5.13) And the same kinds of bounding ideas used above for the fraction nonconforming context might be used with the OC (5.12) in the mean nonconformities per unit context. Pretty clearly, if one could guarantee that ¸P · a and that m · b, one would have (from display (5.12)) P a(¸; a; 0) · P a(¸; ¸P ; m) · P a(¸; 0; b) (5.14) in the perspective B situation. The violence done to the OC notion by the possibility of imperfect inspection in an attributes sampling context is serious, but not completely unmanageable. That is, where one can determine the likelihood of inspection errors experimentally, expressions (5.9), (5.10), (5.12) and (5.13) are simple enough characterizations of real OC’s. And where wG and wD (or ¸P and m) are small, bounds like (5.11) (or (5.14)) show that both the nominal (the wG = 0 and wD = 0, or ¸P = 0 and m = 0 case) OC and real OC are trapped in a fairly narrow band and can not be too di¤erent. Unfortunately, the situation is far less happy in the variables sampling context. The origin of the di¢culty with admitting there is measurement error when it comes to variables acceptance sampling is the fundamental fact that standard variables plans attempt to treat all (¹; ¾) pairs with the same value of p equally. And in short, once one admits to the possibility of measurement error clouding the evaluation of the quantity x that must say whether a given item is conforming or nonconforming, that goal is unattainable. For any level of measurement error, there are (¹; ¾) pairs (with very small ¾) for which product variation can so to speak “hide in the measurement noise.” So some fairly bizarre real OC properties result for standard plans. To illustrate, consider the case of “unknown ¾” variables acceptance sampling with a lower speci…cation, L and adopt the basic measurement model (2.1) of V&J for what is actually observed when an item with characteristic x is measured. Now the development in §8.2 of V&J deals with a normal (¹; ¾) distribution for observations. An important issue is “What observations?” Is it the x’s or the y’s of the model (2.1)? It must be the x’s, for the simple reason that p is de…ned in terms of ¹ and ¾. These parameters describe what the lot is really like, NOT what it looks like when measured with error. That is, the ¾ of §8.2 of V&J must be the ¾x of page 19 of V&J. But then the analysis of §8.2 is done essentially supposing that one has at his or her disposal x ¹ and sx to use for decision making purposes, while all that is really available are y¹ and sy !!! And that turns out to make a huge di¤erence in the real OC properties of the standard method put forth in §8.2. That is, applying criterion (8.35) of V&J to what can really be observed (namely the noise-corrupted y’s) one accepts a lot i¤ y¹ ¡ L ¸ ksy : (5.15) 62 CHAPTER 5. SAMPLING INSPECTION And under model (2.1) of V&J, a given set of parameters (¹x ; ¾x ) for the x distribution has corresponding fraction nonconforming ¶ µ L ¡ ¹x p(¹x ; ¾x ) = © ¾x and acceptance probability · ¸ y¹ ¡ L ¸k sy 1 0 L¡¹y y¹¡¹y p ¡ p p ¾y = n ¾y = n =P@ ¸ k nA sy P a(¹x ; ¾x ; ¯; ¾measurement ) = P ¾y where ¾y is given in display (2.3) of V&J. But then let ¢=¡ and note that L ¡ ¹y (L ¡ ¹x )=¾x ¡ ¯=¾x p = ¡q p ; ¾2 ¾y = n 1 + measurement = n (5.16) 2 ¾x y¹ ¡ ¹y p » Normal (0; 1) ¾y = n p s independent of ¾yy , which has the distribution of U=(n ¡ 1) for U a Â2n¡1 random variable. That is, with W a noncentral t random variable with noncentrality parameter ¢ given in display (5.16), we have p P a(¹x ; ¾x ; ¯; ¾measurement ) = P [W ¸ k n] : And the crux of the matter is that (even if measurement bias, ¯, is 0) ¢ in display (5.16) is not a function of (L ¡ ¹x )=¾x alone unless one assumes that ¾measurement is EXACTLY 0. Even with no measurement bias, if ¾measurement 6= 0 there are (¹x ; ¾x ) pairs with L ¡ ¹x =z ¾x p (and therefore p = ©(z)) and ¢ ranging all the way from ¡z n to 0. Thus considering z · 0 and p · :5 there are corresponding P a’s ranging from p P [a tn¡1 random variable ¸ k n] to p p P [a non-central tn¡1 (¡z n) random variable ¸ k n] ; (the nominal OC), while considering z ¸ 0 and p ¸ :5 there are corresponding P a’s ranging from (the nominal OC) p p P [a non-central tn¡1 (¡z n) random variable ¸ k n] ; 5.3. SOME DETAILS CONCERNING THE ECONOMIC ANALYSIS OF SAMPLING INSPECTION63 Pa(p) 1.0 .5 p Figure 5.6: Typical Real OC for a One-Sided Variables Acceptance Sampling Plan in the Presence of Nonzero Measurement Error to p P [a tn¡1 random variable ¸ k n] : That is, one is confronted with the extremely unpleasant and (initially counterintuitive) picture of real OC indicated in Figure 5.6. It is important to understand the picture painted in Figure 5.6. The situation is worse than in the attributes data case. There, if one knows the e¢cacy of the inspection methodology it is at least possible to pick a single appropriate OC curve. (The OC “bands” indicated by displays (5.11) and (5.14) are created only by ignorance of inspection e¢cacy.) The bizarre “OC bands” created in the variables context (and sketched in Figure 5.6) do not reduce to curves if one knows the inspection bias and precision, but rather are intrinsic to the fact that unless ¾measurement is exactly 0, di¤erent (¹; ¾) pairs with the same p must have di¤erent P a’s under acceptance criterion (5.15). And the only way that one can replace the situation pictured in Figure 5.6 with one having a thinner and more palatable OC band (something approximating a “curve”) is by guaranteeing that ¾x2 2 ¾measurement is of some appreciable size. That is, given a particular measurement precision, one must agree to concern oneself only with cases where product variation cannot hide in measurement noise. Such is the only way that one can even come close to the variables sampling goal of treating (¹; ¾) pairs with the same p equally. 5.3 Some Details Concerning the Economic Analysis of Sampling Inspection Section 8.5 of V&J alludes brie‡y to the possibility of using economic/decisiontheoretic arguments in the choice of sampling inspection schemes and cites the 1994 Technometrics paper of Vander Wiel and Vardeman. Our …rst objective 64 CHAPTER 5. SAMPLING INSPECTION in this section is to provide some additional details of the Vander Wiel and Vardeman analysis. To that end, consider a stable process fraction nonconforming situation and continue the wG and wD notation used above (and also introduced on page 493 of V&J). Note that Table 5.2 remains an appropriate description of the results of a single inspection. We will suppose that inspection costs are accrued on a per item basis and adopt the notation of Table 8.16 of V&J for the costs. As a vehicle to a very quick demonstration of the famous “all or none” principle, consider facing N potential inspections and employing a “random inspection policy” that inspects each item independently with probability ¼. Then the mean cost su¤ered over N items is simply N times that su¤ered for 1 item. And this is ECost = ¼ (kI + (1 ¡ p)wG kGF + p(1 ¡ wD )kDF + pwD kDP ) + (1 ¡ ¼)pkDU = ¼(kI + wG kGF ¡ pK) + pkDU (5.17) for K = (1 ¡ wD )(kDU ¡ kDF ) + wD (kDU ¡ kDP ) + wG kGF (as in display (8.50) of V&J). Now it is clear from display (5.17) that if K < 0, ECost is minimized over choices of ¼ by the choice ¼ = 0. On the other hand, if K > 0, ECost is minimized over choices of ¼ by the choice ¼ = 0 if p · and by the choice ¼ = 1 if p ¸ kI + wG kGF K kI + wG kGF : K That is, if one de…nes pc = ½ 1 kI +wG kGF K if K · 0 if K > 0 then an optimal random inspection policy is clearly ¼ = 0 (do no inspection) if p < pc and ¼ = 1 (inspect everything) if p > pc : This development is simple and completely typical of what one gets from economic analyses of stable process (perspective B) inspection scenarios. Where quality is poor, all items should be inspected, and where it is good none should be inspected. Vander Wiel and Vardeman argue that the speci…c criterion developed here (and phrased in terms of pc ) holds not only as one looks for an optimal random inspection policy, but completely generally as one looks among all possible inspection policies for one that minimizes expected total cost. But it is essential to remember that the context is a stable process/perspective B 5.3. SOME DETAILS CONCERNING THE ECONOMIC ANALYSIS OF SAMPLING INSPECTION65 context, where costs are accrued on a per item basis, and in order to implement the optimal policy one must know p! In other contexts, the best (minimum expected cost) implementable/realizable policy will often turn out to not be of the “all or none” variety. The remainder of this section will elaborate on this assertion. For the balance of the section we will consider (Barlow’s formulation) of what we’ll call the “Deming Inspection Problem” (as Deming’s consideration of this problem rekindled interest in these matters and engendered considerable controversy and confusion in the 1980s and early 1990s). That is, we’ll consider a lot of N items, assume a cost structure where k1 = the cost of inspecting one item (at the proposed inspection site) and k2 = the cost of later grief caused by a defective item that is not detected and suppose that inspection is without error. (This is the Vander Wiel and Vardeman cost structure with kI = k1 ; kDF = 0 and kDU = k2 , where both wG and wD are assumed to be 0.) The objective will be optimal (minimum expected cost) choice of a “…xed n inspection plan” (in the language of §8.1 of V&J, a single sampling with recti…cation plan). That is, we’ll consider the optimal choice of n and c supposing that with X = the number nonconforming in a sample of n ; if X · c the lot will be “accepted” (all nonconforming items in the sample will be replaced with good ones and no more inspection will be done), while if X > c the lot will be “rejected” (all items in the lot will be inspected and all nonconforming items replaced with good ones). (The implicit assumption here is that replacements for nonconforming items are somehow known to be conforming and are produced “for free.”) And we will continue use of the stable process or perspective B model for the generation of the items in the lot. In this problem, the expected total cost associated with the lot is a function of n, c and p, ETC(n; c; p) = k1 n + (1 ¡ P a(n; c; p))k1 (N ¡ n) + pP a(n; c; p)k2 (N ¡ n) µ µ ¶¶ ³ n ´ k2 = k1 N 1 + P a(n; c; p) 1 ¡ p ¡1 : (5.18) N k1 Optimal choice of n and c requires that one be in the business of comparing the functions of p de…ned in display (5.18). How one approaches that comparison depends upon what one is willing to input into the decision process in terms of information about p. First, if p is …xed/known and available for use in choosing n and c, the optimization of criterion (5.18) is completely straightforward. It amounts only to the comparison of numbers (one for each (n; c) pair), not functions. And the 66 CHAPTER 5. SAMPLING INSPECTION ´ ³ solution is quite simple. In the case that p > k1 =k2 , p kk12 ¡ 1 > 0 and from examination of display ¡ (5.18) ¢ minimum expected total cost will be achieved if n P a(n; c; p) =³ 0 or if ´1 ¡ N = 0. That is, “all” is optimal. In the case that p kk21 ¡ 1 < 0 and from examination of formula (5.18) minimum ¢ ¡ n = 1. That expected total cost will be achieved if P a(n; c; p) = 1 and 1 ¡ N is, “none” is optimal. This is a manifestation of the general Vander Wiel and Vardeman result. For known p in this kind of problem, sampling/partial inspection makes no sense. One is not going to learn anything about p from the sampling. Simple economics (comparison of p to the critical cost ratio k1 =k2 ) determines whether it is best to inspect and rectify, or to “take one’s lumps” in later costs. When one may not assume that p is …xed/known (and it is thus unavailable for use in choosing an optimal (n; c) pair) some other approach has to be taken. One possibility is to describe p with a probability distribution G, average ETC(n; c; p) over p according to that distribution to get EG ETC(n; c), and then to compare numbers (one for each (n; c) pair) to identify an optimal inspection plan. This makes sense p < k1 =k2 , 1. from a Bayesian point of view, where the distribution G re‡ects one’s “prior beliefs” about p, or 2. from a non-Bayesian point of view, where the distribution G is a “process distribution” describing how p is thought to vary lot to lot. The program SAMPLE (written by Tom Lorenzen and modi…ed slightly by Steve Crowder) available o¤ the Stat 531 Web page will do this averaging and optimization for the case where G is a Beta distribution. Consider what insights into this “average out according to G” idea can be written down in more or less explicit form. In particular, consider …rst the problem of choosing a best c for a particular n, say (copt G (n)). Note that if a sample of n results in x nonconforming items, the (conditional) expected cost incurred is nk1 + (N ¡ n)k2 EG [p jX = x] with no more inspection and Nk1 if the remainder of the lot is inspected : (Note that the form of the conditional mean of p given X = x depends upon the distribution G.) So, one should do no more inspection if nk1 + (N ¡ n)k2 EG [p jX = x] < N k1 ; i.e. if EG [p jX = x] < k1 ; k2 5.3. SOME DETAILS CONCERNING THE ECONOMIC ANALYSIS OF SAMPLING INSPECTION67 and the remaining items should be inspected if EG [p jX = x] > k1 : k2 So, an optimal choice of c is copt G (n) ½ k1 = max x j EG [p j X = x] · k2 ¾ : (5.19) (And it is perhaps comforting to know that the monotone likelihood ratio property of the binomial distribution guarantees that EG [p jX = x] is monotone in x.) What is this saying? The assumptions 1) that p » G and 2) that conditional on p the variable X » Binomial (n; p) together give a joint distribution for p and X. This in turn can be used to produce for each x a conditional distribution of pjX = x and therefore a conditional mean value of p given that X = x. The prescription (5.19) says that one should …nd the largest x for which that conditional mean value of p is still less than the critical cost ratio and use that value for copt G (n). To complete the optimization of EG ETC(n; c; p), one then would then need to compute and compare (for various n) the quantities EG ETC(n; copt G (n); p) : (5.20) The fact is that depending upon the nature of G, the minimizer of quantity (5.20) can turn out to be anything from 0 to N . For example, if G puts all its probability on one side or the other of k1 =k2 , then the conditional distributions of p given X = x must concentrate all their probability (and therefore have their means) on that same side of the critical cost ratio. So it follows that if G puts all its probability to the left of k1 =k2 , “none” is optimal (even though one doesn’t know p exactly), while if G puts all its probability to the right of k1 =k2 , “all” is optimal in terms of optimizing EG ETC(n; c; p). On the other hand, consider an unrealistic but instructive situation where k1 = 1; k2 = 1000 and G places probability 12 on the possibility that p = 0 and probability 12 on the possibility that p = 1. Under this model the lot is either perfectly good or perfectly bad, and a priori one thinks these possibilities are equally likely. Here the distribution G places probability on both sides of the breakeven quantity k1 =k2 = :001. Even without actually carrying through the whole mathematical analysis, it should be clear that in this scenario the optimal n is 1! Once one has inspected a single item, he or she knows for sure whether p is 0 or is 1 (and the lot can be recti…ed in the latter case). The most common mathematically nontrivial version of this whole analysis of the Deming Inspection Problem is the case where G is a Beta distribution. If G is the Beta(®; ¯) distribution, EG [p jX = x] = ®+x ®+¯ +n 68 CHAPTER 5. SAMPLING INSPECTION so that copt G (n) is the largest value of x such that k1 ®+x · : ®+¯+n k2 That is, in this situation, for byc the greatest integer in y, copt G (n) = b k1 k1 k1 (® + ¯ + n) ¡ ®c = b n ¡ ® + (® + ¯)c ; k2 k2 k2 which for large n is essentially kk12 n. The optimal value of n can then be found by optimizing (over choice of n) the quantity ¡ ¢ EG ETC(n; copt G (n); p) = Z 0 1 ETC(n; copt G (n); p) 1 p®¡1 (1 ¡ p)¯¡1 dp : B(®; ¯) The reader can check that this exercise boils down to the minimization over n of ¶ µ copt (n) µ ¶ Z 1 ³ n ´ GX n k2 x n¡x p (1 ¡ p) 1¡ p ¡ 1 p®¡1 (1 ¡ p)¯¡1 dp : N x=0 x 0 k1 (The SAMPLE program of Lorenzen alluded to earlier actually uses a di¤erent approach than the one discussed here to …nd optimal plans. That approach is computationally more e¢cient, but not as illuminating in terms of laying bare the basic structure of the problem as the route taken in this exposition.) As two …nal pieces of perspective on this topic of economic analysis of sampling inspection we o¤er the following. In the …rst place, while the Deming Inspection Problem is not a terribly general formulation of the topic, the results here are typical of how things turn out. Second, it needs to be remembered that what has been described here is the …nding of a cost-optimal …xed n inspection plan. The problem of …nding a plan optimal among all possible plans (of the type discussed in §5.1) is a more challenging one. For G placing probability on both sides of the critical cost ratio, not only need it not be that case that “all” or “none” is optimal, but in general an optimal plan need not be of the …xed n variety. While in principle the methodology for …nding an overall best inspection plan is well-established (involving as it does so called “dynamic programming” or “backwards induction”) the details are unpleasant enough that it will not make sense to pursue this matter further. Chapter 6 Problems 1 Measurement and Statistics 1.1. Suppose that a sample variance s2 is based on a sample of size n from a normal distribution. One might consider estimating ¾ using s or s=c4 (n), or even some other multiple of s. (a) Since c4 (n) < 1, the second of these estimators has a larger variance than the …rst. But the second is unbiased (has expected value ¾) while the …rst is not. Which has the smaller mean squared error, E(b ¾ ¡ ¾)2 ? Note that (as is standard in statistical theory), E(b ¾¡ 2 2 ¾) =Var ¾ b +(Eb ¾ ¡¾) . (Mean squared error is variance plus squared bias.) (b) What is an optimal (in terms of minimum mean squared error) multiple of s to use in estimating ¾? 1.2. How do R=d2 (n) and s=c4 (n) compare (in terms of mean squared error) as estimators of ¾? (The assumption here is that they are both based on a sample from a normal distribution. See Problem 1.1 for a de…nition of mean squared error.) 1.3. Suppose that sample variances s2i , i = 1; 2; : : : ; r are based on independent samples of size m from normal distributions with a common standard deviation, ¾. A common SQC-inspired estimator of ¾ is s¹=c4 (m). Another possibility is sµ ¶ s21 + ¢ ¢ ¢ + s2r spooled = r 69 70 CHAPTER 6. PROBLEMS or ¾ ^ = spooled =c4 ((m ¡ 1)r + 1) : Standard distribution theory says that r(m ¡ 1)s2pooled =¾ 2 has a Â2 distribution with r(m ¡ 1) degrees of freedom. (a) Compare s¹=c4 (m), spooled and ¾ ^ in terms of mean squared error. (b) What is an optimal multiple of spooled (in terms of mean squared error) to use in estimating ¾? (Note: See Vardeman (1999 IIE Transactions) for a complete treatment of the issues raised in Problems 1.1 through 1.3.) 1.4. Set up a double integral that gives the probability that the sample range of n standard normal random variables is between .5 and 2.0. How is this probability related to the probability that the sample range of n iid normal (¹; ¾ 2 ) random variables is between .5¾ and 2.0¾? 1.5. It is often helpful to state “standard errors” (estimated standard deviations) corresponding to point estimates of quantities of interest. In a ¹ 2 (n) context where a standard deviation, ¾, is to be estimated by R=d based on r samples of size n, what is a reasonable standard error to announce? (Be sure that your answer is computable from sample data, i.e. doesn’t involve any unknown process parameters.) 1.6. Consider the paper weight data in Problem (2.12) of V&J. Assume that the 2-way random e¤ects model is appropriate and do the following. (a) Compute the y¹ij ; sij and Rij for all I£J = 2£5 = 10 Piece£Operator combinations. Then compute both row ranges of means ¢i and row sample variances of means s2i . (b) Find both range-based and sample variance-based point estimates of the repeatability standard deviation, ¾. (c) Find both range-based and sample variance-based pointqestimates of 2 . the reproducibility standard deviation ¾reproducibility = ¾¯2 + ¾®¯ (d) Get a statistical package to give you the 2-way ANOVA table for these data. Verify that s2pooled = MSE and that your sample variancebased estimate of ¾reproducibility from part (c) is s µ ¶ 1 I ¡1 1 max 0; M SB + M SAB ¡ M SE : mI mI m 1. MEASUREMENT AND STATISTICS 71 (e) Find a 90% two-sided con…dence interval for the parameter ¾. (f) Use the material in §1.5 and give an approximate 90% two-sided con…dence interval for ¾reproducibility . (g) Find a linear combination of the mean squares from (d) whose ex2 2 pected value is ¾overall = ¾reproducibility + ¾2 . All the coe¢cients in your linear combination will be positive. In this case, the you may use the next to last paragraph of §1.5 to come up with an approximate 90% two-sided con…dence interval for ¾overall . Do so. (h) The problem from which the paper weight data are drawn indicates that speci…cations of approximately §4g/m2 are common for paper of the type used in this gage study. These translate to speci…cations of about §:16g for pieces of paper of the size used here. Use these speci…cations and your answer to part (g) to make an approximate 90% con…dence interval for the gage capability ratio GCR = 6¾overall : (U ¡ L) Used in the way it was in this study, does the scale seem adequate to check conformance to such speci…cations? (i) Give (any sensible) point estimates of the fractions of the overall measurement variance attributable to repeatability and to reproducibility. 1.7. In a particular (real) thorium detection problem, measurement variation for a particular (spectral absorption) instrument was thought to be about ¾measurement = :002 instrument units. (Division of a measurement expressed in instrument units by 58.2 gave values in g/l.) Suppose that in an environmental study, a …eld sample is to be measured once (producing ynew ) on this instrument and the result is to be compared to a (contemporaneous) measurement of a lab “blank” (producing yold ). If the …eld reading exceeds the blank reading by too much, there will be a declaration that there is a detectable excess amount of thorium present. (a) Assuming that measurements are normal, …nd a critical value Lc so that the lab will run no more than a 5% chance of a “false positive” result. (b) Based on your answer to (a), what is a “lower limit of detection,” Ld , for a 90% probability (°) of correctly detecting excess thorium? What, by the way, is this limit in terms of g/l? 72 CHAPTER 6. PROBLEMS 1.8. Below are 4 hypothetical samples of size n = 3. A little calculation shows that ignoring the fact that there are 4 samples and simply computing “s” based on 12 observations will produce a “standard deviation” much larger than spooled . Why is this? 3,6,5 4,3,1 8,9,6 2,1,4 1.9. In applying ANOVA methods to gage R&R studies, one often uses linear combinations of independent mean squares as estimators of their expected values. Section 1.5 of these notes shows it is possible to also produce standard errors (estimated standard deviations) for these linear combinations. ºi M Si Suppose that M S1 ; M S2 ; : : : ; M Sk are independent random variables, » EM Si Â2ºi . Consider the random variable U = c1 M S1 + c2 MS2 + ¢ ¢ ¢ + ck M Sk : (a) Find the standard deviation of U . (b) Your expression from (a) should involve the means EM Si , that in applications will be unknown. Propose a sensible (data-based) estimator of the standard deviation of U that does not involve these quantities. (c) Apply your result from (b) to give a sensible standard error for the 2 2 ANOVA-based estimators of ¾2 , ¾reproducibility and ¾overall . 1.10. Section 1.7 of the notes presents “rounded data” likelihood methods for normal data with the 2 parameters ¹ and ¾. The same kind of thing can be done for other families of distributions (which can have other numbers of parameters). For example, the exponential distributions with means µ¡1 can be used. (Here there is the single parameter µ.) These exponential distributions have cdf’s ½ 1 ¡ exp(¡µx) for x ¸ 0 Fµ (x) = 0 for x < 0 : Below is a frequency table for twenty exponential observations that have been rounded to the nearest integer. rounded value 0 1 2 3 4 frequency 7 8 2 2 1 (a) Write out an expression for the appropriate “rounded data log likelihood function” for this problem, L(µ) = ln L(datajµ) : 1. MEASUREMENT AND STATISTICS 73 (You should be slightly careful here. Exponential random variables only take values in the interval (0; 1).) (b) Make a plot of L(µ). Use it and identify the maximum likelihood estimate of µ based on the rounded data. (c) Use the plot from (b) and make an approximate 90% con…dence interval for µ. (The appropriate Â2 value has 1 associated degree of freedom.) 1.11. Below are values of a critical dimension (in .0001 inch above nominal) measured on hourly samples of size n = 5 precision metal parts taken from the output of a CNC (computer numerically controlled) lathe. sample measurements 1 4,3,3,2,3 2 2,2,3,3,2 3 4,1,0,¡1,0 4 2,0,2,1,4 5 2,2,1,3,4 6 2, ¡2,2,1,2 (a) Compute for each of these samples the “raw” sample standard deviation (ignoring rounding) and the “Sheppard’s correction” standard deviation that is appropriate for integer rounded data. How do these compare for the eight samples above? (b) For each of the samples that have a range of at least 2, use the CONEST program to …nd “rounded normal data” maximum likelihood estimates of the normal parameters ¹ and ¾. The program as written accepts observations ¸ 1, so you will need to add an integer to each element of some of the samples above before doing calculation with the program. (I don’t remember, but you may not be able to input a standard deviation of exactly 0 either.) How do the maximum likelihood estimates of ¹ compare to x ¹ values? How do the maximum likelihood estimates of ¾ compare to both the raw standard deviations and to the results of applying “Sheppard’s correction”? (c) Consider sample #2. Make 95% and 90% con…dence intervals for both ¹ and ¾ using the work of Johnson Lee. (d) Consider sample #1. Use the CONEST program to get a few approximate values for L¤ (¹) and some approximate values for L¤¤ (¾). (For example, look at a contour plot of L over a narrow range of means near ¹ to get an approximate value for L¤ (¹).) Sketch L¤ (¹) and L¤¤ (¾) and use your sketches and Lee’s tables to produce 95% con…dence intervals for ¹ and ¾. (e) What 95% con…dence intervals for ¹ and ¾ would result from a 9th sample, f2; 2; 2; 2; 2g? 7 0,0,0,2,0 8 1,¡1,2,0,2 74 CHAPTER 6. PROBLEMS 1.12. A single operator measures a single widget diameter 15 times and obtains a range of R = 3 £ 10¡4 inches. Then this person measures the diameters of 12 di¤erent widgets once each and obtains a range of R = 8 £ 10¡4 inches. Give an estimated standard deviation of widget diameters (not including measurement error). 1.13. Cylinders of (outside) diameter O must …t in ring bearings of (inside) diameter I, producing clearance C = I ¡ O. We would like to have some idea of the variability in actual clearances that will be obtained by “random assembly” of cylinders produced on one production line with ring bearings produced on another. The gages used to measure I and O are (naturally enough) di¤erent. In a study using a single gage to measure outside diameters of cylinders, nO = 10 di¤erent cylinders were measured once each, producing a sample standard deviation sO = :001 inch. In a subsequent study, this same gage was used to measure the outside diameter of an additional cylinder mO = 5 times, producing a sample standard deviation sOgage = :0005 inch. In a study using a single gage to measure inside diameters of ring bearings, nI = 20 di¤erent inside diameters were measured once each, producing a sample standard deviation sI = :003 inch. In a subsequent study, this same gage was used to measure the inside diameter of another ring bearing mI = 10 times, producing a sample standard deviation sIgage = :001 inch. (a) Give a sensible (point) estimate of the standard deviation of C produced under random assembly. (b) Find a sensible standard error for your estimate in (a). 2 Process Monitoring Methods 2.1. Consider the following hypothetical situation. A “variables” process monitoring scheme is to be set up for a production line, and two di¤erent measuring devices are available for data gathering purposes. Device A produces precise and expensive measurements and device B produces less precise and less expensive measurements. Let ¾measurement for the two devices be respectively ¾A and ¾B , and suppose that the target for a particular critical diameter for widgets produced on the line is 200.0. 2. PROCESS MONITORING 75 (a) A single widget produced on the line is measured n = 10 times with each device and RA = 2:0 and RB = 5:0. Give estimates of ¾A and ¾B . (b) Explain why it would not be appropriate to use one of your estimates from (a) as a “¾” for setting up an x ¹ and R chart pair for monitoring the process based on measurements from one of the devices. Using device A, 10 consecutive widgets produced on the line (under presumably stable conditions) have (single) measurements with R = 8:0. (c) Set up reasonable control limits for both x ¹ and R for the future monitoring of the process based on samples of size n = 10 and measurements from device A. (d) Combining the information above about the A measurements on 10 consecutive widgets with your answer to (a), under a model that says observed diameter = real diameter + measurement error where “real diameter ” and “measurement error ” are independent, give an estimate of the standard deviation of the real diameters. (See the discussion around page 19 of V&J.) (e) Based on your answers to parts (a) and (d), set up reasonable control limits for both x ¹ and R for the future monitoring of the process based on samples of size n = 5 and measurements from the cheaper device, device B. 2.2. The following are some data taken from a larger set in Statistical Quality Control by Grant and Leavenworth, giving the drained weights (in ounces) of contents of size No. 2 12 cans of standard grade tomatoes in puree. 20 samples of three cans taken from a canning process at regular intervals are represented. 76 CHAPTER 6. PROBLEMS Sample 1 2 3 4 5 6 7 8 9 10 x1 22.0 20.5 20.0 21.0 22.5 23.0 19.0 21.5 21.0 21.5 x2 22.5 22.5 20.5 22.0 19.5 23.5 20.0 20.5 22.5 23.0 x3 22.5 22.5 23.0 22.0 22.5 21.0 22.0 19.0 20.0 22.0 Sample 11 12 13 14 15 16 17 18 19 20 x1 20.0 19.0 19.5 20.0 22.5 21.5 19.0 21.0 20.0 22.0 x2 19.5 21.0 20.5 21.5 19.5 20.5 21.5 20.5 23.5 20.5 x3 21.0 21.0 21.0 24.0 21.0 22.0 23.0 19.5 24.0 21.0 (a) Suppose that standard values for the process mean and standard deviation of drained weights (¹ and ¾) in this canning plant are 21.0 oz and 1.0 oz respectively. Make and interpret standards given x ¹ and R charts based on these samples. What do these charts indicate about the behavior of the …lling process over the time period represented by these data? (b) As an alternative to the standards given range chart made in part (a), make a standards given s chart based on the 20 samples. How does its appearance compare to that of the R chart? Now suppose that no standard values for ¹ and ¾ have been provided. (c) Find one estimate of ¾ for the …lling process based on the average ¹ and another based on the average of 20 of the 20 sample ranges, R, sample standard deviations, s¹. = ¹ and make retrospective (d) Use x and your estimate of ¾ based on R control charts for x ¹ and R. What do these indicate about the stability of the …lling process over the time period represented by these data? = (e) Use x and your estimate of ¾ based on s¹ and make retrospective control charts for x ¹ and s. How do these compare in appearance to the retrospective charts for process mean and variability made in part (d)? 2.3. The accompanying data are some taken from Statistical Quality Control Methods by I.W. Burr, giving the numbers of beverage cans found to be defective in periodic samples of 312 cans at a bottling facility. 2. PROCESS MONITORING Sample 1 2 3 4 5 6 7 8 9 10 Defectives 6 7 5 7 5 5 4 5 12 6 77 Sample 11 12 13 14 15 16 17 18 19 20 Defectives 7 7 6 6 6 6 23 10 8 5 (a) Suppose that company standards are that on average p = :02 of the cans are defective. Use this value and make a standards given p chart based on the data above. Does it appear that the process fraction defective was stable at the p = :02 value over the period represented by these data? (b) Make a retrospective p chart for these data. What is indicated by this chart about the stability of the canning process? 2.4. Modern business pressures are making standards for fractions nonconforming in the range of 10¡4 to 10¡6 not uncommon. (a) What are standards given 3¾ control limits for a p chart with standard fraction nonconforming 10¡4 and sample size 100? What is the all-OK ARL for this scheme? (b) If p becomes twice the standard value (of 10¡4 ), what is the ARL for the scheme from (a)? (Use your answer to (a) and the binomial distribution for n = 100 and p = 2 £ 10¡4 .) (c) What do (a) and (b) suggest about the feasibility of doing process monitoring for very small fractions defective based on attributes data? 2.5. Suppose that a dimension of parts produced on a certain machine over a short period can be thought of as normally distributed with some mean ¹ and standard deviation ¾ = :005 inch. Suppose further, that values of this dimension more than .0098 inch from the 1.000 inch nominal value are considered nonconforming. Finally, suppose that hourly samples of 10 of these parts are to be taken. 78 CHAPTER 6. PROBLEMS (a) If ¹ is exactly on target (i.e. ¹ = 1:000 inch) about what fraction of parts will be nonconforming? Is it possible for the fraction nonconforming to ever be any less than this …gure? (b) One could use a p chart based on n = 10 to monitor process performance in this situation. What would be standards given 3 sigma control limits for the p chart, using your answer from part (a) as the standard value of p? (c) What is the probability that a particular sample of n = 10 parts will produce an out-of-control signal on the chart from (b) if ¹ remains at its standard value of ¹ = 1:000 inch? How does this compare to the same probability for a 3 sigma x ¹ chart for an n = 10 setup with a center line at 1.000? (For the p chart, use a binomial probability calculation. For the x ¹ chart, use the facts that ¹x¹ = ¹ and ¾x¹ = p ¾= n.) What are the ARLs of the monitoring schemes under these conditions? (d) Compare the probability that a particular sample of n = 10 parts will produce an out-of-control signal on the p chart from (b) to the probability that the sample will produce an out of control signal on the (n = 10) 3 sigma x ¹ chart …rst mentioned in (c), supposing that in fact ¹ = 1:005 inch. What are the ARLs of the monitoring schemes under these conditions? What moral is told by your calculations here and in part (c)? 2.6. The article “High Tech, High Touch,” by J. Ryan, that appeared in Quality Progress in 1987 discusses the quality enhancement processes used by Martin Marietta in the production of the space shuttle external (liquid oxygen) fuel tanks. It includes a graph giving counts of major hardware nonconformities for each of 41 tanks produced. The accompanying data are approximate counts read from that graph for the last 35 tanks. (The …rst six tanks were of a di¤erent design than the others and are thus not included here.) 2. PROCESS MONITORING Tank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Nonconformities 537 463 417 370 333 241 194 185 204 185 167 157 139 130 130 267 102 130 79 Tank 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Nonconformities 157 120 148 65 130 111 65 74 65 148 74 65 139 213 222 93 194 (a) Make a retrospective c chart for these data. Is there evidence of real quality improvement in this series of counts of nonconformities? Explain. (b) Consider only the last 17 tanks represented above. Does it appear that quality was stable over the production period represented by these tanks? (Make another retrospective c chart.) (c) It is possible that some of the …gures read from the graph in the original article may di¤er from the real …gures by as much as, say, 15 nonconformities. Would this measurement error account for the apparent lack of stability you found in (a) or (b) above? Explain. 2.7. Boulaevskaia, Fair and Seniva did a study of “defect detection rates” for the visual inspection of some glass vials. Vials known to be visually identi…able as defective were marked with invisible ink, placed among other vials, and run through a visual inspection process at 10 di¤erent time periods. The numbers of marked defective vials that were detected/captured, the numbers placed into the inspection process, and the corresponding ratios for the 10 periods are below. X = number detected/captured n = number placed X=n 6 30 .2 10 30 .33 15 30 .5 18 30 .6 17 30 .57 2 15 .13 7 15 .47 5 15 .33 6 15 .4 5 15 .33 80 CHAPTER 6. PROBLEMS (Overall, 91 of the 225 marked vials placed into the inspection process were detected/captured.) (a) Carefully investigate (and say clearly) whether there is evidence in these data of instability in the defect detection rate. (b) 91=225 = :404. Do you think that the company these students worked with was likely satis…ed with the 40.4% detection rate? What, if anything, does your answer here have to do with the analysis in (a)? 2.8. (Narrow Limit Gaging) Parametric probability model assumptions can sometimes be used to advantage even where one is ultimately going to generate and use attributes data. Consider a situation where process standards are that widget diameters are to be normally distributed with mean ¹ = 5 and standard deviation ¾ = 1. Engineering speci…cations on these diameters are 5 § 3. As a process monitoring device, samples of n = 100 of these widgets are going to be checked with a go/no-go gage, and X =the number of diameters in a sample failing to pass the gaging test will be counted and plotted on an np chart. The design of the go/nogo gage is up to you to choose. You may design it to pass parts with diameters in any interval (a; b) of your choosing. (a) One natural choice of (a; b) is according to the engineering speci…cations, i.e. as (2,8). With this choice of go/no-go gage, a 3¾ control chart for X signals if X ¸ 2. Find the all-OK ARL for this scheme with this gage. (b) One might, however, choose (a; b) in other ways besides according to the engineering speci…cations, e.g. as (5 ¡ ±; 5 + ±) for some ± other than 3. Show that the choice of ± = 2:71 and a control chart that signals if X ¸ 3 will have about the same all-OK ARL as the scheme from (a). (c) Compare the schemes from (a) and (b) supposing that diameters are in fact normally distributed with mean ¹ = 6 and standard deviation ¾ = 1. 2.9. A one-sided upper CUSUM scheme is used to monitor Q = the number of defectives in samples of size n = 400 . 2. PROCESS MONITORING 81 Suppose that one uses k1 = 8 and h1 = 10. Use the normal approximation to the binomial distribution to obtain an approximate ARL for this scheme if p = :025. 2.10. Consider the monitoring of a process that we will assume produces normally distributed observations X with standard deviations ¾ = :04. (a) Set up both a two-sided CUSUM scheme and a EWMA scheme for monitoring the process (Q = X), using a target value of .13 and a desired all-OK ARL of roughly 370, if quickest possible detection of a change in mean of size ¢ = :02 is desired. (b) Plot on the same set of axes, the logarithms of the ARLs for your charts from (a) as functions of ¹, the real mean of observations being CUSUMed or EWMAed. Also plot on this same set of axes the logarithms of ARLs for a standard 3¾ Shewhart Chart for individuals. Comment upon how the 3 ARL curves compare. 2.11. Shear strengths of spot welds made by a certain robot are approximately normal with a short term variability described by ¾ = 60 lbs. The strengths in samples of n of these welds are going to be obtained and x ¹ values CUSUMed. (a) Give a reference value k2 , sample size n and a decision interval h2 so that a one-sided (lower) CUSUM scheme for the x ¹’s will have an ARL of about 370 if ¹ = 800 lbs and an ARL of about 5 if ¹ = 750 lbs. (b) Find a sample size and a lower Shewhart control limit for x ¹, say #, so that if ¹=800 lbs, there will be about 370 samples taken before an x ¹ will plot below #, and if ¹ = 750 there will be on average about 5 samples taken before an x ¹ will plot below #. 2.12. You have data on the e¢ciency of a continuous chemical production process. The e¢ciency is supposed to be about 45%, and you will use a CUSUM scheme to monitor the e¢ciency. E¢ciency is computed once per shift, but from much past data, you know that ¾ ¼ :7%. (a) If you wish quickest possible detection of a shift of .7% (one standard deviation) in mean e¢ciency, design a two-sided CUSUM scheme for this situation with an all-OK ARL of about 500. 82 CHAPTER 6. PROBLEMS (b) Apply your procedure from (a) to the data below. Are any alarms signaled? Shift E¢ciency Shift E¢ciency 1 45.7 11 45.8 2 44.6 12 45.4 3 45.0 13 46.8 4 44.4 14 45.5 5 44.4 15 45.8 6 44.2 16 46.4 7 46.1 17 46.0 8 44.6 18 46.3 9 45.7 19 45.6 10 44.4 (c) Make a plot of “raw” CUSUMs using a reference value of 45%. From your plot, when do you think that the mean e¢ciency shifted away from 45%? (d) What are the all-OK and “¹ = 45:7%” ARLs if one employs your procedure from (a) modi…ed by giving both the high and low side charts “head starts” of u = v = h1 =2 = h2 =2? (e) Repeat part (a) using a EWMA scheme rather than a CUSUM scheme. (f) Apply your procedure from (e) to the data. Are any alarms signaled? Plot your EWMA values. Based on this plot, when do you think that the mean e¢ciency shifted away from 45%? 2.13. Consider the problem of designing a EWMA control chart for x ¹’s, where in addition to choosing chart parameters one gets to choose the sample size, n. In such a case, one can choose monitoring parameters to produce both a desired (large) on-target ARL and a desired (small) o¤-target ARL ± units away from the target. Suppose, for example, that a process standard deviation is ¾ = 1 and one wishes to design for an ARL of 370 if the process mean, ¹, is on target, and an ARL of no more than 5.0 if ¹ is o¤ target by as much as ± = 1:0. p Using ¾Q = ¾= n and shift = ±=¾Q and reading from one of the graphs in Crowder’s 1989 JQT paper, values of ¸opt for detecting a change in process mean of this size using EWMAs of x ¹’s are approximately as below: n ¸opt 1 .14 2 .08 3 .06 4 .05 5 .05 6 .04 7 .04 8 .04 9 .03 Use Crowder’s EWMA ARL program (and some trial and error) to …nd values of K that when used with the ¸’s above will produce an on-target 2. PROCESS MONITORING 83 ARL of 370. Then determine how large n must then be in order to meet the 370 and 5.0 ARL requirements. How does this compare to what Table 4.8 says is needed for a two-sided CUSUM to meet the same criteria? 2.14. Consider a combination of high and low side decision interval CUSUM schemes with h1 = h2 = 2:5, u = 1, v = ¡1, k1 = :5 and k2 = ¡:5. Suppose that Q’s are iid normal variables with ¾Q = 1:0. Find the ARLs for the combined scheme if ¹Q = 0 and then if ¹Q = 1:0. (You will need to use Gan’s CUSUM ARL program and Yashchin’s expression for combining high and low side ARLs.) 2.15. Set up two di¤erent X/MR monitoring chart pairs for normal variables Q, in the case where the standards are ¹Q = 5 and ¾Q = 1:715 and the allOK ARL desired is 250. For these combinations, what ARLs are relevant if in fact ¹Q = 5:5 and ¾Q = 2:00? (Run Crowder’s X=M R ARL program to get these with minimum interpolation.) 2.16. If one has discrete or rounded data and insists on using x ¹ and/or R charts, §1.7.1 shows how these may be based on the exact all-OK distributions of x ¹ and/or R (and not on normal theory control limits). Suppose that measurements arise from integer rounding of normal random variables with ¹ = 2:25 and ¾ = :5 (so that essentially only values 1, 2, 3 and 4 are ever seen). Compute the four probabilities corresponding to these rounded values (and “fudge” them slightly so that they total to 1.00). Then, for n = 4 compute the probability distributions of x ¹ and R based on iid observations from this distribution. Then run Karen (Jensen) Hulting’s DIST program and compare your answers to what her program produces. 2.17. Suppose that standard values of process parameters are ¹ = 17 and ¾ = 2:4. (a) Using sample means x ¹ based on samples of size n = 4, design both a combined high and low side CUSUM scheme (with 0 head starts) and a EWMA scheme to have an all-OK ARL of 370 and quickest possible detection of a shift in process of mean of size .6. (b) If, in fact, the process mean is ¹ = 17:5 and the process standard deviation is ¾ = 3:0, show how you would …nd the ARL associated with your schemes from (a). (You don’t need to actually interpolate in the tables, but do compute the values you would need in order to enter the tables, and say which tables you must employ.) 84 CHAPTER 6. PROBLEMS 2.18. A discrete variable X can take only values 1, 2, 3, 4 and 5. Nevertheless, managers decide to “monitor process spread” using the ranges of samples of size n = 2. Suppose, for sake of argument, that under standard plant conditions observations are iid and uniform on the values 1 through 5 (i.e. P [X = 1] = P [X = 2] = P [X = 3] = P [X = 4] = P [X = 5] = :2). (a) Find the distribution of R for this situation. (Note that R has possible values 0, 1, 2, 3 and 4. You need to reason out the corresponding probabilities.) (b) The correct answer to part (a) has ER = 1:6. This implies that if ¹ computed, one can expect many samples of size n = 2 are taken and R a mean range near 1:6. Find and criticize corresponding normal theory control limits for R. (c) Suppose that instead of using a normal-based Shewhart chart for R, one decides to use a high side Shewhart-CUSUM scheme (for ranges) with reference value k1 = 2 and starting value 0, that signals the …rst time any range is 4 or the CUSUM is 3 or more. Use your answer for (a) and show how to …nd the ARL for this scheme. (You need not actually carry through the calculations, but show explicitly how to set things up.) 2.19. SQC novices faced with the task of analyzing a sequence of (say) m individual observations collected over time often do the following: Compute “¹ x” and “s” from the m data values and apply “control limits” x ¹ § 3s to the m individuals. Say why this method of operation is essentially useless. (Compare Problem 1.8.) 2.20. Consider an x ¹ chart based on standards ¹0 and ¾0 and samples of size n, where only the “one point outside 3¾ limits” alarm rule is in use. (a) Find ARLs if in fact ¾ = ¾0 , but 2, and 3. p nj¹ ¡ ¹0 j=¾ is respectively 0, 1, (b) Find ARLs if in fact ¹ = ¹0 , but ¾=¾0 is respectively .5, .8, 1, 1.5 and 2.0. Theory 2.21. Consider the problem of samples of size n = 1 in variables control charting contexts, and the notion of there using moving ranges for various purposes. 2. PROCESS MONITORING 85 This problem considers a little theory that may help illustrate the implications of using an average moving range, M R, in the estimation of ¾ in such circumstances. Suppose that X1 and X2 are independent normal random variables with a common variance ¾2 , but possibly di¤erent means ¹1 and ¹2 . (You may, if you wish, think of these as widget diameters made at times 1 and 2, where the process mean has potentially shifted between the sampling periods.) (a) What is the distribution of X1 ¡X2 ? The distribution of (X1 ¡X2 )=¾? (b) For t > 0, write out in terms of values of © the probability P [j(X1 ¡ X2 )=¾j · t] : In doing this, abbreviate (¹1 ¡ ¹2 )=¾ as ±. (c) Notice that in part (b), you have found the cumulative distribution function for the random variable M R=¾. Di¤erentiate your answer to (b) to …nd the probability density for MR=¾ and then use this probability density to write down an integral that gives the mean of the random variable M R=¾, E(M R=¾). (You may abbreviate the standard normal pdf as Á, rather than writing everything out.) Vardeman used his trusty HP 15C (and its de…nite integral routine) and evaluated the integral in (c) for various values of ±. Some values that he obtained are below. ± 0 §:1 §:2 §:3 §:4 §:5 §1:0 E(M R=¾) 1.1284 1.1312 1.1396 1.1537 1.1732 1.198 1.399 §2:0 §2:5 §3:0 §3:5 §4:0 large j±j 2.101 2.544 3.017 3.506 4.002 j±j (Notice that as expected, the ± = 0 value is d2 for a sample of size n = 2.) (d) Based on the information above, argue that for n independent normal random variables X1 ; X2 ; : : : ; Xn with common standard deviation ¾, if ¹1 = ¹2 = ¢ ¢ ¢ = ¹n then the sample average moving range, MR, when divided by 1.1284 has expected value ¾. (e) Now suppose that instead of being constant, the successive means, ¹1 ; ¹2 ; : : : ; ¹n in fact exhibit a reasonably strong linear trend. That is suppose that ¹t = ¹t¡1 + ¾. What is the expected value of MR/1.1284 in this situation. Does MR/1.1284 seem like a sensible estimate of ¾ here? §1:5 1.710 86 CHAPTER 6. PROBLEMS (f) In a scenario where the means could potentially “bounce around“ according to ¹t = ¹t¡1 § k¾, how large might k be without destroying the usefulness of MR/1.1284 as an estimate of ¾? Defend your opinion on the basis of the information contained in the table above. 2.22. Consider the kind of discrete time Markov Chain with a single absorbing state used in §2.1 to study the run length properties of process monitoring schemes. Suppose that one wants to know not the mean times to absorption from the nonabsorbing states, but the variances of those times. Since for a generic random variable X, VarX =EX 2 ¡(EX)2 , once one has mean times to absorption (belonging to the vector L = (I ¡ R)¡1 1) it su¢ces to compute the expected squares of times to absorption. Let M be an m £ 1 vector containing expected squares of times to absorption (from states S1 through Sm ). Set up a system of m equations for the elements of M in terms of the elements of R; L and M. Then show that in matrix notation M = (I ¡ R)¡1 (I + 2R(I ¡ R)¡1 )1 : 2.23. So-called “Stop-light Control” or “Target Area Control” of a measured characteristic X proceeds as follows. One …rst de…nes “Green” (OK), “Yellow” (Marginal) and “Red” (Unacceptable) regions of possible values of X. One then periodically samples a process according to the following rules. At a given sampling period, a single item is measured and if it produces a Green X, no further action is necessary at the time period in question. If it produces a Red X, lack of control is declared. If it produces a Yellow X, a second item is immediately sampled and measured. If this second item produces a Green X, no further action is taken at the period in question, but otherwise lack of control is declared. Suppose that in fact a process under stop-light monitoring is stable and pG = P [X is Green], pY = P [X is Yellow] and pR = 1 ¡ pG ¡ pY = P [X is Red]. (a) Find the mean number of sampling periods from the beginning of monitoring through the …rst out-of-control signal, in terms of the p’s. (b) Find the mean total number of items measured from the beginning of monitoring through the …rst out-of-control signal, in terms of the p’s. 2. PROCESS MONITORING 87 2.24. Consider the Run-Sum control chart scheme discussed in §2.2. In the notes Vardeman wrote out a transition matrix for a Markov Chain analysis of the behavior of this scheme. (a) Write out the corresponding system of 8 linear equations in 8 mean times to absorption for the scheme. Note that the mean times till signal from “T = ¡0” and “T = +0” states are the same linear combinations of the 8 mean times and must thus be equal. (b) Find a formula for the ARL of this scheme. This can be done as follows. Use the equations for the mean times to absorption from states “T = +3” and “T = +2” to …nd a constant ·+2;+3 such that L+3 = ·+2;+3 L+2 . Find similar constants ·+1;+2 , ·+0;+1 , ·¡2;¡3 , ·¡1;¡2 and ·¡0;¡1 . Then use these constants to write a single linear equation for L+0 = L¡0 that you can solve for L+0 = L¡0 . 2.25. Consider the problem of monitoring X = the number of nonconformities on a widget : Suppose the standard for ¸ is so small that a usual 3¾ Shewhart control chart will signal any time Xt > 0. On intuitive grounds the engineers involved …nd such a state of a¤airs unacceptable. The replacement for the standard Shewhart scheme that is then being contemplated is one that signals at time t if i) Xt ¸ 2 or ii) Xt = 1 and any of Xt¡1 , Xt¡2 , Xt¡3 or Xt¡4 is also equal to 1. Show how you could …nd an ARL for this scheme. (Give either a matrix equation or system of linear equations one would need to solve. State clearly which of the quantities in your set-up is the desired ARL.) 2.26. Consider a discrete distribution on the (positive and negative) integers speci…ed by the probability function p(¢). This distribution will be used below to help predict the performance of a Shewhart type monitoring scheme that will sound an alarm the …rst time that an individual observation Xt is 3 or more in absolute value (that is, the alarm bell rings the …rst time that jXt j ¸ 3). (a) Give an expression for the ARL of the scheme in terms of values of p(¢), if observations X1 ; X2 ; X3 ; : : : are iid with probability function p(¢). 88 CHAPTER 6. PROBLEMS (b) Carefully set up and show how you would use a transition matrix for an appropriate Markov Chain in order to …nd the ARL of the scheme under a model for the observations X1 ; X2 ; X3 ; : : : speci…ed as follows: X1 has probability function p(¢), and given X1 ; X2 ; : : : ; Xt¡1 , the variable Xt has probability function p(¢ ¡ Xt¡1 ) You need not carry out any matrix manipulations, but be sure to fully explain how you would use the matrix you set up. 2.27. Consider the problem of …nding ARLs for a Shewhart individuals chart supposing that observations X1 ; X2 ; X3 ; : : : are not iid, but rather realizations from a so-called AR(1) model. That is, suppose that in fact for some ½ with j½j < 1 Xt = ½Xt¡1 + ²t for a sequence of iid normal random variables ²1 ; ²2 ; : : : each with mean 0 and variance ¾2 . Notice that under this model the conditional distribution of Xt+1 given all previous observations is normal with mean ½Xt and variance ¾2 . Consider plotting values Xt on a Shewhart chart with control limits UCL and LCL. (a) For LCL < u < UCL, let L(u) stand for the mean number of additional observations (beyond X1 ) that will be required to produce an out of control signal on the chart, given that X1 = u. Carefully derive an integral equation for L(u). (b) Suppose that you can solve your equation from (a) for the function L(u) and that it is sensible to assume that X1 is normal with mean 0 and variance ¾ 2 =(1 ¡½2 ). Show how you would compute the ARL for the Shewhart individuals chart under this model for the X sequence. 2.28. A one-sided upper CUSUM scheme with reference value k1 = :5 and decision interval h1 = 4 is to be used to monitor Poisson (¸) observations. (CUSUM¸ 4 causes a signal.) (a) Set up, but don’t try to manipulate with a Markov Chain transition matrix that you could use to …nd (exact) ARLs for this scheme. (b) Set up, but don’t try to manipulate with a Markov Chain transition matrix that you could use to obtain (exact) ARLs if the CUSUM 2. PROCESS MONITORING 89 scheme is combined with a Shewhart-type scheme that signals any time an observation 3 or larger is obtained. 2.29. In §2.3, Vardeman argued that if Q1 ; Q2 ; : : : are iid continuous random variables with probability density f and cdf F , a one-sided (high side) CUSUM scheme with reference value k1 and decision interval h1 has ARL function L(u) satisfying the integral equation Z h1 L(u) = 1 + L(0)F (k1 ¡ u) + L(y)f (y + k1 ¡ u)dy : 0 Suppose that a (one-sided) Shewhart type criterion is added to the CUSUM alarm criterion. That is, consider a monitoring system that signals the …rst time the high side CUSUM exceeds h1 or Qt > M , for a constant M > k1 . Carefully derive an integral equation similar to the one above that must be satis…ed by the ARL function of the combined Shewhart-CUSUM scheme. 2.30. Consider the problem of …nding ARLs for CUSUM schemes where Q1 ; Q2 ; : : : are iid exponential with mean 1. That is, suppose that one is CUSUMing iid random variables with common probability density ½ ¡x e for x > 0 f(x) = 0 otherwise : (a) Argue that the ARL function of a high side CUSUM scheme for this situation satis…es the di¤erential equation ½ L(u) ¡ L(0) ¡ 1 for 0 · u · k1 L0 (u) = L(u) ¡ L(u ¡ k1 ) ¡ 1 for k1 · u : (Vardeman and Ray (Technometrics, 1985) solve this di¤erential equation and a similar one for low side CUSUMs to obtain ARLs for exponential Q.) (b) Suppose that one decides to approximate high side exponential CUSUM ARLs by using simple numerical methods to solve (approximately) the integral equation discussed in class. For the case of k1 = 1:5 and h1 = 4:0, write out the R matrix (in the equation L = 1 + RL) one has using the quadrature rule de…ned by m = 8, ai = (2i ¡ 1)h1 =2m and each wi = h1 =m. (c) Consider making a Markov Chain approximation to the ARL referred to in part (b). For m = 8 and the discretization discussed in class, write out the R matrix that would be used in this case. How does this matrix compare to the one in part (b)? 90 CHAPTER 6. PROBLEMS 2.31. Consider the problem of determining the run length properties of a high side CUSUM scheme with head start u, reference value k and decision interval h if iid continuous observations Q1 ; Q2 ; : : : with common probability density f and cdf F are involved. Let T be the run length variable. In class, Vardeman concentrated on L(u) =ET , the ARL of the scheme. But other features of the run length distribution might well be of interest in some applications. (a) The variance of T , Var T =ET 2 ¡ L2 (u) might also be of importance in some instances. Let M (u) =ET 2 and argue very carefully that M (u) must satisfy the integral equation Z M (u) = 1+(M(0) + 2L(0)) F (k¡u)+ h (M (s) + 2L(s)) f(s+k¡u)ds : 0 (Once one has found L(u), this gives an integral equation that can be solved for M(u), leading to values for Var T , since then Var T = M (u) ¡ L2 (u).) (b) The probability function of T , P (t; u) = P r[T = t] might also be of importance in some instances. Express P (1; u) in terms of F . Then argue very carefully that for t > 1, P (t; u) must satisfy the recursion P (t; u) = P (t ¡ 1; 0)F (k ¡ u) + Z 0 h P (t ¡ 1; s)f(s + k ¡ u)ds : (There is thus the possibility of determining successively the function P (1; u), then the function P (2; u), then the function P (3; u), etc.) 2.32. In §2.2, Vardeman considered a “two alarm rule monitoring scheme” due to Wetherill and showed how …nd the ARL for that scheme by solving two linear equations for quantities L1 and L2 . It is possible to extend the arguments presented there and …nd the variance of the run length. (a) For a generic random variable X, express both Var X and E(X + 1)2 in terms of EX and EX 2 . (b) Let M1 be the expected square of the run length for the Wetherill scheme and let M2 be the expected square of the number of additional plotted points required to produce an out-of-control signal if there has been no signal to date and the current plotted point is between 2and 3-sigma limits. Set up two equations for M1 and M2 that are linear in M1 , M2 , L1 and L2 . 2. PROCESS MONITORING 91 (c) The equations from (b) can be solved simultaneously for M1 and M2 . Express the variance of the run length for the Wetherill scheme in terms of M1 , M2 , L1 and L2 . 2.33. Consider a Shewhart control chart with the single extra alarm rule “signal if 2 out of any 3 consecutive points fall between 2¾ and 3¾ limits on one side of the center line.” Suppose that points Q1 ; Q2 ; Q3 ; : : : are to be plotted on this chart and that the Qs are iid. Use the notation pA = the probability Q1 falls outside 3¾ limits pB =the probability Q1 falls between 2¾ and 3¾ limits above the center line pC =the probability Q1 falls between 2¾ and 3¾ limits below the center line pD =the probability Q1 falls inside 2¾ limits and set up a Markov Chain that you can use to …nd the ARL of this scheme under the iid model for the Qs. (Be sure to carefully and completely de…ne your state space, write out the proper transition matrix and indicate which entry of (I ¡ R)¡1 1 gives the desired ARL.) 2.34. A process has a “good” state and a “bad” state. Suppose that when in the good state, the probability that an observation on the process plots outside of control limits is g, while the corresponding probability for the bad state is b. Assume further that if the process is in the good state at time t ¡ 1, there is a probability d of degradation to the bad state before an observation at time t is made. (Once the process moves into the bad state it stays there until that condition is detected via process monitoring and corrected.) Find the “ARL”/mean time of alarm, if the process is in the good state at time t = 0 and observation starts at time t = 1. 2.35. Consider the following (nonstandard) process monitoring scheme for a variable X that has ideal value 0. Suppose h(x) > 0 is a function with h(x) = h(¡x) that is decreasing in jxj. (h has its maximum at 0 and decreases symmetrically as one moves away from 0.) Then suppose that i) control limits for X1 are §h(0), and ii) for t > 1 control limits for Xt are §h(Xt¡1 ). (Control limits vary. The larger that jXt¡1 j is, the tighter are the limits on Xt .) Discuss how you would …nd an ARL for this scheme for iid X with marginal probability density f . (Write down an appropriate integral 92 CHAPTER 6. PROBLEMS equation, brie‡y discuss how you would go about solving it and what you would do with the solution in order to …nd the desired ARL.) 2.36. Consider the problem of monitoring integer-valued variables Q1 ; Q2 ; Q3 ; :::(we’ll suppose that Q can take any integer value, positive or negative). De…ne h(x) = 4 ¡ jxj and consider the following de…nition of an alarm scheme: 1) alarm at time i = 1 if jQ1 j ¸ 4, and 2) for i ¸ 2 alarm at time i if jQi j ¸ h(Qi¡1 ). For integer j, let qj = P [Q1 = j] and suppose the Qi are iid. Carefully describe how to …nd the ARL for this situation. (You don’t need to produce a formula, but you do need to set up an appropriate MC and tell me exactly/completely what to do with it in order to get the ARL.) 2.37. Consider the problem of monitoring integer-valued variables Qt (we’ll suppose that Q can take any integer value, positive or negative). A combination of individuals and moving range charts will be used according to the scheme that at time 1, Q1 alone will be plotted, while at time t > 1 both Qt and M Rt = jQt ¡ Qt¡1 j will be plotted. The alarm will ring at the …rst period where jQt j > 3 or M Rt > 4. Suppose that the variables Q1 ; Q2 ; : : : are iid and pi = P [Q1 = i]. Consider the problem of …nding an average run length in this scenario. (a) Set up the transition matrix for an 8 state Markov Chain describing the evolution of this charting method from t = 2 onward, assuming that the alarm doesn’t ring at t = 1. (State Si for i = ¡3, ¡2, ¡1, 0, 1, 2, 3 will represent the situation “no alarm yet and the most recent observation is i” and there will be an alarm state.) (b) Given values for the pi , one could use the transition matrix from part (a) and solve for mean times to alarm from the states Si . Call these L¡3 , L¡2 , L¡1 , L0 , L1 , L2 , and L3 . Express the average run length of the whole scheme (including the plotting at time t = 1 when only Q1 is plotted) in terms of the Li and pi values. 3. ENGINEERING CONTROL AND STOCHASTIC CONTROL THEORY93 3 Engineering Control and Stochastic Control Theory 3.1. Consider the use of the PI(D) controller ¢X(t) = :5E(t) + :25¢E(t) in a situation where the control gain, G, is 1 and the target for the controlled : variable is T (t) = 0. Suppose that no control actions are applied before the time t = 0, but that for t ¸ 0, E(t) and ¢E(t) are used to make changes in the manipulated variable, ¢X(t), according to the above equation. Suppose further that the value of the controlled variable, Y (t), is the sum of what the process would do with no control, say Z(t), and the sum of e¤ects at time t of all changes in the manipulated variable made in previous periods based on E(0), ¢E(0), E(1), ¢E(1), E(2), ¢E(2); : : : ; E(t ¡ 1), ¢E(t ¡ 1). Consider 3 possible patterns of impact at time s of a change in the manipulated variable made at time t, ¢X(t) : Pattern 1: Pattern 2: Pattern 3: The e¤ect on Y (s) is 1 £ ¢X(t) for all s ¸ t + 1 (a control action takes its full e¤ect immediately). The e¤ect on Y (t + 1) is 0, but the e¤ect on Y (s) is 1 £ ¢X(t) for all s ¸ t + 2 (there is one period of dead time, after which a control action immediately takes its full e¤ect). The e¤ect on Y (s) is 1 £ (1 ¡ 2t¡s )¢X(t) for all s ¸ t + 1 (there is an exponential/geometric pattern in the way the impact of ¢X(t) is felt, the full e¤ect only being seen for large s). Consider also 3 possible deterministic patterns of uncontrolled process behavior, Z(t): Pattern A: Pattern B: Pattern C: Z(t) = ¡3 for all t ¸ ¡1 (the uncontrolled process would remain constant, but o¤ target). Z(t) = ¡3 for all ¡1 · t · 5, while Z(t) = 3 for all 6 · t (there is a step change in where the uncontrolled process would be). Z(t) = ¡3 + t for all t ¸ ¡1 (there is a linear trend in where the uncontrolled process would be). For each of the 3 £ 3 = 9 combinations of patterns in the impact of changes in the manipulated variable and behavior of the uncontrolled process, make up a table giving at times t = ¡1; 0; 1; 2; : : : ; 10 the values of Z(t), E(t), ¢E(t), ¢X(t) and Y (t). 94 CHAPTER 6. PROBLEMS 3.2. Consider again the PI(D) controller of Problem 3.1. Suppose that the target is T (t), where T (t) = 0 for t · 5 and T (t) = 3 for t > 5. For the Pattern 1 of impact of control actions and Patterns A, B and C for Z(t), make up tables giving at times t = ¡1; 0; 1; 2; : : : ; 10 the values of Z(t), T (t), E(t), ¢E(t), ¢X(t) and Y (t). 3.3. Consider again the PI(D) controller of Problem 3.1 and Pattern D: Z(t) = (¡1)t (the uncontrolled process would oscillate around the target). For the Patterns 1 and 2 of impact of control actions, make up tables giving at times t = ¡1; 0; 1; 2; : : : ; 10 the values of Z(t), T (t), E(t), ¢E(t), ¢X(t) and Y (t). 3.4. There are two tables here giving some values of an uncontrolled process : Z(t) that has target T (t) = 0. Suppose that a manipulated variable X is available and that the simple (integral only) control algorithm ¢X(t) = E(t) will be employed, based on an observed process Y (t) that is the sum of Z(t) and the e¤ects of all relevant changes in X. Consider two di¤erent scenarios: (a) a change of ¢X in the manipulated variable impacts all subsequent values of Y (t) by the addition of an amount ¢X, and (b) there is one period of dead time, after which a change of ¢X in the manipulated variable impacts all subsequent values of Y (t) by the addition of an amount ¢X. Fill in the two tables according to these two scenarios and then comment on the lesson they seem to suggest about the impact of dead time on the e¤ectiveness of PID control. 3.5. On pages 87 and 88 V&J suggest that over-adjustment of a process will increase rather than decrease variation. In this problem we will investigate this notion mathematically. Imagine periodically sampling a widget produced by a machine and making a measurement yi . Conceptualize the situation as yi = ¹i + ²i where 3. ENGINEERING CONTROL AND STOCHASTIC CONTROL THEORY95 Table 6.1: Table for Problem 3.4(a), No Dead Time t Z(t) T (t) Y (t) E(t) = ¢X(t) 0 ¡1 0 ¡1 1 ¡1 0 2 ¡1 0 3 ¡1 0 4 ¡1 0 5 ¡1 0 6 ¡1 0 7 ¡1 0 8 ¡1 0 9 ¡1 0 Table 6.2: Table for Problem 3.4(a), One Period of Dead Time t 0 1 2 3 4 5 6 7 8 9 Z(t) ¡1 ¡1 ¡1 ¡1 ¡1 ¡1 ¡1 ¡1 ¡1 ¡1 T (t) 0 0 0 0 0 0 0 0 0 0 Y (t) ¡1 E(t) = ¢X(t) 96 CHAPTER 6. PROBLEMS ¹i = the true machine setting (or widget diameter) at time i ²i = “random” variability at time i a¤ecting only measurement i . and Further, suppose that the (coded) ideal diameter is 0 and ¹i is the sum of natural machine drift and adjustments applied by an operator up through time i. That is, with °i = the machine drift between time i ¡ 1 and time i ±i = the operator (or automatic controller’s) adjustment applied between time i ¡ 1 and time i and suppose that ¹0 = 0 and for j ¸ 1 we have ¹j = j X °i + i=1 j X ±i : i=1 We will here consider the (integral-only) adjustment policies for the machine ±i = ¡®yi¡1 for an ® 2 [0; 1] : It is possible to verify that for j ¸ 1 and if if if ®=0: ®=1: ® 2 (0; 1) : P yj = ji=1 °i + ²j yj = °j ¡ ²j¡1 + ²j P P yj = ji=1 °i (1 ¡ ®)j¡i ¡ ® ji=1 ²i¡1 (1 ¡ ®)j¡i + ²j . Model ²0 ; ²1 ; ²2 ; : : : as independent random variables with mean 0 and variance ¾2 and consider predicting the likely e¤ectiveness of the adjustment policies by …nding lim E¹2j . (E¹2j is a measure of how close to proper j!1 adjustment the machine can be expected to be at time j.) : (a) Compare choices of ® supposing that °i = 0. (Here the process is stable.) : (b) Compare choices of ® supposing that °i = d, some constant. (This is a case of deterministic linear machine drift, and might for example be used to model tool wear over reasonably short periods.) (c) Compare choices of ® supposing °1 ; °2 ; : : : is a sequence of independent random variables with mean 0 and variance ´2 that is independent of the ² sequence. What ® would you recommend using if this (random walk) model seems appropriate and ´ is thought to be about one half of ¾? 3. ENGINEERING CONTROL AND STOCHASTIC CONTROL THEORY97 3.6. Suppose that : : : ; ²(¡1); ²(0); ²(1); ²(2); : : : are iid normal random variables with mean 0 and variance ¾ 2 and that Z(t) = ²(t ¡ 1) + ²(t) : (Note that under this model consecutive Z’s are correlated, but those separated in time by at least 2 periods are independent.) As it turns out, under this model t EF [Z(t + 1)jZ t ] = 1 X (¡1)j (t + 1 ¡ j)Z(t ¡ j) t + 2 j=0 while EF [Z(s)jZ t ] = 0 for s ¸ t + 2 : : If T (t) = 0 …nd optimal (MV) control strategies for two di¤erent situations involving numerical process adjustments a. (a) First suppose that A(a; s) = a for all s ¸ 1. (Note that in the limit as t ! 1, the MV controller is a “proportional-only” controller.) (b) Then suppose the impact of a control action is similar to that in (a), except there is one period of delay, i.e. ½ a for s ¸ 2 A(a; s) = 0 for s = 1 : (You should decide that a(t) = 0 is optimal.) (c) For the situation without dead time in part (a), write out Y (t) in terms of ²’s. What are the mean and variance of Y (t)? How do these compare to the mean and variance of Z(t)? Would you say from this comparison that the control algorithm is e¤ective in directing the process to the target T (t) = 0? (d) Again for the situation of part (a), consider the matter of process monitoring for a change from the model of this problem (that ought to be greeted by a revision of the control algorithm or some other appropriate intervention). Argue that after some start-up period it makes sense to Shewhart chart the Y (t)’s, treating them as essentially iid Normal (0; ¾2 ) if “all is OK.” (What is the correlation between Y (t) and Y (t ¡ 1)?) 98 CHAPTER 6. PROBLEMS 3.7. Consider the optimal stochastic control problem as described in §3.1 with Z(t) an iid normal (0; 1) sequence of random variables, control actions : a 2 (¡1; 1), A(a; s) = a for all s ¸ 1 and T (s) = 0 for all s. What do you expect the optimal (minimum variance) control strategy to turn out to be? Why? 3.8. (Vander Wiel) Consider a stochastic control problem with the following elements. The (stochastic) model, F, for the uncontrolled process, Z(t), will be Z(t) = ÁZ(t ¡ 1) + ²(t) where the ²(t) are iid normal (0; ¾2 ) random variables and Á is a (known) constant with absolute value less than 1. (Z(t) is a …rst order autoregressive process.) For this model, EF [Z(t + 1)j : : : ; Z(¡1); Z(0); Z(1); : : : ; Z(t)] = ÁZ(t) : For the function A(a; s) describing the e¤ect of a control action a taken s periods previous, we will use A(a; s) = a½s¡1 for another known constant 0 < ½ < 1 (the e¤ect of an adjustment made at a given period dies out geometrically). Carefully …nd a(0), a(1), and a(2) in terms of a constant target value T and Z(0), Y (1) and Y (2). Then argue that in general à ! t¡1 t X X s a(t) = T 1 + (Á ¡ ½) Á ¡ ÁY (t) ¡ (Á ¡ ½) Ás Y (t ¡ s) : s=0 s=1 For large t, this prescription reduces to approximately what? 3.9. Consider the following stochastic control problem. The stochastic model, F, for the uncontrolled process Z(t), will be Z(t) = ct + ²(t) where c is a known constant and the ²(t)’s are iid normal (0; ¾2 ) random variables. (The Z(t) process is a deterministic linear trend seen through iid/white noise.) For the function A(a; s) describing the e¤ect of a control action a taken s periods previous, we will use A(a; s) = (1 ¡ 2¡s )a for all s ¸ 1. Suppose further that the target value for the controlled process is T = 0 and that control begins at time 0 (after observing Z(0)). b =EF [Z(t+1)j : : : ; Z(¡1); Z(0); Z(1); : : : ; Z(t)] = (a) Argue carefully that Z(t) c(t + 1). 3. ENGINEERING CONTROL AND STOCHASTIC CONTROL THEORY99 (b) Find the minimum variance control algorithm and justify your answer. Does there seem to be a limiting form for a(t)? (c) According to the model here, the controlled process Y (t) should have what kind of behavior? (How would you describe the joint distribution of the variables Y (1); Y (2); : : : ; Y (t)?) Suppose that you decide to set up Shewhart type “control limits” to use in monitoring the Y (t) sequence. What values do you recommend for LCL and U CL in this situation? (These could be used as an on-line check on the continuing validity of the assumptions that we have made here about F and A(a; s).) 3.10. Consider the following optimal stochastic control problem. Suppose that for some (known) appropriate constants ® and ¯, the uncontrolled process Z(t) has the form Z(t) = ®Z(t ¡ 1) + ¯Z(t ¡ 2) + ²(t) for the ²’s iid with mean 0 and variance ¾2 . (The ²’s are independent of all previous Z’s.) Suppose further that for control actions a 2 (¡1; 1), A(a; 1) = 0 and A(a; s) = a for all s ¸ 2. (There is a one period delay, following which the full e¤ect of a control action is immediately felt.) For s ¸ 1, let T (s) be an arbitrary sequence of target values for the process. (a) Argue that EF [Z(t + 1)j : : : ; Z(t ¡ 2); Z(t ¡ 1); Z(t)] = ®Z(t) + ¯Z(t ¡ 1) and that EF [Z(t+2)j : : : ; Z(t¡2); Z(t¡1); Z(t)] = (®2 +¯)Z(t)+®¯Z(t¡1) : (b) Carefully …nd a(0), a(1) and a(2) in terms of Z(¡1), Z(0), Y (1), Y (2) and the T (s) sequence. (c) Finally, give a general form for the optimal control action to be taken at time t ¸ 3 in terms of : : : ; Z(¡1); Z(0); Y (1); Y (2); : : : ; Y (t) and a(0); a(1); : : : ; a(t ¡ 1). 3.11. Use the …rst order autoregressive model of Problem 3.8 and consider the two functions A(a; s) from Problem 3.6. Find the MV optimal control : polices (in terms of the Y ’s) for the T = 0 situation. Are either of these PID control algorithms? 100 CHAPTER 6. PROBLEMS 3.12. A process has a Good state and a Bad state. Every morning a gremlin tosses a coin with P [Heads] = u > :5 that governs how states evolve day to day. Let Ci = P [change state on day i from that on day i ¡ 1] . Each Ci is either u or 1 ¡ u. (a) Before the gremlin tosses the coin on day i, you get to choose whether Ci = u (so that Heads =) change) or Ci = 1 ¡ u (so that Heads =) no change) (You either apply some counter-measures or let the process evolve naturally.) Your object is to see that the process is in the Good state as often as possible. What is your optimal strategy? (What should you do on any morning i? This needs to depend upon the state of the process from day i ¡ 1.) (b) If all is as described here, the evolution of the states under your optimal strategy from (a) is easily described in probabilistic terms. Do so. Then describe in rough/qualitative terms how you might monitor the sequence of states to detect the possibility that the gremlin has somehow changed the rules of process evolution on you. (c) Now suppose that there is a one-day time delay in your countermeasures. Before the gremlin tosses his coin on day you get to choose only whether Ci+1 = u or Ci+1 = 1 ¡ u: (You do not get to choose Ci on the morning of day i.) Now what is your optimal strategy? (What you should choose on the morning of day i depends upon what you already chose on the morning of day (i ¡ 1) and whether the process was in the Good state or in the Bad state on day (i ¡ 1).) Show appropriate calculations to support your answer. 4. PROCESS CHARACTERIZATION 4 101 Process Characterization 4.1. The following are depth measurements taken on n = 8 pump end caps. The units are inches. 4:9991; 4:9990; 4:9994; 4:9989; 4:9986; 4:9991; 4:9993; 4:9990 The speci…cations for this depth measurement were 4:999 § :001 inches. (a) As a means of checking whether a normal distribution assumption is plausible for these depth measurements, make a normal plot of these data. (Use regular graph paper and the method of Section 5.1.) Read an estimate of ¾ from this plot. Regardless of the appearance of your plot from (a), henceforth suppose that one is willing to say that the process producing these lengths is stable and that a normal distribution of depths is plausible. (b) Give a point estimate and a 90% two-sided con…dence interval for the “process capability,” 6¾. (c) Give a point estimate and a 90% two-sided con…dence interval for the process capability ratio Cp . (d) Give a point estimate and a 95% lower con…dence bound for the process capability ratio Cpk . (e) Give a 95% two-sided prediction interval for the next depth measurement on a cap produced by this process. (f) Give a 99% two-sided tolerance interval for 95% of all depth measurements of end caps produced by this process. 4.2. Below are the logarithms of the amounts (in ppm by weight) of aluminum found in 26 bihourly samples of recovered PET plastic at a Rutgers University recycling plant taken from a JQT paper by Susan Albin. (In this context, aluminum is an impurity.) 5.67, 5.40, 4.83, 4.37, 4.98, 4.78, 5.50, 4.77, 5.20, 4.14, 3.40, 4.94, 4.62, 4.62, 4.47, 5.21, 4.09, 5.25, 4.78, 6.24, 4.79, 5.15, 4.25, 3.40, 4.50, 4.74 (a) Set up and plot charts for a sensible monitoring scheme for these values. (They are in order if one reads left to right, top to bottom.) 102 CHAPTER 6. PROBLEMS Caution: Simply computing a mean and sample standard deviation for these values and using “limits” for individuals of the form x ¹ § 3s does not produce a sensible scheme! Say clearly what you are doing and why. (b) Suppose that (on the basis of an analysis of the type in (a) or otherwise) it is plausible to treat the 26 values above as a sample of size n = 26 from some physically stable normally distributed process. (Note x ¹ ¼ 4:773 and s ¼ :632.) i. Give a two-sided interval that you are “90% sure” will contain the next log aluminum content of a sample taken at this plant. Transform this to an interval for the next raw aluminum content. ii. Give a two-sided interval that you are “95% sure” will contain 90% of all log aluminum contents. Transform this interval to one for raw aluminum contents. (c) Rather than adopting the “stable process” model alluded to in part (b) suppose that it is only plausible to assume that the log purity process is stable for periods of about 10 hours, but that mean purities can change (randomly) at roughly ten hour intervals. Note that if one considers the …rst 25 values above to be 5 samples of size 5, some summary statistics are then given below: period 1 2 3 4 5 x ¹ 5.050 4.878 4.410 5.114 4.418 s .506 .514 .590 .784 .661 R 1.30 1.36 1.54 2.15 1.75 Based on the usual random e¤ects model for this two-level “nested/hierarchical” situation, give reasonable point estimates of the within-period standard deviation and the standard deviation governing period to period changes in process mean. 4.3. A standard (in engineering statistics) approximation due to Wallis (used on page 468 of V&J) says that often it is adequate to treat the variable x ¹ § ks as if it were normal with mean ¹ § k¾ and variance µ ¶ 1 k2 ¾2 + : n 2n Use the Wallis approximation to the distribution of x ¹ + ks and …nd k such that for x1 ; x2 ; : : : ; x26 iid normal random variables, x ¹ + ks is a 99% upper statistical tolerance bound for 95% of the population. (That is, 4. PROCESS CHARACTERIZATION 103 ´ ³ ¸ :95] ¼ :99.) How does your job is to choose k so that P [© x¹+ks¡¹ ¾ your approximate value compare to the exact one given in Table A.9b? 4.4. Consider the problem of pooling together samples of size n from, say, …ve di¤erent days to make inferences about all widgets produced during that period. In particular, consider the problem of estimating the fraction of widgets with diameters that are outside of engineering speci…cations. Suppose that Ni = the number of widgets produced on day i pi = the fraction of widgets produced on day i that have diameters that are outside engineering speci…cations and p^i = the fraction of the ith sample that have out-of-spec. diameters . If the samples are simple random samples of the respective daily productions, standard …nite population sampling theory says that µ ¶ Ni ¡ 1 pi (1 ¡ pi ) E pbi = pi and Var pbi = : Ni ¡ n n Two possibly di¤erent estimators of the population fraction of diameters out of engineering speci…cations, p= 5 X Ni pi i=1 5 X ; Ni i=1 are pb = 5 X Ni pbi i=1 5 X i=1 Ni 5 and 1X p^i : p¹b = 5 i=1 ¹b need not be p unless all Ni are the same. Show that E^ p = p, but that Ep Assuming the independence of the p^i , what are the variances of p^ and ¹ pb ? Note that neither of these needs to equal µ ¶ N ¡1 p(1 ¡ p) : N ¡ 5n 5n 104 CHAPTER 6. PROBLEMS 4.5. Suppose that the hierarchical random e¤ects model used in Section 5.5 of V&J is a good description of how 500 widget diameters arise on each of 5 days in each of 10 weeks. (That is, suppose that the model is applicable with I = 10, J = 5 and K = 500.) Suppose further, that of interest is the grand (sample) variance of all 10 £ 5 £ 500 widget diameters. Use the expected mean squares and write out an expression for the expected value of this variance in terms of ¾®2 , ¾¯2 and ¾2 . Now suppose that one only observes 2 widget diameters each day for 5 weeks and in fact obtains the “data” in the accompanying table. From these data obtain point estimates of the variance components ¾®2 , ¾¯2 and ¾ 2 . Use these and your formula from above to predict the variance of all 10 £ 5 £ 500 widget diameters. Then make a similar prediction for the variance of the diameters from the next 10 weeks, supposing that the ¾®2 variance component could be eliminated. 4.6. Consider a situation in which a lot of 50,000 widgets has been packed into 100 crates, each of which contains 500 widgets. Suppose that unbeknownst to us, the lot consists of 25,000 widgets with diameter 5 and 25,000 widgets with diameter 7. We wish to estimate the variance of the widget diameters in the lot (which is 50,000/49,999). To do so, we decide to select 4 crates at random, and from each of those, select 5 widgets to measure. (a) One (not so smart) way to try and estimate the population variance is to simply compute the sample variance of the 20 widget diameters we end up with. Find the expected value of this estimator under two di¤erent scenarios: 1st where each of the 100 crates contains 250 widgets of diameter 5 and 250 widgets with diameter 7, and then 2nd where each crate contains widgets of only one diameter. What, in general terms, does this suggest about when the naive sample variance will produce decent estimates of the population variance? (b) Give the formula for an estimator of the population variance that is unbiased (i.e. has expected value equal to the population variance). 4.7. Consider the data of Table 5.8 in V&J and the use of the hierarchical normal random e¤ects model to describe their generation. (a) Find point estimates of the parameters ¾®2 and ¾2 based …rst on ranges and then on ANOVA mean squares. 4. PROCESS CHARACTERIZATION Week 1 Week 2 Week 3 Week 4 Week 5 Day M T W R F M T W R F M T W R F M T W R F M T W R F Table 6.3: Data for Problem 4.5 k=1 k=2 y¹ij s2ij 15:5 14:9 15:2 .18 15:2 15:2 15:2 0 14:2 14:2 14:2 0 14:3 14:3 14:3 0 15:8 16:4 16:1 .18 6:2 7:0 6:6 .32 7:2 8:4 7:8 .72 6:6 7:8 7:2 .72 6:2 7:6 6:9 .98 5:6 7:4 6:5 1:62 15:4 14:4 14:9 .50 13:9 13:3 13:6 .18 13:4 14:8 14:1 .98 12:5 14:1 13:3 1:28 13:2 15:0 14:1 1:62 10:9 11:3 11:1 .08 12:5 12:7 12:6 .02 12:3 11:7 12:0 .18 11:0 12:0 11:5 .50 12:3 13:3 12:8 .50 7:5 6:7 7:1 .32 6:7 7:3 7:0 .18 7:2 6:0 6:6 .72 7:6 7:6 7:6 0 6:3 7:1 6:7 .32 105 y¹i: s2Bi 15:0 .605 7:0 .275 14:0 .370 12:0 .515 7:0 .155 106 CHAPTER 6. PROBLEMS (b) Find a standard error for your ANOVA-based estimator of ¾®2 from (a). (c) Use the material in §1.5 and make a 90% two sided con…dence interval for ¾®2 . 4.8. All of the variance component estimation material presented in the text is based on balanced data assumptions. As it turns out, it is quite possible to do point estimation (based on sample variances) from even unbalanced data. A basic fact that enables this is the following: If X1 ; X2 ; : : : ; Xn are uncorrelated random variables, each with the same mean, then n Es2 = 1X Var Xi : n i=1 (Note that the usual fact that for iid Xi , Es2 = ¾ 2 , is a special case of this basic fact.) Consider the (hierarchical) random e¤ects model used in Section 5.5 of the text. In notation similar to that in Section 5.5 (but not assuming that data are balanced), let ¤ y¹ij = the sample mean of data values at level i of A and level j of B within A s¤2 ij = the sample variance of the data values at level i of A and level j of B within A ¤ y¹i¤ = the sample mean of the values y¹ij at level i of A ¤ s¤2 ¹ij at level i of A Bi = the sample variance of the values y and s¤2 ¹i¤ A = the sample variance of the values y Suppose that instead of being furnished with balanced data, one has a data set where 1) there are I = 2 levels of A, 2) level 1 of A has J1 = 2 levels of B while level 2 of A has J2 = 3 levels of B, and 3) level 1 of B within level 1 of A has n11 = 2 levels of C, level 2 of B within level 1 of A has n12 = 4 levels of C, levels 1 and 2 of B within level 2 of A have n21 = n22 = 2 levels of C and level 3 of B within level 2 of A has n23 = 3 levels of C. ´ ³ ´ ³ P 1 ¤2 ¤2 ¤2 ¤2 Evaluate the following: Es2pooled , E 15 i;j s¤2 ij , EsB1 , EsB2 , E 2 sB1 + sB2 , ³ ´ 1 ¤2 ¤2 ¤2 2 Es¤2 . Then …nd linear combinations of s , s + s pooled 2 A B1 B2 and sA that could sensibly used to estimate ¾¯2 and ¾®2 . 4. PROCESS CHARACTERIZATION 107 4.9. Suppose that on I = 2 di¤erent days (A), J = 4 di¤erent heats (B) of cast iron are studied, with K = 3 tests (C) being made on each. Suppose further that the resulting percent carbon measurements produce SSA = :0355, SSB(A) = :0081 and SSC(B(A)) = SSE = :4088. (a) If one completely ignores the hierarchical structure of the data set, what “sample variance” is produced? Does this quantity estimate the variance that would be produced if on many di¤erent days a single heat was selected and a single test made? Explain carefully! (Find the expected value of the grand sample variance under the hierarchical random e¤ects model and compare it to this variance of single measurements made on a single day.) (b) Give point estimates of the variance components ¾®2 ; ¾¯2 and ¾ 2 . (c) Your estimate of ¾®2 should involve a linear combination of mean squares. Give the variance of that linear combination in terms of the model parameters and I; J and K. Use that expression and propose a sensible estimated standard deviation (a standard error) for this linear combination. (See §1.4 and Problem 1.9.) 4.10. Consider the “one variable/second order” version of the “propagation of error” ideas discussed in Section 5.4 of the text. That is, for a random variable X with mean ¹ and standard deviation ¾2 and “nice” function g, let Y = g(X) and consider approximating EY and Var Y . A second order approximation of g made at the point x = ¹ is 1 g(x) ¼ g(¹) + g 0 (¹)(x ¡ ¹) + g 00 (¹)(x ¡ ¹)2 : 2 (Note that the approximating quadratic function has the same value, derivative and second derivative as g for the value x = ¹.) Let ·3 =E(X ¡ ¹)3 and ·4 =E(X ¡ ¹)4 . Based on the above preamble, carefully argue for the appropriateness of the following approximations: 1 EY ¼ g(¹) + g00 (¹)¾ 2 2 and 1 Var Y ¼ (g0 (¹))2 ¾2 + g 0 (¹)g 00 (¹)·3 + (g 00 (¹))2 (·4 ¡ ¾ 4 ) : 4 4.11. (Vander Wiel) A certain RCL network involving 2 resistors, 2 capacitors and a single inductor has a dynamic response characterized by the 108 CHAPTER 6. PROBLEMS “transfer function” Vout s2 + ³1 !1 s + !12 (s) = 2 ; Vin s + ³2 !2 s + !22 where !1 = (C2 L)¡1/2 ; µ ¶1/2 C1 + C2 !2 = ; LC1 C2 R2 ³1 = ; 2L!1 and ³2 = R1 + R2 : 2L!2 R1 and R2 are the resistances involved in ohms, C1 and C2 are the capacitances in Farads, and L is the value of the inductance in Henries. Standard circuit theory says that !1 and !2 are the “natural frequencies” of this network, !12 =!22 = C1 =(C1 + C2 ) is the “DC gain,” and ³1 and ³2 determine whether the zeros and poles are real or complex. Suppose that the circuit in question is to be massed produced using components with the following characteristics: EC1 = 1 399 F Var C1 = ¡ ¢2 1 3990 2 ER2 = 2- Var R1 = (3:8) ¡ 1 ¢2 Var C2 = 20 EL = 1H Var L = (:1)2 ER1 = 38EC2 = 12 F Var R2 = (:2)2 Treat C1 , R2 , C2 , R2 and L2 as independent random variables and use the propagation of error approximations to do the following: (a) Approximate the mean and standard deviation of the DC gains of the manufactured circuits. (b) Approximate the mean and standard deviation of the natural frequency !2 . 4. PROCESS CHARACTERIZATION 109 Now suppose that you are designing such an RCL circuit. To simplify things, use the capacitors and the inductor described above. You may choose the resistors, but their quality will be such that Var R1 = (ER1 =10)2 and Var R2 = (ER2 =10)2 : Your design goals are that ³2 should be (approximately) .5, and subject to this constraint, Var ³2 be minimum. (c) What values of ER1 and ER2 satisfy (approximately) the design goals, and what is the resulting (approximate) standard deviation of ³2 ? (Hint for part (c): The …rst design goal allows one to write ER2 as a function of ER1 . To satisfy the second design goal, use the propagation of error idea to write the (approximate) variance of ³2 as a function of ER1 only. By the way, the …rst design goal allows you to conclude that none of the partial derivatives needed in the propagation of error work depend on your choice of ER1 .) 4.12. Manufacturers wish to produce autos with attractive “…t and …nish,” part of which consists of uniform (and small) gaps between adjacent pieces of sheet metal (like, e.g., doors and their corresponding frames). The accompanying …gure is an idealized schematic of a situation of this kind, where we (at least temporarily) assume that edges of both a door and its frame are linear. (The coordinate system on this diagram is pictured as if its axes are “vertical” and “horizontal.” But the line on the body need not be an exactly “vertical” line, and whatever this line’s intended orientation relative to the ground, it is used to establish the coordinate system as indicated on the diagram.) On the …gure, we are concerned with gaps g1 and g2 . The …rst is at the level of the top hinge of the door and the second is d units “below” that level in the body coordinate system (d units “down” the door frame line from the initial measurement). People manufacturing the car body are responsible for the dimension w. People stamping the doors are responsible for the angles µ1 and µ2 and the dimension y. People welding the top door hinge to the door are responsible for the dimension x. And people hanging the door on the car are responsible for the angle Á. The quantities x; y; w; Á; µ1 and µ2 are measurable and can be used in manufacturing to 110 CHAPTER 6. PROBLEMS y q p x θ2 θ1 top hinge and origin of the body coordinate system w g1 r s d g2 t u φ door line of body door frame Figure 6.1: Figure for Problem 4.12 verify that the various folks are “doing their jobs.” A door design engineer has to set nominal values for and produce tolerances for variation in these quantities. This problem is concerned with how the propagation of errors method might help in this tolerancing enterprise, through an analysis of how variation in x; y; w; Á; µ1 and µ2 propagates to g1 ; g2 and g1 ¡ g2 . If I have correctly done my geometry/trigonometry, the following relationships hold for labeled points on the diagram: p = (¡x sin Á; x cos Á) ³ ³ ³ ³ ¼ ´´ ¼ ´´ q = p + (y cos Á + µ1 ¡ ; y sin Á + µ1 ¡ 2 2 s = (q1 + q2 tan (Á + µ1 + µ2 ¡ ¼) ; 0) and u = (q1 + (q2 + d) tan (Á + µ1 + µ2 ¡ ¼) ; ¡d) : Then for the idealized problem here (with perfectly linear edges) we have g1 = w ¡ s1 4. PROCESS CHARACTERIZATION 111 and g2 = w ¡ u1 : Actually, in an attempt to allow for the notion of “form error” in the ideally linear edges, one might propose that at a given distance “below” the origin of the body coordinate system the realized edge of a real geometry is its nominal position plus a “form error.” Then instead of dealing with g1 and g2 , one might consider the gaps g1¤ = g1 + ²1 ¡ ²2 and g2¤ = g2 + ²3 ¡ ²4 ; for body form errors ²1 and ²3 and door form errors ²2 and ²4 . (The interpretation of additive “form errors” around the line of the body door frame is perhaps fairly clear, since “the error” at a given level is measured perpendicular to the “body line” and is thus well-de…ned for a given realized body geometry. The interpretation of an additive error on the right side “door line” is not so clear, since in general one will not be measuring perpendicular to the line of the door, or even at any consistent angle with it. So for a realized geometry, what “form error” to associate with a given point on the ideal line or exactly how to model it is not completely clear. We’ll ignore this logical problem and proceed using the models above.) We’ll use d = 40 cm, and below are two possible sets of nominal values for the parameters of the door assembly: Design A x = 20 cm y = 90 cm w = 90:4 cm Á=0 µ1 = ¼2 µ2 = ¼2 Design B x = 20 cm y = 90 cm ¼ w = (90 cos 10 + :4) cm ¼ Á = 10 µ1 = ¼2 µ2 = 4¼ 10 Partial derivatives of g1 and g2 (evaluated at the design nominal values of x; y; w; Á; µ1 and µ2 ) are: 112 CHAPTER 6. PROBLEMS Design A @g1 @x = 0 @g1 @y = ¡1 @g1 @w = 1 @g1 @Á = 0 @g1 @µ1 = ¡20 @g1 @µ2 = ¡20 Design B @g1 @x = :309 @g1 @y = ¡:951 @g1 @w = 1 @g1 @Á = 0 @g1 @µ1 = ¡19:021 @g1 @µ2 = ¡46:833 @g2 @x @g2 @y @g2 @w @g2 @Á @g2 @µ1 @g2 @µ2 @g2 @x @g2 @y @g2 @w @g2 @Á @g2 @µ1 @g2 @µ2 =0 = ¡1 =1 = ¡40 = ¡60 = ¡60 = :309 = ¡:951 =1 = ¡40 = ¡59:02 = ¡86:833 (a) Suppose that a door engineer must eventually produce tolerances for x; y; w; Á; µ1 and µ2 that are consistent with “§:1 cm” tolerances on g1 and g2 . If we interpret “§:1 cm” tolerances to mean ¾g1 and ¾g2 are no more than .033 cm, consider the set of “sigmas” ¾x = :01 cm ¾y = :01 cm ¾w = :01 cm ¾Á = :001 rad ¾µ1 = :001 rad ¾µ2 = :001 rad First for Design A and then for Design B, investigate whether this set of “sigmas” is consistent with the necessary …nal tolerances on g1 and g2 in two di¤erent ways. Make propagation of error approximations to ¾g1 and ¾g2 . Then simulate 100 values of both g1 and g2 using independent normal random variables x; y; w; Á; µ1 and µ2 with means equal to the design nominals and these standard deviations. (Compute the sample standard deviations of the simulated values and compare to the .033 cm target.) (b) One of the assumptions standing behind the propagation of error approximations is the independence of the input random variables. Brie‡y discuss why independence of the variables µ1 and µ2 may not be such a great model assumption in this problem. (c) Notice that for Design A the propagation of error formula predicts that variation on the dimension x will not much a¤ect the gaps presently of interest, g1 and g2 , while the situation is di¤erent for 4. PROCESS CHARACTERIZATION 113 Design B. Argue, based on the nominal geometries, that this makes perfectly good sense. For Design A, one might say that the gaps g1 and g2 are “robust” to variation in x. For this design, do you think that the entire “…t” of the door to the body of the car is going to be “robust to variation in x”? Explain. 1 (Note, by the way, that the fact that @g @Á = 0 for Design A also makes this design look completely “robust to variation in Á” in terms of the gap g1 , at least by standards of the propagation of error formula. But the situation for this variable is somewhat di¤erent than for x. This partial derivative is equal to 0 because for y; w; Á; µ1 and µ2 at their nominal values, g1 considered as a function of Á alone has a local minimum at Á = 0. This is di¤erent from g1 being constant in Á. A more re…ned “second order” propagation of error analysis of this problem, that essentially begins from a quadratic approximation to g1 instead of a linear one, would distinguish between these two possibilities. But the “…rst order” analysis done on the basis of formula (5.27) of the text is often helpful and adequate for practical purposes.) (d) What does the propagation of error formula predict for variation in the di¤erence g1 ¡ g2 , …rst for Design A, and then for Design B? (e) Suppose that one desires to take into account the possibility of “form errors” a¤ecting the gaps, and thus considers analysis of g1¤ and g2¤ instead of g1 and g2 . If standard deviations for the variables ² are all .001 cm, what does the propagation of error analysis predict for variability in g1¤ and g2¤ for Design A? 4.13. The electrical resistivity, ½, of a wire is a property of the material involved and the temperature at which it is measured. At a given temperature, if a cylindrical piece of wire of length L and (constant) cross-sectional area A has resistance R, then the material’s resistivity is calculated as ½= RA : L In a lab exercise intended to determine the resistivity of copper at 20± C, students measure the length, diameter and resistance of a wire assumed to have circular cross-sections. Suppose the length is approximately 1 meter, the diameter is approximately 2:0 £ 10¡3 meters and the resistance is approximately :54 £ 10¡2 -. Suppose further that the precisions of the 114 CHAPTER 6. PROBLEMS measuring equipment used in the lab are such that standard deviations ¾L = 10¡3 meter, ¾D = 10¡4 meter and ¾R = 10¡4 - are appropriate. (a) Find an approximate standard deviation that might be used to describe the precision associated with an experimentally derived value of ½. (b) Imprecision in which of the measurements appears to be the biggest contributor to imprecision in experimentally determined values of ½? (Explain.) (c) One should probably expect the approximate standard deviation derived here to under-predict the kind of variation that would actually be observed in such lab exercises over a period of years. Explain why this is so. 4.14. A bullet is …red horizontally into a block (of much larger mass) suspended by a long cord, and the impact causes the block and embedded bullet to swing upward a distance d measured vertically from the block’s lowest position. The laws of mechanics can be invoked to argue that if d is measured in feet, and before testing the block weighs w1 , while the block and embedded bullet together weigh w2 (in the same units), then the velocity (in fps) of the bullet just before impact with the block is approximately µ ¶p w2 v= 64:4 ¢ d : w2 ¡ w1 Suppose that the bullet involved weighs about .05 lb, the block involved weighs about 10.00 lb and that both w1 and w2 can be determined with a standard deviation of about .005 lb. Suppose further that the distance d is about .50 ft, and can be determined with a standard deviation of .03 ft. (a) Compute an approximate standard deviation describing the uncertainty in an experimentally derived value of v. (b) Would you say that the uncertainties in the weights contribute more to the uncertainty in v than the uncertainty in the distance? Explain. (c) Say why one should probably think of calculations like those in part (a) as only providing some kind of approximate lower bound on the uncertainty that should be associated with the bullet’s velocity. 4.15. On page 243 of V&J there is an ANOVA table for a balanced hierarchical data set. Use it in what follows. 5. SAMPLING INSPECTION 115 (a) Find standard errors for the usual ANOVA estimates of ¾®2 and ¾ 2 (the “casting” and “analysis” variance components). (b) If you were to later make 100 castings, cut 4 specimens from each of these and make a single lab analysis on each specimen, give a (numerical) prediction of the overall sample variance of these future 400 measurements (based on the hierarchical random e¤ects model and the ANOVA estimates of ¾®2 , ¾¯2 and ¾ 2 ). 5 Sampling Inspection Methods 5.1. Consider attributes single sampling. (a) Make type A OC curves for N = 20, n = 5 and c = 0 and 1, for both percent defective and mean defects per unit situations. (b) Make type B OC curves for n = 5, c = 0, 1 and 2 for both percent defective and mean defects per unit situations. (c) Use the imperfect inspection analysis presented in §5.2 and …nd OC bands for the percent defective cases above with c = 1 under the assumption that wD · :1 and wG · :1. 5.2. Consider single sampling for percent defective. (a) Make approximate OC curves for n = 100, c = 1; n = 200, c = 2; and n = 300, c = 3. (b) Make AOQ and ATI curves for a rectifying inspection scheme using a plan with n = 200 and c = 2 for lots of size N = 10; 000. What is the AOQL? 5.3. Find attributes single sampling plans (i.e. …nd n and c) having approximately (a) P a = :95 if p = :01 and P a = :10 if p = :03. (b) P a = :95 if p = 10¡6 and P a = :10 if p = 3 £ 10¡6 . 5.4. Consider a (truncated sequential) attributes acceptance sampling plan, that for Xn = the number of defective items found through the nth item inspected 116 CHAPTER 6. PROBLEMS rejects the lot if it ever happens that Xn ¸ 1:5 + :5n, accepts the lot if it ever happens that Xn · ¡1:5 + :5n, and further never samples more than 11 items. We will suppose that if sampling were extended to n = 11, we would accept for X11 = 4 or 5 and reject for X11 = 6 or 7 and thus note that sampling can be curtailed at n = 10 if X10 = 4 or 6. (a) Find expressions for the OC and ASN for this plan. (b) Find formulas for the AOQ and ATI of this plan, if it is used in a rectifying inspection scheme for lots of size N = 100. 5.5. Consider single sampling based on a normally distributed variable. (a) Find a single limit variables sampling plan with L = 1:000, ¾ = :015, p1 = :03, P a1 = :95, p2 = :10 and P a2 = :10. Sketch the OC curve of this plan. How does n compare with what would be required for an attributes sampling plan with a comparable OC curve? (b) Find a double limits variables sampling plan with L = :49, U = :51 ¾ = :004, p1 = :03, P a1 = :95, p2 = :10 and P a2 = :10. Sketch the OC curve of this plan. How does n compare with what would be required for an attributes sampling plan with a comparable OC curve? (c) Use the Wallis approximation and …nd a single limit variables sampling plan for L = 1:000, p1 = :03, P a1 = :95, p2 = :10 and P a2 = :10. Sketch an approximate OC curve for this plan. 5.6. In contrast to what you found in Problem 5.3(b), make use of the fact that the upper 10¡6 point of the standard normal distribution is about 4.753, while the upper 3 £ 10¡6 point is about 4.526 and …nd the n required for a known ¾ single limit variables acceptance sampling plan to have P a = :95 if p = 10¡6 and P a = :10 if p = 3 £ 10¡6 . What is the Achilles heel (fatal weakness) of these calculations? 5.7. Consider the CSP-1 plan with i = 100 and f = :02. Make AFI and AOQ plots for this plan and …nd the AOQL for both cases where defectives are recti…ed and where they are culled. 5.8. Consider the classical problem of acceptance sampling plan design. Suppose that one wants plans whose OC “drops” near p = :03 (wants P a ¼ :5 for p = :03) also wants p = :04 to have P a ¼ :05. 5. SAMPLING INSPECTION 117 (a) Design an attributes single sampling plan approximately meeting the above criteria. Suppose that in fact “nonconforming” is de…ned in terms of a measured variable, X, being less than a lower speci…cation L = 13, and that it is sensible to use a normal model for X. (b) Design a “known ¾” variables plan for the above criteria if ¾ = 1. (c) Design an “unknown ¾” variables plan for the above criteria. Theory 5.9. Consider variables acceptance sampling based on exponentially distributed observations, supposing that there is a single lower limit L = :2107. (a) Find means corresponding to fractions defective p = :10 and p = :19. (b) Use the Central Limit Theorem to …nd a number k and sample size n so that an acceptance sampling plan that rejects a lot if x ¹ < k has P a = :95 for p = :10 and P a = :10 for p = :19. (c) Sketch an OC curve for your plan from (b). 5.10. Consider the situation of a consumer who will repeatedly receive lots of 1500 assemblies. These assemblies may be tested at a cost of $24 apiece or simply be put directly into a production stream with a later extra manufacturing cost of $780 occurring for each defective that is undetected because it was not tested. We’ll assume that the supplier replaces any assembly found to be defective (either at the testing stage or later when the extra $780 cost occurs) with a guaranteed good assembly at no additional cost to the consumer. Suppose further that the producer of the assemblies has agreed to establish statistical control with p = :02. (a) Adopt perspective B with p known to be .02 and compare the mean per-lot costs of the following 3 policies: i. test the whole lot, ii. test none of the lot, and iii. go to Mil. Std. 105D with AQL= :025 and adopt an inspection level II, normal inspection single sampling plan (i.e. n = 125 and c = 7), doing 100% inspection of rejected lots. (This by the 118 CHAPTER 6. PROBLEMS way, is not a recommended “use” of the standard. It is designed to “guarantee” a consumer the desired AQL only when all the switching rules are employed. I’m abusing the standard.) (b) Adopt the point of view that in the short term, perspective B may be appropriate, but that over the long term the supplier’s p vacillates between .02 and .04. In fact, suppose that for successive lots the pi = perspective B p at the time lot i is produced are independent random variables, with P [pi = :02] = P [pi = :04] = :5. Now compare the mean costs of policies i), ii) and iii) from (a) used repeatedly. (c) Suppose that the scenario in (b) is modi…ed by the fact that the consumer gets control charts from the supplier in time to determine whether for a given lot, perspective B with p = :02 or p = :04 is appropriate. What should the consumer’s inspection policy be, and what is its mean cost of application? 5.11. Suppose that the fractions defective in successive large lots of …xed size N can be modeled as iid Beta (®; ¯) random variables with ® = 1 and ¯ = 9. Suppose that these lots are subjected to attributes acceptance sampling, using n = 100 and c = 1. Find the conditional distribution of p given that the lot is accepted. Sketch probability densities for both the original Beta distribution and this conditional distribution of p given lot acceptance. 5.12. Consider the following variation on the “Deming Inspection Problem” discussed in §5.3. Each item in an incoming lot of size N will be Good (G), Marginal (M) or Defective (D). Some form of (single) sampling inspection is contemplated based on counts of G’s, D’s and M’s. There will be a per-item inspection cost of k1 for any item inspected, while any M’s going uninspected will eventually produce a cost of k2 , and any D’s going uninspected will produce a cost of k3 > k2 . Adopt perspective B, i.e. that any given incoming lot was produced under some set of stable conditions, characterized here by probabilities pG , pM and pD that any given item in that lot is respectively G, M or D. (a) Argue carefully that the “All or None” criterion is in force here and identify the condition on the p’s under which “All” is optimal and the condition under which “None” is optimal. 5. SAMPLING INSPECTION 119 (b) If pG , pM and pD are not known, but rather are described by a joint probability distribution, n other than N or 0 can turn out to be optimal. A particularly convenient distribution to use in describing the p’s is the Dirichlet distribution (it is the multivariate generalization of the Beta distribution for variables that must add up to 1). For a Dirichlet distribution with parameters ®G > 0, ®M > 0 and ®D > 0, it turns out that if XG , XM and XD are the counts of G’s, M’s and D’s in a sample of n items, then E[pG jXG ; XM ; XD ] = ®G + XG ®G + ®M + ®D + n E[pM jXG ; XM ; XD ] = ®M + XM ®G + ®M + ®D + n E[pD jXG ; XM ; XD ] = ®D + XD : ®G + ®M + ®D + n and Use these expressions and describe what an optimal lot disposal (acceptance or rejection) is, if a Dirichlet distribution is used to describe the p’s and a sample of n items yields counts XG , XM and XD . 5.13. Consider the Deming Inspection Problem exactly as discussed in §5.3. Suppose that k1 = $50, k2 = $500, N = 200 and one’s a priori beliefs are such that one would describe p with a (Beta) distribution with mean .1 and standard deviation .090453. For what values of n are respectively c = 0, 1 and 2 optimal? If you are brave (and either have a pretty good calculator or are fairly quick with computing) compute the expected total costs associated with these values of n (obtained using the corresponding copt (n)). From these calculations, what (n; c) pair appears to be optimal? 5.14. Consider the problem of estimating the process fraction defective based on the results of an “inverse sampling plan” that samples until 2 defective items have been found. Find the UMVUE of p in terms of the random variable n = the number of items required to …nd the second defective. Show directly that this estimator of p is unbiased (i.e. has expected value equal to p). Write out a series giving the variance of this estimator. 5.15. The paper “The Economics of Sampling Inspection“ by Bernard Smith (that appeared in Industrial Quality Control in 1965 and is based on earlier theoretical work of Guthrie and Johns) gives a closed form expression for an approximately optimal n in the Deming inspection problem for cases 120 CHAPTER 6. PROBLEMS where p has a Beta(®; ¯) prior distribution and both ® and ¯ are integers. Smith says v u ¯ N ¢ B(®; ¯)p® u 0 (1 ¡ p0 ) ´ nopt ¼ t ³ ® Bi(® + 1j® + ¯; p0 ) 2 p0 Bi(®j® + ¯ ¡ 1; p0 ) ¡ ®+¯ for p0 ´ k1 =k2 the break-even quantity, B(¢; ¢) the usual beta function and Bi(xjn; p) the probability that a binomial (n; p) random variable takes a value of x or more. Suppose that k1 = $50, k2 = $500, N = 200 and our a priori beliefs about p (or the “process curve”) are such that it is sensible to describe p as having mean .1 and standard deviation .090453. What …xed n inspection plan follows from the Smith formula? 5.16. Consider the Deming inspection scenario as discussed in §5.3. Suppose that N = 3, k1 = 1:5, k2 = 10 and a prior distribution G assigns P [p = :1] = :5 and P [p = :2] = :5. Find the optimal …xed n inspection plan by doing the following. (a) For sample sizes n = 1 and n = 2, determine the corresponding optimal acceptance numbers, copt G (n). (b) For sample sizes n = 0, 1, 2 and 3 …nd the expected total costs associated with those sample sizes if corresponding best acceptance numbers are used. 5.17. Consider the Deming inspection scenario once again. With N = 100, k1 = 1 and k2 = 10, write out the …xed p expected total cost associated with a particular choice of n and c. Note that “None” is optimal for p < :1 and “All” is optimal for p > :1. So, in some sense, what is exactly optimal is highly discontinuous in p. On the other hand, if p is “near” .1, it doesn’t matter much what inspection plan one adopts, “All,” “None” or anything else for that matter. To see this, write out as a function of p worst possible expected total cost(p) ¡ best possible expected total cost(p) : best possible expected total cost(p) How big can this quantity get, e.g., on the interval [.09,.11]? 5.18. Consider the following percent defective acceptance sampling scheme. One will sample items one at a time up to a maximum of 8 items. If at any point in the sampling, half or more of the items inspected are defective, sampling will cease and the lot will be rejected. If the maximum 8 items are inspected without rejecting the lot, the lot will be accepted. 5. SAMPLING INSPECTION 121 (a) Find expressions for the type B Operating Characteristic and the ASN of this plan. (b) Find an expression for the type A Operating Characteristic of this plan if lots of N = 50 items are involved. (c) Find expressions for the type B AOQ and ATI of this plan for lots of size N = 50. (d) What is the (uniformly) minimum variance unbiased estimator of p for this plan? (Say what value one should estimate for every possible stop-sampling point.) 5.19. Vardeman argued in §5.3 that if one adopts perspective B with known p and costs are assessed as the sum of identically calculated costs associated with individual items, either “All” or “None” inspection plans will be optimal. Consider the following two scenarios (that lack one or the other of these assumptions) and show that in each the “All or None” paradigm fails to hold. (a) Consider the Deming inspection scenario discussed in §5.3, with k1 = $1 and k2 = $100 and suppose lots of N = 5 are involved. Suppose that one adopts not perspective B, but instead perspective A, and that p is known to be .2 (a lot contains exactly 1 defective). Find the expected total costs associated with “All” and then with “None” inspection. Then suggest a sequential inspection plan that has smaller expected total cost than either “All” or “None.” (Find the expected total cost of your suggested plan and verify that it is smaller than that for both “All” and “None” inspection plans.) (b) Consider perspective B with p known to be .4. Suppose lots of size N = 5 are involved and costs are assessed as follows. Each inspection costs $1 and defective items are replaced with good items at no charge. If the lot fails to contain at least one good item (and this goes undetected) a penalty of $1000 will be incurred, but otherwise the only costs charged are for inspection. Find the expected total costs associated with “All” and then with “None” inspection. Then argue convincingly that there is a better “…xed n” plan. (Say clearly what plan is superior and show that its expected total cost is less than both “All” and “None“ inspection.) 5.20. Consider the following nonstandard “variables” acceptance sampling situation. A supplier has both a high quality/low variance production line 122 CHAPTER 6. PROBLEMS (#1) and a low quality/high variance production line (#2) used to manufacture widgets ordered by Company V. Coded values of a critical dimension of these widgets produced on the high quality line are normally distributed with ¹1 = 0 and ¾1 = 1, while coded values of this dimension produced on the low quality line are normally distributed with ¹2 = 0 and ¾2 = 2. Coded speci…cations for this dimension are L = ¡3 and U = 3. The supplier is known to mix output from the two lines in lots sent to Company V. As a cost saving measure, this is acceptable to Company V, provided the fraction of “out-of-spec.” widgets does not become too large. Company V expects ¼ = the proportion of items in a lot coming from the high variance line (#2) to vary lot to lot and decides to institute a kind of incoming variables acceptance sampling scheme. What will be done is the following. The critical dimension, X, will be measured on each of n items sampled from a lot. For each measurement X, the value Y = X 2 will be calculated. Then, for a properly chosen constant, k, the lot will be accepted if Y¹ · k and rejected if Y¹ > k. The purpose of this problem is to identify suitable n and k, if P a ¼ :95 is desired for lots with p = :01 and P a ¼ :05 is desired for lots with p = :03. (a) Find an expression for p (the long run fraction defective) as a function of ¼. What values of ¼ correspond to p = :01 and p = :03 respectively? (b) It is possible to show (you need not do so here) that EY = 3¼ + 1 and Var Y = ¡9¼2 + 39¼ + 2. Use these facts, your answer to (a) and the Central Limit Theorem to help you identify suitable values of n and k to use at Company V. 5.21. On what basis is it sensible to criticize the relevance of the calculations usually employed to characterize the performance of continuous sampling plans? 5.22. Individual items produced on a manufacturer’s line may be graded as “Good” (G), “Marginal” (M) or “Defective” (D). Under stable process conditions, each successive item is (independently) G with probability pG , M with probability pM and D with probability pD , where pG +pM +pD = 1. Suppose that ultimately, defective items cause three times as much extra expense as marginal ones. 5. SAMPLING INSPECTION 123 Based on the kind of cost information alluded to above, one might give each inspected item a “score” s according to 8 < 3 if the item is D s= 1 if the item is M : 0 if the item is G : It is possible to argue (don’t bother to do so here) that Es = 3pD + pM and Var s = 9pD (1 ¡ pD ) + pM (1 ¡ pM ) ¡ 3pD pM . (a) Give formulas for standards-given Shewhart control limits for average scores s¹ based on samples of size n. Describe how you would obtain the information necessary to calculate limits for future control of s¹. (b) Ultimately, suppose that “standard” values are set at pG = :90, pM = :07 and pD = :03 and n = 100 is used for samples of a high volume product. Use a normal approximation to the distribution of s¹ and …nd an approximate ARL for your scheme from part (a) if in fact the mix of items shifts to where pG = :85; pM = :10 and pD = :05. (c) Suppose that one decides to use a high side CUSUM scheme to monitor individual scores as they come in one at a time. Consider a scheme with k1 = 1 and no head-start that signals the …rst time that a CUSUM of scores of at least h1 = 6 is reached. Set up an appropriate transition matrix and say how you would use that matrix to …nd an ARL for this scheme for an arbitrary set of probabilities (pG ; pM ; pD ). (d) Suppose that inspecting an item costs 1/5th of the extra expense caused by an undetected marginal item. A plausible (single sampling) acceptance sampling plan for lots of N = 10; 000 of these items then accepts the lot if s¹ · :20 : If rejection of the lot will result in 100% inspection of the remainder, consider the (“perspective B”) economic choice of sample size for plans of this form, in particular the comparison of n = 100 and n = 400 plans. The following table gives some approximate acceptance probabilities for these plans under two sets of probabilities p = (pG ; pM; pD ). n = 100 n = 400 p = (:9; :07; :03) P a ¼ :76 P a ¼ :92 p = (:85; :10; :05) P a ¼ :24 P a ¼ :08 124 CHAPTER 6. PROBLEMS Find expected costs for these two plans (n = 100 and n = 400) if costs are accrued on a per-item and per-inspection basis and “prior” probabilities of these two sets of process conditions are respectively .8 for p = (:9; :07; :03) and .2 for p = (:85; :10; :05). 5.23. Consider variables acceptance sampling for a quantity X that has engineering speci…cations L = 3 and U = 5. We will further suppose that X has standard deviation ¾ = :2. (a) Suppose that X is uniformly distributed with mean ¹. That is, suppose that X has probability density ½ 1:4434 if ¹ ¡ :3464 < x < ¹ + :3464 f(x) = 0 otherwise : What means ¹1 and ¹2 correspond to fractions defective p1 = :01 and p2 = :03? (b) Find a sample size n and number k such that a variables acceptance sampling plan that accepts a lot when 4 ¡ k < x ¹ < 4 + k and rejects it otherwise, has P a1 ¼ :95 for p1 = :01 and P a2 ¼ :10 for p2 = :03 when, as in part (a), observations are uniformly distributed with mean ¹ and standard deviation ¾ = :2. (c) Suppose that one applies your plan from (b), but instead of being uniformly distributed with mean ¹ and standard deviation ¾ = :2, observations are normal with that mean and standard deviation. What acceptance probability then accompanies a fraction defective p1 = :01? 5.24. A large lot of containers are each full of a solution of several gases. Suppose that in a given container the fraction of the solution that is gas A can be described with the probability density ½ (µ + 1)xµ x 2 (0; 1) f (x) = 0 otherwise : For this density, it is possible to show that EX = (µ + 1)=(µ + 2) and VarX = (µ + 1)=(µ + 2)2 (µ + 3). Containers with X < :1 are considered defective and we wish to do acceptance sampling to hopefully screen lots with large p. (a) Find the values of µ corresponding to fractions defective p1 = :01 and p2 = :03. 5. SAMPLING INSPECTION 125 (b) Use the Central Limit Theorem and …nd a number k and a sample size n so that an acceptance sampling plan that rejects if x ¹ < k has P a1 = :95 and P a2 = :10. 5.25. A measurement has an upper speci…cation U = 5:0. Making a normal distribution assumption with ¾ = :015 and desiring P a1 = :95 for p1 = :03 and P a2 = :10 for p2 = :10, a statistician sets up a variables acceptance sampling plan for a sample of size n = 23 that rejects a lot if x ¹ > 4:97685. In fact, a Weibull distribution with shape parameter ¯ = 400 and scale parameter ® is a better description of this characteristic than the normal distribution the statistician used. This alternative distribution has cdf ½ 0 if x < 0 ¡ x ¢400 F (xj®) = 1 ¡ exp(¡ ® ) if x > 0 ; and mean ¹ ¼ :9986® and standard deviation ¾ = :0032®. Show how to obtain an approximate OC curve for the statistician’s acceptance sampling plan under this Weibull model. (Use the Central Limit Theorem.) Use your method to …nd the real acceptance probability if p = :03. 5.26. Here’s a prescription for a possible fraction nonconforming attributes acceptance sampling plan: stop and reject the lot the …rst time that Xn ¸ 2 + n 2 stop and accept the lot the …rst time that n ¡ Xn ¸ 2 + n 2 (a) Find a formula for the OC for this “symmetric wedge-shaped plan.” (One never samples more than 7 items and there are exactly 8 stop sampling points prescribed by the rules above.) (b) Consider the use of this plan where lots of size N = 100 are subjected to rectifying inspection and inspection error is possible. (Assume that any item inspected and classi…ed as defective is replaced with one drawn from a population that is in fact a fraction p defective and has been inspected and classi…ed as good.) Use the parameters wG and wD de…ned in §5.2 of the notes and give a formula for the real AOQ of this plan as a function of p, wG and wD . 5.27. Consider a “perspective A” economic analysis of some fraction defective “…xed n inspection plans.” (Don’t simply try to use the type B calculations made in class. They aren’t relevant. Work this out from …rst principles.) 126 CHAPTER 6. PROBLEMS Suppose that N = 10, k1 = 1 and k2 = 10 in a “Deming Inspection Problem” cost structure. Suppose further that a “prior” distribution for p (the actual lot fraction defective) places equal probabilities on p = 0; :1 and :2 . Here we will consider only plans with n = 0; 1 or 2. Let X = the number of defectives in a simple random sample from the lot (a) For n = 1, …nd the conditional distributions of p given X = x. For n = 2, it turns out that the joint distribution of X and p is: x 0 1 2 0 :333 0 0 :333 p :1 :267 :067 0 :333 :2 :207 :119 :007 :333 :807 :185 :007 and the conditionals of p given X = x are: x 0 1 2 0 :413 0 0 p :1 :330 :360 0 :2 :2257 :640 1:00 (b) Use your answer to (a) and show that the best n = 1 plan REJECTS if X = 0 and ACCEPTS if X = 1. (Yes, this is correct!) Then use the conditionals above for n = 2 and show that the best n = 2 plan REJECTS if X = 0 and ACCEPTS if X = 1 or 2. (c) Standard acceptance sampling plans REJECT FOR LARGE X. Explain in qualitative terms why the best plans from (b) are not of this form. (d) Which sample size (n = 0; 1 or 2) is best here? (Show calculations to support your answer.) A Useful Probabilistic Approximation Here we present the general “delta method” or “propagation of error” approximation that stands behind several variance approximations in these notes as well as much of §5.4 of V&J. Suppose that a p £ 1 random vector 1 0 X1 B X2 C C B X=B . C @ .. A Xp has a mean vector 0 B B ¹=B @ EX1 EX2 .. . EXp 1 0 C B C B C=B A @ ¹1 ¹2 .. . ¹p 1 C C C A and p £ p variance-covariance matrix 0 VarX1 Cov (X1 ; X2 ) ¢ ¢ ¢ Cov (X1 ; Xp¡1 ) Cov (X1 ; Xp ) B Cov (X1 ; X2 ) VarX ¢ ¢ ¢ Cov (X ; X ) Cov (X2 ; Xp ) 2 2 p¡1 B B . . . .. . .. .. .. .. §=B . B @ Cov (X1 ; Xp¡1 ) Cov (X2 ; Xp¡1 ) ¢ ¢ ¢ VarXp¡1 Cov (Xp¡1 ; Xp ) Cov (X1 ; Xp ) Cov (X2 ; Xp ) ¢ ¢ ¢ Cov (Xp¡1 ; Xp ) VarXp 0 1 ¾12 ½12 ¾1 ¾2 ¢ ¢ ¢ ½1;p¡1 ¾1 ¾p¡1 ½1p ¾1 ¾p 2 B ½12 ¾1 ¾2 C ¾ ¢ ¢ ¢ ½ ¾ ¾ ½ 2;p¡1 2 p¡1 2p ¾2 ¾p 2 B C B C . . . . . .. .. .. .. .. =B C B C 2 @ ½2p ¾2 ¾p ½2;p¡1 ¾2 ¾p¡1 ¢ ¢ ¢ ¾p¡1 ½p¡1;p ¾p¡1 ¾p A ½1p ¾1 ¾p ½2p ¾2 ¾p ¢ ¢ ¢ ½p¡1;p ¾p¡1 ¾p ¾p2 = (½ij ¾i ¾j ) (Recall that if X1 and Xj are independent, ½ij = 0.) 127 1 C C C C C A 128 A USEFUL PROBABILISTIC APPROXIMATION Then for a k £ p matrix of constants A = (aij ) consider the random vector Y = A X k£1 k£p p£1 It is a standard piece of probability that Y has mean vector 0 1 EY1 B EY2 C B C B .. C = A ¹ @ . A EYk and variance-covariance matrix Cov Y = A § A0 (The k = 1 version of this for uncorrelated Xi is essentially quoted in (5.23) and (5.24) of V&J.) The propagation of error method says that if instead of the relationship Y = A X, I concern myself with k functions g1 ; g2 ; :::; gk (each mapping Rp to R) and de…ne 1 0 g1 (X) B g2 (X) C C B Y =B C .. A @ . gk (X) a multivariate Taylor’s Theorem argument and the facts above provide an approximate mean vector and an approximate covariance matrix for Y . That is, if the functions gi are di¤erentiable, let à ! ¯ @gi ¯¯ D = k£p @xj ¯¹1 ;¹2 ;:::;¹p A multivariate Taylor approximation says that for each xi near ¹i 0 1 0 1 g1 (x) g1 (¹) B g2 (x) C B g2 (¹) C B C B C y=B C¼B C + D (x ¡¹) .. .. @ A @ A . . gk (x) gk (¹) So if the variances of the Xi are small (so that with high probability Y is near ¹, that is that the linear approximation above is usually valid) it is plausible 129 that Y has mean vector 0 B B B @ EY1 EY2 .. . EYk and variance-covariance matrix 1 0 C B C B C¼B A @ g1 (¹) g2 (¹) .. . gk (¹) Cov Y ¼ D § D0 1 C C C A