Statistics and Measurement (Understanding and Quantifying Measurement Uncertainty) The general question to be addressed here is "How do statistical methods inform measurement/metrology?" Some answers will be phrased in terms of methods for separating measurement variation from process variation (including appropriate confidence intervals), methods for "Gauge R&R" (again including appropriate confidence intervals), and the use of simple linear regression prediction limits to assess measurement uncertainty on the basis of a linear calibration study. 1 Basic Issues in Metrology • Validity (am I really tracking what I want to track?) • Precision (consistency of measurement) • Accuracy (getting the "right" answer on average) 2 Not Accurate Not Precise Accurate Not Precise Not Accurate Precise Accurate Precise Figure 1: Measurement/Target Shooting Analogy 3 A Simple Measurement Model and Basic Statistical Methods A Basic Statistical/Probabilistic Model for Measurement: What is measured, y, is the measurand, x, plus a normal random measurement error, , with mean β and standard deviation σ measurement. y =x+ Pictorially: 4 Figure 2: A Basic Statistical Measurement Model 5 Notice that under this model, based on m repeat measurements of a single measurand, y1, y2, . . . , ym, with sample mean y and sample standard deviation s • if I apply the t confidence interval for a mean, I get an inference for x + β = measurand plus bias that is, — in the event that the measurement device is known to be well-calibrated (one√is sure that β = 0, there is no systematic error), the limits y ± ts/ m based on ν = m − 1 df are limits for x — in the event that what is being measured is a standard for which x is known, one may use the limits s (y − x) ± t √ m 6 to estimate the device bias, β • if I apply the χ2 confidence interval for a standard deviation, I get an inference for the size of the measurement "noise," σ measurement For measurements on multiple measurands (e.g. on different batches or parts produced by a production process), we extend the basic measurement model by assuming that x varies/is random. (Variation in x is "real" process variation.) In fact, if we assume that the measurand is itself normal with mean μx and standard deviation σ x and independent of the measurement error, we then have y =x+ with mean μy = μx + β 7 and standard deviation σy = q σ 2x + σ 2measurement > σ x (so observed variation in y is larger than the actual process variation because of measurement noise). Under this model for single measurements made on n different measurands y1, y2, . . . , yn with sample mean y and sample standard deviation sy , the limits √ y ±tsy / n (for t based on n−1 degrees of freedom) are limits for μx +β, the mean of the distribution of true values plus bias. Note also that the quantity sy estimates σ y , that really isn’t of fundamental interest. But since σx = r³ ´ 2 σ 2x + σ measurement − σ 2measurement an estimate of specimen-to-specimen variation (free of measurement noise) based on a sample of m observations on a single unit and a sample of n 8 observations each on different units is (see display (2.3), page 20 of SQAME ) σcx = r ³ max 0, s2y − s2 ´ Example Below are m = 5 measurements made by a single analyst on a single sample of material. (You may think of these as measured concentrations of some constituent.) 1.0025, .9820, 1.0105, 1.0110, .9960 These have mean y = 1.0004 and s = .0120. Consulting a χ2 table using ν = 5 − 1 = 4 df, we can find a 95% confidence interval for σ measurement ⎛ s ⎝.0120 s ⎞ 4 4 ⎠ i.e. (.0072, .0345) , .0120 11.143 .484 9 (One moral here is that ordinary small sample sizes give very wide confidence limits for a standard deviation.) Consulting a t table also using 4 df, we can find 95% confidence limits for the true value for the specimen plus instrument bias (x + β) .0120 i.e. 1.0004 ± .0167 1.0004 ± 2.776 √ 4 Suppose that subsequently, samples from n = 20 different batches are analyzed. The t confidence interval .0300 .9954 ± 2.093 √ i.e. .9954 ± .0140 20 is for μx+β, the process mean plus any measurement instrument bias/systematic error. An estimate of real process standard deviation is σcx = r ³ ´ max 0, s2y − s2 = r ³ 2 2 max 0, (.0300) − (.0120) ´ = .0275 10 and this value can used to make confidence limits. Satterthwaite "approximate degrees of freedom" ν̂ = s4y σcx4 s4 = (.0275)4 (.0300)4 (.0120)4 + 19 4 To do so, we need a = 11.96 + n−1 m−1 rounding down to ν̂ = 11, an approximate 95% confidence interval for the real process standard deviation, σ x, is ⎛ s ⎝.0275 s ⎞ 11 11 ⎠ i.e. (.0195, .0467) , .0275 21.920 3.816 11 Gauge R&R Studies and Partitioning Measurement Variation Where Multiple Analysts Make Measurements There can be "operator/analyst variability" that should be considered part of measurement imprecision. • "Repeatability" variation is variation characteristic of one operator/analyst remeasuring one specimen • "Reproducibility" variation is variation characteristic of many operators measuring a single specimen once each (exclusive of repeatability variation) 12 In a typical (balanced data) Gauge R&R study, each of I items is measured m times by each of J operators. For example, a typical data layout for I = 2 parts, J = 3 operators and m = 2 repeats per "cell" might be represented as 13 1 1 Part 2 Operator 2 3 y111 y121 y131 y112 y122 y132 y211 y221 y231 y212 y222 y232 Figure 3: A Gauge R&R Layout for I = 2 Parts, J = 3 Operators and m = 2 Repeats per “Cell” 14 Typical analyses of Gauge R&R studies are based on the so-called "two-way random effects" model. With yijk = the kth measurement made by operator j on specimen i the model is that yijk = μ + αi + β j + αβ ij + ijk where • μ is an (unknown) constant, an average (over all possible operators and all possible parts/specimens) measurement • the αi are normal with mean 0 and variance σ 2α, (random) effects of different parts/specimens 15 • the β j are normal with mean 0 and variance σ 2β , (random) effects of different operators • the αβ ij are normal with mean 0 and variance σ 2αβ , (random) joint effects peculiar to particular part/operator combinations • the ijk are normal with mean 0 and variance σ 2, (random) measurement errors σ 2α, σ 2β , σ 2αβ , and σ 2 are called "variance components" and their sizes govern how much variability is seen in the measurements yijk 16 Example "Thought Experiment" generating a Gauge R&R data set y111 = Operator 2 y121 = y131 = y112 = y122 = y132 = y211 = y221 = y231 = y212 = y222 = y232 = 1 3 1 Part 2 In this (two-way random effects) model • σ measures within-cell/repeatability variation 17 q • σ reproducibility = σ 2β + σ 2αβ is the standard deviation that would be experienced by many operators measuring the same specimen once each, in the absence of repeatability variation q r 2 • σ R&R = σ 2reproducibility + σ 2 = σ 2β + σ 2αβ + σ is the standard deviation that would be experienced by many operators measuring the same specimen once each (this is called σ 2overall in SQAME ) The most common analyses (both those based on ranges and those based on ANOVA) (e.g. following the AIAG manual and most company forms) are wrong, in that they purport to produce estimates of σ reproducibility and σ R&R but fail to do so. SQAME presents correct range-based and ANOVA-based methods. Here we consider primarily the generally more effective ANOVA-based estimates 18 and confidence intervals that can be based on them (these limits are not found in SQAME ). But for introduction sake, first briefly consider range-based estimates. R b = • σ for R the average within-cell range and d2 (m) a "control d2(m) chart constant" based on "sample size" m v à µ ! u ¶2 u 1 (σ b )2 for ∆ the average of • σ̂ reproducibility = tmax 0, d ∆ − m 2 (J) part ranges of cell means and d2 (m) a "control chart constant" based on "sample size" J 19 (The second of these is NOT the AIAG estimate of reproducibility standard deviation.) Example A geometric dimension of a machined part. I = 3, J = 3, m = 2 Operator 1 y 11 = .34730 Part 1 R11 = 0 y 11 = .34710 Part 2 R21 = 0 y 31 = .34720 Part 3 R31 = 0 Operator 2 y 12 = .34660 R12 = .0002 y 22 = .34645 R22 = .0001 y 32 = .34655 R32 = .0003 Operator 3 y 13 = .34715 R13 = .0001 y 23 = .34710 R23 = 0 y 33 = .34710 R33 = 0 ∆1 = .00070 ∆2 = .00065 ∆3 = .00065 So R = .0007/9 = .000078 and ∆ = .00067 and b = σ R .000078 = = .000069 in d2 (m) 1.128 20 and v ⎛ à u u u σ̂ reproducibility = tmax ⎝0, s µ ∆ d2 (J) !2 ⎞ 1 b )2⎠ − (σ m ¶ .00067 2 1 = − (.000069)2 = .000391 in 1.693 2 A natural way to estimate σ R&R is as σ̂ R&R = q (.000069)2 + (.00039)2 = .000396 in and the calculations here suggest that the bulk of measurement imprecision is traceable to differences between operators. 21 The range-based Gauge R&R estimates of SQAME are fairly simple and serve the purpose of helping make the analysis goals easy to understand. But we have no good handle on how reliable these estimates are. In order to 1) produce Gauge R&R estimates that are typically better than range-based ones, and 2) produce confidence limits, we must instead use "ANOVA-based" estimates. A careful treatment of ANOVA would require its own course. We’ll simply make use of its main "output" and direct the interested student to books on engineering statistics (like Vardeman’s Statistics for Engineering Problem Solving ) for more details. The fact is that an I × J × m data set of yijk ’s like that produced in a typical Gauge R&R study is often summarized in a so-called ANOVA table. A generic version of such a table is 22 Source Part Operator Part×Operator Error Total SS SSA SSB SSAB SSE SST ot df I −1 J −1 (I − 1) (J − 1) IJ (m − 1) IJm − 1 MS MSA = SSA/ (I − 1) MSB = SSB/ (B − 1) MSAB = SSAB/ (I − 1) (J − 1) MSE = SSE/IJ (m − 1) Any decent statistical package (and even EXCEL) will process a Gauge R&R data set and produce such a summary table. In this table the "mean squares" are essentially sample variances (squares of sample standard deviations). (M SA is essentially a sample variance of part averages, MSB is essentially a sample variance of operator averages, M SE is an average of within cell-sample variances, "M ST ot" isn’t typically calculated, but is a grand sample variance of all observations, ...) The mean squares indicate how much of the overall variability is accounted for by the various sources. 23 For our present purposes, we will take mean squares and degrees of freedom out of such an ANOVA table and make Gauge R&R estimates based on them. Point estimators for the quantities of most interest in a Gauge R&R study are partially summarized on the bottom of page 27 in SQAME. These are √ b = MSE σ̂ repeatability = σ and v à ! u u MSB (I − 1) 1 σ̂ reproducibility = tmax 0, + MSAB − M SE mI Although it is not presented in σ 2β + σ 2αβ + σ 2 (that is called σ̂ R&R = s mI m SQAME, an appropriate estimator for σ 2R&R = σ 2overall in SQAME ) is 1 I −1 m−1 MSB + MSAB + M SE mI mI m 24 It is further possible to use these estimates to make an exact confidence interval for σ repeatability = σ and to use the Satterthwaite approximation to make approximate confidence limits for σ reproducibility and σ R&R. Let ν repeatability = IJ (m − 1) Then, confidence limits for σ repeatability are ⎛ ⎞ v v u ν u ν u repeatability repeatability ⎟ ⎜ , σ̂ repeatabilityu ⎝σ̂ repeatabilityt 2 ⎠ t 2 χν repeatability, upper χν repeatability , lower 25 For estimating σ reproducibility, let ν̂ reproducibility = = ³ ´ MSB 2 mI J −1 1 m2 à + µ σ̂ 4reproducibility (I−1)MSAB mI ¶2 + ³ ´ MSE 2 m (I − 1) (J − 1) IJ (m − 1) σ̂ 4reproducibility MSB 2 (I − 1) MSAB 2 M SE 2 + + I 2 (J − 1) I 2 (J − 1) IJ (m − 1) ! Then approximate confidence limits for σ reproducibility are ⎛ ⎞ v v u ν̂ u ν̂ u u reproducibility reproducibility ⎟ ⎜ , σ̂ reproducibilityt 2 ⎝σ̂ reproducibilityt 2 ⎠ χν̂ χ ν̂ reproducibility,lower reproducibility,upper 26 For estimating σ R&R, let ν̂ R&R = = ³ ´ MSB 2 mI J −1 1 m2 à + µ σ̂ 4R&R (I−1)MSAB mI ¶2 µ (m−1)MSE m ¶2 + (I − 1) (J − 1) IJ (m − 1) σ̂ 4reproducibility M SB 2 (I − 1) M SAB 2 (m − 1) M SE 2 + + 2 2 I (J − 1) I (J − 1) IJ ! Then approximate confidence limits for σ R&R are ⎛ v u u ⎜ σ̂ ⎝ R&Rt ν̂ R&R χ2ν̂ R&R,upper v u u , σ̂ R&Rt ν̂ R&R χ2ν̂ R&R ,lower ⎞ ⎟ ⎠ 27 Example An in-class R&R data set from ISU IE 361 with I = 4, J = 3, m = 2. (Students were measuring plastic packaging "peanuts" to the nearest .01 in.) 28 Figure 4: Results from Measuring Packing Peanuts With a Crude Caliper 29 This data set produces the JMP summary Figure 5: JMP Report for Gauge R&R Study 30 What is essential here are the values SSA = .01244583 and so MSA = SSA/3 = .041486 SSB = .0037750 and so M SB = SSB/2 = .001888 SSAB = .00239167 and so MSAB = SSAB/6 = .000399 SSE = .00085000 and so MSE = SSE/12 = .000071 From these, first √ √ b = MSE = .000071 = .0084 in σ̂ repeatability = σ and with ν repeatability = IJ (m − 1) = 12 and 95% confidence limits for σ repeatability are s s 12 12 .0084 and .0084 23.337 4.404 31 that is, a 95% confidence interval is (.006, .014) "By hand" computations for σ̂ reproducibility, ν̂ reproducibility, σ̂ R&R, and ν̂ R&R are tedious and prone to error. They could be automated with an appropriate EXCEL spreadsheet. Vardeman uses a MathCAD worksheet to do them. Here is a picture for this example: 32 Figure 6: MathCAD Worksheet for R&R Example 33 From this we can read σ̂ reproducibility = .019 and ν̂ reproducibility = 3.9 σ̂ R&R = .021 and ν̂ R&R = 5.6 Then rounding degrees of freedom down, an approximate 95% confidence interval for σ reproducibility is ⎛ s ⎝.019 s ⎞ 3 3 ⎠ i.e. (.011, .071) , .019 9.348 .216 and an approximate 95% confidence interval for σ R&R is ⎛ s ⎝.021 s ⎞ 5 5 ⎠ i.e. (.013, .052) , .021 12.833 .831 34 Since σ R&R is the standard deviation that would be experienced by many operators measuring the same specimen once each, it is often taken as a measure of overall measurement imprecision where multiple operators will use a measurement device. Often, some multiple of if (sometimes 6σ R&R and sometimes 5.15σ R&R) is used as a "measurement uncertainty." Where the device is used to check conformance to some performance requirements, say L and U , its adequacy to so is then often summarized in a Gauge Capability Ratio (or sometimes, Precision to Tolerance Ratio) 6σ R&R GCR = U −L This can, of course, be estimated by 6σ̂ R&R U −L and, in fact, the confidence limits for σ R&R can be substituted in order to make confidence limits for GCR. 35 d = GCR Simple Linear Regression Analysis, Calibration, and Assessing Uncertainty Calibration experiments produce "true"/gold-standard-measurement values x and "local" measurements y and seek a "conversion" method from y to x. (y need not even be in the same units as x.) The relevant statistical methodology is curve-fitting/regression analysis as treated in any good engineering statistics text. Regression analysis can provide both "point conversions" and measures of uncertainty (the latter through inversion of "prediction limits"). The simplest version of this methodology is the case where y ≈ β 0 + β 1x 36 This is linear calibration. stance is The standard statistical model for such a circumy = β 0 + β 1x + for a normal error with mean 0 and standard deviation σ. (σ describes how much y’s vary for a fixed x.) This model can be pictured as 37 Figure 7: Simple Linear Regression Model 38 For n data pairs (xi, yi), simple linear regression methodology allows one to make confidence intervals and tests associated with the model, and what is more important for our present purposes, prediction limits for a new y associated with a new x. These are of the form v u 2 u 1 (x − x̄) (b0 + b1x) ±tsLF t1 + + P 2 n (x − x̄) where the least squares line is ŷ = b0 + b1x and sLF is an estimate of σ derived from the fit of the line to the data. These days, any good statistical package will compute and plot these limits along with a least squares line through the data set. Example (Mandel NBS/NIST) "Gold-standard" and "local" measurements on n = 14 specimens (units not given). The data are 39 Figure 8: Mandel’s Linear Calibration Data 40 A JMP report for simple linear regression including prediction limits for an additional value of y (that, of course, change with x) plotted is 41 Figure 9: A JMP Report for Mandel’s Linear Calibration Data 42 What is especially useful about statistical simple linear regression technology for our purposes is what it indicates about measurement. • from a simple linear regression output, √ sLF = MSE = "root mean square error" is a kind of estimated repeatability standard deviation. • the least squares equation ŷ = b0 + b1x can be solved for x giving y − b0 b1 as a way of estimating a "gold-standard" value x from a measured local value y x̂ = 43 • it turns out that one can take the prediction limits for y and "turn them around" to get confidence limits for the x corresponding to a measured local y. This provides a defensible way to set "error bounds" on what y indicates about x. Example (Mandel NBS/NIST) Since from the JMP report y = 42.216299 + 0.881819x with sLF = 25.32578 we might expect a local (y) repeatability standard deviation of around 25 (in the y units). A "conversion formula" for going from y to x is y − 42.216299 x̂ = 0.881819 The following shows how one can set 95% confidence limits on x if y = 1500 is observed, using the plot of 95% prediction limits for y given x. 44 Figure 10: 95% Confidence Limits for x if y = 1500 (from 95% Prediction Limits for y Given x) 45