Statistics and Measurement (Understanding and Quantifying Measurement Uncertainty)

advertisement
Statistics and Measurement (Understanding
and Quantifying Measurement Uncertainty)
The general question to be addressed here is "How do statistical methods
inform measurement/metrology?" Some answers will be phrased in terms of
methods for separating measurement variation from process variation (including
appropriate confidence intervals), methods for "Gauge R&R" (again including
appropriate confidence intervals), and the use of simple linear regression prediction limits to assess measurement uncertainty on the basis of a linear calibration
study.
1
Basic Issues in Metrology
• Validity (am I really tracking what I want to track?)
• Precision (consistency of measurement)
• Accuracy (getting the "right" answer on average)
2
Not Accurate
Not Precise
Accurate
Not Precise
Not Accurate
Precise
Accurate
Precise
Figure 1: Measurement/Target Shooting Analogy
3
A Simple Measurement Model and Basic Statistical Methods
A Basic Statistical/Probabilistic Model for Measurement: What is measured,
y, is the measurand, x, plus a normal random measurement error, , with mean
β and standard deviation σ measurement.
y =x+
Pictorially:
4
Figure 2: A Basic Statistical Measurement Model
5
Notice that under this model, based on m repeat measurements of a single measurand, y1, y2, . . . , ym, with sample mean y and sample standard
deviation s
• if I apply the t confidence interval for a mean, I get an inference for
x + β = measurand plus bias
that is,
— in the event that the measurement device is known to be well-calibrated
(one√is sure that β = 0, there is no systematic error), the limits y ±
ts/ m based on ν = m − 1 df are limits for x
— in the event that what is being measured is a standard for which x is
known, one may use the limits
s
(y − x) ± t √
m
6
to estimate the device bias, β
• if I apply the χ2 confidence interval for a standard deviation, I get an
inference for the size of the measurement "noise," σ measurement
For measurements on multiple measurands (e.g. on different batches or parts
produced by a production process), we extend the basic measurement model by
assuming that x varies/is random. (Variation in x is "real" process variation.)
In fact, if we assume that the measurand is itself normal with mean μx and
standard deviation σ x and independent of the measurement error, we then have
y =x+
with mean
μy = μx + β
7
and standard deviation
σy =
q
σ 2x + σ 2measurement > σ x
(so observed variation in y is larger than the actual process variation because
of measurement noise).
Under this model for single measurements made on n different measurands
y1, y2, . . . , yn with sample mean y and sample standard deviation sy , the limits
√
y ±tsy / n (for t based on n−1 degrees of freedom) are limits for μx +β, the
mean of the distribution of true values plus bias. Note also that the quantity
sy estimates σ y , that really isn’t of fundamental interest. But since
σx =
r³
´
2
σ 2x + σ measurement − σ 2measurement
an estimate of specimen-to-specimen variation (free of measurement noise)
based on a sample of m observations on a single unit and a sample of n
8
observations each on different units is (see display (2.3), page 20 of SQAME )
σcx =
r
³
max 0, s2y − s2
´
Example Below are m = 5 measurements made by a single analyst on a single
sample of material. (You may think of these as measured concentrations of
some constituent.)
1.0025, .9820, 1.0105, 1.0110, .9960
These have mean y = 1.0004 and s = .0120. Consulting a χ2 table using
ν = 5 − 1 = 4 df, we can find a 95% confidence interval for σ measurement
⎛
s
⎝.0120
s
⎞
4
4 ⎠
i.e. (.0072, .0345)
, .0120
11.143
.484
9
(One moral here is that ordinary small sample sizes give very wide confidence
limits for a standard deviation.) Consulting a t table also using 4 df, we can
find 95% confidence limits for the true value for the specimen plus instrument
bias (x + β)
.0120
i.e. 1.0004 ± .0167
1.0004 ± 2.776 √
4
Suppose that subsequently, samples from n = 20 different batches are analyzed. The t confidence interval
.0300
.9954 ± 2.093 √
i.e. .9954 ± .0140
20
is for μx+β, the process mean plus any measurement instrument bias/systematic
error. An estimate of real process standard deviation is
σcx =
r
³
´
max 0, s2y − s2 =
r
³
2
2
max 0, (.0300) − (.0120)
´
= .0275
10
and this value can used to make confidence limits.
Satterthwaite "approximate degrees of freedom"
ν̂ =
s4y
σcx4
s4
=
(.0275)4
(.0300)4
(.0120)4
+
19
4
To do so, we need a
= 11.96
+
n−1 m−1
rounding down to ν̂ = 11, an approximate 95% confidence interval for the real
process standard deviation, σ x, is
⎛
s
⎝.0275
s
⎞
11
11 ⎠
i.e. (.0195, .0467)
, .0275
21.920
3.816
11
Gauge R&R Studies and Partitioning Measurement
Variation Where Multiple Analysts Make Measurements
There can be "operator/analyst variability" that should be considered part of
measurement imprecision.
• "Repeatability" variation is variation characteristic of one operator/analyst
remeasuring one specimen
• "Reproducibility" variation is variation characteristic of many operators
measuring a single specimen once each (exclusive of repeatability variation)
12
In a typical (balanced data) Gauge R&R study, each of I items is measured m
times by each of J operators. For example, a typical data layout for I = 2
parts, J = 3 operators and m = 2 repeats per "cell" might be represented as
13
1
1
Part
2
Operator
2
3
y111
y121
y131
y112
y122
y132
y211
y221
y231
y212
y222
y232
Figure 3: A Gauge R&R Layout for I = 2 Parts, J = 3 Operators and m = 2
Repeats per “Cell”
14
Typical analyses of Gauge R&R studies are based on the so-called "two-way
random effects" model. With
yijk = the kth measurement made by operator j on specimen i
the model is that
yijk = μ + αi + β j + αβ ij + ijk
where
• μ is an (unknown) constant, an average (over all possible operators and
all possible parts/specimens) measurement
• the αi are normal with mean 0 and variance σ 2α, (random) effects of
different parts/specimens
15
• the β j are normal with mean 0 and variance σ 2β , (random) effects of
different operators
• the αβ ij are normal with mean 0 and variance σ 2αβ , (random) joint effects
peculiar to particular part/operator combinations
• the ijk are normal with mean 0 and variance σ 2, (random) measurement
errors
σ 2α, σ 2β , σ 2αβ , and σ 2 are called "variance components" and their sizes govern
how much variability is seen in the measurements yijk
16
Example "Thought Experiment" generating a Gauge R&R data set
y111 =
Operator
2
y121 =
y131 =
y112 =
y122 =
y132 =
y211 =
y221 =
y231 =
y212 =
y222 =
y232 =
1
3
1
Part
2
In this (two-way random effects) model
• σ measures within-cell/repeatability variation
17
q
• σ reproducibility = σ 2β + σ 2αβ is the standard deviation that would be experienced by many operators measuring the same specimen once each, in
the absence of repeatability variation
q
r
2
• σ R&R = σ 2reproducibility + σ 2 = σ 2β + σ 2αβ + σ is the standard deviation that would be experienced by many operators measuring the same
specimen once each (this is called σ 2overall in SQAME )
The most common analyses (both those based on ranges and those based on
ANOVA) (e.g. following the AIAG manual and most company forms) are wrong,
in that they purport to produce estimates of σ reproducibility and σ R&R but fail
to do so. SQAME presents correct range-based and ANOVA-based methods.
Here we consider primarily the generally more effective ANOVA-based estimates
18
and confidence intervals that can be based on them (these limits are not found
in SQAME ).
But for introduction sake, first briefly consider range-based estimates.
R
b =
• σ
for R the average within-cell range and d2 (m) a "control
d2(m)
chart constant" based on "sample size" m
v
à µ
!
u
¶2
u
1 (σ
b )2 for ∆ the average of
• σ̂ reproducibility = tmax 0, d ∆
−
m
2 (J)
part ranges of cell means and d2 (m) a "control chart constant" based on
"sample size" J
19
(The second of these is NOT the AIAG estimate of reproducibility standard
deviation.)
Example A geometric dimension of a machined part. I = 3, J = 3, m = 2
Operator 1
y 11 = .34730
Part 1
R11 = 0
y 11 = .34710
Part 2
R21 = 0
y 31 = .34720
Part 3
R31 = 0
Operator 2
y 12 = .34660
R12 = .0002
y 22 = .34645
R22 = .0001
y 32 = .34655
R32 = .0003
Operator 3
y 13 = .34715
R13 = .0001
y 23 = .34710
R23 = 0
y 33 = .34710
R33 = 0
∆1 = .00070
∆2 = .00065
∆3 = .00065
So R = .0007/9 = .000078 and ∆ = .00067 and
b =
σ
R
.000078
=
= .000069 in
d2 (m)
1.128
20
and
v
⎛ Ã
u
u
u
σ̂ reproducibility = tmax ⎝0,
s
µ
∆
d2 (J)
!2
⎞
1
b )2⎠
− (σ
m
¶
.00067 2 1
=
− (.000069)2 = .000391 in
1.693
2
A natural way to estimate σ R&R is as
σ̂ R&R =
q
(.000069)2 + (.00039)2 = .000396 in
and the calculations here suggest that the bulk of measurement imprecision is
traceable to differences between operators.
21
The range-based Gauge R&R estimates of SQAME are fairly simple and serve
the purpose of helping make the analysis goals easy to understand. But we
have no good handle on how reliable these estimates are. In order to 1) produce
Gauge R&R estimates that are typically better than range-based ones, and 2)
produce confidence limits, we must instead use "ANOVA-based" estimates.
A careful treatment of ANOVA would require its own course. We’ll simply
make use of its main "output" and direct the interested student to books
on engineering statistics (like Vardeman’s Statistics for Engineering Problem
Solving ) for more details. The fact is that an I × J × m data set of yijk ’s like
that produced in a typical Gauge R&R study is often summarized in a so-called
ANOVA table. A generic version of such a table is
22
Source
Part
Operator
Part×Operator
Error
Total
SS
SSA
SSB
SSAB
SSE
SST ot
df
I −1
J −1
(I − 1) (J − 1)
IJ (m − 1)
IJm − 1
MS
MSA = SSA/ (I − 1)
MSB = SSB/ (B − 1)
MSAB = SSAB/ (I − 1) (J − 1)
MSE = SSE/IJ (m − 1)
Any decent statistical package (and even EXCEL) will process a Gauge R&R
data set and produce such a summary table. In this table the "mean squares"
are essentially sample variances (squares of sample standard deviations). (M SA
is essentially a sample variance of part averages, MSB is essentially a sample variance of operator averages, M SE is an average of within cell-sample
variances, "M ST ot" isn’t typically calculated, but is a grand sample variance
of all observations, ...) The mean squares indicate how much of the overall
variability is accounted for by the various sources.
23
For our present purposes, we will take mean squares and degrees of freedom
out of such an ANOVA table and make Gauge R&R estimates based on them.
Point estimators for the quantities of most interest in a Gauge R&R study are
partially summarized on the bottom of page 27 in SQAME. These are
√
b = MSE
σ̂ repeatability = σ
and
v
Ã
!
u
u
MSB (I − 1)
1
σ̂ reproducibility = tmax 0,
+
MSAB − M SE
mI
Although it is not presented in
σ 2β + σ 2αβ + σ 2 (that is called
σ̂ R&R =
s
mI
m
SQAME, an appropriate estimator for σ 2R&R =
σ 2overall in SQAME ) is
1
I −1
m−1
MSB +
MSAB +
M SE
mI
mI
m
24
It is further possible to use these estimates to make an exact confidence interval
for σ repeatability = σ and to use the Satterthwaite approximation to make
approximate confidence limits for σ reproducibility and σ R&R.
Let
ν repeatability = IJ (m − 1)
Then, confidence limits for σ repeatability are
⎛
⎞
v
v
u ν
u ν
u
repeatability
repeatability ⎟
⎜
, σ̂ repeatabilityu
⎝σ̂ repeatabilityt 2
⎠
t 2
χν repeatability, upper
χν
repeatability , lower
25
For estimating σ reproducibility, let
ν̂ reproducibility =
=
³
´
MSB 2
mI
J −1
1
m2
Ã
+
µ
σ̂ 4reproducibility
(I−1)MSAB
mI
¶2
+
³
´
MSE 2
m
(I − 1) (J − 1)
IJ (m − 1)
σ̂ 4reproducibility
MSB 2
(I − 1) MSAB 2
M SE 2
+
+
I 2 (J − 1)
I 2 (J − 1)
IJ (m − 1)
!
Then approximate confidence limits for σ reproducibility are
⎛
⎞
v
v
u ν̂
u ν̂
u
u
reproducibility
reproducibility ⎟
⎜
, σ̂ reproducibilityt 2
⎝σ̂ reproducibilityt 2
⎠
χν̂
χ
ν̂ reproducibility,lower
reproducibility,upper
26
For estimating σ R&R, let
ν̂ R&R =
=
³
´
MSB 2
mI
J −1
1
m2
Ã
+
µ
σ̂ 4R&R
(I−1)MSAB
mI
¶2
µ
(m−1)MSE
m
¶2
+
(I − 1) (J − 1)
IJ (m − 1)
σ̂ 4reproducibility
M SB 2
(I − 1) M SAB 2 (m − 1) M SE 2
+
+
2
2
I (J − 1)
I (J − 1)
IJ
!
Then approximate confidence limits for σ R&R are
⎛
v
u
u
⎜
σ̂
⎝ R&Rt
ν̂ R&R
χ2ν̂
R&R,upper
v
u
u
, σ̂ R&Rt
ν̂ R&R
χ2ν̂
R&R ,lower
⎞
⎟
⎠
27
Example An in-class R&R data set from ISU IE 361 with I = 4, J = 3, m =
2. (Students were measuring plastic packaging "peanuts" to the nearest .01
in.)
28
Figure 4: Results from Measuring Packing Peanuts With a Crude Caliper 29
This data set produces the JMP summary
Figure 5: JMP Report for Gauge R&R Study
30
What is essential here are the values
SSA = .01244583 and so MSA = SSA/3 = .041486
SSB = .0037750 and so M SB = SSB/2 = .001888
SSAB = .00239167 and so MSAB = SSAB/6 = .000399
SSE = .00085000 and so MSE = SSE/12 = .000071
From these, first
√
√
b = MSE = .000071 = .0084 in
σ̂ repeatability = σ
and with ν repeatability = IJ (m − 1) = 12 and 95% confidence limits for
σ repeatability are
s
s
12
12
.0084
and .0084
23.337
4.404
31
that is, a 95% confidence interval is
(.006, .014)
"By hand" computations for σ̂ reproducibility, ν̂ reproducibility, σ̂ R&R, and ν̂ R&R are
tedious and prone to error. They could be automated with an appropriate
EXCEL spreadsheet. Vardeman uses a MathCAD worksheet to do them. Here
is a picture for this example:
32
Figure 6: MathCAD Worksheet for R&R Example
33
From this we can read
σ̂ reproducibility = .019 and ν̂ reproducibility = 3.9
σ̂ R&R = .021 and ν̂ R&R = 5.6
Then rounding degrees of freedom down, an approximate 95% confidence interval for σ reproducibility is
⎛
s
⎝.019
s
⎞
3
3 ⎠
i.e. (.011, .071)
, .019
9.348
.216
and an approximate 95% confidence interval for σ R&R is
⎛
s
⎝.021
s
⎞
5
5 ⎠
i.e. (.013, .052)
, .021
12.833
.831
34
Since σ R&R is the standard deviation that would be experienced by many operators measuring the same specimen once each, it is often taken as a measure
of overall measurement imprecision where multiple operators will use a measurement device. Often, some multiple of if (sometimes 6σ R&R and sometimes 5.15σ R&R) is used as a "measurement uncertainty." Where the device is
used to check conformance to some performance requirements, say L and U ,
its adequacy to so is then often summarized in a Gauge Capability Ratio (or
sometimes, Precision to Tolerance Ratio)
6σ R&R
GCR =
U −L
This can, of course, be estimated by
6σ̂ R&R
U −L
and, in fact, the confidence limits for σ R&R can be substituted in order to make
confidence limits for GCR.
35
d =
GCR
Simple Linear Regression Analysis, Calibration, and
Assessing Uncertainty
Calibration experiments produce "true"/gold-standard-measurement values x
and "local" measurements y and seek a "conversion" method from y to x. (y
need not even be in the same units as x.) The relevant statistical methodology
is curve-fitting/regression analysis as treated in any good engineering statistics
text. Regression analysis can provide both "point conversions" and measures
of uncertainty (the latter through inversion of "prediction limits").
The simplest version of this methodology is the case where
y ≈ β 0 + β 1x
36
This is linear calibration.
stance is
The standard statistical model for such a circumy = β 0 + β 1x +
for a normal error with mean 0 and standard deviation σ. (σ describes how
much y’s vary for a fixed x.) This model can be pictured as
37
Figure 7: Simple Linear Regression Model
38
For n data pairs (xi, yi), simple linear regression methodology allows one to
make confidence intervals and tests associated with the model, and what is more
important for our present purposes, prediction limits for a new y associated with
a new x. These are of the form
v
u
2
u
1
(x
−
x̄)
(b0 + b1x) ±tsLF t1 + + P
2
n
(x − x̄)
where the least squares line is ŷ = b0 + b1x and sLF is an estimate of σ derived
from the fit of the line to the data. These days, any good statistical package
will compute and plot these limits along with a least squares line through the
data set.
Example (Mandel NBS/NIST) "Gold-standard" and "local" measurements
on n = 14 specimens (units not given). The data are
39
Figure 8: Mandel’s Linear Calibration Data
40
A JMP report for simple linear regression including prediction limits for an
additional value of y (that, of course, change with x) plotted is
41
Figure 9: A JMP Report for Mandel’s Linear Calibration Data
42
What is especially useful about statistical simple linear regression technology
for our purposes is what it indicates about measurement.
• from a simple linear regression output,
√
sLF = MSE = "root mean square error"
is a kind of estimated repeatability standard deviation.
• the least squares equation ŷ = b0 + b1x can be solved for x giving
y − b0
b1
as a way of estimating a "gold-standard" value x from a measured local
value y
x̂ =
43
• it turns out that one can take the prediction limits for y and "turn them
around" to get confidence limits for the x corresponding to a measured
local y. This provides a defensible way to set "error bounds" on what y
indicates about x.
Example (Mandel NBS/NIST) Since from the JMP report
y = 42.216299 + 0.881819x with sLF = 25.32578
we might expect a local (y) repeatability standard deviation of around 25 (in
the y units). A "conversion formula" for going from y to x is
y − 42.216299
x̂ =
0.881819
The following shows how one can set 95% confidence limits on x if y = 1500
is observed, using the plot of 95% prediction limits for y given x.
44
Figure 10: 95% Confidence Limits for x if y = 1500 (from 95% Prediction
Limits for y Given x)
45
Download