Gaussian Spatial Processes: Design

advertisement
Gaussian Spatial Processes: Design
In designing experiments for spatial models, the general goal is finding a set of u ∈ U
that lead to the greatest amount of information, or the least amount of uncertainty – in some
sense – about the function z(u). When modeling is based on a Gaussian stochastic process
model, a key quantity for these purposes is the “prediction variance” at u:
δ 2 − Cov[z(u), y(U )]V ar−1 [y(U )]Cov[y(U ), z(u)]
the (frequentist) conditional or (Bayes) posterior variance of z(u) given y(U ). In practice,
µ, δ 2 , σ 2 , and θ (the vector of correlation parameters) must also be estimated, but here we’ll
only consider design for which at least σ 2 /δ 2 and θ are treated as known.
G-Optimality
Recall from our study of parameteric models that the G-optimality criterion focuses directly on expected response estimation, rather than model parameter estimation. That logic
is perhaps even more appealling here since the only quantities being treated as parameters
are part of the predictive stochastic process, rather than a physically meaningful model.
Hence, for the GaSP models we are studying, define a G-optimal design to be U for a fixed
size N which minimizes the largest conditional/posterior variance of z(u):
maxu∈U V ar[z(u)|y(U )]
Note that for any δ 2 , this is equivalent to finding the design that maximizes the slightly
simpler expression
φ = minu∈U Cov[z(u), y(U )]V ar−1 [y(U )]Cov[y(U ), z(u)]
and so we take this as the criterion function, and for convenience, write it (apart from a
factor of δ 2 ) as
σ2
I + C)−1 r = minu∈U Q(u, U )
δ2
where r is the N -element vector of correlations with ith element R(u − ui ), and C is the
φ = minu∈U r0 (
N -by-N matrix of correlations with (i, j) element R(ui − uj ).
Note, for purposes of finding an optimal design, that for a fully defined R, C is a function
only of the experimental design, but r is a function both of the design and u, the argument of
the minimization. Hence an idealized schetch of an algorithm for finding an optimal design
1
would look like:

search over possible designs













compute ( σδ2 I + C)−1
2

search over points in U




compute r0 ( σδ2 I + C)−1 r
2
...
...
The “outer loop” generally needs a compromise, since trying all possible designs is generally
too computationally intensive. A point-exchange approach, similar to what we talked about
for parametric models, can be taken in which a randomly chosen initial design is improved
through interations that involve adding and then deleting a point:
• add a point u+ that results in the largest minu Q(u, U + u+ )
• delete a point u− that results in the largest minu Q(u, U − u− )
Even here, if N is not small we would like to avoid computing and inverting N × N and
(N + 1) × (N + 1) matrices any more than necessary. With this in mind, let M =
σ2
I
δ2
+ C,
and use the “update” formula:


M v 
, M symmetric and p.s.d.
• let M+ =  0
v s

• then M−1
+

1
1
−1
−1
M−1 + s−v0 M
v)(M−1 v)0 − s−v0 M
v) 
−1 v (M
−1 v (M
=
1
1
−1
v)0
− s−v0 M
−1 v (M
s−v0 M−1 v
−1
The formula can be used directly when adding a point, to get M−1
+ from M . When deleting
a point, use:

M−1
+

A b 
1
= 0
, and M−1 = A − bb0
c
b c
With that, here’s a somewhat more detailed sketch of how an algorithm could be constructed
• set N
• pick a random (or other) starting design, U
• compute M−1
• to add a point:
– loop over all u+ ∈ U
∗ U+ ← U + u+
∗ M−1
+ ← update
2
∗ find min Q(u, U+ ), Qmin , umin
∗ keep u+ for which Qmin is largest
– U ← U + u+
– N ←N +1
– M−1 ← update
• to delete a point:
– loop over all u− ∈ U
∗ U− ← U − u−
∗ M−1
− ← update
∗ find min Q(u, U− ), Qmin , umin
∗ keep u− for which Qmin is largest
– U ← U − u−
– N ←N −1
– M−1 ← update
• alternatively add and delete until no further change results
Examples
The figures below display 3 approximately G-optimal N = 10 point designs generated
using an algorithm as outlined above. For these calculations, U = {0, 0.05, 0.10, 0.15, ..., 1}2 ,
and the design displayed in each case is the best of 10 tries beginning with a random design
and iterating until a sequential add-delete cycle does not change the design. Each design
was constructed for a stationary process with Gaussian correlation function with θ = 10
for both u1 and u2 . From left to right, the panels display the resulting desgns for σ 2 /δ 2 =
●
1.0
1.0
1.0
0.1, 0.2, and 0.5.
●
●
●
●
●
●
●
0.8
0.8
●
0.8
●
●
0.6
●
u2
●
0.4
u2
0.4
0.4
u2
0.6
0.6
●
●
0.2
●
●
0.2
0.2
●
●
●
●
●
●
●
0.0
0.2
●
0.4
0.6
u1
0.8
0.0
0.0
0.0
●
●
●
0.0
0.2
0.4
0.6
u1
3
0.8
1.0
●
●
0.2
0.4
0.6
u1
0.8
1.0
A-Optimality
As we discussed with parametric models, G-optimality is difficult computationally since
it is essentially a nested optimization problems – minimizing through choice of design, the
maximum predictive variance over U. An alternative but related approach is an analogue
to what we called A-optimality earlier, where we focus on integrated (or average) predictive
variance over U, i.e.
Z
V ar(z(u)|y(U ))du
u∈U
For any δ 2 , this is equivalent to finding the design that maximizes:
Z
Z
σ2
σ2
σ2
−1
−1 0
−1
φ=
r ( 2 I + C) rdu =
trace( 2 I + C) rr du = trace( 2 I + C)
rr0 du
δ
δ
δ
u∈U
u∈U
u∈U
Z
0
the last expression being true because only r is a function of the prediction “site” u. This
means that for a given design, evaluation consists of computing the matrix A =
R
u∈U
rr0 du,
2
followed by the trace of ( σδ2 I+C)−1 A. This can still require substantial computational effort,
but is generally less intensive than the complete search over U required by G-optimality. In
cases where a product correlation form is used for R, and U is a hyper-rectangle in r-space,
substantial simplification can be realized in the computing of A by expressing the elements
of the matrix as r factors; if the one-dimensional correlations are of convenient functional
form, the integrals can sometimes be performed analytically (and therefore quickly).
Other reasonable forms of design optimality can be formulated for this model, but each
generally presents a difficult computational problem (even with the design assumption of
known GaSP parameter values). For example, one might want to explore an analogue of
D-optimality from parametric modeling, where here the emphasis would be on minimizing
the determinant of the variance matrix of predicted z(u) at all sites in U simultaneously.
An immediate practical problem, of course, is that this would be an enormous matrix. One
general difficulty here is the absence of an “equivalence strategy” as provided by Frechet
derivatives in the parametric case. Instead of pursuing the general design problem for this
model, we turn to an interesting special case that yields a bit more structure for design.
D-Optimality for No Noise
Suppose now that σ 2 = 0, i.e. z(u) = y(u). This is the appropriate model for applications
in which a deterministic computer model (with u the input vector and z an output of interest)
generates “data”, and the aim is to construct an approximation to the computer model that
can be quickly evaluated at any u. It can also be regarded as a reasonable approximate
model for design purposes when δ 2 >> σ 2 . But the change from “small” σ 2 to σ 2 = 0 is a
fundamental structural change in the model, and provides substantial simplicity in the way
optimality, especially D-optimality, can be formulated.
4
First, note that with this modification comes simplicity in functional expressions for the
predictive mean and standard deviation of z at any u:
ẑ(u) = µ + r0 C−1 (z(U ) − µ1)
V ar(z(u)|z(U )) = δ 2 (1 − r0 C−1 r)
where U is the experimental design of N sites, and z(U ) is the collection of responses observed
there. In fact, in this case, predictions made at points in the design exactly replicate the data
values observed there, and conditional variances associated with these predictions are zero
(since there is no uncertainty about z(u) if it has been observed). To demonstrate this, the
following graph displays ẑ(u) with plus-and-minus 2 conditional standard deviation bounds
generated using the 3-point example data set and the nonnegative linear correlation function
as described in the previous chapter, but here with σ 2 = 0 rather than σ 2 = 0.1 as before:
11
●
10
●
●
8
9
z−hat, +−2sd, y
12
13
y, z−hat, +−2*sd
0.0
0.2
0.4
0.6
0.8
1.0
u
Focus now on a large-but-finite grid V and partition it into U (the design) and Ū (everything else in V). A version of “D-optimality” would be to pick U so as to minimize the
determinant of the conditional covariance matrix of z(Ū ):
V ar[z(Ū )|z(U )] = δ 2 [Corr[z(Ū ), z(Ū )] − Corr[z(Ū ), z(U )]Corr[z(U )]−1 Corr(z(U ), z(Ū ))]
On the face of it, this would lead to a very difficult calculation if V, and therefore Ū is very
large, since this is the dimension of these matrices. However, consider the implications of
the following fact:
|Corr[z(V)]| = |Corr[z(U )]| × |Corr[z(Ū )] − Corr[z(Ū ), z(U )]Corr[z(U )]−1 Corr[z(U ), z(Ū )]|
For fixed GaSP parameters and a fixed grid V, the expression on the left is fixed. The second
factor on the right is the determinant of the (very large) conditional variance matrix for z
5
at all sites other than those in the design that we would want to minimize for D-optimality.
The first factor on the right is the determinant of the unconditional variance matrix for z at
the design sites (a much smaller matrix). Taken together, this says that we can minimize
the determinant of the conditional variance matrix for z(Ū ) – the criterion function for Doptimality – by maximizing the determinant of the unconditional variance matrix for z(U )
with respect to selection of U . This latter matrix is typically much, much smaller (order N ,
the size of the design), and so computationally much more feasible.
This value of this fact actually extends beyond the context of a single selected finite grid
V. Suppose we maximize |Corr(z(U ))| with respect to selection of design points from a
possibly continuous/infinite U. This, in fact, guarantees that |V ar[z(Ū )|z(U )]| is minimized
for any grid V you might have selected from U that includes U .
Whether the search is confined to points on a grid or not, this means that a D-optimal
design (or near-D-optimal design) can be constructed using calculations involving N × N
matrices. An add-delete algorithm that takes advantage of this might follow the following
outline:
• set N
• pick a random (or other) starting design, U
• compute C(U ) (the N × N correlation matrix)
• to add a point:
– loop over all u+ ∈ U
∗ U+ ← U + u+
∗ compute |C(U+ )| as |C(U )| × (1 − r0 C(U )−1 r) = q (where r is an N -element
vector of correlations)
∗ keep u+ for which q is largest
– U+ ← U + u+
– N ←N +1
• to delete a point:
– loop over all u− ∈ U+
∗ U− ← U+ − u−
∗ compute |C(U− )| as |C(U+ )|/(1 − r0 C(U− )−1 r) = q
∗ keep u− for which q is largest
– U ← U+ − u−
6
– N ←N −1
• alternatively add and delete until no further change results
Asymptotic Optimality for No Noise
All discussion of optimal design construction to this point has been predicated on stating
a value of θ, even for the case of σ 2 = 0. Johnson, Moore, and Ylvisaker (1990) developed
arguments that characterize designs for GaSP models (with σ 2 = 0) for the limiting case of
“weak local correlation”, i.e. of θ → ∞ in the parameterization we’ve adopted here.
Suppose the correlation function R(∆) can be written in terms of what we will call
a “distance function” d(∆) that is non-negative, and zero only for ∆ = 0. Write the
relationship between R and d as R(∆) = r(d(∆)), where we require that r be a decreasing
function of d (so that correlation is relatively weak for distances that are relatively large). For
example, the product-form Ornstein-Uhlenbeck correlation Rθ (∆) = exp(− ri=1 θi |∆i |), so
P
d = ri=1 θi |∆i |, i.e. “distance” is the sum of distances in each dimension, each scaled by a
P
corresponding parameter (sometimes called “rectangular distance”). Similarly, the productform Gaussian correlation function corresponds to d =
Pr
2
i=1 θi ∆i ,
or squared Euclidean
distance where, again, each dimension is scaled by a corresponding parameter.
For any design space U, define the following:
• Call an N -point design U a minimiax distance design (with respect to d) if the largest
distance between a point in U to the nearest point in U is less than or equal to that of
any other N -point design. That is, a minimax distance design is any solution to:
argminU maxu∈U minv∈U d(u − v)
For a minimax distance design, let this largest distance between a point in U and the
nearest point in U be dmM . Call the points in U that are distance dmM from the nearest
point in U remote points. Let the smallest number of points in U that are distance
dmM from some remote point be ImM . Then the design is a minimax distance design
of maximum index if there is no minimax distance design with a larger value of ImM .
• Call an N -point design U a maximin distance design (with respect to d) if the smallest
distance between two points in U is greater than or equal to that of any other N -point
design. That is, a maximin distance design is any solution to:
argmaxU minu6=v∈U d(u − v)
For a maximin distance design, let this smallest distance between two points in U be
dM m , and let the number of pairs of points in the design separated by this distance be
IM m . Then the design is a maximin distance design of minimum index if there is no
maximin distance design with a smaller value of IM m .
7
Now consider a progression of GaSP models indexed by positive k for which the correlation function is Rk (∆). The two central results of Johnson, Moore, and Ylvisaker are the
following:
• Minimax distance designs of maximum index are asymptotically G-optimal as k → ∞.
• Maximin distance designs of minimum index are asymptotically D-optimal as k → ∞.
(Proof for the second result, for example, comes from the fact that in the limit, dM m determines the largest-order term in log|C(U )|, and IM m determes the coefficient of that term.)
The two figures below, taken from the JMY paper, display a minimax distance design
of maximum index (left) and a maximin distance design of minimum index (right), each in
N = 7 points for U = [0, 1]2 , for squared Euclidean distance. The circles on the graphs
emphasize reliance on distance from design points to “most distant” points not in the design
(minimax) and the minimum distance between design points (maximin).
Computing to find a minimax distance design is relatively difficult because for any design,
the distance to all other points in U must, in principle, be considered. Evaluation of a design
by the maximin criterion is much simpler since only distances between the
N
2
pairs of points
in the design need to be evaluated.
A Cute Mm Distance Example (of limited practical value)
Consider the squared-Euclidean distance function d(s, t) =
Pr
l=1 (sl
− tl )2 , and the hy-
percubic design region U = [−1, +1]r . (I’ve omitted a θ parameter here for simplicity; the
argument to be presented also holds for scaled distance so long as the scaling is the same in
each dimension.) For convenience, think about a conventional design matrix XN ×r with (i, j)
element ui,j . For this problem, denote the distance between the ith and jth design points
as d(i, j) =
Pr
2
l=1 (ui,l − uj,l ) . For any N -point design, there are a total of
distances:
8
N
2
interpoint
P
i<j
P
l (ui,l
− uj,l )2 =
P P
l
i<j (ui,l
− uj,l )2 =
const × S 2 (l)
P
l
where S 2 (l) is the sample variance of the elements in the lth column of X. We know that
these sample variances are all as large as possible for any design that specifies N/2 +1’s
and N/2 -1’s for each independent variable – the so-called “balanced designs” including
orthogonal arrays.
Now, narrow consideration to N = r + 1 = 0 mod[4], i.e. the conditions needed for a
“full-width” Plackett-Burman design. Let X∗ be the model matrix for a first-order linear
model, including a column for the intercept:
X0∗ X∗ = N I = X∗ X0∗
where the last equation is correct because X∗ is square. The elements of this last matrix
are:
• diagonal: 1 +
P
l
• off-diagonal: 1 +
So,
P
l
u2i,l − 2
P
l
u2i,l
P
l
ui,l uj,l +
ui,l uj,l
P
l
u2j,l = 2N , which implies that
P
l (ui,l
− uj,l )2 = 2N , all (i, j).
That is, the distance between every pair of points is 2N . Putting this together, for PlackettBurman designs,
•
P
i,j
d(i, j) is maximized
• d(i, j) is the same for every (i, j)
Hence, Plackett-Burman designs are maximin distance designs for the stated problem (Further, this argument also works for rectangular distance.)
Latin Hypercube Designs
There are also other forms of spatial designs that are not directly motivated by the
precision of prediction that can be expected for GaSP models, but have been shown to be
effective experimental plans for this purpose. Probably the mostly widely used of these is
based on the Latin Hypercube Sample, LHS, (not “design” yet, but we’ll get to that soon),
introduced by McKay, Beckman, and Conover (1979) in the context of computer experiments
in which inputs are chosen randomly from some specified distribution, and analysis focuses on
estimating properties, such as the mean or specified quantiles, of the resulting distribution of
the outputs. In this kind of experiment, the values of inputs actually selected are generally
not used in the estimation exercise, that is, N input vectors are randomly selected, the
computer model is executed for each of them, and the analysis is based only on the resulting
9
random sample of output values. McKay, Beckman, and Conover (1979) focused in particular
on averages of functions of the output:
T =
n
1 X
g(zi ),
N i=1
(1)
where zi , i = 1, 2, 3, ..., N is the value of the output of interest resulting from execution of
the model with the ith selected set of inputs (ui ). In this setting g is an arbitrary function
that accomodates a useful variety of output statistics. For example, g(z) = z leads to the
sample mean, g(z) = z m for positive integer m yields the mth noncentral sample moment,
and g(z) = 1 for z < z ∗ and 0 otherwise results in the empirical distribution function
evaluated at z ∗ .
Latin Hypercube sampling is based on the idea that a joint probability distribution has
been specified for the input vector, F (u), and that the elements of u are independent so that
the joint distribution can be written as the product of the marginals, F (u) =
Qr
i=1
Fi (ui ).
Values of the inputs are selected individually. For the ith input, the range of ui is partitioned into n non-overlapping intervals, each of probability 1/N under Fi , and one value
of ui is drawn conditionally from each of these intervals. After N values have been thus
chosen for each input, they are combined randomly (with equal probability for each possible
arrangement) to form the N input vectors each of order r. When N is large, the conditional
sampling from equal-probability intervals is often ignored, and values are simply taken from
a regular grid. The following figure displays (in the left panel) how 5 sample values of one
input are selected conditionally from equal-probability “slices” of a given univariate distribution, and (in the right panel) how values chosen in this way for 2 inputs can be randomly
matched to construct a Latin Hypercube sample.
10
The basic result presented by McKay, Beckman, and Conover (1979) compares the efficiency of estimation for LHS to that for simple random sampling (SRS), and can be easily
stated. For a fixed sample size N , let TSRS be the quantity of interest calculated from
outputs resulting from a simple random sample of inputs, and let TLHS be the same quantity resulting from a Latin Hypercube sample. Then if the computer model is such that
z is a monotonic function of each of the inputs, and g is a monotonic function of z, then
V ar(TLHS ) ≤ V ar(TSRS ). Stein (1987) showed that so long as E(g(z)2 ) is finite, the asymptotic (large N ) variance of TLHS is no larger than that of TSRS without the monotonicity
requirements, and that the asymtotic efficiency of TLHS relative to TSRS is governed by how
well the computer model can be approximated by a linear function in u.
As described above, the original justification for the Latin Hypercube was as a stratified
sampling plan, requiring a specified probability distribution for u. However, the structure
of the LHS is frequently used in GaSP modeling applications where u is not regarded as
random, resulting in the Latin Hypercube design. Intuitive appeal for this approach to
design for meta-modeling includes the following:
1. One-dimensional stratification: In a Latin Hypercube sample (or design), each input
takes on N distinct values spread across its experimental range. This is not particularly appealing in physical experimentation since it implies that the design cannot have
factorial structure or include replication. However, in experiments with deterministic
models, there is no uncontrolled “noise” as such; replication is not needed to estimate
uncontrolled variation, and the benefits of factorial structure that come from maximizing “signal to noise” in the analysis are not relevant. The N -value one-dimensional
projections of a Latin Hypercube provide (at least in some cases) the information
needed to map out more complex functional z-to-u behavior than can be supported
with designs that rely on a small number of unique values for each input.
11
2. Space-filling potential: The modeling techniques that are most appropriate in this
context are data interpolators rather than data smoothers; they generally perform best
when the N points of an experimental design “fill the space”, as opposed to being
arranged so that there are relatively large subregions of U containing no points (as
is the case, for example, with factorial designs with relatively few levels attached to
each input). While Latin Hypercube designs do not necessarily have good spacefilling properties, they can be made to fill ∆ effectively through judicious (non-random)
arrangement of the combinations of input values used. As one example, Morris and
Mitchell (1995) constructed maximin distance designs within the class of equally-spaced
Latin Hypercube designs for use in computer experiments.
Sequential Experiments: Expected Improvement
As noted earlier, the GaSP model with σ 2 = 0 has become a popular statistical framework
for modeling the behavoir of deterministic computer models. The motivation for this is often
that complex computer models take a substantial amount of computer time for each run,
but many applications require a large number of such evaluations. An important example of
this kind of computer “experimentation” is the problem of numerical function optimization.
Suppose our interest in z(u) is in finding the value of values of u for which z is maximized.
Numerical optimization has traditionally not been treated as a statistical problem. However,
suppose that z takes substantial computer time for each function evaluation; many or most
traditional numerical optimization approaches may be infeasible due to the number of function evaluations they require, particularly when the problem dimension (r) is high. Recent
statistical research has been focused on how a “meta-model” such as a GaSP might be used
to make more complete use of the data, and lead to effective function optimization through
fewer evaluations of z.
A very simple way to do this is to use the conditional/posterior predictor ẑ(u) as what
is sometimes called an “oracle”, to predict the values of u that are likely to lead to larger
value of z. A simple algorithm:
• Begin with a “standard” but small design U of points taken from U.
• Fit a GaSP model, and find the value or values of u ∈ U that lead to the largest value
of ẑ. (This can be done relatively quickly, since ẑ is typically much easier to calculate
than z.)
• Evaluate z at the value of u identified in the last step, add this value of z to the
dataset, and update the GaSP predictor of z.
• Iterate the second and third stapes until no appreciable improvement is attained.
12
The above approach is simple and heuristic, and more important, it often is actually quite
effective. However, it does not take uncertainty of the predictions ẑ into account, and so
can take ẑ values “too seriously”, especially in early iterations, leading to slow convergence
or failure in some cases. An alternative approach that takes advantage of more of the
information in the stochastic process is based on expected improvement. For our function
maximization example, suppose that we have evaluated z at each of N values of u (the
“current design”) and wish to select an N + 1st value of u for the next evaluation. Because
our goal is function maximization, the next iteration yields “improvement” for our pusposes
only if the resulting value of z is greater than any of the N values already computed. Toward
this end, let zN,max denote the largest z associated with the N function evalutions to date,
and define the improvement that would be realized from a new evaluation at u to be


I(u) = 
z(u) − zN,max , z(u) − zN,max > 0
z(u) − zN,max ≤ 0
0
Since z(u) is unknown at any u other than those in the N -point design, I(u) cannot be
evaluated. However, under the GaSP model, z(u) is a random variable with a conditional
(on the first N data values) distribution, and so I(u) is also a random variable for which
the conditional distribution can be characterized. A sequential design approach based on
expected improvement selects as the next u the vector that maximizes the expectation of
I(u), conditional on all information collected through the first N evaluations.
I(u)|z(U ) has a distribution that is a mixture of a truncated normal distribution and a
point mass distribution (at I = 0). For notational simplicity, represent the conditional standard deviation of z(u) by S(u); S(u) =
q
V ar(z(u)|z(U )). Then the conditional expectation
of I(u) can be derived as:
(
ẑ(u) − zN,max
ẑ(u) − zN,max
Φ
E(I(u)|z(U )) = S(u)
S(u)
S(u)
!
ẑ(u) − zN,max
+φ
S(u)
!)
where Φ and φ denote the cumulative distribution function and the density function, repectively, of the standard normal random variable. The algorithm described above can be
altered by substituting E(I(u)|z(U )) for ẑ(u) in the second step. The result is that, rather
than selecting the site for which ẑ(u) is maximized, sites for which ẑ(u) is somewhat smaller,
but for which S(u) is large (indicating the strong possibility of a larger value of z) are sometimes selected. Overall, this is a compromise between identifying the points that appear to
maximize z based on the best scalar-valued predictions, and points at which uncertainty is
large enough that improved precision of these scalar-valued predictions is needed.
References
Johnson, M.E., L.M. Moore, and D. Ylvisaker (1990). “Minimax and Maximin Distance
Designs,” Journal of Statistical Planning and Inference 26, 131-148.
13
McKay, M.D., R.J. Beckman, and W.J. Conover (1979). “A Comparison of Three Methods
for Selecting Values of Input Variables in the Analysis of Output from a Computer
Code,” Technometrics 21, no.2, 239-245.
Morris, M.D., and T.J. Mitchell (1995). “Exploratory Designs for Computer Experiments.
Journal of Statistical Planning and Inference 43, 381-402.
Stein, M. (1987). “Large Sample Properties of Simulations Using Latin Hypercube Sampling,”Technometrics 29, no. 2, 143-151.
14
Download