Slides for Introduction to Stochastic Search and Optimization ( ISSO ) by J. C. Spall
CHAPTER 17
O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS
•Organization of chapter in ISSO*
–Background
•Motivation
•Finite sample and asymptotic (continuous) designs
•Precision matrix and D -optimality
–Linear models
•Connections to D -optimality
•Key equivalence theorem
–Response surface methods
–Nonlinear models
* Note : Appendix to these slides is brief discussion of factorial design (not in ISSO )
Optimal Design in Simulation
• Two roles for experimental design in simulation
– Building approximation to existing large-scale simulation via “metamodel”
– Building simulation model itself
• Metamodels are “curve fits” that approximate simulation input/output
– Usual form is low-order polynomial in the inputs; linear in parameters
– Linear design theory useful
• Building simulation model
– Typically need nonlinear design theory
• Some terminology distinctions:
– “ Factors ” (statistics term) “ Inputs ” (modeling and simulation terms)
– “ Levels ” “ Values ”
– “ Treatments ” “ Runs ”
17-2
Unique Advantages of Design in Simulation
• Simulation experiments may be considered special case of general experiments
• Some unique benefits occur due to simulation structure
• Can control factors not generally controllable (e.g., arrival rates into network)
• Direct repeatability due to deterministic nature of random number generators
– Variance reduction (CRNs, etc.) may be helpful
• Not necessary to randomize runs to avoid systematic variation due to inherent conditions
– E.g., randomization in run order and input levels in biological experiment to reduce effects of change in ambient humidity in laboratory
– In simulation, systematic effects can be eliminated since analyst controls nature
17-3
Design of Computer Experiments in Statistics
• There exists significant activity among statisticians for experimental design based on computer experiments
– T. J. Santner et al. (2003), The Design and Analysis of
Computer Experiments , Springer-Verlag
– J. Sacks et al (1989), “Design and Analysis of Computer
Experiments (with discussion),” Statistical Science , 409 –435
– Etc.
• Above statistical work differs from experimental design with
Monte Carlo simulations
– Above work assumes deterministic function evaluations via computer (e.g., solution to complicated ODE)
• One implication of deterministic function evaluations: no need to replicate experiments for given set of inputs
• Contrasts with Monte Carlo, where replication provides variance reduction
17-4
General Optimal Design Formulation
(Simulation or Non-Simulation)
• Assume model z = h (
, x ) + v , where x is an input we are trying to pick optimally
• Experimental design consists of N specific input values x =
i and proportions (weights) to these input values w i
:
1
w
1 w
2
2
w
N
N
• Finite-sample design allocates n
N available measurements exactly; asymptotic (continuous) design allocates based on n
17-5
D-Optimal Criterion
• Picking optimal design requires criterion for optimization
• Most popular criterion is D -optimal measure
• Let M (
,
) denote the “precision matrix” for an estimate of
based on a design
– M (
,
) is inverse of covariance matrix for estimate and/or
–
M (
,
) is Fisher information matrix for estimate
• D -optimal solution is
M
17-6
Equivalence Theorem
• Consider linear model z k
h
T k
v k
, k = 1,2,..., n
• Prediction based on parameter estimate and
“future” measurement vector h T is n z = h
T
ˆ n
• Kiefer-Wolfowitz equivalence theorem states:
Doptimal solution for determining
to be used in forming
ˆ n is the same
that minimizes the maximum variance of predictor z
ˆ
• Useful in practical determination of optimal
17-7
Variance Function as it Depends on
Input: Optimal Asymptotic Design for
Example 17.6 in ISSO
17-8
Orthogonal Designs
• With linear models, usually more than one solution is
D -optimal
• Orthogonality is means of reducing number of solutions
• Orthogonality also introduces desirable secondary properties
– Separates effects of input factors (avoids “aliasing”)
– Makes estimates for elements of uncorrelated
• Orthogonal designs are not generally D -optimal;
D -optimal designs are not generally orthogonal
– However, some designs are both
• Classical factorial (“cubic”) designs are orthogonal (and often D -optimal)
17-9
Example Orthogonal Designs, r = 2 Factors x k 2 x k 2 x k 1
Cube (2 r design) x k 1
Star (2r design)
17-10
Example Orthogonal Designs, r = 3 Factors x k 2 x k 2 x k 3
Cube (2 r design) x k 1 x k 1 x k 3
Star (2r design)
17-11
Response Surface Methodology (RSM)
• Suppose want to determine inputs x that minimize the mean response z of some process ( E ( z ))
– There are also other (nonoptimization) uses for RSM
• RSM can be used to build local models with the aim of finding the optimal x
– Based on building a sequence of local models as one moves through factor ( x ) space
• Each response surface is typically a simple regression polynomial
• Experimental design can be used to determine input values for building response surfaces
17-12
Steps of RSM for Optimizing x
Step 0 (Initialization) Initial guess at optimal value of x .
Step 1 (Collect data) Collect responses z from several x values in neighborhood of current estimate of best x value (can use experimental design).
Step 2 (Fit model) From the x , z pairs in step 1, fit regression model in region around current best estimate of optimal x .
Step 3 (Identify steepest descent path) Based on response surface in step 2, estimate path of steepest descent in factor space.
Step 4 (Follow steepest descent path) Perform series of experiments at x values along path of steepest descent until no additional improvement in z response is obtained.
This x value represents new estimate of best vector of factor levels.
Step 5 (Stop or return) Go to step 1 and repeat process until final best factor level is obtained.
17-13
Conceptual Illustration of RSM for Two
Variables in x; Shows More Refined
Experimental Design Near Solution
Adapted from:
Montgomery (2005),
Design and Analysis
of Experiments,
Fig. 11-3
17-14
Nonlinear Design
• Assume model z = h (
, x ) + v , where
enters nonlinearly and x is r -dimensional input vector
• D -optimality remains dominant measure
– Maximization of determinant of Fisher information matrix (from Chapter 13 of ISSO : F n
(
, X ) is Fisher information matrix based on n inputs in n × r matrix X )
• Fundamental distinction from linear case is that D optimal criterion depends on
• Leads to conundrum:
Choosing X to best estimate
, yet need to know
to determine X
17-15
Strategies for Coping with
Dependence on
• Assume nominal value of and develop an optimal design based on this fixed value
• Sequential design strategy based on an iterated design and model fitting process.
• Bayesian strategy where a prior distribution is assigned to
, reflecting uncertainty in the knowledge of the true value of
17-16
•
•
•
Sequential Approach for Parameter
Estimation and Optimal Design
Step 0 (Initialization) Make initial guess at
, Allocate n
0 measurements to initial design. Set k = 0 and n = 0.
Step 1 (D-optimal maximization) Given X n inputs in X X det [ k
F n
(
ˆ n
, X n
)
F n k
( n
Step 2 (Update
estimate) Collect n k
, X ) ] .
, choose the n k measurements based on inputs from step 1. Use measurements to update from
ˆ n to
ˆ n n k
.
Step 3 (Stop or return) Stop if the value of
in step 2 is satisfactory. Else return to step 1 with the new k set to the former k + 1 and the new n set to the former n + n k
(updated X n now includes inputs from step 1).
17-17
Comments on Sequential Design
• Note two optimization problems being solved: one for
, one for
• Determine next n k input values (step 1) conditioned on current value of
– Each step analogous to nonlinear design with fixed
(nominal) value of
• “Full sequential” mode ( n k
= 1) updates
based on each new input
ouput pair ( x k
, z k
)
• Can use stochastic approximation to update
: n
1
ˆ n
a Y
n
| z n
1
, x n
1
where Y n
(
| z n
1
, x n
1
)
1
2
z n
1
h
x n
1
)
2
17-18
Bayesian Design Strategy
• Assume prior distribution (density) for
, p (
), reflecting uncertainty in the knowledge of the true value of
.
• There exist multiple versions of D -optimal criterion
• One possible D -optimal criterion:
E
logdet
F n
(
, X )
logdet
F n
(
, X )
p ( )
• Above criterion related to Shannon information
• While log transform makes no difference with fixed
, it does affect integral-based solution
• To simplify integral, may be useful to choose discrete prior p (
)
17-19
Appendix to Slides for Chapter 17: Factorial
Design (not in ISSO; see ref. [1] below)
• Classical experimental design deals with linear models
• Factorial design is most popular classical method
– All r inputs (“factors”) changed at one time (note: ref.
[1] uses notation m instead of r )
• Factorial design provides two key advantages over one-ata-time changes:
1. Greater efficiency in extracting information from given number of experiments
2. Ability to determine if there are interaction effects
• Standard method is 2 r factorial; “2” comes about by looking at each input at two levels: low (
) and high (+)
(
– E.g., if r = 3, then have 2 3 = 8 input combinations:
), (+
), (
+
), (
+), (+ +
), (+
+), (
+ +), (+ + +)
[1] Spall, J. C. (2010), “Factorial Design for Choosing Input Values in Experimentation:
Generating Informative Data for System Identification,”
IEEE Control Systems
Magazine , vol. 30(5), pp. 38−53.
17-20
Appendix to Slides (cont’d):
Factorial Design with 3 Inputs
• Consider r = 3 linear model z k
=
0
+
1 x k 1
+
2 x k 2
+
3 x k 3
+
4 x k 1
+
6 x k 2 x k 3
+
7 x k 1 x k 2 x k 3
+ noise, x k 2
+
5 x k 1 x k 3 where
= [
0
,
1
,…,
7
] T represents vector of (unknown) parameters and x ki represents i th term in input vector x k
• 2 3 factorial design allows for efficient estimation of all parameters in
• In contrast, one-at-a-time provides no information for estimating
4 to
7
• However, 2 3 factorial design must be augmented in some x
2 way if wish to add quadratic (e.g., ) or other higherorder polynomial terms to model
17-21
Appendix to Slides (cont’d):
Illustration of Interaction with 2 Inputs
• Example responses for r = 2: no interaction and interaction between input variables
• Left plot (no interaction) shows that change in z k change in x k 2 does not depend on x k 1
; right plot with
(interaction) shows change in z k does depend on x k 1 z k
No interaction z k
Interaction
(+ +)
X
k1
= high
(+
)
X
k1
= high
(
+)
(
+)
(+
)
X
k1
= low
(
)
X
k1
= low
(+ +)
(
) x k 2 x k 2
17-22
Appendix to Slides (cont’d):
Efficiency of Factorial Design for Main Effects
• Factorial design estimates “main effects” (non-interaction) with greater efficiency than one-at-a-time changes
• Plot below based on same accuracy in estimation for the two methods
8
7
6
3
2
1
2
5
4
4 6 8 10
Input dimension r
12 14 16
17-23