Yu-Chi Ho - Harvard John A. Paulson School of Engineering and

advertisement
Ordinal Optimization and Simulation
Yu-Chi Ho
Harvard University, Cambridge, MA 02138
Christos G. Cassandras
Boston University, Boston, MA 02215
Chun-Hung Chen
University of Pennsylvania, Philadelphia, PA 19104-6315
Liyi Dai
Washington University, St. Louis, MO 63130
Abstract
Simulation plays a vital role in analyzing stochastic systems, particularly, in comparing
alternative system designs with a view to optimize system performance. Using simulation to
analyze complex systems, however, can be both prohibitively expensive and time consuming.
Efficiency is a key concern for the use of simulation to optimization problems. Ordinal
optimization has emerged as an effective approach to significantly improve efficiency of
simulation and optimization. One can achieve an exponential convergence rate when using
ordinal optimization for simulation problems. There are already several success stories of ordinal
optimization. In this paper we will introduce the idea of ordinal optimization and report some
recent advances in this research. Details of an extension of ordinal optimization to a class of
resource application problems are given as well.
***
Keywords: Discrete-event simulation, ordinal optimization, discrete-event dynamic
systems, resource allocation problem.
***
Corresponding author: Professor Chun-Hung Chen, Dept. of Systems Engineering,
University of Pennsylvania, Philadelphia, PA 19104-6315, USA. Fax: 215-573-2065,
Tel: 215-898-3967, Email: chchen@seas.upenn.edu.
1
1. Introduction
It can be argued that OPTIMIZATION in the general sense of making things BETTER is the
principal driving force behind any prescriptive scientific and engineering endeavor, be it
operations research, control theory, or engineering design. It is also true that the real world is full
of complex decision and optimization problems that we cannot solve. While the literature on
optimization and decision making is huge, much of the concrete analytical results and success
stories are associated with what may be called Real-Variable-Based methods. The idea of
successive approximation to an optimum (say, minimum) by sequential improvements based on
local information is often captured by the metaphor "skiing downhill in a fog". The concepts of
gradient (slope), curvature (valley), and trajectories of steepest descent (fall line) all require the
notion of derivatives/gradients and are based on the existence of a more or less smooth response
surface. There exist various first and second order algorithms of feasible directions for the
iterative determination of the optimum (minimum) of an arbitrary multi-dimensional response or
performance surface. A considerable number of major success stories exist in this genre
including the Nobel prize winning work on linear programming. It is not necessary to repeat or
even reference these here.
On the other hand, we submit that the reason many real-world optimization problems
remain unsolved is partly due to the changing nature of the problem domain in recent years
which makes calculus or real-variable-based method less applicable. For example, a large
number of human-made system problems, such as manufacturing automation, communication
networks, computer performances, and/or general resource allocation problems, involve
combinatorics rather than real analysis, symbols rather than variables, discrete instead of
continuous choices, and synthesizing a configuration rather than proportioning the design
parameters. Such problems are often characterized by the lack of structure, presence of large
uncertainties, and enormously large search spaces. Performance evaluation, optimization, and
validation of such systems problems are daunting, if not impossible, tasks. Most analytical tools
such as queueing network theory or Markov Chains fall short due to modeling or computational
limitations. Simulation becomes the general tool of choice or last resort. Furthermore,
optimization for such problems seems to call for a general search of the performance terrain or
response surface as opposed to the “skiing downhill in a fog” metaphor of real variable based
performance optimization1. Arguments for change can also be made on the technological front.
Sequential algorithms were often dictated as a result of the limited memory and centralized
control of earlier generations of computers. With the advent of modern general purpose parallel
computing and the essentially unlimited size of virtual memory, distributed and parallel
procedures on a network of machines can work hand-in-glove with the Search Based method of
performance evaluation. It is one of the theses of this paper to argue for such a complementary
approach to optimization via simulation models.
1
We hasten to add that we fully realize the distinction we make here is not absolutely black and white. A
continuum of problem types exist. Similarly, there is a spectrum of the nature of optimization variables or search
space ranging from continuous to integer to discrete to combinatorial to symbolic.
2
If we accept the need for a search-based method as a complement to analytical
techniques, then we can argue that to quickly narrow the search for optimum performance to a
“good enough” subset in the design universe is more important than to estimate accurately the
values of the system performance during the initial stages of the process of optimization. We
should compare order first and estimate value second, i.e., ordinal optimization comes before
cardinal optimization. Colloquial expressions, such as “ballpark estimate”, “80/20 solution”
and “forest vs. trees”, state the same sentiment. Furthermore, we shall argue that our
preoccupation with the “best” may be only an ideal that will always be unattainable or not cost
effective. Real world solutions to real world problems will involve compromise for the “good
enough”2 . The purpose of this paper is to establish the distinct advantages of the softer approach
of ordinal optimization for the search-based type of problems, to analyze some of its general
properties, and to show the many orders of magnitude improvement in computational efficiency
that are possible under this mindset; finally to also advocate the use of this mindset as an
additional tool in the arsenal of soft computing and a complement to the more traditional cardinal
approach to optimization and engineering design.
In the next section, we formulate the general simulation and optimization problems and
point out the major limitation of traditional simulation approaches. Ordinal optimization is introduced in Section 3. In Section 4, we present the theoretical analysis that ordinal optimization can
achieve an exponential convergence rate. Section 5 gives a budget allocation technique which
can future improve efficient of exponential convergence of ordinal optimization. In Section 6, we
will present several successful stories of the application of ordinal optimization. In particular,
details of an extension of ordinal optimization to a class of resource application problem and its
application to network control problem will be given. Section 7 concludes this paper.
2. Problem Statement
Suppose we have a complex stochastic system. The performance of this system is conceptually
given by J() where J is the expected performance measure and are the system design
parameters which may be continuous, discrete, combinatorial or even symbolic. A general
problem of stochastic optimization can be defined as
min J()  E[L( )]
(2.1)
 
where the search space is an arbitrary, huge, structureless but finite set;  is the system
design parameter vector; J, the performance criterion which is the expectation of L, the sample
performance, as a functional of  and , a random vector that represents uncertain factors in the
systems.
However, there are some basic limitations related to search-based methods which require
the use of simulation to estimate performances J() as in Eq.(2.1). There are two aspects to this
problem: the "stochastic" part and the "optimization" part.
2
Of course, here one is mindful of the business dictum, used by Mrs. Fields for her enormously successful cookie
shops, which says ”Good enough never is”. However, this can be reconciled via the frame of reference of the
optimization problem.
3
The "stochastic" aspect has to do with the problem of performing numerical expectation
since the functional L( ) is available only in the form of a complex calculation via simulation.
The standard approach is to estimate E[L( )] by sampling, i.e.,
1 t
E[L( )]  Jˆ ( )   L( ,  i )
t i 1
(2.2)
where i represents the i-th sample of . The ultimate accuracy (confidence interval) of this
estimate cannot be improved upon faster than 1/t1/2, the result of averaging i.i.d. noise. This rate
can be quite adequate or even very good for many problems but not good enough for the class of
complex simulation-based performance evaluation problems mentioned above. Here each sample
of L(i) requires one simulation run which for the type of complex stochastic systems we are
interested in here can take seconds or minutes even on the fastest computer not to mention the
setup cost of programming the simulation. Millions or billions of required samples of L(i) for
different values of  may become an infeasible burden. Yet there is no getting away from this
fundamental limit of 1/t1/2 , i.e. one order of magnitude increase in accuracy requires two orders
of magnitude increase in sampling cost.
The second limitation has to do with the "optimization" part. Traditionally, analysis again
plays an important part in numerical optimization whether it is convergence, gradient
computation, or general hill climbing. Again these tools, such as infinitesimal perturbation
analysis (IPA), take advantage of the structure and real-variable nature of the problem to devise
effective algorithms for optimization. However, when we exhaust such advantages and the
remaining problem becomes structureless and  is totally arbitrary as above, we encounter the
limitation introduced by combinatorial explosion, or the so called NP-hard Limitation3. In such
cases, a more or less blind search is the only alternative. While it is often easy to prove
probabilistic convergence of random search, it is very hard to improve the search efficiency. An
example will illustrate this:
Suppose  is of size 1010,which is small by combinatorial standards. We are allowed to
take 10,000 samples of J() uniformly in A natural question is what the probability that at
least one of these samples will belong in the top 50, top 500, or top 5000 of the designs of  is,
i.e.,
P{at least one of the 10,000 sample is in the top-g} = 1 - (1-
g 10,000
)
1010
(2.3)
which turns out to be 0.00004999, 0.0004998, and 0.004988 respectively. This means that for a
simulation based optimization problem of any complexity, a simple random search is not an
effective approach — too much computation for too little return4.
3
For example, in the case of a single input single output static decision rule, the number of possibilities is the size of
the output possibilities raised to the power of the size of the input possibilities. This is an extremely large number.
4 To make matters worse, the No-Free-Lunch Theorem of Wolpert and MaCready (IEEE Journal on Evolutionary
Computation, Vol. 1, 1997) says that without structural information no algorithm can do better on the average than
blind search.
4
These two difficulties, the 1/t1/2 and the NP-Hard Limitation, are basic for general
stochastic optimization problems of the type we are interested in. Each one can induce an
insurmountable computational burden. To make matters worse, the effect of these limitations is
multiplicative rather than additive. No amount of theoretical analysis can help once we are faced
with these limitations. This is where ordinal optimization comes in.
3. Ordinal Optimization
The idea of OO is to effect a strategic change of goals5. It is based on two tenets:
(a) "Order" converges exponentially fast while "value" converges at rate 1/t1/2. In other
words, it is much easier to determine whether or not alternative A is better than B than to
determine "A-B=?". This is intuitively reasonable if we think of the problem of holding
identically-looking packages in each hand and trying to determine which package weighs more
vs. to estimate the difference in weight between the two packages. More quantitatively, suppose
we observe A and B through i.i.d. zero mean additive normal noises. The chance of mis-ordering
A and B is roughly proportional to the area under the overlapping tail of the two density
functions centered at A and B as illustrated below.
A plus
perturbing
noise
B plus
perturbing
noise

This notion can be made more precise. If A and B plus noise are the performance value estimates
of two simulation experiments, then under the best conditions the estimates cannot converge to
the true values faster than 1/t1/2 where “t” is the length or number of replications of the
simulation experiments. But the relative order of A vs. B converges exponentially fast (Dai 1996,
Xie 1998).
Furthermore, the value or cardinal estimation may not always be necessary. In many
cases, we simply try to determine which is the better alternative without necessarily knowing the
detailed consequence of what each alternative entails, i.e., we emphasize the choice (order)
rather than estimating the utility (value) of the choices.
(b) Goal softening eases the computational burden of finding the optimum. Traditional
optimization determines a single alternative in  and hopes that it coincides with the true
optimum. The metaphor of hitting one speeding bullet with another comes to mind. Instead if we
5
This is a euphemism for retreat.
5
are willing to settle for the set of "good enough" alternatives with high probabilities, e.g., any of
the top-n% of the choices for 95% of the time, then this probability improves dramatically with
the size of these sets (i.e., n). It is like trying to hit a truck with a shotgun.
Let us formalize this notion a bit. Suppose we define a subset of the design space  as the
“good enough” subset, G, which could be the top-g designs or top-n(g/N)% of the designs of
the sampled set of N designs. We denote the size of the subset G as |G| g. We now pick out
(blindly or by some rule) another subset, called the selected subset, S with the same cardinality,
i.e., |S|=|G|=g6 . We now ask the question what is the probability that among the set S we have at
least k (≤ g) of the members of G, i.e., P{|GS|≥k}.
This question is relevant since our rule for picking the set S clearly affects how many
good designs we get in S. In the case of blind picking (i.e., a purely random selection of S), we
expect the overlap to be smaller than the result of selecting based on some estimated order of the
alternatives. We define P{|GS|k} as the alignment probability which is a measure of the
goodness of our selection rules. Fig.3.1 below illustrates the concept. For perspective, we have
also indicated the traditional optimization view where the subsets G and S are both singletons.
Figure 3.1. Generalized concept for Optimization.
For the case of blind picking, which can be considered as observing the relative orders
through noise having infinite variance (i.e., any design can be considered as the good enough
design and vice versa), this alignment probability is given by Ho et al. (1992).
g N  g 
 i  g  i 

P{|GS|≥k} =    
N
i k
 
g
 
g
(3.1)
For example, if N=200, g=12, and k=1, then P P{|GS|≥k} > 0.5. When N=200, g=32, and k=1
P is almost 1. We can calculate P for other values N, g, and k easily, one example of which for
N=1000 is illustrated in Figure 3.2 below.
6
Note |S| and |G| need not be equal in which case a slightly more general from of Eq.(4) to follow results.
6
Alignment Probability
1.0
k=1
k=10
0.5
0.0000
10
50
100
150
g
200
Figure 3.2. P vs. g parameterized by k=1,2, . . ., 10 for N=1000
Once again this shows that by blindly picking some 50-130 choices out of 1000 we can
get quite high values for P. Remember these curves are for blind picks. On the other hand, if we
don’t pick blindly, i.e., the variance of the estimation error is not infinite and estimated good
designs have a better chance at being actual good designs, then we should definitely do better.
Suppose now we no longer pick designs blindly, but pick according to their estimated
performance, however approximate they may be, e.g., the selected subset S can be the estimated
top-g or top-n(g/N)% of the designs7 . It turns out we no longer can determine P in closed form.
But P can be easily and experimentally calculated via a simple Monte Carlo experiment as a
function of the noise variance 2, N, g, and k, i.e., the response surface P{|GS|≥k; 2, N}. For
example, we can assume a particular form for J(), e.g., J(i) = i in which case the best design is
1 (we are minimizing) and the worst design is N and the ordered performances are linearly
increasing. For any finite i.i.d. estimation noise w, we can implement Jˆ ( ) = J() + w and
directly check the alignment between the observed and true order.
More generally, this alignment probability can be made precise and quantitative (Lau and
Ho 1997, Lee et al. 1998) and shown to be exponentially increasing with respect to the size of G
and S. Consequently, by combining the two basic ideas (a) and (b) of OO in this section, several
orders of magnitude of easing of the computational burden can be achieved.
In fact, some special cases of the alignment probability is also called the probability of
correct selection or P{CS} in the parlance of the simulation literature. There exists a large
literature on assessing P{CS} based on classical statistical models. Goldsman and Nelson (1994)
provide an excellent survey on available approaches to ranking, selection, and multiple
7
This can be characterized as the Horse Race selection rule where we estimate the performance of ALL designs and
pick the estimated top-g designs.
7
comparisons (e.g., Goldsman, Nelson, and Schmeiser 1991, Gupta and Panchapakesan 1979). In
addition, Bechhofer et al. (1995) provide a systematic and more detailed discussion on this issue.
However, most of these approaches are only suitable for problems with a small number of
designs (e.g., Goldsman and Nelson (1994) suggest 2 to 20 designs). The question "Isn't OO just
reinventing the wheel of ranking and selection literature in statistics?" may occur to the reader at
this point. First, there is a scale difference, since in OO we are dealing with design space of size
billions times billions while R&S typically deal with tens of choices. Secondly, OO does not
address questions such as the distance between the best and the rest, a cardinal notion; or
whether or not the observed order is the maximum likelihood estimate of the actual order, an
elegant concept in Ranking and Selection literature but not very useful in the OO context. Also,
the notion of softening which is a main tenet for OO is not present in R&S literature. Lastly, OO
is complementary to other tools of computational intelligence such as fuzzy logic and genetic
algorithms. In genetic algorithm, one must evaluate the fitness of a population of candidate
designs. However, it is implicitly assumed in almost all GA literature that this evaluation is
simple to carry out. For complex simulation problems we have in mind here, this is not the case.
Thus taking advantage of the ability of OO to quickly separate the good from the bad one can
extend the applicability of GA to problems heretofore not feasible computationally. Similarly, in
OO we have defined concepts such as "good enough" or high alignment probability as crisp
notions. It is straight forward to fuzzify these definitions and have OO apply to fuzzy systems
problems. In other words, "softening" and "fuzzification" are distinct and complementary
concepts. Much remains to be done.
In the next section, we will show the fast convergence of ordinal optimization
theoretically.
4. Convergence of Ordinal Comparison
In this section, we provide a theoretical investigation of the efficiency of ordinal comparison.
Without loss of generality, we say that (design) i is better than j if J(i) < J(j). For notational
simplicity, in this section, we assume that that the N designs are indexed in such as way that  <
J(1) > J(2) > J(3) > .. > J(N) > -. We are interested in finding a design in G, one of the
M , 1  M  N , true best designs in . Let H =  - G. We use Jˆ ( , t ) to denote an estimate of
J() based on a simulation of length t. In order to characterize the rate of convergence for ordinal
comparison, we define the following indicator process
1 if min  G Jˆ  , t   min  H Jˆ  , t 
I t   
0 otherwise.
Then I(t) is equal to 1 if the observed best design is a good design in G and equal to 0 otherwise.
I(t) is a function of t indicating when the observed best design is one of the good designs. We
next examine the convergence of P{I(t) = 1} and P{I(t) = 0} as a function of simulation length of
simulation samples t. The convergence rate of I(t) will be taken as that of P{I(t)=1} or P{I(t)=0}.
8
A lower bound
In many cases such as finite-state Markov chains and averaging i.i.d. random variables, the
variance of an estimate Jˆ ( , t ) decays to zero at rate of O(1/t). Such delay rate of the variance
can be used to derive a lower bound on the rate of convergence for the indicator process, which
is precisely stated in the following theorem.
Theorem 4.1 (Dai 1996) Assume that
(A4.1) The stochastic system under simulation is ergodic in the sense that
lim L( ,  , t ) = J(), a.s.
t 
for any    , and that the variance of the estimate Jˆ ( , t ) decays to zero at rate O(1/t) for all 
 , i.e.,
Var( Jˆ ( , t ) ) = O(1/t).
Then
P{I(t) = 1} = 1 - O(1/t), P{I(t) = 0} = O(1/t),
in other words, the indicator process I(t) converges to 1 at the rate of at least O(1/t).
#
Assumption (A4.1) is very mild and satisfied by most stochastic systems. This
assumption guarantees that J() can be estimated with arbitrary accuracy by increasing the
simulation duration of any one sample trajectory.
Exponential convergence
Theorem 4.1 gives a very conservative bound for general cases. In many situations, we know
much more about the structure and distribution of X(, t). Quite often, the rate of convergence
can be proven faster than that in Theorem 4.1. Especially, we consider the case of simulating
general regenerative systems.
Definition 4.2. (Kalashnikov 1994, p.19) An {X(, t)}is said to be regenerative (in a classic
sense) if there is an increasing sequence of nonnegative finite random times { i(), i  0} such
that for each.
(i)
{X(, i()+t), k() - i(), t  0, k  i } are identically distributed.
(ii)
{X(, i()+t), k() - i(), t  0, k  i } does not depend on {X(, t), j(), ti,
0ji}.
#
The sequence {i(), i  1} is a sequence of regenerative points. We consider two types
of performance measures
K t ,   i  
1t
Jˆ ( , t ) =  ( X ( ,  , u )) du and Jˆ ( , t ) =
t0

 ( X ( ,  , u )) du
i 1  i 1  
K t , 
 Ti  
i 1
9
(4.1)
where (x )  R, Ti() = i() - i-1(), i  1, is the length of the ith regeneration cycle, and
K(t,) = max {i: i()  t, i  0}
i
is the number of regeneration cycles on [0, t]. K (t , ) = K(t,) + 1 is the first passage time
process. For convenience, we define T0() = 0(). Performance measures of the form (4.1)
frequently appear in simulation. They are essentially identical except for the tail terms. However,
these tail terms typically converge at the rate of O(1/t), slower than the rate of ordinal
comparison as we shall show. Therefore, such tail terms have to be taken into account of
consideration.
For every   , we assume that (A4.1) holds and
(A4.2) The cycle time T1() has finite, continuous moment generating function E[e sT1   ] in a
neighborhood of s = 0, s  0;
(A4.3) The cycle time T1() is not degenerate in the sense of limt0 P{T1()  t} < 1.
(A4.4) The function (x )  R is bounded and | (x ) |  B, 0 < B < .
Assumptions (A4.2) and (A4.4) are technical and satisfied in many cases (Dai 1996).
Assumption (A4.3) is very mild. Note that (A4.2) and (A4.3) imply 0 < E[T1()] < .
Regenerative systems constitute a large class of systems. For regenerative systems, the rate of
convergence of ordinal comparison in exponential as stated in the following theorem.
Theorem 4.3 (Dai 1996). Suppose (A4.1)-(A4.4) holds. Then there exists a positive number  >
0 such that
P{I(t) = 1} = 1 - O(e-t), P{I(t) = 0} = O(e-t),
for both of the performance measures Jˆ ( 1 , t ) and Jˆ ( 2 , t ) , in other words, the indicator process
converges at an exponential rate.
#
Relaxation of assumptions (A4.1)-(A4.4) in terms of large deviation principles is
provided in Dai and Chen (1997). The constant  in the rate of convergence is given by the
following expression.

 =  inf lim log E e st min 
s[ 0 ,  ) t  
1
t
G
Jˆ  , t  min  H Jˆ  , t 

.
5. Computing Budget Allocation for Further Efficiency Improvement
While ordinal optimization could significantly reduce the computational cost for design
selection, there is potential to further improve its performance by intelligently determining the
number of simulation samples or replications among different designs. Intuitively, to ensure a
high alignment probability, a larger portion of the computing budget should be allocated to those
designs which are potentially good designs. In other words, a larger number of simulations must
be conducted with those critical designs in order to reduce estimator variance. On the other hand,
10
limited computational effort should be expanded on non-critical designs that have little effect on
identifying the good designs even if they have large variances. In doing so, less computational
effort is spent on simulating non-critical designs and more computational effort is spent on
simulating critical designs; hence, the overall simulation efficiency is improved. Ideally, we want
to optimally choose the number of simulation samples for all designs to maximize simulation
efficiency with a given computing budget. In this section, we present a technique called optimal
computing budget allocation (OCBA) which makes use of this idea.
Let t be the number of simulation samples of design . If simulation is performed on a
sequential computer and the difference of computation costs of simulating different designs is
negligible, the total computation cost can be approximated by   t . The goal is to choose t
for all  such that the total computation cost is minimized, subject to the restriction that the
confidence level defined by alignment probability is greater than some satisfactory level.
min   t
t

s.t. alignment probability  P*.
(5.1.a)
where P* is a user-defined confidence level requirement. Or one may intend to choose t such
that the alignment probability is maximized, subject to a limited computing budget T.
max [alignment probability]
t
s.t.   t = T.
(5.1.b)
Three major issues for solving (5.1) are the following:
 There is no closed-form expression for the alignment probability.
 The alignment probability at t,   , can be computed only after all t simulation
samples for all designs are performed.
 t,   , are integers and the number of combinations for different t is large even for
moderate size of .
To solve the budget allocation problem in (5.1), the alignment probability must be estimated
easily. An extra Monte Carlo simulation for estimating the alignment probability would be too
expensive under this setting. On the other hand, most of existing ranking and selection
approaches are only suitable for problems with a small number of designs (e.g., Goldsman and
Nelson (1994) suggest 2 to 20 designs). Using a Bayesian model, Chen (1996) gives an
estimation technique to quantify the confidence level for ordinal comparison when the number of
designs is large. In addition to the confidence probability, this approach also provides sensitivity
information for each algorithm, which is useful when solving the allocation problem in (5.1). It is
convenient to let (1), (2), … , (N) denote the indices (order statistics) such that
1
t( 1)
1
t( 2 )
1
t( N )
 L( (1) ,  i )  t  L( ( 2 ) , i )  …  t  L( ( N ) , i ) .
t ( 1) i1
( 2 ) i 1
( N ) i 1
11
(5.2)
Under this notation, design (1) has the smallest sample mean and is the observed best design. We
have the following estimator of P{CS} from Chen (1996).
~
Theorem 5.1. Let J  denote the random variable whose probability distribution is the posterior
~
distribution of the expected performance for design  under a Bayesian model. Assume J i and
~
J j , i j, are independent (i.e., the simulations for different designs are independent). For a
minimization problem,
P{|GS|k} 
N

j  G  S  k 1
~
~
P{J (1)  J ( j ) }
Approximate Probability of Correct Selection (APCS).
#
~
Under the Bayesian model, the posterior distribution p( J  ) consists of information from both
prior distribution and the simulation results {L(, i), i = 1, 2, ..., t}. Furthermore, with mild
normal assumption, if the variance  2 is known (Bernardo and Smith 1994),
1 t
~
2
J  ~ N(  L( ,  i ) ,  ).
t
t i1
~
If the variance is unknown,  2 can be replaced by the sample variance and J  becomes
t-distributed (Chick 1997). For ease of explanation, we will concentrate on the case that |S| = |G|
= k = 1 in the rest of this section. In which case,
 1 t( j )

1 t (1)

L
(

,

)

L
(

,

)
 ( j) i
 (1) i 

N
N
t i 1
t(1) i 1
~
~

APCS =  P{J (1)  J ( j ) } =   ( j )
2
j 2 
j 2

 ( j )  (21)



t
t
(
j
)
(
1
)


(5.3)
where  is the standard normal cumulative distribution. The APCS in (5.3) gives an estimate of
the probability that the selected observed best design is indeed the true best design. Note that the
computation of APCS is simply a product of pairwise comparison probabilities, which is easy to
compute.
Another critical issue in solving (5.1) is a cost-effective method to estimate the sensitivity
information, or more specifically, the estimation of the new P{CS} if  additional simulation
samples are performed on design s.
Definition 5.2. Define EPCS(t1, t2, .., t-1, t+, t+1, .., tN) as an estimated P{CS} if additional 
simulation samples are performed on design . EPCS is computed using the statistical
information after t1, t2, .., tN simulations are completed for designs 1, .., N, respectively.
~
1 t
2
~
Let J  ~ N(  L ( , i ) ,  ). If the observed sample means are ranked based on
t i1
t  
(5.2), Chen, et al. (1996) give a simple and effective way to estimate the EPCS :
12
~
~
~
If   (1), EPCS(t1, t2, .., t-1, t+, t+1, .., tN)  P{J (1)  J s } 
N
~
~
 P{J (1)  J ( j ) } , and
j  2,( j ) 
N
~
~
~
If  = (1), EPCS(t1, t2, .., t-1, t+, t+1, .., tN)   P{J (1)  J ( j ) } .
j 2
Design (1) is the observed best design which has the smallest sample mean. Note that the
expression of EPCS is similar to that of APCS in Eq. (5.3) and is easy to compute. EPCS
provides sensitivity information about how APCS will change if additional  simulation samples
are performed on design .
Since the estimation of the alignment probability and its sensitivity information is
feasible, we are able to deal with the budget allocation problem in (5.1). Our approach builds on
the idea that the computing budget should be allocated in a way that best improves the overall
confidence level. This approach for the problem in (5.1.a) is outlined as follows:
Step 0. Perform simulations and obtain n0 samples for all designs; l0; t 1l  t l2    t lN = t0.
Step 1. If APCS( t 1l , t 2l ,, t lN )  P*, stop.
Step 2. Increase the computing budget (i.e., number of additional simulations) by  and
allocate this budget across the candidate designs by solving the subproblem:
EPCS (t1l  l1 , t 2l  l2 , , t kl  lk ) , s.t. ik1 li =  and li  0 for all i.}
{ max
l
l
1 ,,  k
Step 3.Perform simulations and obtain additional li samples for design i, i = 1, ..., k.
t il 1 = t il + li , for i = 1, ..., k; l l + 1. Go to Step 1.
For the maximization problem in (5.1.b), the above algorithm still applies except the
stopping criterion in Step 1 must be substituted by "If   t  T, stop." A last issue remained
for solving the budget allocation problem in (5.1) is the development of an efficient and effective
procedure for solving the subproblem in Step 2. We have explored two major approaches: a)
solving the nonlinear program directly, and b) replacing the objective function with an
approximation and then solved it analytically. The two approaches are briefly described as
follows:
Nonlinear Programming Approach
Here we assume the variables are continuous and apply nonlinear programming techniques
(Luenberger 1984, Nash and Sofer 1996) to obtain an approximate solution to the subproblem in
Step 2. The gradient of APCS with respect to t can be estimated by

EPCS (t1 , t 2 ,  , t   , t 1 ,  , t N )  APCS (t1 , t 2 ,  , t , t 1 ,  , t N )
APCS =
t 

Chen et al. (1997) develop an effective OCBA technique by integrating a simple steepest-descent
method. Numerical testing show that the OCBA can significantly reduced simulation time as
13
compared with the well-known computing budget allocation techniques such as the two-stage
procedures given in Rinott (1978) and Dudewicz and Dalal (1975).
Objective Function Approximation Approach
The nonlinear programming approach needs an extra computation cost to iteratively search the
solution to the subproblem in Step 2. Another approach which we have exploited is to replace the
objective function with an approximation and then find an analytical solution to the
approximation problem. The advantage is that there is no need to iteratively apply a nonlinear
programming technique in Step 2 for an optimal solution. Chen et al .(1998c) approximate EPCS
using the Chernoff bounds. An asymptotically solution can be found and is summarized in the
following theorem.
Theorem 5.3. Given total number of simulation budget T to be allocated to N competing
designs, the Approximate Probability of Correct Selection (APCS) can be asymptotically
maximized when
(a)
t(1)
t( 2 )
  N
 (1)  
 ( 2)  j 1
  (21), ( 2 ) 
 2 


(1), ( j ) 
 j  (1) 

1/ 2
2
    (1), 
t
 for  = 1, ..., N and   (1)  (2),
(b)   
 

t ( 2)
 ( 2 ) (1),( 2 ) 
where i,j =
1 tj
1 ti
 L(i , s ) -  L( j , s ) , for any i, j  .
t i s 1
t j s1
#
The budget allocation problems for some special cases have already been studied by
other researchers. It is worthwhile to compare with our results in Theorem in 5.3 with some wellknown solutions. Remarks 5.4 and 5.5 show that our approach achieves the same results as those
earlier approaches under the required special conditions. Note that our approach is also
applicable to much more general problems.
Remark 5.4. In the case of N = 2, Theorem 5.3 yields
2
t1  1   1, 2 



t 2  2   12, 2 
1/ 2

1
.
2
#
This evaluated result is identical to the well-known optimal allocation solution for N = 2.
Remark 5.5. Assume that (a) (1), = , for  = 1, 2, , N, and   (1),
(b)  12   22     N2 .
#
In this special case, Theorem 5.3 obtains t(1) = tg(N-1)1/2 for  = 1, 2, , N, and   (1), which is
the same as the results in Bechhofer (1969) and Dunnett (1955).
14
6. Applications and Extensions of Ordinal optimization
As shown in section 4, the convergence rate for ordinal comparison can be exponential for many
stochastic problems. We can have good estimates on the ordinal comparison while the value of
an estimate is still very poor. If our goal is to find the good designs rather than to find an accurate estimate of the best performance value, which is true in many practical situations, it is advantageous to use ordinal comparison for selecting a good design. This idea has been applied on
a 10-node network (Patsis et al. 1997), where it is shown that a good design can be identified
with high probability with relatively short simulations. From a computational perspective, they
demonstrate many orders of magnitude of speedup. The idea has been successfully applied to
several other problems. For example, a heuristic algorithm selection problem is formatted as a
stochastic optimization problem in Chen et al. (1998b) and only a much smaller amount of
simulation time is needed to identify a good heuristic algorithm. Chen et al. (1998a) show that
ordinal optimization can be applied to the robot motion planning problem for identifying a good
enough motion plan with a very short computation time. In addition to the direct use of ordinal
optimization, the ordinal comparison idea can also be integrated into some procedures and
achieve a much faster convergence rate. For example, the Stochastic Comparison Algorithm integrates ordinal optimization into a recursive selection procedure and obtains a fast convergence
rate (Gong et al 1999). In this paper, details of an extension of ordinal optimization to a class of
resource application problem and its application to network control problem is given below.
As shown in section 4, the convergence rate for ordinal comparison can be exponential
for many stochastic problems. We can have good estimate on the ordinal comparison while the
value estimate is still very poor. If our goal is to find the good designs rather than to find an
accurate estimate of the best performance value, which is true in many practical situations, it is
advantageous to use ordinal comparison for selecting the good designs. This idea has been
applied on a 10-node network (Patsis et al. 1997), where it is shown that a good design can be
identified with high probability with relatively short simulations. From a computational
perspective, they demonstrate many orders of magnitude of speedup. The idea has been
successfully applied to several other problems. For example, a heuristic algorithm selection
problem is formatted as a stochastic optimization problem in Chen et al. (1998b) and only a
much smaller amount of simulation time is needed to identify a good heuristic algorithm. Chen et
al. (1998a) show that ordinal optimization can be applied to the robot motion planning problem
for identifying a good enough motion plan with a very short computation time. Ho and Larson
(1995) showed how to apply OO to rare event simulations. In Deng and Ho (1999), ordinal
optimization was used to search for the global optimum in a well known open non-convex
optimal control problem of 30 year standing. In addition to the direct use of ordinal optimization,
the ordinal comparison idea can also be integrated into some procedures and achieve a much
faster convergence rate. For example, Stochastic Comparison Algorithm integrates ordinal
optimization into a recursive selection procedure and obtain a fast convergence rate (Gong et al
1999). Ordinal optimization can also be combined with massively parallel computer to achieve
speed up in search for optimum (Ho 1994). Additional applications and an interactive online
demo can be found at the website: hrl.harvard.edu/~ho/DEDS. In this paper, details of an
15
extension of ordinal optimization to a class of resource application problem and its application to
network control problem is given below.
In the resource allocation problems studied in Cassandras et al. (1998), there are r
identical resources to be allocated over N user classes so as to optimize some system
performance measure (objective function). Let s = [n1,…,nN] denote an allocation and S =
{[n1,…,nN] :  iN1 ni = r, ni {1, …, r}} be the finite set of feasible allocations, where ni is the
number of resources allocated to user class i. Let Ji(s) be the class i cost associated with the
allocation vector s. Assuming that Ji(s) depends only on the number of resources assigned to
class i, we can write Ji(s) = Ji(ni). Then, the basic resource allocation problem is
N
(RA) min
sS
  J n 
i
i 1
i
i
where, without loss of generality for the purpose of our analysis, we may set i = 1 for all i.
In a stochastic setting, the cost function Ji(s) is usually an expectation whose exact value
is generally difficult to obtain and one resorts to estimates of Ji(s) which may be obtained
~
through simulation or through direct on-line observation of a system. We denote by Lit s  an
estimate of Ji(s) based on observing a sample path for a time period of length t. Such stochastic
discrete resource allocation problems are hard due to the combinatorial explosion of the feasible
allocation search space, as well as the absence of closed-form expressions for the cost function of
interest.
The stochastic resource allocation algorithm considered here is represented as a Markov
~
process {( ~sk ,Ck )} where the vector ~sk  [n~1,k , n~2,k ,..., n~N ,k ] is the allocation after the kth step,
~
Ck is a subset of user indices after the kth step updated as shown below, and the " ~. " notation is
~
used to indicate that all quantities involved are based on estimates Lit s  of Ji(s), obtained from a
sample path of length t; in particular, the length of the sample path at the kth iteration of the
~
process is denoted by f(k) and the corresponding cost estimates by Lif k   . With C0 = {1, …, N},
after proper initialization, we have:
~
n~i , k  1 if i  ~
ik * and  k

~
~
n~i , k 1  n~i , k  1 if i  jk* and  k
n~
otherwise
 i,k
where
~i , ~j   0
~i , ~j   0
*
k
*
k
*
k
*
k
(6.1)
~ f k  ~
~*
ni,k  ,
ik  arg max
~ Li


(6.2)


(6.3)
iCk
~ f k  ~
~*
ni,k  ,
jk  arg min
~ Li
iCk
 


~ ~ ~
~
~
 k ik * , jk*   L~if k  n~~i ,k  L~jf k  n~~j ,k  1
*
k
*
k
*
k
*
k
16
(6.4)
 


~
C~k  ~j k* if ~k ~
ik * , j k*  0

~
~
(6.5)
C k 1  C 0
if C k  1
~
otherwise
C k
~
~
~
where Lif k  ni   Lif k  ni   Lit ni  1 . It is clear that equations (6.1) - (6.5) define a Markov
~
~
~
process ~sk ,Ck , whose state transition probability matrix is determined by ik * , j k* , and
~ ~ ~
~
~
 k ( ik * , j k* ). Briefly, ik * , and j k* estimate the users ``most sensitive'' and "least sensitive" to the
~
removal of a resource among those users in the set Ck . Then, (6.1) forces the exchange of a


resource from the most to the least sensitive user at the kth step of this process, provided the
~ ~ ~
quantity  k ( ik * , j k* ) is strictly positive; otherwise, the allocation is unaffected, but the user with
~
~
index jk * is removed from the set Ck through (6.5). When this set is reduced to a single user, it
~ ~ ~
is reset to C 0 and the procedure repeats. The quantity  k ( ik * , j k* ) is the estimated "potential cost
reduction" when an actual change in allocation takes place.
We will make the following three assumptions:
(A6.1) For all i = 1, …, N, Ji(ni) is such that Ji(ni+1) > Ji(ni), where
Ji(ni) = Ji(ni) - Ji(ni-1), ni = 1, .., r, with Ji(0)  - and Ji(N+1)  .
~
(A6.2) For every i, and every ni, the estimate Lit ni  is ergodic as the sample path length
~
increases in the sense that lim Lit ni . = Ji(ni), a.s.
t 
~ ~ ~
(A6.3) Let  k i, j   J i n~i ,k   L j n~ j ,k  1 . For every  k ( ik * , j k* ) = 0 , there is a constant p0
~ ~ ~
~
~ ~
such that P{ k ik * , jk*  0  k ik * , jk*  0, ~sk , Ck  s, C }  p0  0 , for any k and any pair
~
~s , C
k
k .








Note that (A6.3) is a technical condition guaranteeing that an estimate does not always
give one-sided-biased incorrect information. Also note that the results below do not require the
technical condition that the optimal allocation be unique. If several allocations exhibit optimal
performance, the proposed scheme will converge to a set of allocations and will oscillate among
the members of the set. Under (A6.1)-(A6.3), it was proven that ~sk converges in probability to
the optimal solution as long as s* as long as f(k)  , as k  . Cassandras et al. (1998) give
detailed discussion and interpretation of the above process, and the proof of the following two
theorems.
Theorem 6.1. Suppose that Assumptions (A6.1)-(A6.3) hold and that the optimum s* is unique.
Then the process described by equations (6.1) - (6.5) converges in probability to the optimal
allocation s*.
#
17
Theorem 6.2. Suppose (A6.1)-(A6.3) and the assumptions of Lower bound of the convergence
rate of comparison from other parets hold. If f(k)  k1+c for some constant c > 0, then the process
described by equations (6.1) - (6.5) converges almost surely to the global optimal allocation. #
Theorems 6.1 and 6.2 establish the convergence of the algorithm. For regenerative
systems, stronger results regarding the rate of convergence have been obtained as stated in the
following theorem.
Theorem 6.3 (Dai et al. 1998). Suppose that Assumptions (A6.1)-(A6.3) hold and for every pair
~
~
i, j, i  j, Lit ni   Ltj n j  is a regenerative process with regenerative cycle times satisfying
(A4.1)-(A4.4). Then, there exists a c > 0 large enough such that if f(k)  cln(k) for sufficiently
large k, the process described by equations (6.1) - (6.5) converges almost surely to the global
optimal allocation. Furthermore,
P{ ~s = s*} = 1 - O(ln(k) e-f(k))
k
for some  > 0.
#
Note that ordinal comparison is used in (6.1). Since ordinal comparison allows one to
make a correct decision based on short simulations, the algorithm can quickly get to the
neighborhood of the optimal solution. The fast convergence has been confirmed experimentally
as shown in Cassandras et al. (1998). Compared with conventional stochastic approximation
methods, the algorithm also eliminating the difficult task of selecting appropriate "step-size".
A numerical example
n1
p1

1
n2
p2
2
nN
pN
N
1
2
N
Figure 6.1. Queueing system with N parallel servers.
As an application, we consider a buffer allocation problem for a queueing system as
shown in Figure 6.1, where each server represents a user and each buffer slot represents a
resource that is to be allocated to a user. Jobs arrive at this system at a rate λ and are routed to
one of the N users with some probability pi, i = 1, …, N. Each user is processing jobs at a rate μi,
i = 1, …, N, and if a job is routed to a user with a full queue, the job is lost. Let Ji(ni) be the
18
individual job loss probability of the ith server (ni is the number of buffers allocated to the ith
server). Our goal is to allocate all K available buffer slots to the users in order to minimize the
objective function
N
 J i ni 
i 1
N
s.t.
n
i
K
i 1
It can be easily verified that assumptions A1-A3 are satisfied for this example
(Cassandras et al. 1998). For this problem, one can directly apply the algorithm corresponding to
the process (6.1) -(6.5). Standard concurrent estimation methods (e.g., see Cassandras and
~
Panayiotou 1996) may be used to obtain all Lif k  n~i ,k  required by the algorithm from
observation of a sample path under a given allocation n~ , n~ ,..., n~ , . For simplicity, in our
1,k
2 ,k
N ,k
simulation of this system we have assumed that the arrival process is Poisson with rate λ = 0.5,
and all service times are exponential with rates i = 1.0, i = 1, …, 6. Furthermore, all routing
probabilities were chosen to be equal, i.e., pi = 1/6, i = 1, …, 6. Figure 6.2 shows typical traces of
the evolution of the algorithm for this system when N = 6 users and K = 24 buffer slots
(resources). In this problem, it is obvious that the optimal allocation is [4,4,4,4,4,4] due to
symmetry. This was chosen only to facilitate the determination of the optimal allocation; the
speed of convergence, however, is not dependent on the assumptions made in this example.
The two traces of Figure 6.2 (on page 22) correspond to two different initial allocations.
In this case, the simulation length is increased linearly in steps of 3,000 events per iteration and
one can easily see that even if we start at one of the worst possible allocations (e.g. [19, 1, 1, 1, 1,
1,]), the cost quickly converges to the neighborhood of the optimal point.
7. Conclusions
The development of OO is still in its infancy. By redefining what is desired in a simulation (e.g.,
order rather than value estimates, good enough rather than optimal choices) we present a general
method for easing the computational burden associated with complex stochastic optimization
problems via simulation. The methodology is easy to understand and as a rule does not require
any additional computation for its use. Similar to the role of order statistics in sampling, it is
totally complementary to any other tools one may be using.
References
1. Bechhofer, R. E. "Optimal Allocation Of Observations When Comparing Several Treatments
With A Control," Multivariate Analysis II, P. R. krishnaiah, ed., Academic Press, Inc. 463473, 1969.
2. Bechhofer, R. E., T. J. Santner, and D. M. Goldsman. Design and Analysis of Experiments
for Statistical Selection, Screening, and Multiple Comparisons, John Wiley & Sons, Inc.,
1995.
3. Bernardo, J. M., and A. F. M. Smith. Bayesian Theory. Wiley, 1994.
19
4. Cassandras, C.G., L. Dai, and C. G. Panayiotou, "Ordinal Optimization for Deterministic and
Stochastic Discrete Resource Allocation," IEEE Trans. on Automatic Control, Vol. 43, No. 7,
881-900, 1998.
5. Cassandras, C. G., and C. G. Panayiotou, "Concurrent Sample Path Analysis of Discrete
Event Systems", Proc. of 35th IEEE Conf. on Decision and Control, 3332-3337, 1996.
6. Chen, C. H. "A Lower Bound for the Correct Subset-Selection Probability and Its
Application to Discrete Event System Simulations," IEEE Transactions on Automatic
Control, Vol. 41, No. 8, 1227-1231, August, 1996.
7. Chen, H. C., C. H. Chen, L. Dai, and E. Yucesan, "New Development of Optimal Computing
Budget Allocation For Discrete Event Simulation," Proceedings of the 1997 Winter
Simulation Conference, 334-341, December, 1997.
8. Chen, C. H., H. C. Chen, and L. Dai, "A Gradient Approach of Smartly Allocating
Computing Budget for Discrete Event Simulation," Proceedings of the 1996 Winter
Simulation Conference, 398-405, December, 1996.
9. Chen, C. H., V. Kumar, and Y. C. Luo, "Motion Planning of Walking Robots Using Ordinal
Optimization," IEEE Robotics and Automation Magazine, 22-32, June, 1998a.
10. Chen, C. H., S. D. Wu, and L. Dai, "Ordinal Comparison of Heuristic Algorithms Using
Stochastic Optimization," to appear in IEEE Trans. on Robotics and Automation, 1998b.
11. Chen, H. C., C. H., Chen, and E. Yucesan, "Optimal Computing Budget Allocation in
Selecting The Best System Design via Discrete Event Simulation," submitted to IEEE
Transactions on Control Systems Technology, 1998c.
12. Chick, S. E. "Bayesian Analysis for Simulation Input and Output," Proceedings of the 1997
Winter Simulation Conference, 253-260, 1997.
13. Dai, L., "Convergence Properties of Ordinal Comparison in the Simulation of Discrete Event
Dynamic Systems," J. of Opt. Theory and Applications, Vol. 91, No.2, 363-388, 1996.
14. Dai, L., C. G. Cassandras, and C. Panayiotou, "On The Convergence Rate of Ordinal
Optimization for A Class of Stochastic Discrete Resource Allocation Problems,'' to appear in
IEEE Transactions on Automatic Control, 1998.
15. Dai, L. and C. H. Chen, "Rate of Convergence for Ordinal Comparison of Dependent
Simulations in Discrete Event Dynamic Systems," Journal of Optimization Theory and
Applications, Vol. 94, No. 1, 29-54, July, 1997.
16. Dudewicz, E. J. and S. R. Dalal, Allocation of Observations in Ranking and Selection with
Unequal Variances. Sankhya, B37:28-78, 1975.
17. Dunnett, C. W. "A Multiple Comparison Procedure For Comparing Several Treatments With
A Control," Journal of American Statistics Association. 50, 1096-1121, 1955.
18. Goldsman, G., and B. L. Nelson. "Ranking, Selection, and Multiple Comparison in Computer
Simulation," Proceedings of the 1994 Winter Simulation Conference, 192-199, 1994.
19. Goldsman, G., B. L. Nelson, and B. Schmeiser. "Methods for Selecting the Best System,"
Proceedings of the 1991 Winter Simulation Conference, 177-186, 1991.
20
20. Gong, W. B., Y. C. Ho and W. Zhai, "Stochastic Comparison Algorithm for Discrete
Optimization with Estimation," to appear in SIAM Journal on Optimization, April 1999.
21. Gupta, S. S. and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodolgy
of Selecting and Ranking Populations, John Wiley, New York 1979.
22. Ho, Y. C., R. S. Sreenivas, and P. Vakili, "Ordinal Optimization of DEDS," Journal of
Discrete Event Dynamic Systems, Vol. 2, No. 2, 61-88, 1992.
23. Kalashnikov, V. V. Topics on Regenerative Processes, CRC Press, 1994.
24. Lau, T.W.E. and Y. C. Ho, "Universal Alignment Probabilities and Subset Selection for
Ordinal Optimization" JOTA , Vol. 39, No. 3, 455-490, June 1997.
25. Lee, L. H, T.W.E. Lau, Y. C. Ho, "Explanation of Goal Softening in Ordinal Optimization"
IEEE Transactions on Automatic Control, July 1998.
26. Luenberger, D. G., Linear and Nonlinear Programming. Addison-Wesley, 1984.
27. Nash, S. G., and A. Sofer, Linear and Nonlinear Programming, McGraw-Hill, Inc., 1996.
28. Patsis, N. T., C. H. Chen, and M. E. Larson, "SIMD Parallel Discrete Event Dynamic System
Simulation," IEEE Trans. on Control Systems Technology, Vol. 5, No. 3, 30-41, Jan. 1997.
29. Rinott, Y., On Two-stage Selection Procedures and Related Probability Inequalities.
Communications in Statistics A7, 799-811, 1978.
30. Xie, X. L., "Dynamics and Convergence Rate of Ordinal Comparison of Stochastic Discrete
Event Systems," IEEE Transactions on Automatic Control, 1998.
21
Figure 6.2. Typical trace for different initial allocations.
22
Download