On Adaptive Multicut Aggregation for Two-Stage Stochastic Linear Programs with Recourse

advertisement
On Adaptive Multicut Aggregation for Two-Stage
Stochastic Linear Programs with Recourse
Svyatoslav Trukhanov and Lewis Ntaimo1
Department of Industrial and Systems Engineering, Texas A&M University, 3131 TAMU, College
Station, TX 77843, USA, slavik@tamu.edu and ntaimo@tamu.edu
Andrew Schaefer
Department of Industrial Engineering, 1048 Benedum Hall, University of Pittsburgh, Pittsburgh, PA
15261, schaefer@ie.pitt.edu
Abstract
Outer linearization methods for two-stage stochastic linear programs with recourse, such as the L-shaped algorithm, generally apply a single optimality cut
on the nonlinear objective at each major iteration, while the multicut version of
the algorithm allows for several cuts to be placed at once. In general, the Lshaped algorithm tends to have more major iterations than the multicut algorithm.
However, the tradeoffs in terms of computational time is problem dependent. This
paper investigates the computational trade-offs of adjusting the level of optimality
cut aggregation from single cut to pure multicut. Specifically, an adaptive multicut
algorithm that dynamically adjusts the aggregation level of the optimality cuts in
the master program, is presented and tested on standard large scale instances from
the literature. The computational results reveal that a cut aggregation level that
is between the single cut and the multicut results in computational times of up to
over 50% reduction compared to the single cut method.
Keywords: stochastic programming, multicut aggregation, adaptive cuts
1
Introduction
Outer linearization methods such as the L-shaped algorithm (Van Slyke and Wets,
1969) for stochastic linear programming (SLP) problems with recourse generally apply
a single cut on the nonlinear objective at each major iteration, while the multicut
version of the algorithm (Birge, 1988) allows for cuts up to the number of outcomes
(scenarios) to be placed at once. In general, the L-shaped algorithm tends to have
more major iterations than the multicut algorithm. However, the trade-offs in terms
of computational time may be problem dependent. This paper generalizes these two
approaches through an adaptive multicut algorithm that dynamically adjusts the level
of aggregation of the optimality cuts in the master program during the course of the
algorithm. Results of our computational investigation suggest that for certain classes of
problems the adaptive multicut provides better performance than the traditional single
cut and multicut methods.
The relative advantages of the single cut and multicut methods have already been
explored (Birge, 1988, Gassmann, 1990). Citing earlier work (Birge, 1988, Gassmann,
1
Corresponding Author
1
1990), Birge and Louveaux (1997) gave the rule of thumb that the multicut method is
preferable when the number of scenarios is not much larger than the size of the first
stage decision dimension space. In earlier work, Birge (1985) used variable aggregation
to reduce the problem size and to find upper and lower bounds on the optimal solution.
Ruszczynski (1988) considered adding a regularized term to the objective function to
stabilize the master problem and reported computational results showing that the method
runs efficiently despite solving a quadratic problem instead of a linear one. The number
of iterations that required extensive computations decreased twice for the instances
considered.
More recent work related to modified versions of the L-shaped method examined serial
and asynchronous versions of the L-shaped method and a trust-region method (Linderoth
and Wright, 2003). Algorithms were implemented on a grid computing platform that
allowed authors to solve large scale problem instances from the literature. These include
the SSN (Sen et al., 1994) with 105 scenarios and STORM (Mulvey and Ruszczynski,
1992) with 107 scenarios. The authors reported CPU time improvements of up to 83.5%
for instances of SSN with 104 scenarios.
The main contributions of this paper are twofold: 1) an adaptive optimality multicut
algorithm for SLP, and 2) computational investigation with standard instances from the
literature demonstrating that changing the level of optimality cut aggregation in the
master problem during the course of the algorithm can lead to significant improvement
in solution time. The results of this work also have potential to influence the design and
implementation of future algorithms, especially those based on Benders decomposition
(Benders, 1962). For example, decomposition methods based on the the sample average
approximation method (Shapiro, 2000), the trust-region method (Linderoth and Wright,
2003) and the stochastic decomposition method (Higle and Sen, 1991), may benefit from
the adaptive multicut approach.
The rest of this paper is organized as follows. The next section gives the problem
statement and a motivation for the adaptive multicut approach. A formal description of
the method is given in Section 3. Computational results are reported in Section 4. The
paper ends with some concluding remarks in Section 5.
2
An Adaptive Multicut Approach
A two-stage stochastic linear program with fixed recourse in the extensive form can be
given as follows:
X
Min c> x +
ps qs> ys
(1a)
s∈S
s.t. Ax = b
Ts x + W ys ≥ rs , ∀s ∈ S
x ≥ 0, ys ≥ 0, ∀s ∈ S
(1b)
(1c)
(1d)
where x ∈ Rn1 is the vector of first stage decision variables with the corresponding cost
vector c ∈ Rn1 ; A ∈ Rm1 ×n1 is the first stage constraint matrix; and b ∈ Rm1 is the
2
first stage righthand-side vector. Additionally, S is the set of scenarios (outcomes), and
for each scenario s ∈ S, ys ∈ Rn2 is the vector of second stage decision variables; ps is
the probability of occurrence for scenario s; qs ∈ Rn2 is the second stage cost vector;
rs ∈ Rm2 is the second stage righthand-side vector; Ts ∈ Rm2 ×n1 is the technology matrix;
and W ∈ Rm2 ×n2 is the recourse matrix.
To set the ground for the adaptive multicut method, we first make some observations
from a comparison of the single cut L-shaped method and the multicut method in
the context of problem (1). In general, the single cut L-shaped method tend to have
‘information loss’ due to the aggregation of all the scenario dual information into one
optimality cut in the master problem at each major iteration of the method. On the other
hand, the multicut method uses all scenario dual information and places an optimality
cut for each scenario in the master problem. Thus the L-shaped method generally
requires more major iterations than the multicut method. However, on iteration index
k the size of the master problem in the multicut method grows much faster and is
(m1 + k|S|) × (n1 + |S|) in size in the worst case, as compared to (m1 + k) × (n1 + 1) in
the L-shaped method. Consequently, the multicut method is expected to have increased
solution time when |S| is very large.
Our goal is to devise an algorithm that performs better than the single cut L-shaped
and the multicut methods. The insights to our approach are to 1) use more information
from subproblems, which assumes adding more than one cut at each major iteration;
and 2) keep the size of master problem relatively small, which requires to limit the
number of cuts added to master problem. These insights lead us to consider a multicut
method with ‘partial’ cut aggregation. This means that such an algorithm requires
partitioning the sample space S into D aggregates {Sd }D
d=1 such that S = S1 ∪ S2 ∪ · · · SD
and Si ∩ Sj = ∅ ∀i 6= j. Then at the initialization step of the algorithm, optimality
cut variables θd are introduced, one for each aggregate. The optimality P
cuts are also
generated one per aggregate in the form (βdk )> x + θd ≥ αdk , where (βdk )> =
ps (πsk )> Ts
s∈Sd
P
and αdk =
ps (πsk )> rs .
s∈Sd
The multicut and the single cut L-shaped methods have two extremes of optimality cut
aggregation. The multicut L-shaped method has no cut aggregation, which corresponds
to the case when D = |S| and Sd = {sd }, d = 1, · · · , |S|. The expected recourse function
value is approximated by having an optimality decision variable for each scenario in the
master program. On the contrary, the single cut L-shaped method has the highest level
of cut aggregation, which means D = 1 and S1 = S. The expected recourse function
value is approximated by only one optimality decision variable in the master program.
According to our numerical results, the optimal runtime is achieved on some middle level
of aggregation 1 < D < |S| but this level is not known a priori.
Our approach allows for the optimality cut aggregation to be done dynamically during
the course of the algorithm. Thus the master program will have ‘adaptive’ optimality
variables, that is, the number of optimality variables will change during the course
of the algorithm. The goal is to let the algorithm learn more information about the
expected recourse function and then settle for a level of aggregation that leads to faster
3
convergence to the optimal solution. Therefore, we propose initializing the algorithm
with a ‘low’ level of cut aggregation and increasing it, if necessary, as more information
about the expected recourse function is learned during the course of the algorithm.
As the level of cut aggregation increases, the number of optimality variables in the
master problem decreases. If |S| is not ‘very large’ one could initialize the algorithm
as pure multicut, where an optimality variable θs is created for each s ∈ S in the master
program. Otherwise, an initial level of cut aggregation between 1 and |S| should be
chosen. Computer speed and memory will certainly influence the level of cut aggregation
since a low level of cut aggregation would generally require more computer memory.
3
The Basic Adaptive Multicut Algorithm
To formalize our approach, at the k-th iteration let the scenario set partition be S(k) =
{S1 , S2 , · · · , Slk }, where the sets Si , i = 1, · · · , lk are disjoint, that is, Si ∩ Sj = ∅ ∀i 6= j
k
and ∪li=1
Si = S. Therefore, each scenario s ∈ S belongs to only one Si ∈ S(k) and the
lk
P
P
P
aggregate probability of Si is defined as pSi = s∈Si ps with
pSi =
ps = 1. Let
i=1
s∈S
F(k) denote the set of iteration numbers up to k where all subproblems are feasible and
optimality cuts are generated. Then the master program at major iteration k takes the
form:
X
θd ,
(2a)
Min c> x +
d∈S(k)
s.t. Ax = b
t >
(2b)
t
(β ) x ≥ α , t ∈ {1, · · · , k} \ F (k)
(2c)
(βdt )> x
(2d)
(2e)
+ θd ≥
x≥0
αdt ,
t ∈ F(k), d ∈ S(k)
where constraints (2c) and (2d) are the feasibility and optimality cuts, respectively. We
are now in a position to formally state the adaptive multicut algorithm.
Basic Adaptive Multicut Algorithm
Step 0: Initialization.
Set k ← 0, F(0) ← ∅, and initialize S(0).
Step 1: Solve the Master Problem.
Set k ← k + 1 and solve problem 2. Let (xk , {θdk }d∈S(k) ) be an optimal solution to
problem (2). If no constraint (2d) is present for some d ∈ S(k), θdk is set equal to
−∞ and is ignored in the computation.
4
Step 2: Update Cut Aggregation Level.
Step 2a Cut Aggregation.
Generate aggregate S(k) using S(k − 1) based on some aggregation rules. Each
element of S(k) is a union of some elements from S(k−1). If according to aggregation
j
S
rules d1 , d2 , · · · , dj ∈ S(k − 1) are aggregated into d ∈ S(k), then d =
di and
pd =
j
P
i=1
i=1
pdi . Master problem (2) will be modified by removing variables θd1 , · · · , θdj
and introducing the new one θd .
Step 2b Update Optimality Cuts.
Update the optimality cut coefficients in the master program (2) as follows. For
each iteration that optimality cuts were added define
αdt
j
X
(1/pdl )αdt l ∀d ∈ S(k), t ∈ F(k)
(3)
j
X
= pd
(1/pdl )βdt l ∀d ∈ S(k), t ∈ F (k).
(4)
= pd
l=1
and
βdt
l=1
For all iterations t ∈ F (k) replace cuts corresponding to d1 , · · · , dj with one new
cut (βdt )> x + θd ≥ αdt corresponding to d.
Step 3: Solve Scenario Subproblems.
For all s ∈ S solve:
Min qs> y
(5a)
k
s.t. W y ≥ rs − Ts x ,
y ≥ 0.
(5b)
(5c)
Let πsk be the dual multipliers associated with an optimal solution of problem (5).
For each d ∈ S(k) if
X
θdk < pd
πsk (rs − Ts xk )
(6)
s∈d
define
αdk =
X
ps (πsk )> rs
(7)
s∈Sd
and
(βdk )> =
X
ps (πsk )> Ts
(8)
s∈Sd
If condition (6) does not hold for any d ∈ S(k), stop xk is optimal. Otherwise put
F(k) = F(k − 1) ∪ {k} and go to Step 1.
5
If problem (5) is infeasible for some s ∈ S, let µk be the associated dual extreme
ray and define
α k = µ>
(9)
k rs ,
and
(β k )> = µ>
k Ts
(10)
Go to Step 1.
Note that in the last step of the algorithm, if some scenario subproblem is infeasible, another implementation would be to find all infeasible subproblems and generate
feasibility cut from each one to add to the master program. Since convergence of the
above algorithm follows directly from that of the multicut method, a formal proof of
convergence is omitted. An issue that remains to be addressed is the aggregation rule
that one can use to obtain S(k) from S(k−1). Here we simply provide a basic aggregation
rule that we implemented in our computational study.
Aggregation Rule:
• Redundancy Threshold δ. This rule is based on the observation that inactive
(redundant) optimality cuts in the master program contain ‘little’ information
about the optimal solution and can therefore be aggregated without information
loss. Consider some iteration k after solving the master problem for some aggregate
d ∈ S(k). Let fd be the number of iterations when optimality cuts corresponding
to d are redundant, that is, the corresponding slack variables are nonzero. Also let
0 < δ < 1 be a given threshold. Then all d such that fd /|F(k)| > δ are combined to
form one aggregate. This means that the optimality cuts obtained from aggregate
d are often redundant at least δ of the time. This criterion works fine if the number
of iterations is large. Otherwise, it is necessary to impose a ‘warm up’ period during
which no aggregation is made so as to let the algorithm ‘learn’ information about
the expected recourse function.
• Bound on the Number of Aggregates. As a supplement to the redundancy threshold,
a bound on the minimum number of aggregates |S(k)| can be imposed. This
prevents aggregating all scenarios prematurely, leading to the single cut L-shaped
method. Similarly, a bound on the maximum number of the aggregates can also be
imposed to curtail the proliferation of huge numbers of cuts in the master program.
These bounds have to be set a priori and kept constant throughout the algorithm
run.
Another viable aggregation rule is to aggregate optimality cuts with ‘similar’ gradients or orientation. We implemented this rule but did not observe significant gains in
computation time due to the additional computational burden required for checking the
cut gradients. Also, we considered deletion of ‘old’ cuts from the master program, that
is, cuts that were added many iterations earlier and remained redundant for some time.
6
This seemed to reduce memory requirements but did not speed up the master program
solves. Another consideration is partitioning an aggregate into two or more aggregates
or cut ‘disaggregation’. The major drawback to doing this is that all cuts should be kept
in memory (or at least written to a file) in order to partition any aggregate. Thus this
may be a computer memory intensive activity requiring careful book-keeping of old cuts
for each scenario subproblem.
4
Computational Results
We implemented the adaptive multicut algorithm and conducted several computational
experiments to gain insight into the performance of the algorithm. We tested the
algorithm on several large scale instances from the literature. The three well-known
SLP instances with fixed recourse, 20TERM, SSN, and truncated version of STORM
were used for the study. 20TERM models a motor freight carrier’s operations Mak et al.
(1999), while SSN is a telephone switching network expansion planning problem Sen et al.
(1994). STORM is a two period freight scheduling problem Mulvey and Ruszczynski
(1992). The problem characteristics of these instances are given in Table 1.
Problem
20TERM
SSN
STORM
Table 1: Problem Characteristics.
First Stage Second Stage
No. of
Rows Cols Rows Cols Random Variables
3
63
124
764
40
1
89
175
706
86
59
128
526
1259
8
No. of
Scenarios
≈ 1.09 × 1012
≈ 1.02 × 1070
≈ 3.91 × 105
Due to extremely large number of scenarios, solving these instances with all the
scenarios is not practical and a sampling approach was used. Only random subsets of
scenarios of sizes 1000, 2000 and 3000 were considered. In addition, sample sizes of 5000
and 10000 were used for STORM since it had the lowest improvement in computation
time. Five replications were made for each sample size with the samples independently
drawn. Both the static and dynamic cut aggregation methods were applied to the
instances and the minimum, maximum and average CPU time, and the number of major
iterations for different levels of cut aggregation recorded. All the experiments were
conducted on an Intel Pentium 4 3.2 GHz processor with 512MB of memory running
ILOG CPLEX 9.0 LP solver ILOG (2003). The algorithms were implemented in C++
using Microsoft .NET Visual Studio.
We designed three computational experiments to investigate 1) adaptive cut aggregation with limit on the number of aggregates, 2) adaptive cut aggregation based on the
number of redundant optimality cuts in the master program, and 3) static aggregation
where the level of cut aggregation is given. In the static approach, the cut aggregation
level was specified a priori and kept constant throughout the algorithm. This experiment
was conducted for different cut aggregation levels to study the effect on computational
time and provide a comparison for the adaptive multicut method.
7
For each experiment we report the computational results as plots and put all the
numerical results in tables in the Appendix. In the tables the column ‘Sample Size’
indicates the total number scenarios; ‘Num Aggs’ gives the number of aggregates; ‘CPU
Time’ is the computation time in seconds; ‘Min Iters’, ‘Aver Iters’ and ‘Max Iters’ are the
minimum, average and maximum major iterations (i.e. the number of times the master
problem is solved); ‘% CPU Reduction’ is the percentage reduction in CPU time over
the single cut L-shaped method.
4.1
Experiment 1: Adaptive Multicut Aggregation
This experiment aimed at studying the performance of the adaptive multicut approach
where the level of aggregation is dynamically varied during computation. As pointed out
earlier, with adaptive aggregation one has to set a limit on the size of the aggregate to
avoid all cuts being aggregated into one aggregate prematurely. In our implementation
we used a parameter agg-max, which specified the maximum number of atomic cuts
that could be aggregated into one to prevent premature total aggregation (single cut
L-shaped). We initiated the aggregation level at 1000 aggregates (this is pure multicut
for instances with sample size 1000). We arbitrarily varied agg-max from 10 to 100 in
increments of 10, and up to 1000 in increments of 100. For instances with 3000 scenarios,
values of agg-max greater than 300 resulted in insufficient computer memory and are thus
not shown in the plots. We arbitrarily fixed the redundancy threshold at δ = 0.9 with a
“warm up” period of 10 iterations for all the instances.
The computational results showing how CPU time depends on aggregate size are
summarized in the plots in Figures 1, 2, and 3. The results for 20TERM and STORM
instances show that the number of major iterations increases with agg-max. However,
the best CPU time is achieved at some level of agg-max between 10 and 100. The
best CPU time is better than that of the single cut and the pure multicut methods.
In this case there is over 60% reduction in CPU time over the single cut L-shaped
method for 20TERM and over 90% for SSN. The results for STORM, however, show
no improvement in computation time over the single method. This can be attributed
to ‘flat’ expected recourse objective function for STORM, implying that no significant
information is learned by using a multicut approach. In this case the single cut L-shaped
method works better than both the adaptive multicut and the pure multicut methods.
4.2
Experiment 2: Adaptive multicut aggregation with Varying
Threshold δ
In this experiment we varied the redundancy threshold δ in addition to varying aggmax. The parameter δ allows for aggregating cuts in the master program whose ratio of
the number of iterations when the cut is not tight (redundant) to the total number of
iterations since the cut was generated exceeds δ. The aim of this experiment was to study
the performance of the method relative to the redundancy threshold δ. We arbitrarily
chose values of δ between 0.5 and 1.0 with a fixed ‘warm up’ period of 10 iterations.
8
Problem: 20TERM
70
60
% reduction in CPU time
50
40
30
20
10
0
Sample size: 1000
Sample size: 2000
Sample size: 3000
−10
−20
−30
10
20
30
40
50
60
70
80
agg−max
90
100
200
300
500
1000
Figure 1: Adaptive multicut aggregation % CPU improvement over regular L-shaped
method for 20TERM
Problem: SSN
100
% reduction in CPU time
80
60
40
20
Sample size: 1000
Sample size: 2000
Sample size: 3000
0
−20
10
20
30
40
50
60
70
80
agg−max
90
100
200
300
500
1000
Figure 2: Adaptive multicut aggregation % CPU improvement over regular L-shaped
method for SSN
9
Problem: STORM
−60
−80
Sample size: 1000
Sample size: 2000
Sample size: 3000
% reduction in CPU time
−100
−120
−140
−160
−180
−200
−220
−240
10
20
30
40
50
60
70
80
agg−max
90
100
200
300
500
1000
Figure 3: Adaptive multicut aggregation % CPU improvement over regular L-shaped
method for STORM
Figures 4, 5 and 6 show results for values of delta 0.5, 0.6 and 0.7. The results seem to
indicate that redundancy threshold is not an important factor in determining the CPU
time for the adaptive multicut method.
4.3
Experiment 3: Static Multicut Aggregation
The aim of the third experiment was to study the performance of static partial cut
aggregation relative to the single cut and the pure multicut methods. We arbitrarily set
the level of aggregation a priori and fixed it throughout the algorithm. Thus the sample
space was partitioned during the initialization step of the algorithm and the number
of cuts added to master problem at each iteration step remained fixed. We varied the
aggregation level from ‘no aggregation’ to ‘total aggregation’. Under ‘no aggregation’
the number of aggregates is equal to the sample size, which is the pure multicut method.
We only performed pure multicut for sample size 1000 due to the requirement of large
amount of memory for the larger samples. ‘Total aggregation’ corresponds to the single
cut L-shaped method. All static cut aggregations were arbitrarily made in a round-robin
manner, that is, for a fixed number of aggregates m, cuts 1, m + 1, 2m + 1, · · · , were
aggregated to aggregate 1, cuts 2, m + 2, 2m + 2, · · · , were aggregated to aggregate 2, and
so on.
The computational results are summarized in the plots in Figures 7, 8, and 9. In the
plots ‘% reduction in CPU time’ is relative to the single cut L-shaped method, which acted
as the benchmark. The results in Figures 7 show that a level of aggregation between single
10
Problem: 20TERM
70
60
% reduction in CPU time
50
40
δ=0.5
δ=0.6
δ=0.7
30
20
10
0
−10
−20
−30
10
20
30
40
50
60
agg−max
70
80
90
100
Figure 4: Adaptive multicut aggregation with δ % CPU improvement over regular Lshaped method for 20TERM
Problem: SSN
% reduction in CPU time
80
δ=0.5
δ=0.6
δ=0.7
60
40
20
0
−20
10
20
30
40
50
60
agg−max
70
80
90
100
Figure 5: Adaptive multicut aggregation with δ % CPU improvement over regular Lshaped method for SSN
11
Problem: STORM
−65
% reduction in CPU time
−70
−75
δ=0.5
δ=0.6
δ=0.7
−80
−85
−90
−95
−100
10
20
30
40
50
60
agg−max
70
80
90
100
Figure 6: Adaptive multicut aggregation with δ % CPU improvement over regular Lshaped method for STORM
cut and pure multicut for 20TERM, results in about 60% reduction in CPU time over
the L-shaped for all the instances. For SSN (Figure 8) percent improvements over 90%
are achieved, while for STORM (Figure 9) no significant improvement (just over 10%) is
observed. The results also show that increasing the number of aggregates decreases the
number of iterations, but increases the size and time required to solve the problem. These
results amply demonstrate that using an aggregation level that is somewhere between the
single cut and the pure multicut L-shaped methods results in better CPU time. From a
practical point of view, however, a good level of cut aggregation is not known a priori.
Therefore, the adaptive multicut method provides a viable approach.
12
Problem: 20TERM
80
% reduction in CPU time
60
40
20
0
Sample size: 1000
Sample size: 2000
Sample size: 3000
−20
−40
1
5
10
25
50
100
Number of aggregates
200
500
1000
Figure 7: Static multicut aggregation % CPU improvement over regular L-shaped method
for 20TERM
Problem: SSN
100
% reduction in CPU time
80
Sample size: 1000
Sample size: 2000
Sample size: 3000
60
40
20
0
−20
1
5
10
25
50
100
Number of aggregates
200
500
1000
Figure 8: Static multicut aggregation % CPU improvement over regular L-shaped method
for SSN
13
Problem: STORM
100
% reduction in CPU time
0
−100
−200
−300
Sample size: 3000
Sample size: 5000
Sample size: 10000
−400
−500
−600
1
5
10
25
50
100
200
Number of aggregates
500
1000
2000 3000/5000
Figure 9: Static multicut aggregation % CPU improvement over regular L-shaped method
for STORM
14
5
Conclusion
Traditionally, the single cut and multicut L-shaped methods are the algorithms of choice
for stochastic linear programs. In this paper, we introduce an adaptive multicut method
that generalizes the single cut and multicut methods. The method dynamically adjusts
the aggregation level of the optimality cuts in the master program. A generic framework
for building different implementations with different aggregation rules was developed
and implemented. The computational results based on large scale instances from the
literature reveal that the proposed method performs significantly faster than standard
techniques. The adaptive multicut method has potential to be applied to decompositionbased stochastic linear programming algorithms. These include, for example, scalable
sampling methods such as the sample average approximation method (Shapiro, 2000),
the trust-region method (Linderoth and Wright, 2003), and the stochastic decomposition
method (Higle and Sen, 1991, 1996, Higle and Zhao, 2007).
Acknowledgments. This research was supported by grants CNS-0540000, CMII-0355433
and CMII-0620780 from the National Science Foundation.
15
Appendix: Numerical Results
Table 2: Adaptive multicut aggregation for 20TERM
Sample
Size
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
agg-max
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
10
20
30
40
50
60
70
80
90
100
200
300
CPU
sec
403.02
395.43
523.26
569.79
586.45
567.58
651.05
662.77
694.13
669.08
840.73
921.68
1111.70
1272.22
803.99
854.15
955.18
947.60
1143.19
1156.10
1247.86
1279.83
1360.74
1352.84
1551.35
1734.43
2073.32
2240.14
1350.50
1211.21
1415.92
1469.87
1643.93
1682.43
1869.92
1590.16
1768.80
1881.62
1852.93
2418.09
Min
Iters
281
311
397
499
499
482
567
585
632
528
775
890
1071
983
262
307
409
407
512
525
546
501
589
615
671
761
949
1159
244
305
391
400
446
448
533
480
511
530
600
836
16
Aver
Iters
301.8
359.8
460.8
521.2
545.4
551.4
616.0
631.2
658.2
658.4
852.4
914.2
1139.6
1282.6
285.6
365.8
429.8
449.0
520.0
544.6
583.4
604.4
638.4
639.8
773.0
852.8
1014.2
1224.6
259.8
327.2
404.8
443.2
476.8
496.0
556.4
513.8
556.4
592.4
632.6
845.4
Max
Iters
316
396
491
533
581
603
710
664
693
757
914
954
1267
1491
300
393
461
483
535
560
652
669
668
674
888
947
1067
1298
281
362
423
472
504
523
574
561
591
626
667
856
% CPU
Reduction
67.00
67.62
57.15
53.34
51.98
53.52
46.69
45.73
43.16
45.21
31.15
24.53
8.96
-4.18
62.93
60.62
55.96
56.31
47.29
46.70
42.47
40.99
37.26
37.63
28.47
20.03
4.41
-3.28
66.39
69.86
64.76
63.42
59.09
58.13
53.47
60.43
55.98
53.17
53.89
39.82
Table 3: Adaptive multicut aggregation for SSN
Sample
Size
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
agg-max
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
10
20
30
40
50
60
70
80
90
100
200
300
CPU
sec
81.23
104.93
135.92
169.87
202.05
238.27
258.44
290.35
318.80
351.89
662.72
800.38
1295.09
2006.02
150.05
180.33
216.47
254.05
292.41
332.53
376.76
422.19
451.73
501.13
903.76
1140.26
1653.71
2551.07
214.87
249.74
294.30
339.46
377.86
449.29
479.50
507.89
561.22
621.87
998.69
1412.60
Min
Iters
60
93
131
169
212
257
276
313
341
375
751
892
1554
2611
54
80
101
129
150
173
199
232
250
276
514
685
1064
1714
48
70
93
112
129
157
170
187
207
223
383
558
17
Aver
Iters
61.8
96.4
135.8
177.6
218.6
260.6
287.8
328.6
362.6
408.8
789.2
965.0
1646.4
2806.8
55.4
81.6
106.2
131.8
156.4
181.6
209.4
236.8
257.0
288.6
533.0
714.8
1111.2
1772.6
49.6
72.0
95.8
116.0
132.8
162.6
176.6
189.8
213.0
236.4
398.0
613.4
Max
Iters
64
104
139
189
229
267
303
347
376
447
851
1021
1749
3062
57
83
113
133
160
193
221
241
267
314
548
736
1176
1854
51
73
98
119
142
170
182
195
217
250
424
650
% CPU
Reduction
95.88
94.67
93.10
91.37
89.74
87.90
86.88
85.26
83.81
82.13
66.35
59.36
34.24
-1.85
96.25
95.50
94.59
93.65
92.70
91.70
90.59
89.46
88.72
87.48
77.43
71.52
58.70
36.29
96.48
95.91
95.18
94.44
93.81
92.65
92.15
91.69
90.81
89.82
83.65
76.88
Table 4: Adaptive multicut aggregation for STORM
Sample
Size
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
agg-max
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
CPU
sec
21.62
21.07
21.27
21.20
21.28
20.82
20.87
21.08
20.65
20.87
20.79
20.61
20.65
20.61
21.56
21.47
21.36
21.39
21.43
21.61
21.43
21.39
21.42
21.41
21.41
21.37
21.39
21.41
48.23
48.57
48.86
48.22
47.46
48.24
48.12
47.62
48.40
46.97
46.90
46.09
46.06
46.11
Min
Iters
18
18
18
18
18
18
18
18
18
18
18
18
18
18
17
17
17
17
17
17
17
17
17
17
17
17
17
17
19
19
19
19
19
19
19
19
20
19
20
20
20
20
18
Aver
Iters
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
19.8
19.8
19.8
19.8
19.8
19.8
19.8
19.8
20.0
19.8
20.0
20.0
20.0
20.0
Max
Iters
19
19
19
19
19
19
19
19
19
19
19
19
19
19
17
17
17
17
17
17
17
17
17
17
17
17
17
17
20
20
20
20
20
20
20
20
20
20
20
20
20
20
% CPU
Reduction
-230.28
-221.82
-224.88
-223.85
-225.09
-217.95
-218.72
-221.97
-215.40
-218.75
-217.62
-214.81
-215.39
-214.75
-62.53
-61.85
-61.08
-61.26
-61.57
-62.91
-61.57
-61.29
-61.50
-61.45
-61.40
-61.14
-61.31
-61.45
-138.26
-139.98
-141.37
-138.25
-134.47
-138.31
-137.72
-135.28
-139.11
-132.06
-131.69
-127.71
-127.57
-127.82
Table 5: Adaptive multicut aggregation with redundancy threshold δ
Problem
Name
20term
20term
20term
20term
20term
20term
20term
20term
20term
20term
ssn
ssn
ssn
ssn
ssn
ssn
ssn
ssn
ssn
ssn
storm
storm
storm
storm
storm
storm
storm
storm
storm
storm
agg
-max
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
CPU Time (secs.)
δ
0.5
0.6
0.7
407.01 402.15 384.90
450.09 434.86 470.50
522.31 543.72 490.48
593.87 596.54 564.78
512.02 518.50 612.98
652.42 629.03 620.88
630.61 634.00 603.24
618.35 630.86 619.25
551.51 588.42 666.14
723.23 708.96 738.51
78.09
78.63
78.75
106.49 107.55 107.27
132.61 134.48 135.52
171.78 171.77 172.50
202.26 202.95 202.36
237.44 235.90 235.76
267.75 264.22 262.49
301.38 298.50 298.74
315.24 323.30 318.39
354.43 361.52 358.93
11.76
10.87
11.15
10.90
10.54
10.67
10.81
10.34
10.57
11.25
10.32
10.47
11.20
10.30
10.46
11.72
10.32
10.48
11.81
10.29
10.41
11.76
10.30
10.47
11.64
10.20
10.49
11.94
10.29
10.42
Aver Iters
δ
0.5
0.6
0.7
308.4 304.6 297.4
386.6 380.2 398.8
461.8 469.2 452.0
538.8 539.8 515.8
511.8 514.4 565.8
606.6 591.0 584.2
602.6 603.2 590.6
614.8 622.2 608.2
582.0 602.2 650.2
706.8 694.4 702.6
59.8
60.4 134.4
98.6
99.4 128.0
132.4 134.4 189.8
180.2 180.4 197.0
218.6 218.8 206.8
262.2 259.4 223.4
296.4 293.8 225.0
339.0 336.2 236.2
360.2 366.8 266.4
412.4 420.2 402.8
21.4
18.4
18.4
19.2
18.4
18.4
19.0
18.4
18.4
21.0
18.4
18.4
20.8
18.4
18.4
22.8
18.4
18.4
23.0
18.4
18.4
22.8
18.4
18.4
22.2
18.0
18.4
23.2
18.4
18.4
19
% CPU Reduction
δ
0.5
0.6
0.7
68.11
68.49
69.84
64.74
65.93
63.14
59.08
57.40
61.57
53.47
53.26
55.75
59.88
59.38
51.97
48.88
50.72
51.36
50.59
50.33
52.74
51.55
50.57
51.48
56.79
53.90
47.81
43.34
44.45
42.14
95.83
95.80
95.79
94.31
94.25
94.27
92.91
92.81
92.76
90.82
90.82
90.78
89.19
89.16
89.19
87.31
87.39
87.40
85.69
85.88
85.97
83.90
84.05
84.04
83.16
82.72
82.99
81.06
80.68
80.82
-92.16 -77.61 -82.19
-78.10 -72.22 -74.35
-76.63 -68.95 -72.71
-83.82 -68.63 -71.08
-83.01 -68.30 -70.92
-91.50 -68.63 -71.24
-92.97 -68.14 -70.10
-92.16 -68.30 -71.08
-90.20 -66.67 -71.41
-95.10 -68.14 -70.26
Table 6: Static multicut aggregation for 20TERM
Sample
Size
1000
1000
1000
1000
1000
1000
1000
1000
1000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
Num
aggs
1
5
10
25
50
100
200
500
1000
1
5
10
25
50
100
200
500
1000
2000
1
5
10
25
50
100
200
500
1000
2000
CPU
sec
1221.17
955.19
724.08
605.16
465.67
423.27
345.65
639.72
1681.18
2168.89
1804.45
1687.90
1294.14
1018.93
844.92
781.94
1037.75
2351.56
12065.45
4018.32
2765.13
2456.81
2004.51
1702.02
1370.96
1277.18
1525.36
3728.50
14920.00
Min
Iters
1161
988
631
515
385
311
191
149
127
1082
830
666
611
448
350
241
182
139
114
1184
896
781
615
516
332
261
206
172
141
20
Aver
Iters
1304.2
1003.0
734.4
569.0
411.4
320.2
221.4
160.2
138.8
1232.6
976.2
853.6
643.0
483.8
372.8
283.6
194.4
154.0
126.8
1426.2
1007.0
878.6
682.8
547.2
419.8
327.0
228.0
180.5
147.8
Max
Iters
1400
1023
789
608
431
327
244
167
146
1480
1054
919
684
523
388
302
201
166
136
1672
1176
950
764
574
474
360
248
195
150
% CPU
Reduction
0.00
21.78
40.71
50.44
61.87
65.34
71.69
47.61
-37.67
0.00
16.80
22.18
40.33
53.02
61.04
63.95
52.15
-8.42
-456.30
0.00
31.19
38.86
50.12
57.64
65.88
68.22
62.04
7.21
-271.30
Table 7: Static multicut aggregation for SSN
Sample
Size
1000
1000
1000
1000
1000
1000
1000
1000
1000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
Num
aggs
1
5
10
25
50
100
200
500
1000
1
5
10
25
50
100
200
500
1000
2000
1
5
10
25
50
100
200
500
1000
2000
3000
CPU
sec
1969.52
634.58
346.65
167.07
96.41
68.21
56.99
58.43
76.77
4004.00
1414.22
840.35
415.28
246.87
172.25
139.63
132.74
149.73
229.89
6109.21
2349.93
1418.19
719.81
425.63
289.04
229.88
206.18
239.42
361.58
542.95
Min
Iters
2569
785
403
180
99
62
43
29
24
2690
912
504
240
135
84
57
37
29
24
2693
983
572
280
158
98
67
41
31
26
24
21
Aver
Iters
2767.0
811.4
429.0
189.8
102.6
63.6
43.8
30.2
25.2
2856.0
939.0
536.6
247.0
137.0
85.0
58.0
38.0
29.0
24.6
2935.0
1059.6
616.2
292.6
162.2
100.0
67.2
42.4
32.8
26.8
24.4
Max
Iters
2961
850
456
197
109
65
45
31
27
2980
968
551
258
139
87
59
40
29
26
3236
1118
648
302
167
103
68
44
34
27
25
% CPU
Reduction
0.00
67.78
82.40
91.52
95.10
96.54
97.11
97.03
96.10
0.00
64.68
79.01
89.63
93.83
95.70
96.51
96.68
96.26
94.26
0.00
61.53
76.79
88.22
93.03
95.27
96.24
96.62
96.08
94.08
91.11
Table 8: Static multicut aggregation for STORM
Sample
Size
1000
1000
1000
1000
1000
1000
1000
1000
1000
2000
2000
2000
2000
2000
2000
2000
2000
2000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
10000
10000
10000
10000
10000
10000
10000
10000
10000
10000
Num
aggs
1
5
10
25
50
100
200
500
1000
1
5
10
25
50
100
200
500
1000
1
5
10
25
50
100
200
500
1000
2000
3000
1
5
10
25
50
100
200
500
1000
2000
5000
1
5
10
25
50
100
200
500
1000
2000
CPU
sec
6.54
6.50
5.84
5.72
6.52
6.09
6.69
8.63
16.43
13.26
13.73
12.87
12.31
12.91
12.16
12.42
18.04
21.11
20.24
18.28
18.69
17.97
19.76
20.75
18.27
21.68
32.45
56.82
91.86
35.35
33.13
33.68
32.20
29.78
32.49
33.90
33.80
47.47
78.36
229.00
70.55
70.35
70.67
71.60
69.74
51.11
64.06
68.81
74.46
114.06
Min
Iters
20
20
18
17
20
19
18
17
18
20
22
18
18
18
19
19
19
17
20
18
19
18
18
20
19
17
19
18
18
23
18
20
19
19
19
19
19
19
18
18
23
23
22
23
23
18
19
19
19
20
22
Aver
Iters
21.8
22.0
20.2
19.4
21.4
19.0
19.2
17.0
18.4
22.2
23.0
22.4
21.2
21.0
20.0
19.0
20.6
17.0
22.0
21.8
21.8
20.8
22.0
22.2
19.4
18.6
19.8
18.8
18.8
23.2
22.6
23.0
22.0
20.8
21.0
21.8
19.0
20.6
19.6
18.6
23.2
23.2
23.4
23.8
23.0
18.8
20.6
20.2
19.0
20.8
Max
Iters
24
24
22
21
23
19
22
17
19
24
24
24
23
23
22
19
21
17
24
24
24
24
24
24
21
19
20
19
19
24
24
24
24
23
23
23
19
22
21
19
24
24
24
24
23
19
22
22
19
21
% CPU
Reduction
0.00
0.61
10.71
12.50
0.28
6.85
-2.33
-31.87
-151.03
0.00
-3.54
2.92
7.16
2.61
8.25
6.31
-36.05
-59.24
0.00
9.68
7.62
11.18
2.38
-2.52
9.70
-7.13
-60.32
-180.74
-353.83
0.00
6.28
4.74
8.92
15.76
8.09
4.08
4.41
-34.27
-121.63
-547.67
0.00
0.28
-0.17
-1.48
1.15
27.55
9.20
2.47
-5.55
-61.67
References
Benders, J. F.: 1962, Partitioning procedures for solving mixed-variable programming
problems, Numerische Mathematik 54, 238–252.
Birge, J.: 1985, Aggregation bounds in stochastic linear programming, Mathematical
Programming 31, 25–41.
Birge, J.: 1988, A multicut algorithm for two-stage stochastic linear programs, European
Journal of Operational Research 34, 384–392.
Birge, J. R. and Louveaux, F. V.: 1997, Introduction to Stochastic Programming,
Springer, New York.
Gassmann, H. I.: 1990, MSLiP: a computer code for the multistage stochastic linear
programming problem, Mathematical Programming 47, 407–423.
Higle, J. and Sen, S.: 1991, Stochastic decomposition: An algorithm for twostage stochastic linear programs with recourse, Mathematics of Operational Research
16, 650–669.
Higle, J. and Sen, S.: 1996, Stochastic Decomposition: A Statistical Method for Large
Scale stochastic linear programming, Kluwer Academic Publisher, Dordrecht.
Higle, J. and Zhao, L.: 2007, Adaptive and nonadaptive samples in solving stochastic
linear programs: A computational investigation, Submitted to Computational Optimization and Applications .
ILOG, I.: 2003, CPLEX 9.0 Reference Manual, ILOG CPLEX Division.
Linderoth, J. T. and Wright, S. J.: 2003, Decomposition algorithms for stochastic
programming on a computational grid, Computational Optimization and Applications
24, 207–250.
Mak, W. K., Morton, D. and Wood., R.: 1999, Monte carlo bounding techniques
for determining solution quality in stochastic programs, Operations Research Letters
24, 47–56.
Mulvey, J. M. and Ruszczynski, A.: 1992, A new scenario decomposition method for large
scale stochastic optimization, Technical report sor-91-19, Dept. of Civil Engineering
and Operations Research, Princeton Univ. Princeton, N.J. 08544.
Ruszczynski, A.: 1988, A regularized decomposition for minimizing a sum of polyhedral
functions, Mathematical Programming 35, 309–333.
Sen, S., Doverspike, R. and Cosares, S.: 1994, Network planning with random demand,
Telecommunications Systems 3, 11–30.
23
Shapiro, A.: 2000, Stochastic programming by Monte Carlo simulation methods,
Available at http://www2.isye.gatech.edu/∼ashapiro/publications.html.
URL: http://www2.isye.gatech.edu/∼ashapiro/publications.html
Van Slyke, R. and Wets, R.-B.: 1969, L-shaped linear programs with application to
optimal control and stochastic programming, SIAM Journal on Applied Mathematics
17, 638–663.
24
Download