Adaptive multicut aggregation for two-stage stochastic linear programs with recourse Svyatoslav Trukhanov

advertisement
European Journal of Operational Research 206 (2010) 395–406
Contents lists available at ScienceDirect
European Journal of Operational Research
journal homepage: www.elsevier.com/locate/ejor
Stochastics and Statistics
Adaptive multicut aggregation for two-stage stochastic linear programs
with recourse
Svyatoslav Trukhanov a, Lewis Ntaimo a,*, Andrew Schaefer b
a
b
Department of Industrial and Systems Engineering, Texas A&M University, 3131 TAMU, College Station, TX 77843, USA
Department of Industrial Engineering, 1048 Benedum Hall, University of Pittsburgh, Pittsburgh, PA 15261, USA
a r t i c l e
i n f o
Article history:
Received 21 January 2009
Accepted 18 February 2010
Available online 21 February 2010
Keywords:
Stochastic programming
Multicut aggregation
Adaptive cuts
a b s t r a c t
Outer linearization methods for two-stage stochastic linear programs with recourse, such as the L-shaped
algorithm, generally apply a single optimality cut on the nonlinear objective at each major iteration,
while the multicut version of the algorithm allows for several cuts to be placed at once. In general, the
L-shaped algorithm tends to have more major iterations than the multicut algorithm. However, the
trade-offs in terms of computational time are problem dependent. This paper investigates the computational trade-offs of adjusting the level of optimality cut aggregation from single cut to pure multicut. Specifically, an adaptive multicut algorithm that dynamically adjusts the aggregation level of the optimality
cuts in the master program, is presented and tested on standard large-scale instances from the literature.
Computational results reveal that a cut aggregation level that is between the single cut and the multicut
can result in substantial computational savings over the single cut method.
Ó 2010 Elsevier B.V. All rights reserved.
1. Introduction
Outer linearization methods such as the L-shaped algorithm
(Van Slyke and Wets, 1969) for stochastic linear programming
(SLP) problems with recourse generally apply a single cut on the
nonlinear objective at each major iteration, while the multicut version of the algorithm (Birge and Louveaux, 1988) allows for cuts up
to the number of outcomes (scenarios) to be placed at once. In general, the L-shaped algorithm tends to have more major iterations
than the multicut algorithm. However, the trade-offs in terms of
computational time may be problem dependent. This paper generalizes these two approaches through an adaptive multicut algorithm that dynamically adjusts the level of aggregation of the
optimality cuts in the master program during the course of the algorithm. Results of our computational investigation suggest that for
certain classes of problems the adaptive multicut provides better
performance than the traditional single cut and multicut methods.
The relative advantages of the single cut and multicut methods
have already been explored (Birge and Louveaux, 1988, 1990). Citing earlier work (Birge and Louveaux, 1988, 1990), Birge and Louveaux (1997) gave the rule of thumb that the multicut method is
preferable when the number of scenarios is not much larger than
the size of the first stage decision dimension space. Birge (1985)
used variable aggregation to reduce the problem size and to find
upper and lower bounds on the optimal solution. Ruszczyński
* Corresponding author.
E-mail addresses: slavik@tamu.edu (S. Trukhanov), ntaimo@tamu.edu (L.
Ntaimo), schaefer@ie.pitt.edu (A. Schaefer).
0377-2217/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.ejor.2010.02.025
(1988) considered adding a regularized term to the objective function to stabilize the master problem and reported computational
results showing that the method runs efficiently despite solving
a quadratic problem instead of a linear one. The number of iterations that required extensive computations decreased for the instances considered.
More recent work related to modified versions of the L-shaped
method examined serial and asynchronous versions of the Lshaped method and a trust-region method (Linderoth and Wright,
2003). Algorithms were implemented on a grid computing platform that allowed authors to solve large-scale problem instances
from the literature. These include the SSN (Sen et al., 1994) with
105 scenarios and STORM (Mulvey and Ruszczyński, 1995) with
107 scenarios. The authors reported CPU time improvements of
up to 83.5% for instances of SSN with 104 scenarios.
The contributions of this paper include an adaptive optimality
multicut method for SLP, an implementation of the algorithm in
C++ using the CPLEX Callable Library (ILOG, 2003) and a computational investigation with instances from the literature demonstrating that changing the level of optimality cut aggregation in the
master problem during the course of the algorithm can lead to significant improvement in solution time. The results of this work also
have potential to influence the design and implementation of future algorithms, especially those based on Benders decomposition
(Benders, 1962). For example, decomposition methods based on
the sample average approximation method (Shapiro, 2003), the
trust-region method (Linderoth and Wright, 2003) and the stochastic decomposition method (Higle and Sen, 1991), may benefit
396
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Table 1
Problem characteristics.
Problem
First stage
Second stage
Rows
Cols
Rows
No. of random variables
No. of scenarios
Cols
20TERM
3
63
124
764
40
1:09 1012
SSN
1
89
175
706
86
1:02 1070
59
128
526
1259
8
3:91 105
Problem: 20TERM
70
60
50
% reduction in CPU time
40
30
20
10
0
Sample size: 1000
Sample size: 2000
Sample size: 3000
−10
−20
−30
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
agg−max
Fig. 1. Adaptive multicut aggregation % CPU improvement over regular L-shaped method for 20TERM.
Problem: SSN
100
80
60
% reduction in CPU time
STORM
40
20
Sample size: 1000
Sample size: 2000
Sample size: 3000
0
−20
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
agg−max
Fig. 2. Adaptive multicut aggregation % CPU improvement over regular L-shaped method for SSN.
397
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Problem: 20TERM
70
60
% reduction in CPU time
50
40
δ=0.5
δ=0.6
δ=0.7
30
20
10
0
−10
−20
−30
10
20
30
40
50
60
70
80
90
100
agg−max
Fig. 3. Adaptive multicut aggregation with d % CPU improvement over regular L-shaped method for 20TERM.
Problem: SSN
80
δ=0.5
δ=0.6
δ=0.7
% reduction in CPU time
60
40
20
0
−20
10
20
30
40
50
60
70
80
90
100
agg−max
Fig. 4. Adaptive multicut aggregation with d % CPU improvement over regular L-shaped method for SSN.
from the adaptive multicut approach. Furthermore, the adaptive
optimality multicut method we propose is relatively easier to
implement than the trust-region method and provides significant
improvement in computational time for the instance classes we
considered.
The rest of this paper is organized as follows. The next section
gives the problem statement and a motivation for the adaptive
multicut approach. A formal description of the method is given
in Section 3. Computational results are reported in Section 4. The
paper ends with some concluding remarks in Section 5.
2. An adaptive multicut approach
A two-stage stochastic linear program with fixed recourse in the
extensive form can be given as follows:
Min c> x þ
X
ps q>s ys
ð1aÞ
s2S
s:t: Ax ¼ b
T s x þ Wys P r s ; 8s 2 S
x P 0; ys P 0; 8s 2 S
ð1bÞ
ð1cÞ
ð1dÞ
398
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Problem: 20TERM
80
% reduction in CPU time
60
40
20
0
Sample size: 1000
Sample size: 2000
Sample size: 3000
−20
−40
1
5
10
25
50
100
200
500
1000
Number of aggregates
Fig. 5. Static multicut aggregation % CPU improvement over regular L-shaped method for 20TERM.
Problem: SSN
100
% reduction in CPU time
80
Sample size: 1000
Sample size: 2000
Sample size: 3000
60
40
20
0
−20
1
5
10
25
50
100
200
500
1000
Number of aggregates
Fig. 6. Static multicut aggregation % CPU improvement over regular L-shaped method for SSN.
where x 2 Rn1 is the vector of first stage decision variables with the
corresponding cost vector c 2 Rn1 ; A 2 Rm1 n1 is the first stage constraint matrix; and b 2 Rm1 is the first stage right-hand side vector.
Additionally, S is the set of scenarios (outcomes), and for each scenario s 2 S; ys 2 Rn2 is the vector of second stage decision variables;
ps is the probability of occurrence for scenario s; qs 2 Rn2 is the second stage cost vector; rs 2 Rm2 is the second stage right-hand side
vector; T s 2 Rm2 n1 is the technology matrix; and W 2 Rm2 n2 is
the recourse matrix.
To set the ground for the adaptive multicut method, we first make
some observations from a comparison of the single cut L-shaped
method and the multicut method in the context of problem (1d).
In general, the single cut L-shaped method tends to have ‘information loss’ due to the aggregation of all the scenario dual information
into one optimality cut in the master problem at each major iteration
of the method. On the other hand, the multicut method uses all scenario dual information and places an optimality cut for each scenario
in the master problem. Thus the L-shaped method generally requires
more major iterations than the multicut method. However, on iteration index k the size of the master problem in the multicut method
grows much faster and is ðm1 þ kjSjÞ ðn1 þ jSjÞ in size in the worst
case, as compared to ðm1 þ kÞ ðn1 þ 1Þ in the L-shaped method.
Consequently, the multicut method is expected to have increased
solution time when jSj is very large.
399
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Problem: STORM
100
% reduction in CPU time
0
−100
−200
−300
Sample size: 3000
Sample size: 5000
Sample size: 10000
−400
−500
−600
1
5
10
25
50
100
200
500
1000
2000 3000/5000
Number of aggregates
Fig. 7. Static multicut aggregation % CPU improvement over regular L-shaped method for STORM.
Table 2
Adaptive multicut aggregation for 20TERM.
Sample size
Agg-max
CPU seconds
Min iters
Aver iters
Max iters
% CPU reduction
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
403.02
395.43
523.26
569.79
586.45
567.58
651.05
662.77
694.13
669.08
840.73
921.68
1111.70
1272.22
281
311
397
499
499
482
567
585
632
528
775
890
1071
983
301.8
359.8
460.8
521.2
545.4
551.4
616.0
631.2
658.2
658.4
852.4
914.2
1139.6
1282.6
316
396
491
533
581
603
710
664
693
757
914
954
1267
1491
67.00
67.62
57.15
53.34
51.98
53.52
46.69
45.73
43.16
45.21
31.15
24.53
8.96
4.18
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
803.99
854.15
955.18
947.60
1143.19
1156.10
1247.86
1279.83
1360.74
1352.84
1551.35
1734.43
2073.32
2240.14
262
307
409
407
512
525
546
501
589
615
671
761
949
1159
285.6
365.8
429.8
449.0
520.0
544.6
583.4
604.4
638.4
639.8
773.0
852.8
1014.2
1224.6
300
393
461
483
535
560
652
669
668
674
888
947
1067
1298
62.93
60.62
55.96
56.31
47.29
46.70
42.47
40.99
37.26
37.63
28.47
20.03
4.41
3.28
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
10
20
30
40
50
60
70
80
90
100
200
300
1350.50
1211.21
1415.92
1469.87
1643.93
1682.43
1869.92
1590.16
1768.80
1881.62
1852.93
2418.09
244
305
391
400
446
448
533
480
511
530
600
836
259.8
327.2
404.8
443.2
476.8
496.0
556.4
513.8
556.4
592.4
632.6
845.4
281
362
423
472
504
523
574
561
591
626
667
856
66.39
69.86
64.76
63.42
59.09
58.13
53.47
60.43
55.98
53.17
53.89
39.82
400
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Our goal is to devise an algorithm that performs better than the
single cut L-shaped and the multicut methods. The insights to our
approach are to (1) use more information from subproblems,
which assumes adding more than one cut at each major iteration;
and (2) keep the size of master problem relatively small, which requires to limit the number of cuts added to master problem. These
insights lead us to consider a multicut method with ‘partial’ cut
aggregation. This means that such an algorithm requires partitioning the sample space S into D aggregates fSd gDd¼1 such that
S ¼ S1 [ S2 [ [ SD and Si \ Sj ¼ ; 8i – j. Then at the initialization
step of the algorithm, optimality cut variables hd are introduced,
one for each aggregate. The optimality cuts are also generated
one per aggregate in the form ðbkd Þ> x þ hd P akd , where ðbkd Þ> ¼
P
P
k >
k
k >
s2Sd ps ðps Þ T s and ad ¼
s2Sd ps ðps Þ r s .
The multicut and the single cut L-shaped methods have two
extremes of optimality cut aggregation. The multicut L-shaped
method has no cut aggregation, which corresponds to the case when
D ¼ jSj and Sd ¼ fsd g; d ¼ 1; . . . ; jSj. The expected recourse function
value is approximated by having an optimality decision variable for
each scenario in the master program. On the contrary, the single cut
L-shaped method has the highest level of cut aggregation, which
means D ¼ 1 and S1 ¼ S. The expected recourse function value is
approximated by only one optimality decision variable in the master program. According to our numerical results, the optimal runtime is achieved on some middle level of aggregation 1 < D < jSj
but this level is not known a priori. For two out of the three classes
of instances we tested, the proposed adaptive multicut method outperformed both the pure single cut and the multicut method.
Our approach allows for the optimality cut aggregation to be
done dynamically during the course of the algorithm. Thus the
master program will have ‘adaptive’ optimality cut variables, that
is, the number of optimality cut variables will change during the
course of the algorithm. The goal is to let the algorithm learn more
information about the expected recourse function and then settle
for a level of aggregation that leads to faster convergence to the
optimal solution. Therefore, we propose initializing the algorithm
with a ‘low’ level of cut aggregation and increasing it, if necessary,
as more information about the expected recourse function is
learned during the course of the algorithm. As the level of cut
aggregation increases, the number of optimality cut variables in
the master problem decreases. If jSj is not ‘very large’ one could initialize the algorithm as pure multicut, where an optimality cut variable hs is created for each s 2 S in the master program. Otherwise,
an initial level of cut aggregation between 1 and jSj should be chosen. Computer speed and memory will certainly influence the level
of cut aggregation since a low level of cut aggregation would generally require more computer memory.
where constraints (2c) and (2d) are the feasibility and optimality
cuts, respectively. We are now in a position to formally state the
adaptive multicut algorithm.
Basic Adaptive Multicut Algorithm
Step 0: Initialization.
Set k
0; Fð0Þ
;, and initialize Sð0Þ.
Step 1: Solve the Master Problem.
Set k
k þ 1 and solve problem (2). Let ðxk ; fhkd gd2SðkÞ Þ be an
optimal solution to problem (2). If no constraint (2d) is present for some d 2 SðkÞ; hkd is set equal to 1 and is ignored in
the computation.
Step 2: Update Cut Aggregation Level.
Step 2a Cut Aggregation.
Generate aggregate SðkÞ using Sðk 1Þ based on some
aggregation scheme. Each element of SðkÞ is a union of some
elements from Sðk 1Þ. If, according to the aggregation
scheme, d1 ; d2 ; . . . ; dj 2 Sðk 1Þ are aggregated into
P
S
d 2 SðkÞ, then d ¼ ji¼1 di and pd ¼ ji¼1 pdi . Master problem
(2) will be modified by removing variables hd1 ; . . . ; hdj and
introducing a new one hd .
Step 2b Update Optimality Cuts.
Update the optimality cut coefficients in the master program
(2) as follows: For each iteration that optimality cuts were
added define
atd ¼ pd
j
X
ð1=pdl Þatdl
8d 2 SðkÞ; t 2 FðkÞ;
ð3Þ
ð1=pdl Þbtdl
8d 2 SðkÞ; t 2 FðkÞ:
ð4Þ
l¼1
and
btd ¼ pd
j
X
l¼1
For all iterations t 2 FðkÞ replace cuts corresponding to
d1 ; . . . ; dj with one new cut ðbtd Þ> x þ hd P atd corresponding to d.
Step 3: Solve Scenario Subproblems.
For all s 2 S solve:
Min q>s y
ð5aÞ
s:t: Wy P r s T s xk ;
y P 0:
ð5bÞ
ð5cÞ
Let pks be the dual multipliers associated with an optimal solution of problem (5c). For each d 2 SðkÞ if
hkd < pd
X
pks ðrs T s xk Þ;
ð6Þ
s2d
define
akd ¼
X
ps ðpks Þ> rs ;
ð7Þ
s2Sd
3. The basic adaptive multicut algorithm
and
To formalize our approach, at the kth iteration let the scenario set
partition be SðkÞ ¼ fS1 ; S2 ; . . . ; Slk g, where the sets Si ; i ¼ 1; . . . ; lk are
Slk
Si ¼ S. Therefore, each scedisjoint, that is, Si \ Sj ¼ ; 8i – j and i¼1
nario s 2 S belongs to only one Si 2 SðkÞ and the aggregate probabilP
Plk
P
pSi ¼ s2S ps ¼ 1. Let
ity of Si is defined as pSi ¼ s2Si ps with i¼1
FðkÞ denote the set of iteration numbers up to k where all subproblems are feasible and optimality cuts are generated. Then the master
program at major
X iteration k takes the form:
ðbkd Þ> ¼
Min c> x þ
hd ;
ð2aÞ
d2SðkÞ
s:t: Ax ¼ b;
ð2bÞ
X
ps ðpks Þ> T s :
If condition (6) does not hold for any d 2 SðkÞ, stop xk is optimal. Otherwise, if for all s 2 S subproblem (5c) is optimal, put
FðkÞ ¼ Fðk 1Þ [ fkg and go to Step 1. If problem (5c) is
infeasible for some s 2 S, let lk be the associated dual extreme
ray and define
ak ¼ l>k rs ;
ð2cÞ
ðbk Þ> ¼ l>k T s :
ðbtd Þ> x
ð2dÞ
Go to Step 1.
x P 0;
ð2eÞ
ð9Þ
and
ðbt Þ> x P at ; t 2 f1; . . . ; kg n FðkÞ;
þ hd P atd ; t 2 FðkÞ; d 2 SðkÞ;
ð8Þ
s2Sd
ð10Þ
401
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Notice that in Basic Adaptive Multicut algorithm optimality cuts
are only generated and added to the master program when all scenario subproblems (5c) are feasible. Otherwise, when an infeasible
subproblem is encountered, a feasibility cut based on that subproblem is generated and added to the master program. An alternative implementation is to find all infeasible subproblems and
generate a feasibility cut from each one to add to the master program. Also, optimality cuts can be generated for all d 2 SðkÞ such
that all subproblems s 2 d are feasible. Since convergence of Basic
Adaptive Multicut algorithm follows directly from that of the multicut method, a formal proof of convergence is omitted. An issue
that remains to be addressed is the aggregation scheme that one
can use to obtain SðkÞ from Sðk 1Þ.
There are several possibilities one can think of when it comes to
aggregation schemes, but our experience showed that most do not
perform well. We tried several schemes and through computational experimentation devised a scheme that worked the best.
This aggregation scheme involves two basic rules, redundancy
threshold and bound on the number of aggregates, and can described
as follows:
Aggregation Scheme:
Redundancy Threshold ðdÞ. This rule is based on the observation
that inactive (redundant) optimality cuts in the master
program contain ‘little’ information about the optimal
solution and can therefore, be aggregated without
information loss. Consider some iteration k after solving the
master problem for some aggregate d 2 SðkÞ. Let fd be the
number of iterations when all optimality cuts corresponding
to d are redundant, that is, the corresponding dual multipliers
are zero. Also let 0 < d < 1 be a given threshold. Then all d
such that fd =jFðkÞj > d are combined to form one aggregate.
This means that the optimality cuts obtained from aggregate
d are often redundant at least d of the time. This criterion
works fine if the number of iterations is large. Otherwise, it is
necessary to impose a ‘warm up’ period during which no
aggregation is made so as to let the algorithm ‘learn’
information about the expected recourse function.
Bound on the Number of Aggregates. As a supplement to the
redundancy threshold, a bound on the minimum number of
aggregates jSðkÞj can be imposed. This is necessary to
prevent aggregating all scenarios prematurely, leading to the
single cut method. Similarly, a bound on the maximum
number of aggregates can also be imposed to curtail the
proliferation of optimality cuts in the master program. These
bounds have to be set a priori and kept constant throughout
the algorithm run.
We should point out that even though in Redundancy Threshold
we require for each d 2 SðkÞ all the optimality cuts to be redundant,
one can consider an implementation where only a fraction of the
optimality cuts are required to be redundant.
Another aggregation scheme we tried is to aggregate scenarios
that yield ‘‘similar” optimality cuts, that is, cuts with similar gradients (orientation). We implemented this scheme but did not observe significant gains in computation time due to the additional
computational burden required for determining optimality cuts
that are similar. This task involves comparing thousands of cut
coefficients for thousands of all cuts generated in the course of
the algorithm run. We also considered deletion of ‘‘old” cuts from
the master program, that is, cuts that were added many iterations
earlier and remained redundant for some time. This seemed to reduce memory requirements but did not speed-up the master program solves. Remember that deletion of hundreds of cuts at a time
from the master program can also be a time consuming process.
Table 3
Adaptive multicut aggregation for SSN.
Sample
size
Aggmax
CPU
seconds
Min
iters
Aver
iters
Max
iters
% CPU
reduction
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
81.23
104.93
135.92
169.87
202.05
238.27
258.44
290.35
318.80
351.89
662.72
800.38
1295.09
2006.02
60
93
131
169
212
257
276
313
341
375
751
892
1554
2611
61.8
96.4
135.8
177.6
218.6
260.6
287.8
328.6
362.6
408.8
789.2
965.0
1646.4
2806.8
64
104
139
189
229
267
303
347
376
447
851
1021
1749
3062
95.88
94.67
93.10
91.37
89.74
87.90
86.88
85.26
83.81
82.13
66.35
59.36
34.24
1.85
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
150.05
180.33
216.47
254.05
292.41
332.53
376.76
422.19
451.73
501.13
903.76
1140.26
1653.71
2551.07
54
80
101
129
150
173
199
232
250
276
514
685
1064
1714
55.4
81.6
106.2
131.8
156.4
181.6
209.4
236.8
257.0
288.6
533.0
714.8
1111.2
1772.6
57
83
113
133
160
193
221
241
267
314
548
736
1176
1854
96.25
95.50
94.59
93.65
92.70
91.70
90.59
89.46
88.72
87.48
77.43
71.52
58.70
36.29
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
10
20
30
40
50
60
70
80
90
100
200
300
214.87
249.74
294.30
339.46
377.86
449.29
479.50
507.89
561.22
621.87
998.69
1412.60
48
70
93
112
129
157
170
187
207
223
383
558
49.6
72.0
95.8
116.0
132.8
162.6
176.6
189.8
213.0
236.4
398.0
613.4
51
73
98
119
142
170
182
195
217
250
424
650
96.48
95.91
95.18
94.44
93.81
92.65
92.15
91.69
90.81
89.82
83.65
76.88
An opposite approach to cut aggregation is cut disaggregation,
that is, partitioning an aggregate into two or more aggregates.
This allows for more cut information about the recourse function
to be available in the master program. However, the major drawback to this scheme is that all cut information has to be kept in
memory (or at least written to file) in order to partition any
aggregate. We tried this scheme and did not obtain any promising
results based on the instances we used. This scheme requires
careful book-keeping of all the generated cuts. Consequently, in
our implementation this resulted in a computer memory intensive scheme. We encountered several memory and algorithm
speed-up issues related to (1) the storage of cut information
(e.g. cut coefficients, right-hand sides, iteration index, and scenario index); (2) retrieval and aggregation of cut information;
(3) deletion of cut information from the master program; and
(4) increasing the master program size. These issues resulted in
a significantly slow algorithm, defeating the benefits of the adaptive approach. Nevertheless, we believe that cut disaggregation
still needs further research, especially with an advanced implementation of the cut disaggregation scheme to alleviated the
above-listed computational issues.
Finally, we should point out that even though we did not experiment with sampling-based aggregation approaches, one can still
incorporate such approaches into the proposed framework. For
402
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Table 4
Adaptive multicut aggregation for STORM.
Sample
size
Aggmax
CPU
seconds
Min
iters
Aver
iters
Max
iters
% CPU
reduction
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
1000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
21.62
21.07
21.27
21.20
21.28
20.82
20.87
21.08
20.65
20.87
20.79
20.61
20.65
20.61
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
19
19
19
19
19
19
19
19
19
19
19
19
19
19
230.28
221.82
224.88
223.85
225.09
217.95
218.72
221.97
215.40
218.75
217.62
214.81
215.39
214.75
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
21.56
21.47
21.36
21.39
21.43
21.61
21.43
21.39
21.42
21.41
21.41
21.37
21.39
21.41
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17.0
17
17
17
17
17
17
17
17
17
17
17
17
17
17
62.53
61.85
61.08
61.26
61.57
62.91
61.57
61.29
61.50
61.45
61.40
61.14
61.31
61.45
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
10
20
30
40
50
60
70
80
90
100
200
300
500
1000
48.23
48.57
48.86
48.22
47.46
48.24
48.12
47.62
48.40
46.97
46.90
46.09
46.06
46.11
19
19
19
19
19
19
19
19
20
19
20
20
20
20
19.8
19.8
19.8
19.8
19.8
19.8
19.8
19.8
20.0
19.8
20.0
20.0
20.0
20.0
20
20
20
20
20
20
20
20
20
20
20
20
20
20
138.26
139.98
141.37
138.25
134.47
138.31
137.72
135.28
139.11
132.06
131.69
127.71
127.57
127.82
example, one can aggregate scenarios based on importance sampling (Infanger, 1992). Another approach would be to use an accelerated cutting plane method such as regularized decomposition
(Ruszczyński, 1988) in conjunction with the proposed cut aggregation scheme.
4. Computational results
We implemented the adaptive multicut algorithm in C++ using
the CPLEX Callable Library (ILOG, 2003) and conducted several
computational experiments to gain insight into the performance
of the algorithm. We tested the algorithm on several large-scale instances from the literature. The three well-known SLP instances
with fixed recourse, 20TERM, SSN, and truncated version of STORM
were used for the study. 20TERM models a motor freight carrier’s
operations Mak et al. (1999), while SSN is a telephone switching
network expansion planning problem Sen et al. (1994). STORM is
a two period freight scheduling problem Mulvey and Ruszczyński
(1995). The problem characteristics of these instances are given
in Table 1.
Due to the extremely large number of scenarios, solving these
instances with all the scenarios is not practical and a sampling approach was used. Only random subsets of scenarios of sizes 1000,
2000 and 3000 were considered. In addition, sample sizes of
5000 and 10,000 were used for STORM since it had the lowest
improvement in computation time. Five replications were made
for each sample size with the samples independently drawn. Both
the static and dynamic cut aggregation methods were applied to
the instances and the minimum, maximum and average CPU time,
and the number of major iterations for different levels of cut aggregation recorded. All the experiments were conducted on an Intel
Pentium 4 3.2 GHz processor with 512 MB of memory running
ILOG CPLEX 9.0 LP solver (ILOG, 2003). The algorithms were implemented in C++ using Microsoft. NET Visual Studio.
We designed three computational experiments to investigate
(1) adaptive cut aggregation with a limit on the number of aggregates, (2) adaptive cut aggregation based on the number of redundant optimality cuts in the master program, and (3) static
aggregation where the level of cut aggregation is given. In the static approach, the cut aggregation level was specified a priori and
kept constant throughout the algorithm. This experiment was conducted for different cut aggregation levels to study the effect on
computational time and provide a comparison for the adaptive
multicut method.
For each experiment we report the computational results as
plots and put all the numerical results in tables in the Appendix.
In the tables the column ‘Sample Size’ indicates the total number
of scenarios; ‘Num Aggs’ gives the number of aggregates; ‘CPU
Time’ is the computation time in seconds; ‘Min Iters’, ‘Aver Iters’
and ‘Max Iters’ are the minimum, average and maximum major
iterations (i.e. the number of times the master problem is solved);
‘% CPU Reduction’ is the percentage reduction in CPU time over the
single cut L-shaped method.
4.1. Experiment 1: Adaptive multicut aggregation
This experiment aimed at studying the performance of the
adaptive multicut approach where the level of aggregation is
dynamically varied during computation. As pointed out earlier,
with adaptive aggregation one has to set a limit on the size of
the aggregate to avoid all cuts being aggregated into one aggregate
prematurely. In our implementation we used a parameter agg-max,
which specified the maximum number of atomic cuts that could be
aggregated into one to prevent premature total aggregation (single
cut L-shaped). We initiated the aggregation level at 1000 aggregates (this is pure multicut for instances with sample size 1000).
We arbitrarily varied agg-max from 10 to 100 in increments of
10, and up to 1000 in increments of 100. For instances with 3000
scenarios, values of agg-max greater than 300 resulted in insufficient computer memory and are thus not shown in the plots. We
arbitrarily fixed the redundancy threshold at d ¼ 0:9 with a ‘‘warm
up” period of 10 iterations for all the instances.
The computational results showing how CPU time depends on
aggregate size for 20TERM and SSN are summarized in the plots
in Figs. 1 and 2. The results show that in general the number of major iterations increases with agg-max. However, the best CPU time
is achieved at some level of agg-max between 10 and 100. The best
CPU time is better than that of the single cut and the pure multicut
methods. In this case there is over 60% reduction in CPU time over
the single cut L-shaped method for 20TERM and over 90% for SSN.
The results for STORM showed no significant improvement in computation time over the single method. In this case the best aggregation was no aggregation.
4.2. Experiment 2: Adaptive multicut aggregation with varying
threshold d
In this experiment we varied the redundancy threshold d in addition to varying agg-max. The parameter d allows for aggregating
403
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Table 5
Adaptive multicut aggregation with redundancy threshold d.
Problem name
Agg-max
CPU Time (seconds)
Aver iters
% CPU reduction
0.5
d
0.6
0.7
0.5
d
0.6
0.7
d
0.5
0.6
0.7
20term
20term
20term
20term
20term
20term
20term
20term
20term
20term
10
20
30
40
50
60
70
80
90
100
407.01
450.09
522.31
593.87
512.02
652.42
630.61
618.35
551.51
723.23
402.15
434.86
543.72
596.54
518.50
629.03
634.00
630.86
588.42
708.96
384.90
470.50
490.48
564.78
612.98
620.88
603.24
619.25
666.14
738.51
308.4
386.6
461.8
538.8
511.8
606.6
602.6
614.8
582.0
706.8
304.6
380.2
469.2
539.8
514.4
591.0
603.2
622.2
602.2
694.4
297.4
398.8
452.0
515.8
565.8
584.2
590.6
608.2
650.2
702.6
68.11
64.74
59.08
53.47
59.88
48.88
50.59
51.55
56.79
43.34
68.49
65.93
57.40
53.26
59.38
50.72
50.33
50.57
53.90
44.45
69.84
63.14
61.57
55.75
51.97
51.36
52.74
51.48
47.81
42.14
ssn
ssn
ssn
ssn
ssn
ssn
ssn
ssn
ssn
ssn
10
20
30
40
50
60
70
80
90
100
78.09
106.49
132.61
171.78
202.26
237.44
267.75
301.38
315.24
354.43
78.63
107.55
134.48
171.77
202.95
235.90
264.22
298.50
323.30
361.52
78.75
107.27
135.52
172.50
202.36
235.76
262.49
298.74
318.39
358.93
59.8
98.6
132.4
180.2
218.6
262.2
296.4
339.0
360.2
412.4
60.4
99.4
134.4
180.4
218.8
259.4
293.8
336.2
366.8
420.2
134.4
128.0
189.8
197.0
206.8
223.4
225.0
236.2
266.4
402.8
95.83
94.31
92.91
90.82
89.19
87.31
85.69
83.90
83.16
81.06
95.80
94.25
92.81
90.82
89.16
87.39
85.88
84.05
82.72
80.68
95.79
94.27
92.76
90.78
89.19
87.40
85.97
84.04
82.99
80.82
storm
storm
storm
storm
storm
storm
storm
storm
storm
storm
10
20
30
40
50
60
70
80
90
100
11.76
10.90
10.81
11.25
11.20
11.72
11.81
11.76
11.64
11.94
10.87
10.54
10.34
10.32
10.30
10.32
10.29
10.30
10.20
10.29
11.15
10.67
10.57
10.47
10.46
10.48
10.41
10.47
10.49
10.42
21.4
19.2
19.0
21.0
20.8
22.8
23.0
22.8
22.2
23.2
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.0
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
18.4
92.16
78.10
76.63
83.82
83.01
91.50
92.97
92.16
90.20
95.10
77.61
72.22
68.95
68.63
68.30
68.63
68.14
68.30
66.67
68.14
82.19
74.35
72.71
71.08
70.92
71.24
70.10
71.08
71.41
70.26
Table 6
Static multicut aggregation for 20TERM.
Sample size
Num aggs
Min iters
Aver iters
Max iters
1000
1000
1000
1000
1000
1000
1000
1000
1000
1
5
10
25
50
100
200
500
1000
CPU seconds
1221.17
955.19
724.08
605.16
465.67
423.27
345.65
639.72
1681.18
1161
988
631
515
385
311
191
149
127
1304.2
1003.0
734.4
569.0
411.4
320.2
221.4
160.2
138.8
1400
1023
789
608
431
327
244
167
146
% CPU reduction
0.00
21.78
40.71
50.44
61.87
65.34
71.69
47.61
37.67
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
1
5
10
25
50
100
200
500
1000
2000
2168.89
1804.45
1687.90
1294.14
1018.93
844.92
781.94
1037.75
2351.56
12065.45
1082
830
666
611
448
350
241
182
139
114
1232.6
976.2
853.6
643.0
483.8
372.8
283.6
194.4
154.0
126.8
1480
1054
919
684
523
388
302
201
166
136
0.00
16.80
22.18
40.33
53.02
61.04
63.95
52.15
8.42
456.30
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
1
5
10
25
50
100
200
500
1000
2000
4018.32
2765.13
2456.81
2004.51
1702.02
1370.96
1277.18
1525.36
3728.50
14920.00
1184
896
781
615
516
332
261
206
172
141
1426.2
1007.0
878.6
682.8
547.2
419.8
327.0
228.0
180.5
147.8
1672
1176
950
764
574
474
360
248
195
150
0.00
31.19
38.86
50.12
57.64
65.88
68.22
62.04
7.21
271.30
404
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Table 7
Static multicut aggregation for SSN.
Sample size
Num aggs
CPU seconds
Min iters
Aver iters
Max iters
% CPU reduction
1000
1000
1000
1000
1000
1000
1000
1000
1000
1
5
10
25
50
100
200
500
1000
1969.52
634.58
346.65
167.07
96.41
68.21
56.99
58.43
76.77
2569
785
403
180
99
62
43
29
24
2767.0
811.4
429.0
189.8
102.6
63.6
43.8
30.2
25.2
2961
850
456
197
109
65
45
31
27
0.00
67.78
82.40
91.52
95.10
96.54
97.11
97.03
96.10
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
1
5
10
25
50
100
200
500
1000
2000
4004.00
1414.22
840.35
415.28
246.87
172.25
139.63
132.74
149.73
229.89
2690
912
504
240
135
84
57
37
29
24
2856.0
939.0
536.6
247.0
137.0
85.0
58.0
38.0
29.0
24.6
2980
968
551
258
139
87
59
40
29
26
0.00
64.68
79.01
89.63
93.83
95.70
96.51
96.68
96.26
94.26
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
1
5
10
25
50
100
200
500
1000
2000
3000
6109.21
2349.93
1418.19
719.81
425.63
289.04
229.88
206.18
239.42
361.58
542.95
2693
983
572
280
158
98
67
41
31
26
24
2935.0
1059.6
616.2
292.6
162.2
100.0
67.2
42.4
32.8
26.8
24.4
3236
1118
648
302
167
103
68
44
34
27
25
0.00
61.53
76.79
88.22
93.03
95.27
96.24
96.62
96.08
94.08
91.11
cuts in the master program whose ratio of the number of iterations
when the cut is not tight (redundant) to the total number of iterations since the cut was generated exceeds d. The aim of this experiment was to study the performance of the method relative to the
redundancy threshold d. We arbitrarily chose values of d between
0.5 and 1.0 with a fixed ‘warm up’ period of 10 iterations. Figs. 3
and 4 show results for values of delta 0.5, 0.6 and 0.7. The results
seem to indicate that redundancy threshold is not an important
factor in determining the CPU time for the adaptive multicut
method.
4.3. Experiment 3: Static multicut aggregation
The aim of the third experiment was to study the performance
of static partial cut aggregation relative to the single cut and the
pure multicut methods. We arbitrarily set the level of aggregation
a priori and fixed it throughout the algorithm. Thus the sample
space was partitioned during the initialization step of the algorithm and the number of cuts added to master problem at each
iteration step remained fixed. We varied the aggregation level from
‘no aggregation’ to ‘total aggregation’. Under ‘no aggregation’ the
number of aggregates is equal to the sample size, which is the pure
multicut method. We only performed pure multicut for sample
size 1000 due to the requirement of large amount of memory for
the larger samples. ‘Total aggregation’ corresponds to the single
cut L-shaped method. All static cut aggregations were arbitrarily
made in a round–robin manner, that is, for a fixed number of
aggregates m, cuts 1; m þ 1; 2m þ 1; . . ., were aggregated to aggregate 1, cuts 2; m þ 2; 2m þ 2; . . ., were aggregated to aggregate 2,
and so on.
The computational results are summarized in the plots in
Figs. 5 and 6. In the plots ‘% reduction in CPU time’ is relative
to the single cut L-shaped method, which acted as the bench-
mark. The results in Fig. 5 show that a level of aggregation between single cut and pure multicut for 20TERM, results in
about 60% reduction in CPU time over the L-shaped for all the
instances. For SSN (Fig. 6) percent improvements over 90% are
achieved. The results for STORM (Fig. 7) showed improvements
of just over 10%, which provides a bound on the performance
of the adaptive multicut method, which showed no aggregation
being the best level of aggregation for this instance. The results
also show that increasing the number of aggregates decreases
the number of iterations, but increases the size and time required to solve the problem.
In Fig. 6 (SSN) the percentage reduction in CPU time increases
with the number of aggregates and then tapers off after 200 aggregates. On the contrary, in Fig. 7 (STORM) we see that the percentage reduction in CPU time first remains steady up to about 200
aggregates, and then decreases steadily. This contrasting behavior
can be attributed to the different shapes of the expected recourse
function in each instance. For example, based on our computational experimentation, STORM seems to have a fairly ‘flat’ expected recourse function around the optimal solution and thus
does not benefit much from a large number of aggregates. On the
contrary, the expected recourse function in SSN is not as ‘flat’
around the optimal solution and thus benefits from a large number
of aggregates (more information).
The computational results with static multicut aggregation amply demonstrate that using an aggregation level that is somewhere
between the single cut and the pure multicut L-shaped methods
results in better CPU time. From a practical point of view, however,
a good level of cut aggregation is not known a priori. Therefore, the
adaptive multicut method provides a viable approach. Furthermore, this method is relatively easier to implement and provides
similar performance as the trust-region method of Linderoth and
Wright (2003).
405
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
Table 8
Static multicut aggregation for STORM.
Sample size
Num aggs
CPU seconds
Min iters
Aver iters
Max iters
% CPU reduction
1000
1000
1000
1000
1000
1000
1000
1000
1000
1
5
10
25
50
100
200
500
1000
6.54
6.50
5.84
5.72
6.52
6.09
6.69
8.63
16.43
20
20
18
17
20
19
18
17
18
21.8
22.0
20.2
19.4
21.4
19.0
19.2
17.0
18.4
24
24
22
21
23
19
22
17
19
0.00
0.61
10.71
12.50
0.28
6.85
2.33
31.87
151.03
2000
2000
2000
2000
2000
2000
2000
2000
2000
1
5
10
25
50
100
200
500
1000
13.26
13.73
12.87
12.31
12.91
12.16
12.42
18.04
21.11
20
22
18
18
18
19
19
19
17
22.2
23.0
22.4
21.2
21.0
20.0
19.0
20.6
17.0
24
24
24
23
23
22
19
21
17
0.00
3.54
2.92
7.16
2.61
8.25
6.31
36.05
59.24
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
3000
1
5
10
25
50
100
200
500
1000
2000
3000
20.24
18.28
18.69
17.97
19.76
20.75
18.27
21.68
32.45
56.82
91.86
20
18
19
18
18
20
19
17
19
18
18
22.0
21.8
21.8
20.8
22.0
22.2
19.4
18.6
19.8
18.8
18.8
24
24
24
24
24
24
21
19
20
19
19
0.00
9.68
7.62
11.18
2.38
2.52
9.70
7.13
60.32
180.74
353.83
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
1
5
10
25
50
100
200
500
1000
2000
5000
35.35
33.13
33.68
32.20
29.78
32.49
33.90
33.80
47.47
78.36
229.00
23
18
20
19
19
19
19
19
19
18
18
23.2
22.6
23.0
22.0
20.8
21.0
21.8
19.0
20.6
19.6
18.6
24
24
24
24
23
23
23
19
22
21
19
0.00
6.28
4.74
8.92
15.76
8.09
4.08
4.41
34.27
121.63
547.67
10,000
10,000
10,000
10,000
10,000
10,000
10,000
10,000
10,000
10,000
1
5
10
25
50
100
200
500
1000
2000
70.55
70.35
70.67
71.60
69.74
51.11
64.06
68.81
74.46
114.06
23
23
22
23
23
18
19
19
19
20
23.2
23.2
23.4
23.8
23.0
18.8
20.6
20.2
19.0
20.8
24
24
24
24
23
19
22
22
19
21
0.00
0.28
0.17
1.48
1.15
27.55
9.20
2.47
5.55
61.67
5. Conclusion
Traditionally, the single cut and multicut L-shaped methods are
the algorithms of choice for stochastic linear programs. In this paper, we introduce an adaptive multicut method that generalizes
the single cut and multicut methods. The method dynamically adjusts the aggregation level of the optimality cuts in the master program. A generic framework for building different implementations
with different aggregation schemes was developed and implemented. The computational results based on large-scale instances
from the literature reveal that the proposed method performs significantly faster than standard techniques. The adaptive multicut
method has potential to be applied to decomposition-based stochastic linear programming algorithms. These include, for example, scalable sampling methods such as the sample average
approximation method (Shapiro, 2003), the trust-region method
(Linderoth and Wright, 2003), and the stochastic decomposition
method (Higle and Sen, 1991, 1996, 2009). Future work along this
line of work include incorporating both cut aggregation and disaggregation in the adaptive multicut approach, and extending it to
integer first stage and multistage stochastic linear programs.
Acknowledgments
The authors are grateful to the anonymous referees for their
useful comments concerning a previous version of the paper. This
research was supported in part by grants CNS-0540000, CMII0355433 and CMII-0620780 from the National Science Foundation.
Appendix A. Numerical results
See Tables 2–8.
406
S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406
References
Benders, J.F., 1962. Partitioning procedures for solving mixed-variable
programming problems. Numerische Mathematik 54, 238–252.
Birge, J.R., 1985. Aggregation bounds in stochastic linear programming.
Mathematical Programming 31, 25–41.
Birge, J.R., Louveaux, F.V., 1988. A multicut algorithm for two-stage stochastic linear
programs. European Journal of Operational Research 34, 384–392.
Birge, J.R., Louveaux, F.V., 1997. Introduction to Stochastic Programming. Springer,
New York.
Gassmann, H.I., 1990. MSLiP: A computer code for the multistage stochastic linear
programming problem. Mathematical Programming 47, 407–423.
Higle, J.L., Sen, S., 1991. Stochastic decomposition: An algorithm for two-stage
stochastic linear programs with recourse. Mathematics of Operational Research
16, 650–669.
Higle, J.L. Zhao, L. 2009, Adaptive and nonadaptive samples in solving stochastic
linear programs: A computational investigation. Under second review.
%3chttp://edoc.hu-berlin.de/series/speps/2005-12/PDF/12.pdf%3e.
Higle, J., Sen, S., 1996. Stochastic Decomposition: A Statistical Method for Large
Scale Stochastic Linear Programming. Kluwer Academic Publisher, Dordrecht.
ILOG, I.: 2003, CPLEX 9.0 Reference Manual, ILOG CPLEX Division.
Infanger, G., 1992. Monte Carlo (importance) sampling within a Benders
decomposition algorithm for stochastic linear programs 39 (1), 69–95.
Linderoth, J.T., Wright, S.J., 2003. Decomposition algorithms for stochastic
programming on a computational grid. Computational Optimization and
Applications 24, 207–250.
Mak, W.K., Morton, D.P., Wood, R.K., 1999. Monte Carlo bounding techniques for
determining solution quality in stochastic programs. Operations Research
Letters 24, 47–56.
Mulvey, J.M., Ruszczyński, A., 1995. A new scenario decomposition method for large
scale stochastic optimization. Operations Research 43, 477–490.
Ruszczyński, A., 1988. A regularized decomposition for minimizing a sum of
polyhedral functions. Mathematical Programming 35, 309–333.
Sen, S., Doverspike, R.D., Cosares, S., 1994. Network planning with random demand.
Telecommunications Systems 3, 11–30.
Shapiro, A., 2003. Monte Carlo sampling methods. In: Ruszczyński, A., Shapiro, A.
(Eds.), Stochastic Programming, Handbooks in Operations Research and
Management Science, vol. 10. North-Holland Publishing Company,
Amsterdam, The Netherlands, pp. 353–425.
Van Slyke, R., Wets, R.J.-B., 1969. L-shaped linear programs with application to
optimal control and stochastic programming. SIAM Journal on Applied
Mathematics 17, 638–663.
Download