European Journal of Operational Research 206 (2010) 395–406 Contents lists available at ScienceDirect European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor Stochastics and Statistics Adaptive multicut aggregation for two-stage stochastic linear programs with recourse Svyatoslav Trukhanov a, Lewis Ntaimo a,*, Andrew Schaefer b a b Department of Industrial and Systems Engineering, Texas A&M University, 3131 TAMU, College Station, TX 77843, USA Department of Industrial Engineering, 1048 Benedum Hall, University of Pittsburgh, Pittsburgh, PA 15261, USA a r t i c l e i n f o Article history: Received 21 January 2009 Accepted 18 February 2010 Available online 21 February 2010 Keywords: Stochastic programming Multicut aggregation Adaptive cuts a b s t r a c t Outer linearization methods for two-stage stochastic linear programs with recourse, such as the L-shaped algorithm, generally apply a single optimality cut on the nonlinear objective at each major iteration, while the multicut version of the algorithm allows for several cuts to be placed at once. In general, the L-shaped algorithm tends to have more major iterations than the multicut algorithm. However, the trade-offs in terms of computational time are problem dependent. This paper investigates the computational trade-offs of adjusting the level of optimality cut aggregation from single cut to pure multicut. Specifically, an adaptive multicut algorithm that dynamically adjusts the aggregation level of the optimality cuts in the master program, is presented and tested on standard large-scale instances from the literature. Computational results reveal that a cut aggregation level that is between the single cut and the multicut can result in substantial computational savings over the single cut method. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction Outer linearization methods such as the L-shaped algorithm (Van Slyke and Wets, 1969) for stochastic linear programming (SLP) problems with recourse generally apply a single cut on the nonlinear objective at each major iteration, while the multicut version of the algorithm (Birge and Louveaux, 1988) allows for cuts up to the number of outcomes (scenarios) to be placed at once. In general, the L-shaped algorithm tends to have more major iterations than the multicut algorithm. However, the trade-offs in terms of computational time may be problem dependent. This paper generalizes these two approaches through an adaptive multicut algorithm that dynamically adjusts the level of aggregation of the optimality cuts in the master program during the course of the algorithm. Results of our computational investigation suggest that for certain classes of problems the adaptive multicut provides better performance than the traditional single cut and multicut methods. The relative advantages of the single cut and multicut methods have already been explored (Birge and Louveaux, 1988, 1990). Citing earlier work (Birge and Louveaux, 1988, 1990), Birge and Louveaux (1997) gave the rule of thumb that the multicut method is preferable when the number of scenarios is not much larger than the size of the first stage decision dimension space. Birge (1985) used variable aggregation to reduce the problem size and to find upper and lower bounds on the optimal solution. Ruszczyński * Corresponding author. E-mail addresses: slavik@tamu.edu (S. Trukhanov), ntaimo@tamu.edu (L. Ntaimo), schaefer@ie.pitt.edu (A. Schaefer). 0377-2217/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2010.02.025 (1988) considered adding a regularized term to the objective function to stabilize the master problem and reported computational results showing that the method runs efficiently despite solving a quadratic problem instead of a linear one. The number of iterations that required extensive computations decreased for the instances considered. More recent work related to modified versions of the L-shaped method examined serial and asynchronous versions of the Lshaped method and a trust-region method (Linderoth and Wright, 2003). Algorithms were implemented on a grid computing platform that allowed authors to solve large-scale problem instances from the literature. These include the SSN (Sen et al., 1994) with 105 scenarios and STORM (Mulvey and Ruszczyński, 1995) with 107 scenarios. The authors reported CPU time improvements of up to 83.5% for instances of SSN with 104 scenarios. The contributions of this paper include an adaptive optimality multicut method for SLP, an implementation of the algorithm in C++ using the CPLEX Callable Library (ILOG, 2003) and a computational investigation with instances from the literature demonstrating that changing the level of optimality cut aggregation in the master problem during the course of the algorithm can lead to significant improvement in solution time. The results of this work also have potential to influence the design and implementation of future algorithms, especially those based on Benders decomposition (Benders, 1962). For example, decomposition methods based on the sample average approximation method (Shapiro, 2003), the trust-region method (Linderoth and Wright, 2003) and the stochastic decomposition method (Higle and Sen, 1991), may benefit 396 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Table 1 Problem characteristics. Problem First stage Second stage Rows Cols Rows No. of random variables No. of scenarios Cols 20TERM 3 63 124 764 40 1:09 1012 SSN 1 89 175 706 86 1:02 1070 59 128 526 1259 8 3:91 105 Problem: 20TERM 70 60 50 % reduction in CPU time 40 30 20 10 0 Sample size: 1000 Sample size: 2000 Sample size: 3000 −10 −20 −30 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 agg−max Fig. 1. Adaptive multicut aggregation % CPU improvement over regular L-shaped method for 20TERM. Problem: SSN 100 80 60 % reduction in CPU time STORM 40 20 Sample size: 1000 Sample size: 2000 Sample size: 3000 0 −20 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 agg−max Fig. 2. Adaptive multicut aggregation % CPU improvement over regular L-shaped method for SSN. 397 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Problem: 20TERM 70 60 % reduction in CPU time 50 40 δ=0.5 δ=0.6 δ=0.7 30 20 10 0 −10 −20 −30 10 20 30 40 50 60 70 80 90 100 agg−max Fig. 3. Adaptive multicut aggregation with d % CPU improvement over regular L-shaped method for 20TERM. Problem: SSN 80 δ=0.5 δ=0.6 δ=0.7 % reduction in CPU time 60 40 20 0 −20 10 20 30 40 50 60 70 80 90 100 agg−max Fig. 4. Adaptive multicut aggregation with d % CPU improvement over regular L-shaped method for SSN. from the adaptive multicut approach. Furthermore, the adaptive optimality multicut method we propose is relatively easier to implement than the trust-region method and provides significant improvement in computational time for the instance classes we considered. The rest of this paper is organized as follows. The next section gives the problem statement and a motivation for the adaptive multicut approach. A formal description of the method is given in Section 3. Computational results are reported in Section 4. The paper ends with some concluding remarks in Section 5. 2. An adaptive multicut approach A two-stage stochastic linear program with fixed recourse in the extensive form can be given as follows: Min c> x þ X ps q>s ys ð1aÞ s2S s:t: Ax ¼ b T s x þ Wys P r s ; 8s 2 S x P 0; ys P 0; 8s 2 S ð1bÞ ð1cÞ ð1dÞ 398 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Problem: 20TERM 80 % reduction in CPU time 60 40 20 0 Sample size: 1000 Sample size: 2000 Sample size: 3000 −20 −40 1 5 10 25 50 100 200 500 1000 Number of aggregates Fig. 5. Static multicut aggregation % CPU improvement over regular L-shaped method for 20TERM. Problem: SSN 100 % reduction in CPU time 80 Sample size: 1000 Sample size: 2000 Sample size: 3000 60 40 20 0 −20 1 5 10 25 50 100 200 500 1000 Number of aggregates Fig. 6. Static multicut aggregation % CPU improvement over regular L-shaped method for SSN. where x 2 Rn1 is the vector of first stage decision variables with the corresponding cost vector c 2 Rn1 ; A 2 Rm1 n1 is the first stage constraint matrix; and b 2 Rm1 is the first stage right-hand side vector. Additionally, S is the set of scenarios (outcomes), and for each scenario s 2 S; ys 2 Rn2 is the vector of second stage decision variables; ps is the probability of occurrence for scenario s; qs 2 Rn2 is the second stage cost vector; rs 2 Rm2 is the second stage right-hand side vector; T s 2 Rm2 n1 is the technology matrix; and W 2 Rm2 n2 is the recourse matrix. To set the ground for the adaptive multicut method, we first make some observations from a comparison of the single cut L-shaped method and the multicut method in the context of problem (1d). In general, the single cut L-shaped method tends to have ‘information loss’ due to the aggregation of all the scenario dual information into one optimality cut in the master problem at each major iteration of the method. On the other hand, the multicut method uses all scenario dual information and places an optimality cut for each scenario in the master problem. Thus the L-shaped method generally requires more major iterations than the multicut method. However, on iteration index k the size of the master problem in the multicut method grows much faster and is ðm1 þ kjSjÞ ðn1 þ jSjÞ in size in the worst case, as compared to ðm1 þ kÞ ðn1 þ 1Þ in the L-shaped method. Consequently, the multicut method is expected to have increased solution time when jSj is very large. 399 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Problem: STORM 100 % reduction in CPU time 0 −100 −200 −300 Sample size: 3000 Sample size: 5000 Sample size: 10000 −400 −500 −600 1 5 10 25 50 100 200 500 1000 2000 3000/5000 Number of aggregates Fig. 7. Static multicut aggregation % CPU improvement over regular L-shaped method for STORM. Table 2 Adaptive multicut aggregation for 20TERM. Sample size Agg-max CPU seconds Min iters Aver iters Max iters % CPU reduction 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 403.02 395.43 523.26 569.79 586.45 567.58 651.05 662.77 694.13 669.08 840.73 921.68 1111.70 1272.22 281 311 397 499 499 482 567 585 632 528 775 890 1071 983 301.8 359.8 460.8 521.2 545.4 551.4 616.0 631.2 658.2 658.4 852.4 914.2 1139.6 1282.6 316 396 491 533 581 603 710 664 693 757 914 954 1267 1491 67.00 67.62 57.15 53.34 51.98 53.52 46.69 45.73 43.16 45.21 31.15 24.53 8.96 4.18 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 803.99 854.15 955.18 947.60 1143.19 1156.10 1247.86 1279.83 1360.74 1352.84 1551.35 1734.43 2073.32 2240.14 262 307 409 407 512 525 546 501 589 615 671 761 949 1159 285.6 365.8 429.8 449.0 520.0 544.6 583.4 604.4 638.4 639.8 773.0 852.8 1014.2 1224.6 300 393 461 483 535 560 652 669 668 674 888 947 1067 1298 62.93 60.62 55.96 56.31 47.29 46.70 42.47 40.99 37.26 37.63 28.47 20.03 4.41 3.28 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 10 20 30 40 50 60 70 80 90 100 200 300 1350.50 1211.21 1415.92 1469.87 1643.93 1682.43 1869.92 1590.16 1768.80 1881.62 1852.93 2418.09 244 305 391 400 446 448 533 480 511 530 600 836 259.8 327.2 404.8 443.2 476.8 496.0 556.4 513.8 556.4 592.4 632.6 845.4 281 362 423 472 504 523 574 561 591 626 667 856 66.39 69.86 64.76 63.42 59.09 58.13 53.47 60.43 55.98 53.17 53.89 39.82 400 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Our goal is to devise an algorithm that performs better than the single cut L-shaped and the multicut methods. The insights to our approach are to (1) use more information from subproblems, which assumes adding more than one cut at each major iteration; and (2) keep the size of master problem relatively small, which requires to limit the number of cuts added to master problem. These insights lead us to consider a multicut method with ‘partial’ cut aggregation. This means that such an algorithm requires partitioning the sample space S into D aggregates fSd gDd¼1 such that S ¼ S1 [ S2 [ [ SD and Si \ Sj ¼ ; 8i – j. Then at the initialization step of the algorithm, optimality cut variables hd are introduced, one for each aggregate. The optimality cuts are also generated one per aggregate in the form ðbkd Þ> x þ hd P akd , where ðbkd Þ> ¼ P P k > k k > s2Sd ps ðps Þ T s and ad ¼ s2Sd ps ðps Þ r s . The multicut and the single cut L-shaped methods have two extremes of optimality cut aggregation. The multicut L-shaped method has no cut aggregation, which corresponds to the case when D ¼ jSj and Sd ¼ fsd g; d ¼ 1; . . . ; jSj. The expected recourse function value is approximated by having an optimality decision variable for each scenario in the master program. On the contrary, the single cut L-shaped method has the highest level of cut aggregation, which means D ¼ 1 and S1 ¼ S. The expected recourse function value is approximated by only one optimality decision variable in the master program. According to our numerical results, the optimal runtime is achieved on some middle level of aggregation 1 < D < jSj but this level is not known a priori. For two out of the three classes of instances we tested, the proposed adaptive multicut method outperformed both the pure single cut and the multicut method. Our approach allows for the optimality cut aggregation to be done dynamically during the course of the algorithm. Thus the master program will have ‘adaptive’ optimality cut variables, that is, the number of optimality cut variables will change during the course of the algorithm. The goal is to let the algorithm learn more information about the expected recourse function and then settle for a level of aggregation that leads to faster convergence to the optimal solution. Therefore, we propose initializing the algorithm with a ‘low’ level of cut aggregation and increasing it, if necessary, as more information about the expected recourse function is learned during the course of the algorithm. As the level of cut aggregation increases, the number of optimality cut variables in the master problem decreases. If jSj is not ‘very large’ one could initialize the algorithm as pure multicut, where an optimality cut variable hs is created for each s 2 S in the master program. Otherwise, an initial level of cut aggregation between 1 and jSj should be chosen. Computer speed and memory will certainly influence the level of cut aggregation since a low level of cut aggregation would generally require more computer memory. where constraints (2c) and (2d) are the feasibility and optimality cuts, respectively. We are now in a position to formally state the adaptive multicut algorithm. Basic Adaptive Multicut Algorithm Step 0: Initialization. Set k 0; Fð0Þ ;, and initialize Sð0Þ. Step 1: Solve the Master Problem. Set k k þ 1 and solve problem (2). Let ðxk ; fhkd gd2SðkÞ Þ be an optimal solution to problem (2). If no constraint (2d) is present for some d 2 SðkÞ; hkd is set equal to 1 and is ignored in the computation. Step 2: Update Cut Aggregation Level. Step 2a Cut Aggregation. Generate aggregate SðkÞ using Sðk 1Þ based on some aggregation scheme. Each element of SðkÞ is a union of some elements from Sðk 1Þ. If, according to the aggregation scheme, d1 ; d2 ; . . . ; dj 2 Sðk 1Þ are aggregated into P S d 2 SðkÞ, then d ¼ ji¼1 di and pd ¼ ji¼1 pdi . Master problem (2) will be modified by removing variables hd1 ; . . . ; hdj and introducing a new one hd . Step 2b Update Optimality Cuts. Update the optimality cut coefficients in the master program (2) as follows: For each iteration that optimality cuts were added define atd ¼ pd j X ð1=pdl Þatdl 8d 2 SðkÞ; t 2 FðkÞ; ð3Þ ð1=pdl Þbtdl 8d 2 SðkÞ; t 2 FðkÞ: ð4Þ l¼1 and btd ¼ pd j X l¼1 For all iterations t 2 FðkÞ replace cuts corresponding to d1 ; . . . ; dj with one new cut ðbtd Þ> x þ hd P atd corresponding to d. Step 3: Solve Scenario Subproblems. For all s 2 S solve: Min q>s y ð5aÞ s:t: Wy P r s T s xk ; y P 0: ð5bÞ ð5cÞ Let pks be the dual multipliers associated with an optimal solution of problem (5c). For each d 2 SðkÞ if hkd < pd X pks ðrs T s xk Þ; ð6Þ s2d define akd ¼ X ps ðpks Þ> rs ; ð7Þ s2Sd 3. The basic adaptive multicut algorithm and To formalize our approach, at the kth iteration let the scenario set partition be SðkÞ ¼ fS1 ; S2 ; . . . ; Slk g, where the sets Si ; i ¼ 1; . . . ; lk are Slk Si ¼ S. Therefore, each scedisjoint, that is, Si \ Sj ¼ ; 8i – j and i¼1 nario s 2 S belongs to only one Si 2 SðkÞ and the aggregate probabilP Plk P pSi ¼ s2S ps ¼ 1. Let ity of Si is defined as pSi ¼ s2Si ps with i¼1 FðkÞ denote the set of iteration numbers up to k where all subproblems are feasible and optimality cuts are generated. Then the master program at major X iteration k takes the form: ðbkd Þ> ¼ Min c> x þ hd ; ð2aÞ d2SðkÞ s:t: Ax ¼ b; ð2bÞ X ps ðpks Þ> T s : If condition (6) does not hold for any d 2 SðkÞ, stop xk is optimal. Otherwise, if for all s 2 S subproblem (5c) is optimal, put FðkÞ ¼ Fðk 1Þ [ fkg and go to Step 1. If problem (5c) is infeasible for some s 2 S, let lk be the associated dual extreme ray and define ak ¼ l>k rs ; ð2cÞ ðbk Þ> ¼ l>k T s : ðbtd Þ> x ð2dÞ Go to Step 1. x P 0; ð2eÞ ð9Þ and ðbt Þ> x P at ; t 2 f1; . . . ; kg n FðkÞ; þ hd P atd ; t 2 FðkÞ; d 2 SðkÞ; ð8Þ s2Sd ð10Þ 401 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Notice that in Basic Adaptive Multicut algorithm optimality cuts are only generated and added to the master program when all scenario subproblems (5c) are feasible. Otherwise, when an infeasible subproblem is encountered, a feasibility cut based on that subproblem is generated and added to the master program. An alternative implementation is to find all infeasible subproblems and generate a feasibility cut from each one to add to the master program. Also, optimality cuts can be generated for all d 2 SðkÞ such that all subproblems s 2 d are feasible. Since convergence of Basic Adaptive Multicut algorithm follows directly from that of the multicut method, a formal proof of convergence is omitted. An issue that remains to be addressed is the aggregation scheme that one can use to obtain SðkÞ from Sðk 1Þ. There are several possibilities one can think of when it comes to aggregation schemes, but our experience showed that most do not perform well. We tried several schemes and through computational experimentation devised a scheme that worked the best. This aggregation scheme involves two basic rules, redundancy threshold and bound on the number of aggregates, and can described as follows: Aggregation Scheme: Redundancy Threshold ðdÞ. This rule is based on the observation that inactive (redundant) optimality cuts in the master program contain ‘little’ information about the optimal solution and can therefore, be aggregated without information loss. Consider some iteration k after solving the master problem for some aggregate d 2 SðkÞ. Let fd be the number of iterations when all optimality cuts corresponding to d are redundant, that is, the corresponding dual multipliers are zero. Also let 0 < d < 1 be a given threshold. Then all d such that fd =jFðkÞj > d are combined to form one aggregate. This means that the optimality cuts obtained from aggregate d are often redundant at least d of the time. This criterion works fine if the number of iterations is large. Otherwise, it is necessary to impose a ‘warm up’ period during which no aggregation is made so as to let the algorithm ‘learn’ information about the expected recourse function. Bound on the Number of Aggregates. As a supplement to the redundancy threshold, a bound on the minimum number of aggregates jSðkÞj can be imposed. This is necessary to prevent aggregating all scenarios prematurely, leading to the single cut method. Similarly, a bound on the maximum number of aggregates can also be imposed to curtail the proliferation of optimality cuts in the master program. These bounds have to be set a priori and kept constant throughout the algorithm run. We should point out that even though in Redundancy Threshold we require for each d 2 SðkÞ all the optimality cuts to be redundant, one can consider an implementation where only a fraction of the optimality cuts are required to be redundant. Another aggregation scheme we tried is to aggregate scenarios that yield ‘‘similar” optimality cuts, that is, cuts with similar gradients (orientation). We implemented this scheme but did not observe significant gains in computation time due to the additional computational burden required for determining optimality cuts that are similar. This task involves comparing thousands of cut coefficients for thousands of all cuts generated in the course of the algorithm run. We also considered deletion of ‘‘old” cuts from the master program, that is, cuts that were added many iterations earlier and remained redundant for some time. This seemed to reduce memory requirements but did not speed-up the master program solves. Remember that deletion of hundreds of cuts at a time from the master program can also be a time consuming process. Table 3 Adaptive multicut aggregation for SSN. Sample size Aggmax CPU seconds Min iters Aver iters Max iters % CPU reduction 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 81.23 104.93 135.92 169.87 202.05 238.27 258.44 290.35 318.80 351.89 662.72 800.38 1295.09 2006.02 60 93 131 169 212 257 276 313 341 375 751 892 1554 2611 61.8 96.4 135.8 177.6 218.6 260.6 287.8 328.6 362.6 408.8 789.2 965.0 1646.4 2806.8 64 104 139 189 229 267 303 347 376 447 851 1021 1749 3062 95.88 94.67 93.10 91.37 89.74 87.90 86.88 85.26 83.81 82.13 66.35 59.36 34.24 1.85 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 150.05 180.33 216.47 254.05 292.41 332.53 376.76 422.19 451.73 501.13 903.76 1140.26 1653.71 2551.07 54 80 101 129 150 173 199 232 250 276 514 685 1064 1714 55.4 81.6 106.2 131.8 156.4 181.6 209.4 236.8 257.0 288.6 533.0 714.8 1111.2 1772.6 57 83 113 133 160 193 221 241 267 314 548 736 1176 1854 96.25 95.50 94.59 93.65 92.70 91.70 90.59 89.46 88.72 87.48 77.43 71.52 58.70 36.29 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 10 20 30 40 50 60 70 80 90 100 200 300 214.87 249.74 294.30 339.46 377.86 449.29 479.50 507.89 561.22 621.87 998.69 1412.60 48 70 93 112 129 157 170 187 207 223 383 558 49.6 72.0 95.8 116.0 132.8 162.6 176.6 189.8 213.0 236.4 398.0 613.4 51 73 98 119 142 170 182 195 217 250 424 650 96.48 95.91 95.18 94.44 93.81 92.65 92.15 91.69 90.81 89.82 83.65 76.88 An opposite approach to cut aggregation is cut disaggregation, that is, partitioning an aggregate into two or more aggregates. This allows for more cut information about the recourse function to be available in the master program. However, the major drawback to this scheme is that all cut information has to be kept in memory (or at least written to file) in order to partition any aggregate. We tried this scheme and did not obtain any promising results based on the instances we used. This scheme requires careful book-keeping of all the generated cuts. Consequently, in our implementation this resulted in a computer memory intensive scheme. We encountered several memory and algorithm speed-up issues related to (1) the storage of cut information (e.g. cut coefficients, right-hand sides, iteration index, and scenario index); (2) retrieval and aggregation of cut information; (3) deletion of cut information from the master program; and (4) increasing the master program size. These issues resulted in a significantly slow algorithm, defeating the benefits of the adaptive approach. Nevertheless, we believe that cut disaggregation still needs further research, especially with an advanced implementation of the cut disaggregation scheme to alleviated the above-listed computational issues. Finally, we should point out that even though we did not experiment with sampling-based aggregation approaches, one can still incorporate such approaches into the proposed framework. For 402 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Table 4 Adaptive multicut aggregation for STORM. Sample size Aggmax CPU seconds Min iters Aver iters Max iters % CPU reduction 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 21.62 21.07 21.27 21.20 21.28 20.82 20.87 21.08 20.65 20.87 20.79 20.61 20.65 20.61 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 19 19 19 19 19 19 19 19 19 19 19 19 19 19 230.28 221.82 224.88 223.85 225.09 217.95 218.72 221.97 215.40 218.75 217.62 214.81 215.39 214.75 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 21.56 21.47 21.36 21.39 21.43 21.61 21.43 21.39 21.42 21.41 21.41 21.37 21.39 21.41 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17 17 17 17 17 17 17 17 17 17 17 17 17 17 62.53 61.85 61.08 61.26 61.57 62.91 61.57 61.29 61.50 61.45 61.40 61.14 61.31 61.45 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 48.23 48.57 48.86 48.22 47.46 48.24 48.12 47.62 48.40 46.97 46.90 46.09 46.06 46.11 19 19 19 19 19 19 19 19 20 19 20 20 20 20 19.8 19.8 19.8 19.8 19.8 19.8 19.8 19.8 20.0 19.8 20.0 20.0 20.0 20.0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 138.26 139.98 141.37 138.25 134.47 138.31 137.72 135.28 139.11 132.06 131.69 127.71 127.57 127.82 example, one can aggregate scenarios based on importance sampling (Infanger, 1992). Another approach would be to use an accelerated cutting plane method such as regularized decomposition (Ruszczyński, 1988) in conjunction with the proposed cut aggregation scheme. 4. Computational results We implemented the adaptive multicut algorithm in C++ using the CPLEX Callable Library (ILOG, 2003) and conducted several computational experiments to gain insight into the performance of the algorithm. We tested the algorithm on several large-scale instances from the literature. The three well-known SLP instances with fixed recourse, 20TERM, SSN, and truncated version of STORM were used for the study. 20TERM models a motor freight carrier’s operations Mak et al. (1999), while SSN is a telephone switching network expansion planning problem Sen et al. (1994). STORM is a two period freight scheduling problem Mulvey and Ruszczyński (1995). The problem characteristics of these instances are given in Table 1. Due to the extremely large number of scenarios, solving these instances with all the scenarios is not practical and a sampling approach was used. Only random subsets of scenarios of sizes 1000, 2000 and 3000 were considered. In addition, sample sizes of 5000 and 10,000 were used for STORM since it had the lowest improvement in computation time. Five replications were made for each sample size with the samples independently drawn. Both the static and dynamic cut aggregation methods were applied to the instances and the minimum, maximum and average CPU time, and the number of major iterations for different levels of cut aggregation recorded. All the experiments were conducted on an Intel Pentium 4 3.2 GHz processor with 512 MB of memory running ILOG CPLEX 9.0 LP solver (ILOG, 2003). The algorithms were implemented in C++ using Microsoft. NET Visual Studio. We designed three computational experiments to investigate (1) adaptive cut aggregation with a limit on the number of aggregates, (2) adaptive cut aggregation based on the number of redundant optimality cuts in the master program, and (3) static aggregation where the level of cut aggregation is given. In the static approach, the cut aggregation level was specified a priori and kept constant throughout the algorithm. This experiment was conducted for different cut aggregation levels to study the effect on computational time and provide a comparison for the adaptive multicut method. For each experiment we report the computational results as plots and put all the numerical results in tables in the Appendix. In the tables the column ‘Sample Size’ indicates the total number of scenarios; ‘Num Aggs’ gives the number of aggregates; ‘CPU Time’ is the computation time in seconds; ‘Min Iters’, ‘Aver Iters’ and ‘Max Iters’ are the minimum, average and maximum major iterations (i.e. the number of times the master problem is solved); ‘% CPU Reduction’ is the percentage reduction in CPU time over the single cut L-shaped method. 4.1. Experiment 1: Adaptive multicut aggregation This experiment aimed at studying the performance of the adaptive multicut approach where the level of aggregation is dynamically varied during computation. As pointed out earlier, with adaptive aggregation one has to set a limit on the size of the aggregate to avoid all cuts being aggregated into one aggregate prematurely. In our implementation we used a parameter agg-max, which specified the maximum number of atomic cuts that could be aggregated into one to prevent premature total aggregation (single cut L-shaped). We initiated the aggregation level at 1000 aggregates (this is pure multicut for instances with sample size 1000). We arbitrarily varied agg-max from 10 to 100 in increments of 10, and up to 1000 in increments of 100. For instances with 3000 scenarios, values of agg-max greater than 300 resulted in insufficient computer memory and are thus not shown in the plots. We arbitrarily fixed the redundancy threshold at d ¼ 0:9 with a ‘‘warm up” period of 10 iterations for all the instances. The computational results showing how CPU time depends on aggregate size for 20TERM and SSN are summarized in the plots in Figs. 1 and 2. The results show that in general the number of major iterations increases with agg-max. However, the best CPU time is achieved at some level of agg-max between 10 and 100. The best CPU time is better than that of the single cut and the pure multicut methods. In this case there is over 60% reduction in CPU time over the single cut L-shaped method for 20TERM and over 90% for SSN. The results for STORM showed no significant improvement in computation time over the single method. In this case the best aggregation was no aggregation. 4.2. Experiment 2: Adaptive multicut aggregation with varying threshold d In this experiment we varied the redundancy threshold d in addition to varying agg-max. The parameter d allows for aggregating 403 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Table 5 Adaptive multicut aggregation with redundancy threshold d. Problem name Agg-max CPU Time (seconds) Aver iters % CPU reduction 0.5 d 0.6 0.7 0.5 d 0.6 0.7 d 0.5 0.6 0.7 20term 20term 20term 20term 20term 20term 20term 20term 20term 20term 10 20 30 40 50 60 70 80 90 100 407.01 450.09 522.31 593.87 512.02 652.42 630.61 618.35 551.51 723.23 402.15 434.86 543.72 596.54 518.50 629.03 634.00 630.86 588.42 708.96 384.90 470.50 490.48 564.78 612.98 620.88 603.24 619.25 666.14 738.51 308.4 386.6 461.8 538.8 511.8 606.6 602.6 614.8 582.0 706.8 304.6 380.2 469.2 539.8 514.4 591.0 603.2 622.2 602.2 694.4 297.4 398.8 452.0 515.8 565.8 584.2 590.6 608.2 650.2 702.6 68.11 64.74 59.08 53.47 59.88 48.88 50.59 51.55 56.79 43.34 68.49 65.93 57.40 53.26 59.38 50.72 50.33 50.57 53.90 44.45 69.84 63.14 61.57 55.75 51.97 51.36 52.74 51.48 47.81 42.14 ssn ssn ssn ssn ssn ssn ssn ssn ssn ssn 10 20 30 40 50 60 70 80 90 100 78.09 106.49 132.61 171.78 202.26 237.44 267.75 301.38 315.24 354.43 78.63 107.55 134.48 171.77 202.95 235.90 264.22 298.50 323.30 361.52 78.75 107.27 135.52 172.50 202.36 235.76 262.49 298.74 318.39 358.93 59.8 98.6 132.4 180.2 218.6 262.2 296.4 339.0 360.2 412.4 60.4 99.4 134.4 180.4 218.8 259.4 293.8 336.2 366.8 420.2 134.4 128.0 189.8 197.0 206.8 223.4 225.0 236.2 266.4 402.8 95.83 94.31 92.91 90.82 89.19 87.31 85.69 83.90 83.16 81.06 95.80 94.25 92.81 90.82 89.16 87.39 85.88 84.05 82.72 80.68 95.79 94.27 92.76 90.78 89.19 87.40 85.97 84.04 82.99 80.82 storm storm storm storm storm storm storm storm storm storm 10 20 30 40 50 60 70 80 90 100 11.76 10.90 10.81 11.25 11.20 11.72 11.81 11.76 11.64 11.94 10.87 10.54 10.34 10.32 10.30 10.32 10.29 10.30 10.20 10.29 11.15 10.67 10.57 10.47 10.46 10.48 10.41 10.47 10.49 10.42 21.4 19.2 19.0 21.0 20.8 22.8 23.0 22.8 22.2 23.2 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.0 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 92.16 78.10 76.63 83.82 83.01 91.50 92.97 92.16 90.20 95.10 77.61 72.22 68.95 68.63 68.30 68.63 68.14 68.30 66.67 68.14 82.19 74.35 72.71 71.08 70.92 71.24 70.10 71.08 71.41 70.26 Table 6 Static multicut aggregation for 20TERM. Sample size Num aggs Min iters Aver iters Max iters 1000 1000 1000 1000 1000 1000 1000 1000 1000 1 5 10 25 50 100 200 500 1000 CPU seconds 1221.17 955.19 724.08 605.16 465.67 423.27 345.65 639.72 1681.18 1161 988 631 515 385 311 191 149 127 1304.2 1003.0 734.4 569.0 411.4 320.2 221.4 160.2 138.8 1400 1023 789 608 431 327 244 167 146 % CPU reduction 0.00 21.78 40.71 50.44 61.87 65.34 71.69 47.61 37.67 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 1 5 10 25 50 100 200 500 1000 2000 2168.89 1804.45 1687.90 1294.14 1018.93 844.92 781.94 1037.75 2351.56 12065.45 1082 830 666 611 448 350 241 182 139 114 1232.6 976.2 853.6 643.0 483.8 372.8 283.6 194.4 154.0 126.8 1480 1054 919 684 523 388 302 201 166 136 0.00 16.80 22.18 40.33 53.02 61.04 63.95 52.15 8.42 456.30 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 1 5 10 25 50 100 200 500 1000 2000 4018.32 2765.13 2456.81 2004.51 1702.02 1370.96 1277.18 1525.36 3728.50 14920.00 1184 896 781 615 516 332 261 206 172 141 1426.2 1007.0 878.6 682.8 547.2 419.8 327.0 228.0 180.5 147.8 1672 1176 950 764 574 474 360 248 195 150 0.00 31.19 38.86 50.12 57.64 65.88 68.22 62.04 7.21 271.30 404 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Table 7 Static multicut aggregation for SSN. Sample size Num aggs CPU seconds Min iters Aver iters Max iters % CPU reduction 1000 1000 1000 1000 1000 1000 1000 1000 1000 1 5 10 25 50 100 200 500 1000 1969.52 634.58 346.65 167.07 96.41 68.21 56.99 58.43 76.77 2569 785 403 180 99 62 43 29 24 2767.0 811.4 429.0 189.8 102.6 63.6 43.8 30.2 25.2 2961 850 456 197 109 65 45 31 27 0.00 67.78 82.40 91.52 95.10 96.54 97.11 97.03 96.10 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 1 5 10 25 50 100 200 500 1000 2000 4004.00 1414.22 840.35 415.28 246.87 172.25 139.63 132.74 149.73 229.89 2690 912 504 240 135 84 57 37 29 24 2856.0 939.0 536.6 247.0 137.0 85.0 58.0 38.0 29.0 24.6 2980 968 551 258 139 87 59 40 29 26 0.00 64.68 79.01 89.63 93.83 95.70 96.51 96.68 96.26 94.26 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 1 5 10 25 50 100 200 500 1000 2000 3000 6109.21 2349.93 1418.19 719.81 425.63 289.04 229.88 206.18 239.42 361.58 542.95 2693 983 572 280 158 98 67 41 31 26 24 2935.0 1059.6 616.2 292.6 162.2 100.0 67.2 42.4 32.8 26.8 24.4 3236 1118 648 302 167 103 68 44 34 27 25 0.00 61.53 76.79 88.22 93.03 95.27 96.24 96.62 96.08 94.08 91.11 cuts in the master program whose ratio of the number of iterations when the cut is not tight (redundant) to the total number of iterations since the cut was generated exceeds d. The aim of this experiment was to study the performance of the method relative to the redundancy threshold d. We arbitrarily chose values of d between 0.5 and 1.0 with a fixed ‘warm up’ period of 10 iterations. Figs. 3 and 4 show results for values of delta 0.5, 0.6 and 0.7. The results seem to indicate that redundancy threshold is not an important factor in determining the CPU time for the adaptive multicut method. 4.3. Experiment 3: Static multicut aggregation The aim of the third experiment was to study the performance of static partial cut aggregation relative to the single cut and the pure multicut methods. We arbitrarily set the level of aggregation a priori and fixed it throughout the algorithm. Thus the sample space was partitioned during the initialization step of the algorithm and the number of cuts added to master problem at each iteration step remained fixed. We varied the aggregation level from ‘no aggregation’ to ‘total aggregation’. Under ‘no aggregation’ the number of aggregates is equal to the sample size, which is the pure multicut method. We only performed pure multicut for sample size 1000 due to the requirement of large amount of memory for the larger samples. ‘Total aggregation’ corresponds to the single cut L-shaped method. All static cut aggregations were arbitrarily made in a round–robin manner, that is, for a fixed number of aggregates m, cuts 1; m þ 1; 2m þ 1; . . ., were aggregated to aggregate 1, cuts 2; m þ 2; 2m þ 2; . . ., were aggregated to aggregate 2, and so on. The computational results are summarized in the plots in Figs. 5 and 6. In the plots ‘% reduction in CPU time’ is relative to the single cut L-shaped method, which acted as the bench- mark. The results in Fig. 5 show that a level of aggregation between single cut and pure multicut for 20TERM, results in about 60% reduction in CPU time over the L-shaped for all the instances. For SSN (Fig. 6) percent improvements over 90% are achieved. The results for STORM (Fig. 7) showed improvements of just over 10%, which provides a bound on the performance of the adaptive multicut method, which showed no aggregation being the best level of aggregation for this instance. The results also show that increasing the number of aggregates decreases the number of iterations, but increases the size and time required to solve the problem. In Fig. 6 (SSN) the percentage reduction in CPU time increases with the number of aggregates and then tapers off after 200 aggregates. On the contrary, in Fig. 7 (STORM) we see that the percentage reduction in CPU time first remains steady up to about 200 aggregates, and then decreases steadily. This contrasting behavior can be attributed to the different shapes of the expected recourse function in each instance. For example, based on our computational experimentation, STORM seems to have a fairly ‘flat’ expected recourse function around the optimal solution and thus does not benefit much from a large number of aggregates. On the contrary, the expected recourse function in SSN is not as ‘flat’ around the optimal solution and thus benefits from a large number of aggregates (more information). The computational results with static multicut aggregation amply demonstrate that using an aggregation level that is somewhere between the single cut and the pure multicut L-shaped methods results in better CPU time. From a practical point of view, however, a good level of cut aggregation is not known a priori. Therefore, the adaptive multicut method provides a viable approach. Furthermore, this method is relatively easier to implement and provides similar performance as the trust-region method of Linderoth and Wright (2003). 405 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 Table 8 Static multicut aggregation for STORM. Sample size Num aggs CPU seconds Min iters Aver iters Max iters % CPU reduction 1000 1000 1000 1000 1000 1000 1000 1000 1000 1 5 10 25 50 100 200 500 1000 6.54 6.50 5.84 5.72 6.52 6.09 6.69 8.63 16.43 20 20 18 17 20 19 18 17 18 21.8 22.0 20.2 19.4 21.4 19.0 19.2 17.0 18.4 24 24 22 21 23 19 22 17 19 0.00 0.61 10.71 12.50 0.28 6.85 2.33 31.87 151.03 2000 2000 2000 2000 2000 2000 2000 2000 2000 1 5 10 25 50 100 200 500 1000 13.26 13.73 12.87 12.31 12.91 12.16 12.42 18.04 21.11 20 22 18 18 18 19 19 19 17 22.2 23.0 22.4 21.2 21.0 20.0 19.0 20.6 17.0 24 24 24 23 23 22 19 21 17 0.00 3.54 2.92 7.16 2.61 8.25 6.31 36.05 59.24 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 1 5 10 25 50 100 200 500 1000 2000 3000 20.24 18.28 18.69 17.97 19.76 20.75 18.27 21.68 32.45 56.82 91.86 20 18 19 18 18 20 19 17 19 18 18 22.0 21.8 21.8 20.8 22.0 22.2 19.4 18.6 19.8 18.8 18.8 24 24 24 24 24 24 21 19 20 19 19 0.00 9.68 7.62 11.18 2.38 2.52 9.70 7.13 60.32 180.74 353.83 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 1 5 10 25 50 100 200 500 1000 2000 5000 35.35 33.13 33.68 32.20 29.78 32.49 33.90 33.80 47.47 78.36 229.00 23 18 20 19 19 19 19 19 19 18 18 23.2 22.6 23.0 22.0 20.8 21.0 21.8 19.0 20.6 19.6 18.6 24 24 24 24 23 23 23 19 22 21 19 0.00 6.28 4.74 8.92 15.76 8.09 4.08 4.41 34.27 121.63 547.67 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 1 5 10 25 50 100 200 500 1000 2000 70.55 70.35 70.67 71.60 69.74 51.11 64.06 68.81 74.46 114.06 23 23 22 23 23 18 19 19 19 20 23.2 23.2 23.4 23.8 23.0 18.8 20.6 20.2 19.0 20.8 24 24 24 24 23 19 22 22 19 21 0.00 0.28 0.17 1.48 1.15 27.55 9.20 2.47 5.55 61.67 5. Conclusion Traditionally, the single cut and multicut L-shaped methods are the algorithms of choice for stochastic linear programs. In this paper, we introduce an adaptive multicut method that generalizes the single cut and multicut methods. The method dynamically adjusts the aggregation level of the optimality cuts in the master program. A generic framework for building different implementations with different aggregation schemes was developed and implemented. The computational results based on large-scale instances from the literature reveal that the proposed method performs significantly faster than standard techniques. The adaptive multicut method has potential to be applied to decomposition-based stochastic linear programming algorithms. These include, for example, scalable sampling methods such as the sample average approximation method (Shapiro, 2003), the trust-region method (Linderoth and Wright, 2003), and the stochastic decomposition method (Higle and Sen, 1991, 1996, 2009). Future work along this line of work include incorporating both cut aggregation and disaggregation in the adaptive multicut approach, and extending it to integer first stage and multistage stochastic linear programs. Acknowledgments The authors are grateful to the anonymous referees for their useful comments concerning a previous version of the paper. This research was supported in part by grants CNS-0540000, CMII0355433 and CMII-0620780 from the National Science Foundation. Appendix A. Numerical results See Tables 2–8. 406 S. Trukhanov et al. / European Journal of Operational Research 206 (2010) 395–406 References Benders, J.F., 1962. Partitioning procedures for solving mixed-variable programming problems. Numerische Mathematik 54, 238–252. Birge, J.R., 1985. Aggregation bounds in stochastic linear programming. Mathematical Programming 31, 25–41. Birge, J.R., Louveaux, F.V., 1988. A multicut algorithm for two-stage stochastic linear programs. European Journal of Operational Research 34, 384–392. Birge, J.R., Louveaux, F.V., 1997. Introduction to Stochastic Programming. Springer, New York. Gassmann, H.I., 1990. MSLiP: A computer code for the multistage stochastic linear programming problem. Mathematical Programming 47, 407–423. Higle, J.L., Sen, S., 1991. Stochastic decomposition: An algorithm for two-stage stochastic linear programs with recourse. Mathematics of Operational Research 16, 650–669. Higle, J.L. Zhao, L. 2009, Adaptive and nonadaptive samples in solving stochastic linear programs: A computational investigation. Under second review. %3chttp://edoc.hu-berlin.de/series/speps/2005-12/PDF/12.pdf%3e. Higle, J., Sen, S., 1996. Stochastic Decomposition: A Statistical Method for Large Scale Stochastic Linear Programming. Kluwer Academic Publisher, Dordrecht. ILOG, I.: 2003, CPLEX 9.0 Reference Manual, ILOG CPLEX Division. Infanger, G., 1992. Monte Carlo (importance) sampling within a Benders decomposition algorithm for stochastic linear programs 39 (1), 69–95. Linderoth, J.T., Wright, S.J., 2003. Decomposition algorithms for stochastic programming on a computational grid. Computational Optimization and Applications 24, 207–250. Mak, W.K., Morton, D.P., Wood, R.K., 1999. Monte Carlo bounding techniques for determining solution quality in stochastic programs. Operations Research Letters 24, 47–56. Mulvey, J.M., Ruszczyński, A., 1995. A new scenario decomposition method for large scale stochastic optimization. Operations Research 43, 477–490. Ruszczyński, A., 1988. A regularized decomposition for minimizing a sum of polyhedral functions. Mathematical Programming 35, 309–333. Sen, S., Doverspike, R.D., Cosares, S., 1994. Network planning with random demand. Telecommunications Systems 3, 11–30. Shapiro, A., 2003. Monte Carlo sampling methods. In: Ruszczyński, A., Shapiro, A. (Eds.), Stochastic Programming, Handbooks in Operations Research and Management Science, vol. 10. North-Holland Publishing Company, Amsterdam, The Netherlands, pp. 353–425. Van Slyke, R., Wets, R.J.-B., 1969. L-shaped linear programs with application to optimal control and stochastic programming. SIAM Journal on Applied Mathematics 17, 638–663.