On Adaptive Multicut Aggregation for Two-Stage Stochastic Linear Programs with Recourse Svyatoslav Trukhanov and Lewis Ntaimo1 Department of Industrial and Systems Engineering, Texas A&M University, 3131 TAMU, College Station, TX 77843, USA, slavik@tamu.edu and ntaimo@tamu.edu Andrew Schaefer Department of Industrial Engineering, 1048 Benedum Hall, University of Pittsburgh, Pittsburgh, PA 15261, schaefer@ie.pitt.edu Abstract Outer linearization methods for two-stage stochastic linear programs with recourse, such as the L-shaped algorithm, generally apply a single optimality cut on the nonlinear objective at each major iteration, while the multicut version of the algorithm allows for several cuts to be placed at once. In general, the Lshaped algorithm tends to have more major iterations than the multicut algorithm. However, the tradeoffs in terms of computational time is problem dependent. This paper investigates the computational trade-offs of adjusting the level of optimality cut aggregation from single cut to pure multicut. Specifically, an adaptive multicut algorithm that dynamically adjusts the aggregation level of the optimality cuts in the master program, is presented and tested on standard large scale instances from the literature. The computational results reveal that a cut aggregation level that is between the single cut and the multicut results in computational times of up to over 50% reduction compared to the single cut method. Keywords: stochastic programming, multicut aggregation, adaptive cuts 1 Introduction Outer linearization methods such as the L-shaped algorithm (Van Slyke and Wets, 1969) for stochastic linear programming (SLP) problems with recourse generally apply a single cut on the nonlinear objective at each major iteration, while the multicut version of the algorithm (Birge, 1988) allows for cuts up to the number of outcomes (scenarios) to be placed at once. In general, the L-shaped algorithm tends to have more major iterations than the multicut algorithm. However, the trade-offs in terms of computational time may be problem dependent. This paper generalizes these two approaches through an adaptive multicut algorithm that dynamically adjusts the level of aggregation of the optimality cuts in the master program during the course of the algorithm. Results of our computational investigation suggest that for certain classes of problems the adaptive multicut provides better performance than the traditional single cut and multicut methods. The relative advantages of the single cut and multicut methods have already been explored (Birge, 1988, Gassmann, 1990). Citing earlier work (Birge, 1988, Gassmann, 1 Corresponding Author 1 1990), Birge and Louveaux (1997) gave the rule of thumb that the multicut method is preferable when the number of scenarios is not much larger than the size of the first stage decision dimension space. In earlier work, Birge (1985) used variable aggregation to reduce the problem size and to find upper and lower bounds on the optimal solution. Ruszczynski (1988) considered adding a regularized term to the objective function to stabilize the master problem and reported computational results showing that the method runs efficiently despite solving a quadratic problem instead of a linear one. The number of iterations that required extensive computations decreased twice for the instances considered. More recent work related to modified versions of the L-shaped method examined serial and asynchronous versions of the L-shaped method and a trust-region method (Linderoth and Wright, 2003). Algorithms were implemented on a grid computing platform that allowed authors to solve large scale problem instances from the literature. These include the SSN (Sen et al., 1994) with 105 scenarios and STORM (Mulvey and Ruszczynski, 1992) with 107 scenarios. The authors reported CPU time improvements of up to 83.5% for instances of SSN with 104 scenarios. The main contributions of this paper are twofold: 1) an adaptive optimality multicut algorithm for SLP, and 2) computational investigation with standard instances from the literature demonstrating that changing the level of optimality cut aggregation in the master problem during the course of the algorithm can lead to significant improvement in solution time. The results of this work also have potential to influence the design and implementation of future algorithms, especially those based on Benders decomposition (Benders, 1962). For example, decomposition methods based on the the sample average approximation method (Shapiro, 2000), the trust-region method (Linderoth and Wright, 2003) and the stochastic decomposition method (Higle and Sen, 1991), may benefit from the adaptive multicut approach. The rest of this paper is organized as follows. The next section gives the problem statement and a motivation for the adaptive multicut approach. A formal description of the method is given in Section 3. Computational results are reported in Section 4. The paper ends with some concluding remarks in Section 5. 2 An Adaptive Multicut Approach A two-stage stochastic linear program with fixed recourse in the extensive form can be given as follows: X Min c> x + ps qs> ys (1a) s∈S s.t. Ax = b Ts x + W ys ≥ rs , ∀s ∈ S x ≥ 0, ys ≥ 0, ∀s ∈ S (1b) (1c) (1d) where x ∈ Rn1 is the vector of first stage decision variables with the corresponding cost vector c ∈ Rn1 ; A ∈ Rm1 ×n1 is the first stage constraint matrix; and b ∈ Rm1 is the 2 first stage righthand-side vector. Additionally, S is the set of scenarios (outcomes), and for each scenario s ∈ S, ys ∈ Rn2 is the vector of second stage decision variables; ps is the probability of occurrence for scenario s; qs ∈ Rn2 is the second stage cost vector; rs ∈ Rm2 is the second stage righthand-side vector; Ts ∈ Rm2 ×n1 is the technology matrix; and W ∈ Rm2 ×n2 is the recourse matrix. To set the ground for the adaptive multicut method, we first make some observations from a comparison of the single cut L-shaped method and the multicut method in the context of problem (1). In general, the single cut L-shaped method tend to have ‘information loss’ due to the aggregation of all the scenario dual information into one optimality cut in the master problem at each major iteration of the method. On the other hand, the multicut method uses all scenario dual information and places an optimality cut for each scenario in the master problem. Thus the L-shaped method generally requires more major iterations than the multicut method. However, on iteration index k the size of the master problem in the multicut method grows much faster and is (m1 + k|S|) × (n1 + |S|) in size in the worst case, as compared to (m1 + k) × (n1 + 1) in the L-shaped method. Consequently, the multicut method is expected to have increased solution time when |S| is very large. Our goal is to devise an algorithm that performs better than the single cut L-shaped and the multicut methods. The insights to our approach are to 1) use more information from subproblems, which assumes adding more than one cut at each major iteration; and 2) keep the size of master problem relatively small, which requires to limit the number of cuts added to master problem. These insights lead us to consider a multicut method with ‘partial’ cut aggregation. This means that such an algorithm requires partitioning the sample space S into D aggregates {Sd }D d=1 such that S = S1 ∪ S2 ∪ · · · SD and Si ∩ Sj = ∅ ∀i 6= j. Then at the initialization step of the algorithm, optimality cut variables θd are introduced, one for each aggregate. The optimality P cuts are also generated one per aggregate in the form (βdk )> x + θd ≥ αdk , where (βdk )> = ps (πsk )> Ts s∈Sd P and αdk = ps (πsk )> rs . s∈Sd The multicut and the single cut L-shaped methods have two extremes of optimality cut aggregation. The multicut L-shaped method has no cut aggregation, which corresponds to the case when D = |S| and Sd = {sd }, d = 1, · · · , |S|. The expected recourse function value is approximated by having an optimality decision variable for each scenario in the master program. On the contrary, the single cut L-shaped method has the highest level of cut aggregation, which means D = 1 and S1 = S. The expected recourse function value is approximated by only one optimality decision variable in the master program. According to our numerical results, the optimal runtime is achieved on some middle level of aggregation 1 < D < |S| but this level is not known a priori. Our approach allows for the optimality cut aggregation to be done dynamically during the course of the algorithm. Thus the master program will have ‘adaptive’ optimality variables, that is, the number of optimality variables will change during the course of the algorithm. The goal is to let the algorithm learn more information about the expected recourse function and then settle for a level of aggregation that leads to faster 3 convergence to the optimal solution. Therefore, we propose initializing the algorithm with a ‘low’ level of cut aggregation and increasing it, if necessary, as more information about the expected recourse function is learned during the course of the algorithm. As the level of cut aggregation increases, the number of optimality variables in the master problem decreases. If |S| is not ‘very large’ one could initialize the algorithm as pure multicut, where an optimality variable θs is created for each s ∈ S in the master program. Otherwise, an initial level of cut aggregation between 1 and |S| should be chosen. Computer speed and memory will certainly influence the level of cut aggregation since a low level of cut aggregation would generally require more computer memory. 3 The Basic Adaptive Multicut Algorithm To formalize our approach, at the k-th iteration let the scenario set partition be S(k) = {S1 , S2 , · · · , Slk }, where the sets Si , i = 1, · · · , lk are disjoint, that is, Si ∩ Sj = ∅ ∀i 6= j k and ∪li=1 Si = S. Therefore, each scenario s ∈ S belongs to only one Si ∈ S(k) and the lk P P P aggregate probability of Si is defined as pSi = s∈Si ps with pSi = ps = 1. Let i=1 s∈S F(k) denote the set of iteration numbers up to k where all subproblems are feasible and optimality cuts are generated. Then the master program at major iteration k takes the form: X θd , (2a) Min c> x + d∈S(k) s.t. Ax = b t > (2b) t (β ) x ≥ α , t ∈ {1, · · · , k} \ F (k) (2c) (βdt )> x (2d) (2e) + θd ≥ x≥0 αdt , t ∈ F(k), d ∈ S(k) where constraints (2c) and (2d) are the feasibility and optimality cuts, respectively. We are now in a position to formally state the adaptive multicut algorithm. Basic Adaptive Multicut Algorithm Step 0: Initialization. Set k ← 0, F(0) ← ∅, and initialize S(0). Step 1: Solve the Master Problem. Set k ← k + 1 and solve problem 2. Let (xk , {θdk }d∈S(k) ) be an optimal solution to problem (2). If no constraint (2d) is present for some d ∈ S(k), θdk is set equal to −∞ and is ignored in the computation. 4 Step 2: Update Cut Aggregation Level. Step 2a Cut Aggregation. Generate aggregate S(k) using S(k − 1) based on some aggregation rules. Each element of S(k) is a union of some elements from S(k−1). If according to aggregation j S rules d1 , d2 , · · · , dj ∈ S(k − 1) are aggregated into d ∈ S(k), then d = di and pd = j P i=1 i=1 pdi . Master problem (2) will be modified by removing variables θd1 , · · · , θdj and introducing the new one θd . Step 2b Update Optimality Cuts. Update the optimality cut coefficients in the master program (2) as follows. For each iteration that optimality cuts were added define αdt j X (1/pdl )αdt l ∀d ∈ S(k), t ∈ F(k) (3) j X = pd (1/pdl )βdt l ∀d ∈ S(k), t ∈ F (k). (4) = pd l=1 and βdt l=1 For all iterations t ∈ F (k) replace cuts corresponding to d1 , · · · , dj with one new cut (βdt )> x + θd ≥ αdt corresponding to d. Step 3: Solve Scenario Subproblems. For all s ∈ S solve: Min qs> y (5a) k s.t. W y ≥ rs − Ts x , y ≥ 0. (5b) (5c) Let πsk be the dual multipliers associated with an optimal solution of problem (5). For each d ∈ S(k) if X θdk < pd πsk (rs − Ts xk ) (6) s∈d define αdk = X ps (πsk )> rs (7) s∈Sd and (βdk )> = X ps (πsk )> Ts (8) s∈Sd If condition (6) does not hold for any d ∈ S(k), stop xk is optimal. Otherwise put F(k) = F(k − 1) ∪ {k} and go to Step 1. 5 If problem (5) is infeasible for some s ∈ S, let µk be the associated dual extreme ray and define α k = µ> (9) k rs , and (β k )> = µ> k Ts (10) Go to Step 1. Note that in the last step of the algorithm, if some scenario subproblem is infeasible, another implementation would be to find all infeasible subproblems and generate feasibility cut from each one to add to the master program. Since convergence of the above algorithm follows directly from that of the multicut method, a formal proof of convergence is omitted. An issue that remains to be addressed is the aggregation rule that one can use to obtain S(k) from S(k−1). Here we simply provide a basic aggregation rule that we implemented in our computational study. Aggregation Rule: • Redundancy Threshold δ. This rule is based on the observation that inactive (redundant) optimality cuts in the master program contain ‘little’ information about the optimal solution and can therefore be aggregated without information loss. Consider some iteration k after solving the master problem for some aggregate d ∈ S(k). Let fd be the number of iterations when optimality cuts corresponding to d are redundant, that is, the corresponding slack variables are nonzero. Also let 0 < δ < 1 be a given threshold. Then all d such that fd /|F(k)| > δ are combined to form one aggregate. This means that the optimality cuts obtained from aggregate d are often redundant at least δ of the time. This criterion works fine if the number of iterations is large. Otherwise, it is necessary to impose a ‘warm up’ period during which no aggregation is made so as to let the algorithm ‘learn’ information about the expected recourse function. • Bound on the Number of Aggregates. As a supplement to the redundancy threshold, a bound on the minimum number of aggregates |S(k)| can be imposed. This prevents aggregating all scenarios prematurely, leading to the single cut L-shaped method. Similarly, a bound on the maximum number of the aggregates can also be imposed to curtail the proliferation of huge numbers of cuts in the master program. These bounds have to be set a priori and kept constant throughout the algorithm run. Another viable aggregation rule is to aggregate optimality cuts with ‘similar’ gradients or orientation. We implemented this rule but did not observe significant gains in computation time due to the additional computational burden required for checking the cut gradients. Also, we considered deletion of ‘old’ cuts from the master program, that is, cuts that were added many iterations earlier and remained redundant for some time. 6 This seemed to reduce memory requirements but did not speed up the master program solves. Another consideration is partitioning an aggregate into two or more aggregates or cut ‘disaggregation’. The major drawback to doing this is that all cuts should be kept in memory (or at least written to a file) in order to partition any aggregate. Thus this may be a computer memory intensive activity requiring careful book-keeping of old cuts for each scenario subproblem. 4 Computational Results We implemented the adaptive multicut algorithm and conducted several computational experiments to gain insight into the performance of the algorithm. We tested the algorithm on several large scale instances from the literature. The three well-known SLP instances with fixed recourse, 20TERM, SSN, and truncated version of STORM were used for the study. 20TERM models a motor freight carrier’s operations Mak et al. (1999), while SSN is a telephone switching network expansion planning problem Sen et al. (1994). STORM is a two period freight scheduling problem Mulvey and Ruszczynski (1992). The problem characteristics of these instances are given in Table 1. Problem 20TERM SSN STORM Table 1: Problem Characteristics. First Stage Second Stage No. of Rows Cols Rows Cols Random Variables 3 63 124 764 40 1 89 175 706 86 59 128 526 1259 8 No. of Scenarios ≈ 1.09 × 1012 ≈ 1.02 × 1070 ≈ 3.91 × 105 Due to extremely large number of scenarios, solving these instances with all the scenarios is not practical and a sampling approach was used. Only random subsets of scenarios of sizes 1000, 2000 and 3000 were considered. In addition, sample sizes of 5000 and 10000 were used for STORM since it had the lowest improvement in computation time. Five replications were made for each sample size with the samples independently drawn. Both the static and dynamic cut aggregation methods were applied to the instances and the minimum, maximum and average CPU time, and the number of major iterations for different levels of cut aggregation recorded. All the experiments were conducted on an Intel Pentium 4 3.2 GHz processor with 512MB of memory running ILOG CPLEX 9.0 LP solver ILOG (2003). The algorithms were implemented in C++ using Microsoft .NET Visual Studio. We designed three computational experiments to investigate 1) adaptive cut aggregation with limit on the number of aggregates, 2) adaptive cut aggregation based on the number of redundant optimality cuts in the master program, and 3) static aggregation where the level of cut aggregation is given. In the static approach, the cut aggregation level was specified a priori and kept constant throughout the algorithm. This experiment was conducted for different cut aggregation levels to study the effect on computational time and provide a comparison for the adaptive multicut method. 7 For each experiment we report the computational results as plots and put all the numerical results in tables in the Appendix. In the tables the column ‘Sample Size’ indicates the total number scenarios; ‘Num Aggs’ gives the number of aggregates; ‘CPU Time’ is the computation time in seconds; ‘Min Iters’, ‘Aver Iters’ and ‘Max Iters’ are the minimum, average and maximum major iterations (i.e. the number of times the master problem is solved); ‘% CPU Reduction’ is the percentage reduction in CPU time over the single cut L-shaped method. 4.1 Experiment 1: Adaptive Multicut Aggregation This experiment aimed at studying the performance of the adaptive multicut approach where the level of aggregation is dynamically varied during computation. As pointed out earlier, with adaptive aggregation one has to set a limit on the size of the aggregate to avoid all cuts being aggregated into one aggregate prematurely. In our implementation we used a parameter agg-max, which specified the maximum number of atomic cuts that could be aggregated into one to prevent premature total aggregation (single cut L-shaped). We initiated the aggregation level at 1000 aggregates (this is pure multicut for instances with sample size 1000). We arbitrarily varied agg-max from 10 to 100 in increments of 10, and up to 1000 in increments of 100. For instances with 3000 scenarios, values of agg-max greater than 300 resulted in insufficient computer memory and are thus not shown in the plots. We arbitrarily fixed the redundancy threshold at δ = 0.9 with a “warm up” period of 10 iterations for all the instances. The computational results showing how CPU time depends on aggregate size are summarized in the plots in Figures 1, 2, and 3. The results for 20TERM and STORM instances show that the number of major iterations increases with agg-max. However, the best CPU time is achieved at some level of agg-max between 10 and 100. The best CPU time is better than that of the single cut and the pure multicut methods. In this case there is over 60% reduction in CPU time over the single cut L-shaped method for 20TERM and over 90% for SSN. The results for STORM, however, show no improvement in computation time over the single method. This can be attributed to ‘flat’ expected recourse objective function for STORM, implying that no significant information is learned by using a multicut approach. In this case the single cut L-shaped method works better than both the adaptive multicut and the pure multicut methods. 4.2 Experiment 2: Adaptive multicut aggregation with Varying Threshold δ In this experiment we varied the redundancy threshold δ in addition to varying aggmax. The parameter δ allows for aggregating cuts in the master program whose ratio of the number of iterations when the cut is not tight (redundant) to the total number of iterations since the cut was generated exceeds δ. The aim of this experiment was to study the performance of the method relative to the redundancy threshold δ. We arbitrarily chose values of δ between 0.5 and 1.0 with a fixed ‘warm up’ period of 10 iterations. 8 Problem: 20TERM 70 60 % reduction in CPU time 50 40 30 20 10 0 Sample size: 1000 Sample size: 2000 Sample size: 3000 −10 −20 −30 10 20 30 40 50 60 70 80 agg−max 90 100 200 300 500 1000 Figure 1: Adaptive multicut aggregation % CPU improvement over regular L-shaped method for 20TERM Problem: SSN 100 % reduction in CPU time 80 60 40 20 Sample size: 1000 Sample size: 2000 Sample size: 3000 0 −20 10 20 30 40 50 60 70 80 agg−max 90 100 200 300 500 1000 Figure 2: Adaptive multicut aggregation % CPU improvement over regular L-shaped method for SSN 9 Problem: STORM −60 −80 Sample size: 1000 Sample size: 2000 Sample size: 3000 % reduction in CPU time −100 −120 −140 −160 −180 −200 −220 −240 10 20 30 40 50 60 70 80 agg−max 90 100 200 300 500 1000 Figure 3: Adaptive multicut aggregation % CPU improvement over regular L-shaped method for STORM Figures 4, 5 and 6 show results for values of delta 0.5, 0.6 and 0.7. The results seem to indicate that redundancy threshold is not an important factor in determining the CPU time for the adaptive multicut method. 4.3 Experiment 3: Static Multicut Aggregation The aim of the third experiment was to study the performance of static partial cut aggregation relative to the single cut and the pure multicut methods. We arbitrarily set the level of aggregation a priori and fixed it throughout the algorithm. Thus the sample space was partitioned during the initialization step of the algorithm and the number of cuts added to master problem at each iteration step remained fixed. We varied the aggregation level from ‘no aggregation’ to ‘total aggregation’. Under ‘no aggregation’ the number of aggregates is equal to the sample size, which is the pure multicut method. We only performed pure multicut for sample size 1000 due to the requirement of large amount of memory for the larger samples. ‘Total aggregation’ corresponds to the single cut L-shaped method. All static cut aggregations were arbitrarily made in a round-robin manner, that is, for a fixed number of aggregates m, cuts 1, m + 1, 2m + 1, · · · , were aggregated to aggregate 1, cuts 2, m + 2, 2m + 2, · · · , were aggregated to aggregate 2, and so on. The computational results are summarized in the plots in Figures 7, 8, and 9. In the plots ‘% reduction in CPU time’ is relative to the single cut L-shaped method, which acted as the benchmark. The results in Figures 7 show that a level of aggregation between single 10 Problem: 20TERM 70 60 % reduction in CPU time 50 40 δ=0.5 δ=0.6 δ=0.7 30 20 10 0 −10 −20 −30 10 20 30 40 50 60 agg−max 70 80 90 100 Figure 4: Adaptive multicut aggregation with δ % CPU improvement over regular Lshaped method for 20TERM Problem: SSN % reduction in CPU time 80 δ=0.5 δ=0.6 δ=0.7 60 40 20 0 −20 10 20 30 40 50 60 agg−max 70 80 90 100 Figure 5: Adaptive multicut aggregation with δ % CPU improvement over regular Lshaped method for SSN 11 Problem: STORM −65 % reduction in CPU time −70 −75 δ=0.5 δ=0.6 δ=0.7 −80 −85 −90 −95 −100 10 20 30 40 50 60 agg−max 70 80 90 100 Figure 6: Adaptive multicut aggregation with δ % CPU improvement over regular Lshaped method for STORM cut and pure multicut for 20TERM, results in about 60% reduction in CPU time over the L-shaped for all the instances. For SSN (Figure 8) percent improvements over 90% are achieved, while for STORM (Figure 9) no significant improvement (just over 10%) is observed. The results also show that increasing the number of aggregates decreases the number of iterations, but increases the size and time required to solve the problem. These results amply demonstrate that using an aggregation level that is somewhere between the single cut and the pure multicut L-shaped methods results in better CPU time. From a practical point of view, however, a good level of cut aggregation is not known a priori. Therefore, the adaptive multicut method provides a viable approach. 12 Problem: 20TERM 80 % reduction in CPU time 60 40 20 0 Sample size: 1000 Sample size: 2000 Sample size: 3000 −20 −40 1 5 10 25 50 100 Number of aggregates 200 500 1000 Figure 7: Static multicut aggregation % CPU improvement over regular L-shaped method for 20TERM Problem: SSN 100 % reduction in CPU time 80 Sample size: 1000 Sample size: 2000 Sample size: 3000 60 40 20 0 −20 1 5 10 25 50 100 Number of aggregates 200 500 1000 Figure 8: Static multicut aggregation % CPU improvement over regular L-shaped method for SSN 13 Problem: STORM 100 % reduction in CPU time 0 −100 −200 −300 Sample size: 3000 Sample size: 5000 Sample size: 10000 −400 −500 −600 1 5 10 25 50 100 200 Number of aggregates 500 1000 2000 3000/5000 Figure 9: Static multicut aggregation % CPU improvement over regular L-shaped method for STORM 14 5 Conclusion Traditionally, the single cut and multicut L-shaped methods are the algorithms of choice for stochastic linear programs. In this paper, we introduce an adaptive multicut method that generalizes the single cut and multicut methods. The method dynamically adjusts the aggregation level of the optimality cuts in the master program. A generic framework for building different implementations with different aggregation rules was developed and implemented. The computational results based on large scale instances from the literature reveal that the proposed method performs significantly faster than standard techniques. The adaptive multicut method has potential to be applied to decompositionbased stochastic linear programming algorithms. These include, for example, scalable sampling methods such as the sample average approximation method (Shapiro, 2000), the trust-region method (Linderoth and Wright, 2003), and the stochastic decomposition method (Higle and Sen, 1991, 1996, Higle and Zhao, 2007). Acknowledgments. This research was supported by grants CNS-0540000, CMII-0355433 and CMII-0620780 from the National Science Foundation. 15 Appendix: Numerical Results Table 2: Adaptive multicut aggregation for 20TERM Sample Size 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 agg-max 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 10 20 30 40 50 60 70 80 90 100 200 300 CPU sec 403.02 395.43 523.26 569.79 586.45 567.58 651.05 662.77 694.13 669.08 840.73 921.68 1111.70 1272.22 803.99 854.15 955.18 947.60 1143.19 1156.10 1247.86 1279.83 1360.74 1352.84 1551.35 1734.43 2073.32 2240.14 1350.50 1211.21 1415.92 1469.87 1643.93 1682.43 1869.92 1590.16 1768.80 1881.62 1852.93 2418.09 Min Iters 281 311 397 499 499 482 567 585 632 528 775 890 1071 983 262 307 409 407 512 525 546 501 589 615 671 761 949 1159 244 305 391 400 446 448 533 480 511 530 600 836 16 Aver Iters 301.8 359.8 460.8 521.2 545.4 551.4 616.0 631.2 658.2 658.4 852.4 914.2 1139.6 1282.6 285.6 365.8 429.8 449.0 520.0 544.6 583.4 604.4 638.4 639.8 773.0 852.8 1014.2 1224.6 259.8 327.2 404.8 443.2 476.8 496.0 556.4 513.8 556.4 592.4 632.6 845.4 Max Iters 316 396 491 533 581 603 710 664 693 757 914 954 1267 1491 300 393 461 483 535 560 652 669 668 674 888 947 1067 1298 281 362 423 472 504 523 574 561 591 626 667 856 % CPU Reduction 67.00 67.62 57.15 53.34 51.98 53.52 46.69 45.73 43.16 45.21 31.15 24.53 8.96 -4.18 62.93 60.62 55.96 56.31 47.29 46.70 42.47 40.99 37.26 37.63 28.47 20.03 4.41 -3.28 66.39 69.86 64.76 63.42 59.09 58.13 53.47 60.43 55.98 53.17 53.89 39.82 Table 3: Adaptive multicut aggregation for SSN Sample Size 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 agg-max 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 10 20 30 40 50 60 70 80 90 100 200 300 CPU sec 81.23 104.93 135.92 169.87 202.05 238.27 258.44 290.35 318.80 351.89 662.72 800.38 1295.09 2006.02 150.05 180.33 216.47 254.05 292.41 332.53 376.76 422.19 451.73 501.13 903.76 1140.26 1653.71 2551.07 214.87 249.74 294.30 339.46 377.86 449.29 479.50 507.89 561.22 621.87 998.69 1412.60 Min Iters 60 93 131 169 212 257 276 313 341 375 751 892 1554 2611 54 80 101 129 150 173 199 232 250 276 514 685 1064 1714 48 70 93 112 129 157 170 187 207 223 383 558 17 Aver Iters 61.8 96.4 135.8 177.6 218.6 260.6 287.8 328.6 362.6 408.8 789.2 965.0 1646.4 2806.8 55.4 81.6 106.2 131.8 156.4 181.6 209.4 236.8 257.0 288.6 533.0 714.8 1111.2 1772.6 49.6 72.0 95.8 116.0 132.8 162.6 176.6 189.8 213.0 236.4 398.0 613.4 Max Iters 64 104 139 189 229 267 303 347 376 447 851 1021 1749 3062 57 83 113 133 160 193 221 241 267 314 548 736 1176 1854 51 73 98 119 142 170 182 195 217 250 424 650 % CPU Reduction 95.88 94.67 93.10 91.37 89.74 87.90 86.88 85.26 83.81 82.13 66.35 59.36 34.24 -1.85 96.25 95.50 94.59 93.65 92.70 91.70 90.59 89.46 88.72 87.48 77.43 71.52 58.70 36.29 96.48 95.91 95.18 94.44 93.81 92.65 92.15 91.69 90.81 89.82 83.65 76.88 Table 4: Adaptive multicut aggregation for STORM Sample Size 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 agg-max 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 10 20 30 40 50 60 70 80 90 100 200 300 500 1000 CPU sec 21.62 21.07 21.27 21.20 21.28 20.82 20.87 21.08 20.65 20.87 20.79 20.61 20.65 20.61 21.56 21.47 21.36 21.39 21.43 21.61 21.43 21.39 21.42 21.41 21.41 21.37 21.39 21.41 48.23 48.57 48.86 48.22 47.46 48.24 48.12 47.62 48.40 46.97 46.90 46.09 46.06 46.11 Min Iters 18 18 18 18 18 18 18 18 18 18 18 18 18 18 17 17 17 17 17 17 17 17 17 17 17 17 17 17 19 19 19 19 19 19 19 19 20 19 20 20 20 20 18 Aver Iters 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 18.4 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 17.0 19.8 19.8 19.8 19.8 19.8 19.8 19.8 19.8 20.0 19.8 20.0 20.0 20.0 20.0 Max Iters 19 19 19 19 19 19 19 19 19 19 19 19 19 19 17 17 17 17 17 17 17 17 17 17 17 17 17 17 20 20 20 20 20 20 20 20 20 20 20 20 20 20 % CPU Reduction -230.28 -221.82 -224.88 -223.85 -225.09 -217.95 -218.72 -221.97 -215.40 -218.75 -217.62 -214.81 -215.39 -214.75 -62.53 -61.85 -61.08 -61.26 -61.57 -62.91 -61.57 -61.29 -61.50 -61.45 -61.40 -61.14 -61.31 -61.45 -138.26 -139.98 -141.37 -138.25 -134.47 -138.31 -137.72 -135.28 -139.11 -132.06 -131.69 -127.71 -127.57 -127.82 Table 5: Adaptive multicut aggregation with redundancy threshold δ Problem Name 20term 20term 20term 20term 20term 20term 20term 20term 20term 20term ssn ssn ssn ssn ssn ssn ssn ssn ssn ssn storm storm storm storm storm storm storm storm storm storm agg -max 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 CPU Time (secs.) δ 0.5 0.6 0.7 407.01 402.15 384.90 450.09 434.86 470.50 522.31 543.72 490.48 593.87 596.54 564.78 512.02 518.50 612.98 652.42 629.03 620.88 630.61 634.00 603.24 618.35 630.86 619.25 551.51 588.42 666.14 723.23 708.96 738.51 78.09 78.63 78.75 106.49 107.55 107.27 132.61 134.48 135.52 171.78 171.77 172.50 202.26 202.95 202.36 237.44 235.90 235.76 267.75 264.22 262.49 301.38 298.50 298.74 315.24 323.30 318.39 354.43 361.52 358.93 11.76 10.87 11.15 10.90 10.54 10.67 10.81 10.34 10.57 11.25 10.32 10.47 11.20 10.30 10.46 11.72 10.32 10.48 11.81 10.29 10.41 11.76 10.30 10.47 11.64 10.20 10.49 11.94 10.29 10.42 Aver Iters δ 0.5 0.6 0.7 308.4 304.6 297.4 386.6 380.2 398.8 461.8 469.2 452.0 538.8 539.8 515.8 511.8 514.4 565.8 606.6 591.0 584.2 602.6 603.2 590.6 614.8 622.2 608.2 582.0 602.2 650.2 706.8 694.4 702.6 59.8 60.4 134.4 98.6 99.4 128.0 132.4 134.4 189.8 180.2 180.4 197.0 218.6 218.8 206.8 262.2 259.4 223.4 296.4 293.8 225.0 339.0 336.2 236.2 360.2 366.8 266.4 412.4 420.2 402.8 21.4 18.4 18.4 19.2 18.4 18.4 19.0 18.4 18.4 21.0 18.4 18.4 20.8 18.4 18.4 22.8 18.4 18.4 23.0 18.4 18.4 22.8 18.4 18.4 22.2 18.0 18.4 23.2 18.4 18.4 19 % CPU Reduction δ 0.5 0.6 0.7 68.11 68.49 69.84 64.74 65.93 63.14 59.08 57.40 61.57 53.47 53.26 55.75 59.88 59.38 51.97 48.88 50.72 51.36 50.59 50.33 52.74 51.55 50.57 51.48 56.79 53.90 47.81 43.34 44.45 42.14 95.83 95.80 95.79 94.31 94.25 94.27 92.91 92.81 92.76 90.82 90.82 90.78 89.19 89.16 89.19 87.31 87.39 87.40 85.69 85.88 85.97 83.90 84.05 84.04 83.16 82.72 82.99 81.06 80.68 80.82 -92.16 -77.61 -82.19 -78.10 -72.22 -74.35 -76.63 -68.95 -72.71 -83.82 -68.63 -71.08 -83.01 -68.30 -70.92 -91.50 -68.63 -71.24 -92.97 -68.14 -70.10 -92.16 -68.30 -71.08 -90.20 -66.67 -71.41 -95.10 -68.14 -70.26 Table 6: Static multicut aggregation for 20TERM Sample Size 1000 1000 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 Num aggs 1 5 10 25 50 100 200 500 1000 1 5 10 25 50 100 200 500 1000 2000 1 5 10 25 50 100 200 500 1000 2000 CPU sec 1221.17 955.19 724.08 605.16 465.67 423.27 345.65 639.72 1681.18 2168.89 1804.45 1687.90 1294.14 1018.93 844.92 781.94 1037.75 2351.56 12065.45 4018.32 2765.13 2456.81 2004.51 1702.02 1370.96 1277.18 1525.36 3728.50 14920.00 Min Iters 1161 988 631 515 385 311 191 149 127 1082 830 666 611 448 350 241 182 139 114 1184 896 781 615 516 332 261 206 172 141 20 Aver Iters 1304.2 1003.0 734.4 569.0 411.4 320.2 221.4 160.2 138.8 1232.6 976.2 853.6 643.0 483.8 372.8 283.6 194.4 154.0 126.8 1426.2 1007.0 878.6 682.8 547.2 419.8 327.0 228.0 180.5 147.8 Max Iters 1400 1023 789 608 431 327 244 167 146 1480 1054 919 684 523 388 302 201 166 136 1672 1176 950 764 574 474 360 248 195 150 % CPU Reduction 0.00 21.78 40.71 50.44 61.87 65.34 71.69 47.61 -37.67 0.00 16.80 22.18 40.33 53.02 61.04 63.95 52.15 -8.42 -456.30 0.00 31.19 38.86 50.12 57.64 65.88 68.22 62.04 7.21 -271.30 Table 7: Static multicut aggregation for SSN Sample Size 1000 1000 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 Num aggs 1 5 10 25 50 100 200 500 1000 1 5 10 25 50 100 200 500 1000 2000 1 5 10 25 50 100 200 500 1000 2000 3000 CPU sec 1969.52 634.58 346.65 167.07 96.41 68.21 56.99 58.43 76.77 4004.00 1414.22 840.35 415.28 246.87 172.25 139.63 132.74 149.73 229.89 6109.21 2349.93 1418.19 719.81 425.63 289.04 229.88 206.18 239.42 361.58 542.95 Min Iters 2569 785 403 180 99 62 43 29 24 2690 912 504 240 135 84 57 37 29 24 2693 983 572 280 158 98 67 41 31 26 24 21 Aver Iters 2767.0 811.4 429.0 189.8 102.6 63.6 43.8 30.2 25.2 2856.0 939.0 536.6 247.0 137.0 85.0 58.0 38.0 29.0 24.6 2935.0 1059.6 616.2 292.6 162.2 100.0 67.2 42.4 32.8 26.8 24.4 Max Iters 2961 850 456 197 109 65 45 31 27 2980 968 551 258 139 87 59 40 29 26 3236 1118 648 302 167 103 68 44 34 27 25 % CPU Reduction 0.00 67.78 82.40 91.52 95.10 96.54 97.11 97.03 96.10 0.00 64.68 79.01 89.63 93.83 95.70 96.51 96.68 96.26 94.26 0.00 61.53 76.79 88.22 93.03 95.27 96.24 96.62 96.08 94.08 91.11 Table 8: Static multicut aggregation for STORM Sample Size 1000 1000 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000 2000 2000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 5000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 Num aggs 1 5 10 25 50 100 200 500 1000 1 5 10 25 50 100 200 500 1000 1 5 10 25 50 100 200 500 1000 2000 3000 1 5 10 25 50 100 200 500 1000 2000 5000 1 5 10 25 50 100 200 500 1000 2000 CPU sec 6.54 6.50 5.84 5.72 6.52 6.09 6.69 8.63 16.43 13.26 13.73 12.87 12.31 12.91 12.16 12.42 18.04 21.11 20.24 18.28 18.69 17.97 19.76 20.75 18.27 21.68 32.45 56.82 91.86 35.35 33.13 33.68 32.20 29.78 32.49 33.90 33.80 47.47 78.36 229.00 70.55 70.35 70.67 71.60 69.74 51.11 64.06 68.81 74.46 114.06 Min Iters 20 20 18 17 20 19 18 17 18 20 22 18 18 18 19 19 19 17 20 18 19 18 18 20 19 17 19 18 18 23 18 20 19 19 19 19 19 19 18 18 23 23 22 23 23 18 19 19 19 20 22 Aver Iters 21.8 22.0 20.2 19.4 21.4 19.0 19.2 17.0 18.4 22.2 23.0 22.4 21.2 21.0 20.0 19.0 20.6 17.0 22.0 21.8 21.8 20.8 22.0 22.2 19.4 18.6 19.8 18.8 18.8 23.2 22.6 23.0 22.0 20.8 21.0 21.8 19.0 20.6 19.6 18.6 23.2 23.2 23.4 23.8 23.0 18.8 20.6 20.2 19.0 20.8 Max Iters 24 24 22 21 23 19 22 17 19 24 24 24 23 23 22 19 21 17 24 24 24 24 24 24 21 19 20 19 19 24 24 24 24 23 23 23 19 22 21 19 24 24 24 24 23 19 22 22 19 21 % CPU Reduction 0.00 0.61 10.71 12.50 0.28 6.85 -2.33 -31.87 -151.03 0.00 -3.54 2.92 7.16 2.61 8.25 6.31 -36.05 -59.24 0.00 9.68 7.62 11.18 2.38 -2.52 9.70 -7.13 -60.32 -180.74 -353.83 0.00 6.28 4.74 8.92 15.76 8.09 4.08 4.41 -34.27 -121.63 -547.67 0.00 0.28 -0.17 -1.48 1.15 27.55 9.20 2.47 -5.55 -61.67 References Benders, J. F.: 1962, Partitioning procedures for solving mixed-variable programming problems, Numerische Mathematik 54, 238–252. Birge, J.: 1985, Aggregation bounds in stochastic linear programming, Mathematical Programming 31, 25–41. Birge, J.: 1988, A multicut algorithm for two-stage stochastic linear programs, European Journal of Operational Research 34, 384–392. Birge, J. R. and Louveaux, F. V.: 1997, Introduction to Stochastic Programming, Springer, New York. Gassmann, H. I.: 1990, MSLiP: a computer code for the multistage stochastic linear programming problem, Mathematical Programming 47, 407–423. Higle, J. and Sen, S.: 1991, Stochastic decomposition: An algorithm for twostage stochastic linear programs with recourse, Mathematics of Operational Research 16, 650–669. Higle, J. and Sen, S.: 1996, Stochastic Decomposition: A Statistical Method for Large Scale stochastic linear programming, Kluwer Academic Publisher, Dordrecht. Higle, J. and Zhao, L.: 2007, Adaptive and nonadaptive samples in solving stochastic linear programs: A computational investigation, Submitted to Computational Optimization and Applications . ILOG, I.: 2003, CPLEX 9.0 Reference Manual, ILOG CPLEX Division. Linderoth, J. T. and Wright, S. J.: 2003, Decomposition algorithms for stochastic programming on a computational grid, Computational Optimization and Applications 24, 207–250. Mak, W. K., Morton, D. and Wood., R.: 1999, Monte carlo bounding techniques for determining solution quality in stochastic programs, Operations Research Letters 24, 47–56. Mulvey, J. M. and Ruszczynski, A.: 1992, A new scenario decomposition method for large scale stochastic optimization, Technical report sor-91-19, Dept. of Civil Engineering and Operations Research, Princeton Univ. Princeton, N.J. 08544. Ruszczynski, A.: 1988, A regularized decomposition for minimizing a sum of polyhedral functions, Mathematical Programming 35, 309–333. Sen, S., Doverspike, R. and Cosares, S.: 1994, Network planning with random demand, Telecommunications Systems 3, 11–30. 23 Shapiro, A.: 2000, Stochastic programming by Monte Carlo simulation methods, Available at http://www2.isye.gatech.edu/∼ashapiro/publications.html. URL: http://www2.isye.gatech.edu/∼ashapiro/publications.html Van Slyke, R. and Wets, R.-B.: 1969, L-shaped linear programs with application to optimal control and stochastic programming, SIAM Journal on Applied Mathematics 17, 638–663. 24