Hyper-sparsity in the revised simplex method and how to exploit it J. A. J. Hall K. I. M. McKinnon th 08 October 2002 Abstract The revised simplex method is often the method of choice when solving large scale sparse linear programming problems, particularly when a family of closely-related problems is to be solved. Each iteration of the revised simplex method requires the solution of two linear systems and a matrix vector product. For a significant number of practical problems the result of one or more of these operations is usually sparse, a property we call hyper-sparsity. Analysis of the commonly-used techniques for implementing each step of the revised simplex method shows them to be inefficient when hyper-sparsity is present. Techniques to exploit hyper-sparsity are developed and their performance is compared with the standard techniques. For the subset of our test problems that exhibits hyper-sparsity, the average speedup in solution time is 5.2 when these techniques are used. For this problem set our implementation of the revised simplex method which exploits hyper-sparsity is shown to be competitive with the leading commercial solver and significantly faster than the leading public-domain solver. 1 Introduction Linear programming (LP) is a widely applicable technique both in its own right and as a sub-problem in the solution of other optimization problems. The revised simplex method and the barrier method are the two efficient methods for solving general large sparse LP problems. In a context where families of related LP problems have to be solved, such as in integer programming and decomposition methods, the revised simplex method is usually the more efficient method. The constraint matrices for most practical LP problems are sparse and, for an implementation of the revised simplex method to be efficient, it is crucial that only non-zeros in the coefficient matrix are stored and operated on. Each iteration of the revised simplex method requires the solution of two linear systems, the computation of which are commonly referred to as FTRAN and BTRAN, and a matrix vector product computed as the result of an operation referred to as PRICE. The matrices involved in these operations are submatrices of the coefficient matrix so are normally sparse. For many problems these three operations yield dense vectors, in which case there is limited scope for improving the performance by fully exploiting any zeros in the vectors. However, there is 1 a significant number of practical problems where the results of one or more of these operations are usually sparse, a property referred to in this paper as hyper-sparsity. This phenomenon has been reported independently by Bixby [3] and Bixby et al [4]. A number of classes of problems are known to exhibit hyper-sparsity, in particular those with a significant network structure such as multicommodity flow problems. Specialist variants of the revised simplex method which exploit this structure explicitly have been developed, in particular by McBride and Mamer [16, 17]. This paper identifies hyper-sparsity in a wider range of more general LP problems. Techniques are developed which can be applied within a general revised simplex solver when hyper-sparsity is identified during the solution process and, for problems exhibiting hypersparsity, significant performance improvements are demonstrated. The computational components of the revised simplex method and standard techniques for FTRAN and BTRAN are introduced in Section 2 of this paper. Section 3 describes hyper-sparsity and gives statistics on its occurrence in a test set of LPs drawn from the standard Netlib set [10], the Kennington test set [5] and the authors’ personal collection [13]. Analysis given in Section 4 shows the commonly-used techniques for each of the computational components of the revised simplex method to be inefficient when hyper-sparsity is present and techniques to exploit hyper-sparsity are described. Section 5 presents a computational comparison of the authors’ revised simplex solver, EMSOL, with and without the techniques for exploiting hyper-sparsity. For those problems which exhibit hyper-sparsity, a comparison is also made between EMSOL, SOPLEX 1.2 [21] and the CPLEX 6.5 [15] primal simplex solver. Conclusions are offered in Section 6. Although it contains no new ideas, this paper represents a significant reworking of [12]. The presentation has been greatly improved and the comparisons between EMSOL, SOPLEX and CPLEX have been added. 2 The revised simplex method The revised simplex method and its computational requirements are most conveniently discussed in the context of LP problems in standard form cT x Ax = b x ≥ 0, minimize subject to where x ∈ IRn and b ∈ IRm . In the simplex method, the variables are partitioned into index sets B of m basic variables and N of n − m nonbasic variables such that the basis matrix B formed from the columns of A corresponding to the basic variables is nonsingular. The set B itself is conventionally referred to as the basis. The columns of A corresponding to the nonbasic variables form the matrix N and the components of c corresponding to the basic and nonbasic variables are referred to as, respectively, the basic costs cB and non-basic costs cN . When the nonbasic variables are set to zero the values b̂ = B −1 b of the basic variables, if nonnegative, correspond to a vertex of the feasible region. If any basic variables are infeasible they are given costs of −1, with other variables being given costs of zero. Phase I iterations are then performed in order to force all variables to 2 be non-negative, if possible. An account of the details of the revised simplex method is given by Chvátal in [6] and the computational components of each iteration are summarised in Figure 1. Note that although the reduced costs may be computed directly using the following BTRAN and PRICE operations π TB = cTB B −1 ĉTN = cTN − π TB N, it is more efficient computationally to update them by calculating the pivotal row âTp = eTp B −1 N , where ep is column p of the identity matrix, using the BTRAN and PRICE operations defined in Figure 1. The only significant computational requirement which is not indicated in Figure 1 occurs when, in a phase I iteration, one or more variables which remain basic become feasible so their cost coefficients increase by one to zero. In order to update the reduced costs, it is necessary to compute the corresponding linear combination of tableau rows. This composite row is formed by the following BTRAN and PRICE operations π Tδ = δ T B −1 âTδ = π Tδ N, where the nonzeros in δ are equal to one for each variable which remains basic and becomes feasible. In this paper, techniques are discussed which are applicable to all BTRAN and PRICE operations. This is done with reference to an unsubscripted vector π corresponding to a generic right-hand-side vector r for the system B0 T π = r solved by BTRAN. When techniques are only applicable to a particular class of BTRAN or PRICE operations, the specific notation for the right-hand-side and solution is used. CHUZC: Scan ĉN for a good candidate q to enter the basis. FTRAN: Form the pivotal column âq = B −1 aq , where aq is column q of A. CHUZR: Scan the ratios b̂i /âiq for the row p of a good candidate to leave the basis. Let α = b̂p /âpq . Update b̂ := b̂ − αâq . BTRAN: Form π Tp = eTp B −1 . PRICE: Form the pivotal row âTp = π Tp N . Update reduced costs ĉTN := ĉTN − ĉq âTp . If (growth in factors) then INVERT: Form a factored representation of B −1 . else UPDATE: Update the factored representation of B −1 corresponding to the basis change. end if Figure 1: Operations in an iteration of the revised simplex method Efficient implementations of the revised simplex method generally weight each reduced cost in CHUZC by some measure of the magnitude of the corresponding column of the standard simplex tableau, commonly referred to as an edge weight. Many solvers, including the authors’ solver EMSOL, use the Harris Devex strategy [14]. This requires only the pivotal row to update 3 the edge weights so carries no significant computational overhead beyond that outlined in Figure 1. 2.1 The representation of B −1 In each iteration of the simplex method it is necessary to solve two systems, one involving the current basis matrix B and the other its transpose. This is achieved by passing respectively forwards and backwards through the data structure corresponding to a factored representation of B −1 . There are a number of procedures for updating the factored representation of B −1 , the original and simplest being the product form update of Dantzig and Orchard-Hays [7]. This approach is used by EMSOL and the techniques in this paper are developed with the product form update in mind. If B0−1 is used to denote the factored representation obtained by INVERT, and EU represents the transformation of B0 corresponding to subsequent basis changes such that B = B0 EU , it follows that B −1 may be expressed as B −1 = EU−1 B0−1 . In a typical implementation of the revised simplex method for large sparse LP problems, B0−1 is represented as a product of KI elimination operations derived directly from the nontrivial columns of the matrices L and U which form an LU decomposition of (a row and column permutation of) B0 . This invertible Q representation allows B0−1 to be expressed algebraically as B0−1 = 1k=KI Ek−1 , where 1 −η1k 1 .. .. .. . . . k 1 −η 1 −1 p k k 1 µ Ek−1 = . (1) k 1 −η 1 +1 p k .. . . . .. .. k 1 1 −ηm Within an implementation, the nonzeros in the ‘eta’ vector η k = [ η1k . . . ηpkk −1 0 ηpkk +1 T k . . . ηm ] I are stored as value-index pairs and the data structure {pk , µk , η k }K k=1 is referred to as an eta file. Eta vectors associated with the matrices L and U coming from Gaussian elimination are referred to as L-etas and U -etas respectively. Note that the number of nonzeros in the eta file is no more that the number of nonzeros in the LU decomposition.The operations with pk , µk and η k required when forming Ek−1 r during the standard algorithms for FTRAN and Ek−T r during BTRAN are illustrated in Figure 2. The product form update leaves the factored form of B0−1 unaltered and represents EU−1 as a product of pairs of elementary matrices of the form (1). The representation of each UPDATE operation is obtained directly from the pivotal column and is given by pk = p, µk = 1/âpq and η k = âq − âpq ep . In solvers based on the product form, the representation of the UPDATE operations can be appended to the eta file following INVERT, resulting in a 4 if (rpk 6= 0) then rpk := µk rpk r := r − rpk η k end if rpk := µk (rpk − rT η k ) (a) FTRAN (b) BTRAN Figure 2: Standard operations in FTRAN and BTRAN single homogeneous data structure. However, in this paper and in EMSOL, the particular properties of the INVERT and UPDATE etas and the nature of the operations with them are exploited, so FTRAN is performed as the pair of operations (I-FTRAN) ãq = B0−1 aq followed by âq = EU−1 ãq . (U-FTRAN) Conversely, BTRAN is performed as followed by π̃T = rT EU−1 (U-BTRAN) π T = π̃ T B0−1 . (I-BTRAN) Note that the term RHS is used to refer, not just to the vector on the right hand side of the system to be solved, but to the vector at any stage in the process of transforming it into the solution of the system. 3 What is hyper-sparsity? Each iteration of the revised simplex method performs FTRAN to obtain the pivotal column âq = B −1 aq as the solution of one linear system, BTRAN to obtain π Tp = eTp B −1 as the solution of a second linear system, and PRICE to form the pivotal row âTp = π Tp N as the result of a matrix-vector product. For many LP problems, even those for which the constraint matrix is sparse, there are few zero values in the results of these operations. However, as this paper demonstrates, there is a significant set of practical problems for which the proportion of zero values in the results of these operations is such that exploiting this yields very large computational savings. For the sake of classifying test problems, in this paper the result of an FTRAN, BTRAN or PRICE operation is considered to be sparse if no more than 10% of its entries are nonzero and if at least 60% of the results of that particular operation are sparse then this is taken to be a clear majority. An LP problem is considered to exhibit hyper-sparsity if a clear majority of the results of one or more of these operations is sparse. These thresholds are used for the presentation of results only and are not parameters in the implementation of the methods which exploit hyper-sparsity. The extent to which hyper-sparsity exists in LP problems was investigated for a subset of the standard Netlib test set [10], the Kennington test set [5] 5 and the authors’ personal collection [13]. Those problems from the Netlib set whose solution requires less than ten seconds of CPU time were excluded, as were FIT2D and the OSA problems from the Kennington set. For the latter problems, the number of columns is particularly large relative to the number of rows so the the solution techniques developed in this paper are inappropriate. Note that a simple standard scaling algorithm is applied to each of the problems for reasons of numerical stability and each problem is solved from the initial basis obtained using the crash procedure in CPLEX [15]. For each problem the density of each pivotal column âq following FTRAN, of π p and π δ following BTRAN and of âTp and âTδ following PRICE was determined, and the total number of each which was found to be sparse was counted. The problems for which there is a clear majority of sparse results for at least one of these three operations are listed in Table 1 and referred to as test set H. The remaining problems, those which exhibit no hyper-sparsity in any of FTRAN, BTRAN or PRICE, are listed in Table 2 and referred to as test set H0 . For each of the three operations, Tables 1 and 2 give the percentage of the results which are sparse. Subsequent columns give, for each of the three operations, the average density of those results which are sparse and the average density of those results which are not. The final column of Table 1 summarises the extent to which the problem exhibits hyper-sparsity by giving the initial letter of the operation(s) for which a clear majority of the results is sparse. The first thing to note from the results in Table 1 is that all but two of the problems in H exhibit hyper-sparsity for BTRAN and most do so for all three operations. It is interesting to consider why this is the case and why there are exceptions. Recall that the result of FTRAN is the pivotal column âq and the result of (most) PRICE operations is the pivotal row. Since these are a column and row from the same matrix (the standard simplex tableau B −1 N ) it might be expected that a problem would exhibit hyper-sparsity in both FTRAN and PRICE or in neither. Also, since the π p is just a single row of B −1 , and the pivotal column is usually a linear combination of several columns of B −1 , it might be expected that π p would be less dense than âq . These arguments would lead us to expect all problems in H to have property B, and that problems in H would be either FBP or B. The reasons for the exceptions are now explained. There are two problems, DCP1 and DCP2, of type F, i.e. the pivotal columns are typically sparse but the results of BTRAN and PRICE are not. These problems are decentralised planning problems for which a typical standard simplex tableau is very sparse with a few dense rows. Thus the pivotal columns are usually sparse. However, the pivot is usually chosen from one of the dense rows. Conversely there are five problems of the opposite type, BP, i.e. the pivotal rows are typically sparse but the pivotal columns are not. The most remarkable of these is FIT2P: 81% of pivotal columns are essentially full and all but one of the pivotal rows are sparse. For this problem, most columns of the constraint matrix have only one nonzero entry, with the remaining columns being very dense. Thus B −1 is largely diagonal with a small number of essentially full columns. For this model it turns out that most variables chosen to enter the basis have a single nonzero entry such that the pivotal column is (a multiple of) one of these dense columns of B −1 . Each π p is a row of B −1 and its resulting 6 7 Problem 80BAU3B FIT2P GREENBEA GREENBEB STOCFOR3 WOODW DCP1 DCP2 CRE-A CRE-C KEN-11 KEN-13 KEN-18 PDS-06 PDS-10 PDS-20 Rows 2262 3000 2392 2392 16675 1098 4950 32388 3516 3068 14694 28632 105127 9881 16558 33874 Dimensions Columns Nonzeros 9799 21002 13525 50284 5405 30877 5405 30877 15695 64875 8405 37474 3007 93853 21087 559390 4067 14987 3678 13244 21349 49058 42659 97246 154699 358171 28655 62524 48763 106436 105728 230200 Mean density (%) Sparse results Non-sparse results Sparse results (%) FTRAN BTRAN PRICE FTRAN BTRAN PRICE FTRAN BTRAN PRICE 97 19 15 16 29 32 98 100 100 100 100 100 100 100 100 100 72 100 60 61 100 73 56 59 82 83 98 92 93 97 96 94 70 100 60 60 75 71 47 53 80 80 97 92 93 97 96 94 2.40 0.17 4.62 4.55 2.09 3.90 4.52 1.25 1.91 2.86 0.10 0.12 0.05 0.94 1.22 2.55 0.61 0.65 0.49 0.48 2.26 1.01 1.33 0.68 0.39 0.45 0.27 0.16 0.21 0.34 0.24 0.21 0.66 0.91 0.81 0.77 2.86 1.14 1.26 0.45 0.61 0.59 0.28 0.17 0.20 0.46 0.31 0.29 10.84 91.10 24.73 27.81 62.18 22.60 11.00 — — — — — — — — — 42.41 16.93 75.54 77.75 — 57.55 13.90 15.08 28.17 33.71 11.88 35.85 39.58 18.95 23.60 41.87 50.85 19.46 80.75 83.78 14.97 59.91 82.60 82.34 56.06 58.43 13.79 42.07 41.94 17.71 25.19 41.11 Table 1: Problems exhibiting hyper-sparsity (set H): dimensions, percentage of the results of FTRAN, BTRAN and PRICE which are sparse, average density of those results which are sparse, average density of those results which are not and summary of those operations for which more than 60% of the results are sparse. Hypersparse FBP BP BP BP BP BP F F FBP FBP FBP FBP FBP FBP FBP FBP 8 Problem BNL2 D2Q06C D6CUBE DEGEN3 DFL001 MAROS-R7 PILOT PILOT87 QAP8 TRUSS CRE-B CRE-D Rows 2324 2171 415 1503 6071 3136 1441 2030 912 1000 9648 8926 Dimensions Columns Nonzeros 3489 13999 5167 32417 6184 37704 1818 24646 12230 35632 9408 144848 3652 43167 4883 73152 1632 7296 8806 27836 72447 256095 69980 242646 Mean density (%) Sparse results Non-sparse results Sparse results (%) FTRAN BTRAN PRICE FTRAN BTRAN PRICE FTRAN BTRAN PRICE 29 12 2 15 2 24 9 8 0 8 58 57 36 25 9 43 37 1 27 19 11 39 55 56 35 25 9 42 37 1 24 15 11 38 55 55 2.28 1.84 5.10 4.42 1.57 0.54 3.89 1.33 1.46 4.03 4.15 3.61 0.61 0.48 0.64 1.22 0.22 0.59 1.59 0.99 0.55 0.81 0.19 0.22 0.79 0.86 1.11 1.47 0.37 1.43 2.06 0.89 1.92 1.28 0.37 0.39 28.56 49.32 65.98 25.55 51.57 81.25 60.86 74.93 83.92 47.71 14.01 13.82 40.03 66.32 94.43 62.05 83.45 30.78 76.31 72.50 75.51 85.56 55.81 54.43 70.22 87.24 97.91 84.97 92.51 62.30 88.51 91.08 98.45 80.62 89.16 89.20 Table 2: Problems not exhibiting hyper-sparsity (set H0 ): dimensions, percentage of the results of FTRAN, BTRAN and PRICE which are sparse, average density of those results which are sparse and average density of those results which are not. sparsity is inherited by the pivotal row since most columns of N in the PRICE operation have only one nonzero entry. Several observations can be made from the columns in Tables 1 and 2. For those problems in H0 the density of the majority of results is such that the techniques developed for hyper-sparsity are seen to be of limited value. For the problems in H, those results which are not sparse may have a very high proportion of nonzero entries. It is important therefore that techniques for exploiting hyper-sparsity are ‘switched off’ when such cases are identified during a particular FTRAN, BTRAN or PRICE operation. If this is not done then any computational savings obtained when exploiting hyper-sparsity may be compromised. 4 Exploiting hyper-sparsity Each computational component of the revised simplex method either forms, or operates with, the result of FTRAN, BTRAN or PRICE. Each of these components is considered below and it is shown that typical computational techniques are inefficient in the presence of hyper-sparsity. In each case, equivalent computational techniques are developed which exploit hyper-sparsity. 4.1 Relative cost of computational components Table 3 gives, for test set H, the percentage of solution time which can be attributed to each of the major computational components in the revised simplex method. This allows the value of exploiting hyper-sparsity in a particular component to be assessed. Although PRICE and CHUZC together are dominant for most problems, each of the other computational components, with the exception of I-FTRAN and I-BTRAN, constitute at least 10% of the solution time for some of the problems. Thus, to obtain general improvement in computational performance requires hyper-sparsity to be exploited in all computational components of the revised simplex method. For the problems in test set H0 , only U-BTRAN, PRICE and INVERT benefit noticeably from the techniques developed below. The percentages of the solution time for these computational components are given as part of Table 6. 4.2 Hyper-sparse FTRAN For most problems which exhibit hyper-sparsity, Table 3 shows that the dominant computational cost of FTRAN is I-FTRAN, particularly so for the larger problems. When the pivotal column âq computed by FTRAN is sparse, only a very small proportion of the INVERT (and UPDATE) eta vectors, needs to be applied (unless there is an improbable amount of cancellation). Indeed the number of floating point operations required to perform these few operations can be expected to be of the same order as the number of nonzeros in âq . Gilbert and Peierls [11] identified this property in the context of Gaussian elimination when the pivotal column is formed as required and is expected to be sparse. It follows that the cost of I-FTRAN (and hence FTRAN) will be dominated by the test for zero if the INVERT etas are applied using the standard operation illustrated in Figure 2(a). 9 10 Problem 80BAU3B FIT2P GREENBEA GREENBEB STOCFOR3 WOODW DCP1 DCP2 CRE-A CRE-C KEN-11 KEN-13 KEN-18 PDS-06 PDS-10 PDS-20 Mean Solution CPU (s) 31.51 117.77 66.71 47.78 372.43 10.52 47.51 2572.45 13.71 10.12 209.86 1221.09 21711.40 143.33 868.28 10967.10 Percentage of solution time CHUZC I-FTRAN U-FTRAN CHUZR I-BTRAN U-BTRAN PRICE INVERT 20.26 9.00 4.74 4.68 4.29 9.94 3.86 4.58 12.13 12.58 9.00 8.60 9.10 12.76 12.45 11.88 9.37 5.52 4.01 9.34 9.57 7.84 4.48 4.71 3.83 9.08 8.35 10.58 9.96 10.57 7.93 7.01 5.58 7.40 0.65 1.65 3.59 3.79 7.95 0.68 1.94 2.27 0.98 1.10 0.20 0.15 0.08 0.22 0.18 0.27 1.61 3.35 13.11 5.92 6.15 13.91 2.77 3.33 2.39 6.38 7.15 3.27 2.80 2.79 3.11 2.68 2.66 5.11 6.46 15.57 16.87 17.02 17.13 7.86 17.79 17.11 16.81 15.22 21.84 20.08 20.72 14.41 13.51 11.01 15.59 1.09 0.76 5.17 5.31 4.20 1.70 2.33 2.46 2.46 2.56 0.96 0.66 0.36 1.28 1.24 2.02 2.16 60.02 44.90 42.00 40.41 23.07 68.60 59.74 60.38 47.52 47.71 53.26 57.03 55.93 58.48 61.05 63.80 52.74 1.58 8.32 11.12 11.83 18.34 2.99 5.64 6.87 3.00 3.53 0.59 0.59 0.42 1.47 1.70 2.65 5.04 Table 3: Total solution time and percentage of solution time for computational components when not exploiting hyper-sparsity for test set H. The aim of this subsection is to develop a computational technique which identifies the INVERT etas which have to be applied without passing through the whole INVERT eta file and testing each value of rpk for zero. The Gilbert and Peierls approach finds this minimal set of etas at a cost proportional to their total number of nonzeros. A limitation of the Gilbert and Peierls approach is that the entire minimal set of etas has to be found before any of it can be used. If ãq is not sparse then the cost of this approach could be significantly greater than the standard I-FTRAN. This drawback is avoided by the hyper-sparse I-FTRAN algorithm given as pseudo-code in Figure 3, and explained in the following paragraph. K = {k : rpk 6= 0} While K = 6 ∅ k0 = mink∈K rpk0 := µk0 rpk0 for i ∈ Ek0 do if (ri 6= 0) then ri := ri − rpk0 [η k0 ]i else ri := −rpk0 [η k0 ]i (1) (1) if (Pi > k0 ) K := K ∪ {Pi } (2) (2) if (Pi > k0 ) K := K ∪ {Pi } R := R ∪ {i} end if end do K := K\{k0 } end while Figure 3: Hyper-sparse I-FTRAN algorithm For a given RHS vector r and set of indices of nonzeros R = {i : ri 6= 0} (which is always available without a search for the nonzeros), the hyper-sparse I-FTRAN algorithm is initialised by forming a set K of indices k of etas for which rpk is nonzero. This is done by passing through the indices in R and using the arrays P (1) and P (2) . These record, for each row, the index of the first and second eta which have a pivot in the particular row. Note that, because the etas used in I-FTRAN correspond to the LU factors, there can be at most two etas with a pivot in any particular row. Unless cancellation occurs, all the etas with indices in K will have to be applied. If K is empty then no more etas need to be applied so I-FTRAN is complete. Otherwise, the least index k0 ∈ K identifies the next eta which needs to be applied. In applying eta k0 the algorithm steps through the nonzeros in η k0 (whose indices are in Ek0 ). For each fill-in row the algorithm checks if there are etas k with k > k0 whose pivot is in this row, and any such are added to K. (The check only requires two lookups using the P (1) and P (2) arrays.) Finally the k0 entry is removed. The set K must be searched to determine the next eta to be applied, and there is some scope for variation in the way that this is achieved. In EMSOL, K is maintained as an unordered list and, if the number of entries in K becomes large, there comes a point at which the cost of the search exceeds the cost of the tests for zero which it seeks to avoid. To prevent this happening the average 11 skip through the eta file which has been achieved during the current FTRAN is compared with a multiple of |K| to determine the point at which it is preferable to complete I-FTRAN using the standard algorithm. Although the set of possible nonzeros in the RHS is not required by the algorithm in Figure 3, it is maintained as the set R. There is no real scope for exploiting hyper-sparsity in U-FTRAN which, as a consequence, is performed using the standard algorithm with the modification that R is maintained so long as the RHS is sparse. The availability of set R which, on completion of FTRAN, gives the possible positions of nonzeros in the pivotal column, allows hyper-sparsity to be exploited in CHUZR, as indicated below. 4.3 Hyper-sparse CHUZR CHUZR performs a small number of floating-point operations for each of the nonzero entries in the pivotal column. If the indices of the nonzeros are not known a priori then all entries in the pivotal column will have to be tested for zero, and for problems when this vector is typically sparse, the cost of CHUZR will be dominated by the cost of performing the tests for zero. If a list of indices of entries in the pivotal column which are (or may be) nonzero is known, then this overhead is eliminated. The nonzero entries in the pivotal column are also required both to update the values of the basic variables following CHUZR and, as described in Section 4.8, to update the product form UPDATE eta file. If the nonzero entries in the workspace vector used to compute the pivotal column are zeroed after being packed onto the end of the UPDATE eta file, this yields a contiguous list of real values to update the values of the basic variables and makes the UPDATE operation near-trivial. A further consequence is that, so long as pivotal columns remain sparse, the only complete pass through the workspace vector used to compute the pivotal column is that required to zero it before the first simplex iteration. 4.4 Hyper-sparse BTRAN When performing BTRAN using the standard operation illustrated in Figure 2(b), most of the work comes from the evaluation of the inner product r T η k . However, when the RHS is sparse, it will usually be the case that there is no intersection of the nonzeros in rT and η k so that the result is structurally zero. Unfortunately, checking directly whether there is a non-empty intersection is slower than evaluating the inner product directly. Better techniques are discussed in the remainder of this subsection. 4.4.1 Avoiding structurally zero inner products and operations with zero When using the product form update, it is valuable to consider U-BTRAN separately from I-BTRAN. When forming π Tp = eTp B −1 , which constitutes the overwhelming majority of BTRAN operations, it is possible in U-BTRAN to eliminate all the structurally zero inner products and significantly reduce the number of operations with zero. For all BTRAN operations it is possible to eliminate a significant number of the structurally zero inner products in I-BTRAN. 12 Hyper-sparse U-BTRAN when forming π p Let KU denote the number of UPDATE operations which have been performed since INVERT and let P denote the set of indices of those rows which have been pivotal. Note that the inequality |P| ≤ KU is strict if a particular row has been pivotal more than once. Since the RHS of B T π p = ep has only one nonzero in the row which has just been pivotal and fill-in during U-BTRAN can only occur in components of the RHS corresponding to pivots, it follows that the nonzeros in π̃ Tp = eTp EU−1 are restricted to the components with indices in P. Thus, when applying the k th UPDATE eta, only the nonzeros with indices in P contribute to rT ηk . Since |P| is very much smaller than the dimension of B, it follows that unless this observation is exploited, most of the floating point operations when applying the UPDATE etas involve zero. A significant improvement in efficiency is achieved by maintaining a rectangular array EP of dimension |P| × KU which holds the values of the entries corresponding to P in the UPDATE etas, allowing π̃ p to be formed as a sequence of KU inner products. These are computed by indirection into EP using a list of indices of nonzeros in the RHS which is simple and cheap to maintain. When using this technique, if the update etas are sparse then EP will be largely zero. As a consequence, most of the inner products rT η k will be structurally zero, and most of the values of rpk will be zero. The first (next) eta for which this is not the case, and so must be applied, is identified by searching for the first (next) nonzero in the rows of EP for which the corresponding component of r is nonzero. The extra work of performing this search is usually much less than the computation which is avoided. Indeed, the initial search frequently identifies that none of the UPDATE etas needs to be applied. If KU is sufficiently large then a prohibitively large amount of storage is required to store EP in a rectangular array. However EP may be represented by ensuring that in the UPDATE eta file the indices and values of nonzeros in ηk for rows in P are stored before any remaining indices and values. It is still possible to perform row-wise searches of EP by using an additional, more compact, data structure from which the eta index of nonzeros in each row of EP may be deduced. The overhead of maintaining this data structure and searching it is usually much less than the floating-point operations with zero that would otherwise be performed. Note that since π̃ p is sparse for any LP problem, the technique for exploiting this is of benefit whether or not the update etas are sparse. However it is seen in Table 6 that, for problems in test set H0 , the saving is a small proportion of the overall cost of performing a simplex iteration. Hyper-sparse I-BTRAN using a column-wise INVERT eta file As in the U-BTRAN case above we wish to find a way of only applying those INVERT eta vectors which require at least one nonzero operation. Assume we have available a list, Q(1) , of the indices of the last INVERT eta with a nonzero in each row, with an index of zero used to indicate that there is no such eta. The greatest index in Q(1) corresponding to the nonzeros in π̃ then indicates the first INVERT eta which must be applied. As with the UPDATE etas, the technique may indicate that a significant number of the INVERT etas need not be applied and, if the index is zero, it follows immediately that π = π̃. More generally, 13 if the list Q(l) of the index of the lth last INVERT eta with a nonzero in each row is recorded, for l from 1 to some small limit, then several significant steps backwards through the INVERT eta file may be made. However, to implement this technique requires an integer array of dimension equal to that of B0 for each list, and the backward steps are likely to become smaller for larger l, so in EMSOL only Q(1) and Q(2) are recorded. 4.4.2 Hyper-sparse I-BTRAN using a row-wise INVERT eta file The limitations and/or high storage requirements associated with exploiting hyper-sparsity during I-BTRAN with the conventional column-wise (INVERT) eta file motivate the formation of an equivalent representation stored row-wise. This may be formed after the normal column wise INVERT by passing twice through the complete column-wise INVERT eta file. This row-wise eta file permits I-BTRAN to be performed using the algorithm given in Figure 3 for I-FTRAN. For problems in which π is typically sparse, the computational overhead in forming the row-wise eta file is far outweighed by the savings achieved when applying it, even when compared to the hyper-sparse I-BTRAN using a column-wise INVERT eta file. 4.4.3 Maintaining a list of the indices of nonzeros in the RHS During BTRAN, when using the standard operation illustrated in Figure 2(b), it is simple and cheap to maintain a list of the indices of the possible nonzeros in the RHS: if rpk is zero and rT η k is nonzero then the index pk is added to the end of a list. When performing I-BTRAN by applying the algorithm given in Figure 3 with a row-wise INVERT eta file, the list of the indices of the possible nonzeros in the RHS is maintained as set R. For problems when π is frequently sparse, knowing the indices of those elements which are (or may be) nonzero allows a valuable saving to be made in PRICE. 4.5 Row-wise (hyper-sparse) PRICE For the problems in test set H, it is clear from Table 3 that PRICE accounts for about half of the CPU time required for most problems and significantly more than that for others. The matrix-vector product π T N is commonly formed as a sequence of inner products between π and (the packed form of) the appropriate columns of the constraint matrix. In the case when π is full, there will be no floating-point operations with zero so this simple technique is optimal. However this is far from being true if π is sparse, in which case, by forming π T N as a linear combination of those rows of N which correspond to nonzero entries in π, all floating point operations with zero are avoided. Although the cost of maintaining the row-wise representation of N is non-trivial, this is far outweighed by the efficiency with which π T N may then be formed. Within reason, even for problems when π is not sparse, performing PRICE with a row-wise representation of N is advantageous. For example, even if π is on average half full, the overhead of maintaining the row-wise representation of N is small compared to the half of the work of column-wise PRICE which is saved. 14 For problems when π is particularly sparse, the time to test all entries for zero dominates the time to do the small number of floating point operations which involve the nonzeros in π. The cost of testing can be avoided if a list of the indices of the non-zeros in π is known and, as identified in Section 4.4.3, such a list can be constructed at negligible cost during BTRAN. After each nonzero entry in π is used it is zeroed, so that by the end of BTRAN the entire π workspace is zeroed in preparation for the next BTRAN. Thus, so long as π vectors remain sparse, the only complete pass through this workspace array when solving an LP problem is that required to zero it before the first simplex iteration. Provided π T N remains sparse, it is advantageous to maintain a list of the indices of its nonzeros. This list can be used to avoid having to search for the nonzeros in π T N , which would otherwise be the dominant cost when using this vector. Once again the list of indices of the non-zeros can be use to zero the workspace so, provided the π T N vectors remain sparse, the only full pass through this workspace that is needed is to zero it before the first simplex iteration. 4.6 Hyper-sparse CHUZC Before discussing methods for CHUZC which exploit hyper-sparsity, it should be observed that, since the vector cB of basic costs may be full, the vector of reduced costs given by ĉTN = cTN − cTB B −1 N, may also be full. Further, for most of the solution time, a significant proportion of the reduced costs are negative. Thus, even for LP problems exhibiting hypersparsity, the attractive nonbasic variables do not form a small set whose size could then be exploited. However, if the pivotal row, eTp B −1 N , is sparse, the number of reduced costs which change each iteration is small and this can be exploited to improve the efficiency of CHUZC. The aim of the hyper-sparse CHUZC algorithm is to maintain a set, C, consisting of a small number, s, of variables which is guaranteed to include the variable with the most negative reduced cost. At the start of an LP iteration let g be the most negative reduced cost of any variable in C and let h be the lowest reduced cost of those non-basic variables not in C. The steps of CHUZC for one LP iteration are: • If h < g then reinitialise C by performing a full CHUZC: pass through all the non-basic variables and find the s + 1 most attractive (i.e. those with the lowest reduced costs). Store those with the lowest s reduced costs in C, set g to the lowest reduced cost and h to the reduced cost of the remaining variable. • Find and remove the variable with the best reduced cost from C. This provides the candidate to enter the basis. • Whilst updating the reduced costs – form a set D of the variables (not in C) with the lowest s reduced costs which have changed; 15 – update h each time a variable is not included in D and has a lower reduced cost than the current value of h. Note that hyper-sparse PRICE generates a list of the non-zeros in the pivot row, which is used to drive this step and simultaneously zero the workspace for the pivot row. • Update C to correspond to the lowest s reduced costs in C ∪ D, set g to the lowest reduced cost in the resultant C and update h if a variable which is not included in C has a lower reduced cost than the current value of h. Note that the set operations with C and D can be performed in a time proportional to their length. Also, observe that the technique for hyper-sparse CHUZC described above extends naturally, and at no additional cost, when the column selection strategy incorporates edge weights. Since the edge weight for a column changes only if the corresponding entry in the pivotal row is nonzero, the list D still contains the most attractive candidates not in C whose reduced cost and weight have changed. 4.7 Hyper-sparse Tomlin INVERT The default INVERT used in EMSOL is based on the procedure described by Tomlin [19]. This procedure identifies, and uses as pivots for as long as possible, rows and columns in the active submatrix which have only a single nonzero. Following this triangularisation phase, any residual submatrix is then factorised using Gaussian elimination with the order of the columns determined prior to the numerical calculation by merit counts based on fill-in estimates. Since the pivot in each stage of Gaussian elimination is selected from a predetermined pivotal column, only this column of the active submatrix is required. Thus, rather than apply elimination operations to maintain the up-to-date active submatrix, the up-to-date pivotal column is formed each iteration. This procedure requires much simpler data structure management and pivot search strategy compared to a Markowitz-based procedure, which maintains and selects the pivot from the whole up-to-date active submatrix. The pivotal column in a given stage of the Tomlin INVERT is formed by passing forwards through the L-etas for the residual block which have been computed up to that stage. Even for problems which do not exhibit hypersparsity, the pivotal column of the active submatrix during Gaussian elimination is very likely to be sparse. This is the situation where what is referred to in this paper as hyper-sparsity was identified by Gilbert and Peierls [11]. This partial FTRAN operation is particularly amenable to the exploitation of hyper-sparsity using the algorithm illustrated in Figure 3. Note that the data structures required to exploit hyper-sparsity during this stage in INVERT, as well as during I-FTRAN itself, are generated at almost no cost during INVERT. For the problems in test set H, the Tomlin INVERT yields factors which, in the worst case, have an average of 4% more entries than those generated by a Markowitz-based INVERT. The average fill-in with the Tomlin INVERT is no more than 10%. For problems with such low fill-in, the Tomlin INVERT (when exploiting hyper-sparsity) is at least as fast as a Markowitz-based INVERT. For many problems in H the dimension of the residual block in the Tomlin INVERT is very small (no more than a few percent) relative to the dimension of B0 so 16 the scope for exploiting hyper-sparsity is negligible. However, for others, the average dimension of the residual block is between 10 and 25 percent of the dimension of B0 . Since there is little fill-in, the scope for exploiting hypersparsity during INVERT for these problems is significant. Indeed, if hypersparsity is not exploited, the Tomlin INVERT may be significantly slower than a Markowitz-based INVERT. For some of the problems in test set H0 , using the Tomlin INVERT results in significantly more fill-in than would occur with a Markowitz-based INVERT, in which case it would generally be preferable to use a Markowitz-based INVERT. For the remaining problems, the Tomlin INVERT is at least competitive when exploiting hyper-sparsity. 4.8 Hyper-sparse (product-form) UPDATE The product-form UPDATE requires the nonzeros in the pivotal column to be stored in packed form, with the pivot stored as its reciprocal (so that the divisions in FTRAN and BTRAN are effected by multiplication). As explained in Section 4.3 this packed form is produced at negligible cost during the course of CHUZR. 4.9 Hyper-sparsity for other update procedures The product form update is commonly criticised for its lack of numerical stability and inefficiency with regard to sparsity. For this reason, some implementations of the revised simplex method are based on the ForrestTomlin [9] or Bartels-Golub [1] update procedures which modify the U (but not the L) etas in the representation of B0−1 in order to gain numerical stability and efficiency with regard to sparsity. If such a procedure were used, the data structure which enables hyper-sparsity to be exploited when applying U -etas during BTRAN and FTRAN would have to be modified after each UPDATE. The overhead of doing this is likely to limit severely the value of exploiting hyper-sparsity. Also, the advantage of the Forrest-Tomlin and Bartels-Golub update procedures with respect to sparsity is small for problems which exhibit hyper-sparsity in the product form UPDATE etas. If greater numerical stability is required than is offered by the product form update, the Schur complement update [2] may be used. Like the product form update, the representation of B0−1 is unaltered so the data structures for exploiting hyper-sparsity when applying the INVERT etas remain static. Techniques analogous to those described above for the product form update may be used to exploit hypersparsity during U-BTRAN when using a Schur complement update. 4.10 Controlling the use of hyper-sparsity techniques All the hyper-sparse techniques described above are less efficient than the standard versions in the absence of hyper-sparsity and so should only be applied when hyper-sparsity is present. For problems which do not exhibit hypersparsity at all, or for problems where a particular computational component does not exhibit hyper-sparsity, this is easily recognised by monitoring a running average of the density of the result over a number of iterations. The technique would then be switched off for all subsequent iterations if hyper-sparsity is seen 17 to be absent. For a computational component which typically exhibits hypersparsity, it is important to identify the situation where the result for a particular iteration is not going to be sparse, and switch to the standard algorithm which will then be more efficient. This can be achieved by monitoring the density of the result during the operation and switching on some tolerance. Practical experience has shown that performance is very insensitive to changes in these tolerances around the optimal value. 5 Results Computational results in this section demonstrate the value of the techniques for exploiting hyper-sparsity described in the previous section. A detailed analysis is given of the speedup of the authors’ revised simplex solver, EMSOL, as a result of exploiting hyper-sparsity. In addition, for problems which exhibit hypersparsity, EMSOL is compared with SOPLEX 1.2 [21] and the CPLEX 6.5 [15] primal simplex solver. All results were obtained on a SunBlade 100 with 512Mb of memory. 5.1 Speedup of EMSOL when exploiting hyper-sparsity The efficiency of the techniques for exploiting hyper-sparsity is demonstrated by the results in Table 4 for the problems in test set H and Table 6 for the problems in test set H0 . These tables give the speedup of both overall solution time and the time attributed to each computational component where significant hypersparsity may be exploited. In this paper, mean values of speedup are geometric since this avoids bias when favourable and unfavourable speedups are being combined. 5.1.1 Problems which exhibit hyper-sparsity For test set H, Table 4 shows clearly the value of exploiting hyper-sparsity. The solution time of all problems improves, by more than an order of magnitude in the case of the larger problems, and all computational components show a significant mean speedup. Note that, particularly for the larger problems, the speedup in PRICE and CHUZC underestimates the efficiency of these operations when the pivotal row is sparse: although only a small percentage of the pivotal rows are not sparse, they dominate the time required for PRICE, and in addition, if the pivotal row is not sparse, the set C in the hyper-sparse CHUZC must be reinitialised, requiring a full CHUZC. Although U-BTRAN exhibits the greatest mean speedup, it is seen in Table 3 that, when not exploiting hyper-sparsity, this operation makes a relatively small contribution to overall solution time. It is the speedups in I-FTRAN, I-BTRAN, PRICE and CHUZC which are of greatest value. However, the speedup in all operations is of some value. We now consider whether there is scope for further improvement. Table 5 gives the percentage of solution time attributable to the major computational components when exploiting hyper-sparsity. The column headed ‘Hypersparsity’ is the percentage of the solution time which is attributable to creating and maintaining the data structures required to exploit hyper-sparsity in the 18 speedup in total solution time and computational components 19 Problem 80BAU3B FIT2P GREENBEA GREENBEB STOCFOR3 WOODW DCP1 DCP2 CRE-A CRE-C KEN-11 KEN-13 KEN-18 PDS-06 PDS-10 PDS-20 Mean Solution 3.34 1.75 2.71 2.44 1.85 3.40 3.25 5.32 3.05 2.89 22.84 12.12 15.27 17.48 10.36 10.35 5.21 CHUZC I-FTRAN CHUZR I-BTRAN U-BTRAN PRICE INVERT 3.05 18.91 1.33 1.39 4.47 1.82 1.58 1.60 2.54 2.88 19.36 6.31 6.63 15.25 10.67 8.58 4.38 5.13 1.30 1.30 1.35 1.14 1.70 2.36 8.24 4.00 4.67 98.04 104.09 263.94 24.07 11.24 5.96 7.03 1.93 0.93 1.13 1.21 0.96 1.30 1.19 2.51 1.72 1.97 13.93 7.31 15.27 3.57 1.85 1.68 2.28 3.51 12.22 3.60 3.69 7.26 4.17 3.87 6.21 3.64 3.58 27.22 12.87 13.91 21.58 16.60 14.33 7.64 6.72 3.59 19.87 21.88 56.99 11.72 6.25 13.99 6.50 6.53 9.90 9.19 13.07 35.76 49.99 189.19 15.44 6.06 13.47 3.45 3.44 7.61 5.14 6.71 6.20 4.48 4.97 66.36 17.60 19.92 28.18 17.55 15.40 9.71 1.34 0.87 2.83 2.78 3.16 1.53 1.70 8.63 1.14 1.08 1.02 0.94 1.01 1.02 0.96 1.44 1.55 Table 4: Speedup for test set H when exploiting hyper-sparsity. computational components. PRICE and CHUZC are still the major cost for many problems and some form of partial/multiple pricing might reduce the time per iteration attributable to PRICE and CHUZC. However, the saving may be more than offset by an increase in the number of iterations required to solve these problems. The one operation where no techniques for exploiting hyper-sparsity have been developed is U-FTRAN. The contribution of this operation to overall solution time has increased from an average of 1.61% when not exploiting hypersparsity in other components (see Table 3) to a far from dominant 6.11%. 5.1.2 Problems which do not exhibit hyper-sparsity As identified in Section 4, for problems in test set H0 the only significant scope for exploiting hyper-sparsity is in U-BTRAN, PRICE and INVERT. Table 6 gives the percentage of solution time attributable to these three operations when not exploiting hyper-sparsity. Although the overhead of INVERT is higher than for problems in set H, the dominant operation is, again, PRICE. With the exception of PILOT, all other problems show some speedup in solution time when exploiting hyper-sparsity, with a modest but not insignificant mean speedup of 1.45. Despite the significant speedup in U-BTRAN, much of the overall performance gain can be put down to halving the time attributable to PRICE. Although the mean speed of INVERT is doubled, it should be born in mind that a Markowitz-based INVERT procedure may well be preferable for these problems. For the other computational components, there is an mean speedup of between 1.02 and 1.10, indicating that the hyper-sparse techniques are not significant relative to the rest of the calculation. However, the overhead associated with creating and maintaining the data structures required to exploit hyper-sparsity is not significant and for the only problems where it accounts for more than 1% of the solution time, it yields a significant overall speedup. 5.2 Comparison with representative simplex solvers In this section the performance of EMSOL is compared with other simplex solvers for the problems in test set H. CPLEX [15] is commonly held to be the leading commercial simplex solver, a view supported by benchmark tests performed by Mittelmann [18]. SOPLEX [21] is a public-domain simplex solver developed by Wunderling [20] which outperforms other such solvers in tests performed by Mittelmann [18]. In the results presented below, EMSOL is compared with the most recent version of CPLEX available to the authors (version 6.5) and SOPLEX version 1.2. Although SOPLEX version 1.2.1 is available, it shows no noticeable performance improvement over version 1.2. It is important when comparing the performance of computational components of different solvers that all solvers start from the same (or similar) advanced basis and any algorithmic differences which affect the computational requirements are reduced to a minimum. In the comparisons below, EMSOL is started from the basis obtained using the CPLEX crash procedure. SOPLEX cannot be started from a given advanced basis. However, since the SOPLEX crash procedure described by Wunderling [20] appears to be closely related to that of CPLEX, the SOPLEX initial basis may be expected to be comparable to that of CPLEX, and hence EMSOL. 20 Percentage of solution time 21 Problem 80BAU3B FIT2P GREENBEA GREENBEB STOCFOR3 WOODW DCP1 DCP2 CRE-A CRE-C KEN-11 KEN-13 KEN-18 PDS-06 PDS-10 PDS-20 Mean CHUZC I-FTRAN U-FTRAN CHUZR I-BTRAN U-BTRAN PRICE INVERT 26.24 1.21 8.24 7.97 1.97 17.79 8.16 13.54 14.03 13.60 12.02 17.06 20.75 12.29 11.89 12.55 12.46 4.23 7.78 16.51 16.87 14.18 8.45 6.69 2.20 6.58 5.60 2.74 1.20 0.60 4.80 6.35 8.49 7.08 4.00 4.66 7.94 7.98 13.91 3.51 8.92 5.93 5.87 5.69 5.22 3.85 2.59 5.90 5.99 5.75 6.11 6.89 35.34 12.03 12.14 29.68 7.02 9.42 4.51 10.92 11.41 6.08 4.81 2.76 12.77 14.74 14.35 12.18 7.34 3.21 10.82 10.95 4.85 6.17 15.39 13.05 13.43 13.32 20.64 19.53 22.51 9.75 8.29 6.96 11.64 1.13 0.51 0.79 0.90 0.21 1.17 1.80 0.67 1.65 2.02 4.61 1.81 1.01 1.50 0.77 0.30 1.30 39.22 8.39 28.10 27.89 6.23 43.90 29.85 46.16 31.08 30.31 20.72 40.58 42.42 30.38 35.43 37.54 31.14 2.81 18.56 7.48 7.37 13.55 3.18 8.73 6.26 4.47 4.21 8.51 3.86 2.62 6.03 5.18 5.48 6.77 Hyper-sparsity 4.83 13.01 4.98 4.90 9.07 4.70 7.93 6.99 8.32 9.50 15.14 5.85 4.32 12.86 9.27 7.16 8.05 Table 5: Percentage of solution time for computational components and overhead attributable to exploiting hyper-sparsity for test set H. Not exploiting hyper-sparsity Percentage of solution time 22 Problem BNL2 D2Q06C D6CUBE DEGEN3 DFL001 MAROS-R7 PILOT PILOT87 QAP8 TRUSS CRE-B CRE-D Mean U-BTRAN PRICE INVERT 6.29 2.69 0.45 6.51 2.90 1.79 1.40 1.19 1.25 2.62 1.31 1.25 2.47 32.61 36.65 72.35 27.12 24.22 18.50 22.15 19.51 15.56 54.50 74.61 74.95 39.39 14.74 14.57 3.83 14.14 22.52 26.22 13.03 17.39 16.68 10.26 2.97 2.82 13.26 Solution 1.87 1.33 1.65 2.32 1.69 1.05 0.96 1.00 1.05 1.80 1.82 1.52 1.45 Exploiting hyper-sparsity Speedup Hyper-sparsity U-BTRAN PRICE INVERT CPU (%) 14.61 2.43 2.80 4.79 8.34 1.84 2.20 1.12 1.37 2.04 1.37 0.23 11.58 3.15 2.11 6.65 10.93 1.97 5.21 0.59 2.14 2.06 0.98 0.51 1.28 1.43 0.92 0.26 1.39 1.50 0.85 0.26 1.29 1.24 1.48 0.40 6.47 2.49 4.40 0.88 50.74 2.33 2.93 0.60 41.97 2.16 3.14 0.65 5.80 1.99 2.01 1.41 Table 6: Percentage of solution time when not exploiting hyper-sparsity, speedup when exploiting hyper-sparsity and percentage of solution time attributable to exploiting hyper-sparsity for test set H0 . When CPLEX is run, the default approach to pricing is to start with an inexpensive strategy and switch to Devex. For the test problems in this paper, this approach leads to a speedup of 1.22 (1.14 for the problems in H) over the performance when using only Devex pricing. However, since EMSOL uses only Devex pricing, it is compared with CPLEX using only Devex pricing. By default, SOPLEX uses the dual simplex method, although it often switches to the primal, particularly when close to optimality and occasionally significantly before. Although it is suggested that SOPLEX can run as a primal simplex solver, in practice it soon switches to the dual simplex method for the test problems in this paper. Thus, for the comparisons below, SOPLEX is run using its default choice of method. The default pricing strategy used by SOPLEX is steepest edge which is described by Forrest and Goldfarb in [8]. Even for the dual simplex method, steepest edge carries a higher computational overhead than Devex since it requires an additional BTRAN operation in the case of the primal (FTRAN in the dual) as well as some pricing in the primal. SOPLEX can be forced to use Devex pricing which has the same computational requirements in the dual as in the primal. Thus, in the results below, SOPLEX is forced to use only Devex pricing. Although the use of the primal or dual simplex method may significantly affect the number of iterations required to solve a problem, comparing the average time per iteration of SOPLEX and EMSOL gives a fair measure of the efficiency of the computational components of the respective solvers. SOPLEX has a presolve which is used by default. Since the presolve improves the solution time by only about 10% and incorporates scaling, in the results for SOPLEX given below it is run with the presolve. As a result, the relative iteration speed of SOPLEX may give a slight overestimate of the efficiency of the techniques used in its computational components. The results of the comparisons of EMSOL with CPLEX and SOPLEX, when run in the modes discussed above, are given in Table 7. Values of speedup which are greater than unity indicate that EMSOL is faster. 5.2.1 Comparison with CPLEX Table 7 shows that for the H problems EMSOL is faster on seven and CPLEX is faster on nine. On average, CPLEX is slightly faster: the mean speedup for EMSOL is 0.84. There is no consistent reason for the difference, with neither code dominating the other in time per iteration or number of iterations. Because the different solvers do not follow the same paths to the solution, it is difficult to make precise comparisons. Also we have noticed that making an arbitrary perturbation to the pivot choice can double or half the solution time (though the result presented in this paper are all for the EMSOL default settings). This difference is due both to the variation in number of iterations taken and to variation in the average amount of hyper-sparsity encountered on the solution path. Brief comments by Bixby [3] and by Bixby et al [4] indicate that methods to exploit hyper-sparsity in FTRAN, BTRAN and PRICE have been developed independently for use in CPLEX [15]. These techniques, for which no details have been published, were introduced to CPLEX between versions 6 and 6.5. There may be other unpublished features of CPLEX unconnected with hypersparsity which contribute the the differences in performance when compared 23 with EMSOL. 5.2.2 Comparison with SOPLEX In the comparison of EMSOL and SOPLEX, two values are given in Table 7 for each model: the relative speedup in total solution time when EMSOL is used and the relative speedup in the time required by EMSOL to perform a simplex iteration. In terms of solution speed, EMSOL is clearly far superior, dramatically so for FIT2P and DCP1 for which SOPLEX takes an excessive number of iterations. However, even after excluding these problems, the mean speedup when using EMSOL is 2.72, which is considerable. Note that, for the problems in test set H, SOPLEX with Devex pricing is 1.14 times faster than with all its defaults when it uses steepest edge. For the reasons set out above, the values for the speedup in iteration time provide the most reliable comparison of the techniques used within the computational components of EMSOL and SOPLEX. With the exception of STOCFOR3, the iteration speed of EMSOL exceeds that of SOPLEX, significantly so for the larger problems. Problem 80BAU3B FIT2P GREENBEA GREENBEB STOCFOR3 WOODW DCP1 DCP2 CRE-A CRE-C KEN-11 KEN-13 KEN-18 PDS-06 PDS-10 PDS-20 Mean EMSOL Solution time CPU (s) 9.43 67.34 24.63 19.55 201.21 3.09 14.64 483.64 4.49 3.50 9.19 100.77 1421.50 8.20 83.83 1059.19 CPLEX Solution time CPU (s) 14.39 42.08 21.18 18.31 41.15 3.99 16.30 1412.89 6.29 5.91 13.57 91.41 786.08 5.49 26.92 238.48 (Devex) EMSOL solution speedup 1.53 0.62 0.86 0.94 0.20 1.29 1.11 2.92 1.40 1.69 1.48 0.91 0.55 0.67 0.32 0.23 0.84 SOPLEX (Devex) Solution EMSOL EMSOL time solution iteration CPU (s) speedup speedup 20.50 2.17 1.78 1058.79 15.72 1.30 150.53 6.11 1.38 76.36 3.91 1.56 154.11 0.77 0.64 21.41 6.93 2.58 330.35 22.56 1.69 1507.40 3.12 1.24 5.07 1.13 1.09 3.39 0.97 1.06 46.99 5.11 2.20 417.26 4.14 2.05 5167.60 3.64 1.91 44.22 5.39 3.37 158.79 1.89 2.37 1754.17 1.66 2.73 3.47 1.67 Table 7: Performance of EMSOL relative to CPLEX 6.5 and SOPLEX 1.2 6 Conclusions and extensions This paper has identified hyper-sparsity as a property of a significant number of LP problems when solved by the revised simplex method. Techniques for exploiting this property in each of the computational components of the revised simplex method have been described. Although this has been done in the 24 context of the primal simplex method, the techniques developed in this paper can be applied immediately to the dual simplex method since it has comparable computational components. For the subset of our test problems that do not exhibit hyper-sparsity (H), the geometric mean speedup is 1.45, and for those problems which do exhibit hyper-sparsity (H0 ), the speedup is substantial, increases with problem size and has a mean value of 5.38. For this latter subset of problems our implementation of the revised simplex which exploits hyper-sparsity has been shown to be comparable to the leading commercial simplex solver and several times faster than the leading publicdomain solver. Although this performance gain is substantial for only a subset of LP problems, the amenable problems in this paper are of genuine practical value and the techniques yield some performance improvement in all but one of the test problems. The authors would like to thank John Reid who brought the Gilbert-Peierls algorithm to their attention and made valuable comments on an earlier version of this paper. References [1] R. H. Bartels. A stabilization of the simplex method. Numer. Math., 16:414–434, 1971. [2] J. Bisschop and A. J. Meeraus. Matrix augmentation and partitioning in the updating of the basis inverse. Mathematical Programming, 13:241–254, 1977. [3] R. E. Bixby. Solving real-world linear programs: A decade and more of progress. Operations Research, 50(1):3–15, 2002. [4] R. E. Bixby, M. Fenelon, Z. Gu, E. Rothberg, and R. Wunderling. MIP: Theory and practice closing the gap. In M. J. D. Powell and S. Scholtes, editors, System Modelling and Optimization: Methods, Theory and Applications, pages 19–49. Kluwer, The Netherlands, 2000. [5] W. J. Carolan, J. E. Hill, J. L. Kennington, S. Niemi, and S. J. Wichmann. An empirical evaluation of the KORBX algorithms for military airlift applications. Operations Research, 38(2):240–248, 1990. [6] V. Chvátal. Linear Programming. Freeman, 1983. [7] G. B. Dantzig and W. Orchard-Hays. The product form for the inverse in the simplex method. Math. Comp., 8:64–67, 1954. [8] J. J. Forrest and D. Goldfarb. Steepset-edge simplex algorithms for linear programming. Mathematical Programming, 57:341–374, 1992. [9] J. J. H. Forrest and J. A. Tomlin. Updated triangular factors of the basis to maintain sparsity in the product form simplex method. Mathematical Programming, 2:263–278, 1972. 25 [10] D. M. Gay. Electronic mail distribution of linear programming test problems. Mathematical Programming Society COAL Newsletter, 13:10– 12, 1985. [11] J. R. Gilbert and T. Peierls. Sparse partial pivoting in time proportional to arithmetic operations. SIAM J. Sci. Stat. Comput., 9(5):862–874, 1988. [12] J. A. J. Hall and K. I. M. McKinnon. Hyper-sparsity in the revised simplex method and how to exploit it. Technical Report MS00-015, Department of Mathematics and Statistics, University of Edinburgh, 2000. Submitted to SIAM Journal of Optimization. [13] J. A. J. Hall and K. I. M. McKinnon. http://www.maths.ed.ac.uk/hall/PublicLP/, 2002. LP test problems. [14] P. M. J. Harris. Pivot selection methods of the Devex LP code. Mathematical Programming, 5:1–28, 1973. [15] ILOG. CPLEX 6.5 Reference Manual, 1999. [16] R. D. McBride and J. W. Mamer. Solving multicommodity flow problems with a primal embedded network simplex algorithm. INFORMS Journal on Computing, 9(2):154–163, Spring 1997. [17] R. D. McBride and J. W. Mamer. A decomposition-based pricing procedure for large-scale linear programs: an application to the linear multicommodity flow problem. INFORMS Journal on Computing, 46(5):693–709, May 2000. [18] H. D. Mittelmann. Benchmarks for optimization http://www.plato.la.asu.edu/bench.html, April 2002. software. [19] J. A. Tomlin. Pivoting for size and sparsity in linear programming inversion routines. J. Inst. Maths. Applics, 10:289–295, 1972. [20] R. Wunderling. Paralleler und objektorientierter simplex. Technical Report TR-96-09, Konrad-Zuse-Zentrum für Informationstechnik Berlin, 1996. [21] R. Wunderling, A. Bley, T. Pfender, and T. Koch. SOPLEX 1.2.0, 2002. 26