Notes on Linear Programming 1 Introduction and Elementary Examples Linear programming and it variants are certainly to most widely used optimization algorighms in applications. The main algorithm used for actual computation is the Simplex Algorighm and is based on techniques from Linear Algebra. Computer codes are commonly available with most libraries of numerical routines and used for diverse problems ranging from problems of the efficient use of limited resources to the design of routings for computer cables. The basic numerical algorithm the Simplex Algorighm is designed to solve the linear optimization problem whose standard form can be written as n X Minimize cj xj j=1 under the side conditions (1.1) xj ≥ 0 , j = 1, . . . , n , and n P aij xj = bi , i = 1, . . . , m . j=1 This problem is a linear, finite dimensional problem since the cost functional, as well as the side conditions, are written in terms of finitely many linear functionals. Remark: Note that these two problems are written as minimization problems. It is common in practice that one is confronted by a problem in maximization. However, each such problem may be rewritten as one of minimization simply by replacing the cost functional n X cj xj by (−1) n X cj xj j=1 j=1 We will, for the theoretical development, always speak in terms of minimization problems although some of the examples are ones of maximization of an appropriate cost. 1 1.1 Some Examples To make the general formulation above more concrete, we look at a pair of examples. Example 1.1 Suppose a manufacturing concern produces four different products which we will call A, B, C, and D. Suppose further that there are fixed amounts of three different resources, such as labor, raw materials, equipment, etc. which can be combined in certain fixed and known ways to produce the four products. In particular, it is known how much of each resource must be used to produce one unit of each product and the variable cost of each resource. Moreover, we will assume that the profit made for each unit of product sold is known. To be specific, let us take the data as listed in the following table: Resource PRODUCT Limitation on Resource A B C D 1 5 2 3 1 300 2 1 2 1 2 200 3 1 0 1 0 100 Unit Price 6 4 2 1 From this data, we wish to make the decision of how much of each product to make in order to maximize profit. We want to write this problem mathematically as a linear programming problem. To accomplish this, we let xj , j = 1, 2, 3, 4, represent the amounts of the products to be manufactured. These are the numbers that we wish to determine. From the last line of the table, we can write the objective functionsal as z = 6x1 + 4x2 + 2x3 + x4 . Since the resources are limited, we have certain constraints which are determined by the limitations on the resources listed in the upper lines of the table. The three constraints lead to the set of inequalities: 5x1 + 2x2 + 3x3 + x4 ≤ 300 x1 + 2x2 + x3 + 2x4 ≤ 200 x1 + x3 ≤ 100 2 Likewise we have the usual non-negativity constraints, xj ≥ 0, j = 1, 2, 3, 4. This linear program is summarized by: Maximize 6x1 + 4x2 + 2x3 + x4 under the side conditions 4 x ∈ R and xi ≥ 0, i = 1, 2, 3, 4 together with 5x1 + 2x2 + 3x3 + x4 ≤ 300 x1 + 2x2 + x3 + 2x4 ≤ 200 x + x ≤ 100 1 3 (1.2) This last example is one with four unknowns. If there are only two, it is possible to use graphical methods to see the optimal solution. By doing so, we gain insight into the general case. Example 1.2 A bank has $100 million dollars dedicated to investment. A portion will be invested in loans ($L million) and a portion in securities ($S million). Loans yield high returns (for example 10%) but have the drawback that they are long-term investments. On the other hand, securities yield, for example, 5% earnings. On a yearly basis, the bank recieves 0.1 L + 0.05 S in returns on these investments. This amount is to be maximized subject to the obvious constraints. 1. Implicit side conditions: L ≥ 0, S ≥ 0. 2. Limited money supply: L + S ≤ 100. 3. Liquidity constraints: It is desired (or required by law) that the bank hold at least 25% of the available investment monies in liquid form, i.e., S ≥ 0.25 (L + S) or L − 3 S ≤ 0. 4. Minimal investments: Certain companies expect sizable loans, e.g. L ≥ 30. 3 A pair (L, S) which satisfies the constraints (1) − (4) are called admissible investments. The managers of the bank wish to find, among all admissible investments, that which maximizes the income from investments 0.1 L + 0.05 S. In other words, we have the linear programming problem: Maximize 0.1 L + 0.05 S under the side conditions L ≥ 0, S ≥ 0 together with L + S ≤ 100 L− 3S ≤ 0 L ≥ 30 (1.3) We can solve this problem graphically as follows: 100 90 80 70 S 60 50 L−3S=0 40 P 30 L+S=100 20 10 0 0 20 40 60 80 100 L The shaded region of the triangle represents the admissble investments. The question is: Which of the points in the triangle are optimal? 1 1 L + 20 S = constant, i.e., the lines The dotted lines in the figure represents the lines 10 2L + S = c for different constants c > 0. The line with slope −2 is apparently optimal if it passes through P . Then P = (L∗ , S ∗ ) is optimal! L∗ + S ∗ = 100 The point P is characterized by the system of equations , and L∗ − 3S ∗ = 0 therefore L∗ = 75, S ∗ = 25 with value $8.75 million. 4 Remark: It is important to notice that the optimal choice occurs at a corner of the polygonal domain of admissible investment pairs. Indeed, this is true in general for such problems. It may well be that there are other optimal solutions as we will see in the next example. Example 1.3 Let us consider the linear programming problem: Maximize 2 x1 + 0.5 x2 under the side conditions x ≥ 0, x ≥ 0 1 2 together with 4 x1 + 5 x2 ≤ 30 4 x + x ≤ 12 1 2 (1.4) In the diagram below, the feasible region lies in the bounded polygonal region, and the dotted lines are level lines of the cost functional. Here we see that the point P as well as the point (3, 0) are optimal points as are all points on the line segment joining these points. In fact, all these points represent optimal solutions for the problem. An optimal solution (in fact two of them) lie at a corner point of the feasible region. Notice that there are no interior points which are optimal. 8 4x1+x2=12 7 6 z=8 x 2 5 4 P 3 2 1 0 0 z=2 z=4 z=6 2 4x1+5x2=30 4 x 6 8 10 1 The interesting thing is that we can, from these examples, see what the general picture is regardless of the number of unknowns or the number of constraints that are involved. Explicitly, 5 1. The intersection of the quadrant boundaries and the constraint boundaries generated a convex polygon. This is also true in the n-dimensional case where the constraints are geometrically represented as hyperplanes and the domain is an n-dimensional polygonal body. 2. A solution to the linear programming problem is at a “corner” or extreme point of the polygon and not in the interior of the polygon of feasible solutions. We say that the common boundaries which meet at the corner are the set of “active” constraints. This is also true in the n-dimensional case with a polyhedron. Indeed, to find an optimal solution, one need only search through the extreme points of the set of feasible solutions. Since, in higher dimensions, it is not possible to use geometric methods to arrive at a solution, we will need an algebraic formulation of an algorithm that searches these points. That algorithm is, of course, the Simplex Algorithm. 1.2 Formulation in Terms of Matrices We start with some basic notation that we will use throughout these notes. We will always write vectors a, b, c, x, y, z ∈ Rn as column vectors. An (m × n)−matrix will be denoted by capital roman letters, as, for example, A = (aij )i=1,...,m ∈ Rm×n . j=1,...,n The transposed matrix A⊤ is defined by A⊤ = (aji ) j=1,...,n ∈ Rn×m . We also will i=1,...,m interpret x ∈ Rn as an (n × 1)− matrix, in which case x⊤ is exactly the row vector in R1×n . We recall that there are different possibilities for multiplying vectors which are special cases of the rule for matrix multiplication AB ∈ Rm×p for A ∈ Rm×n and B ∈ Rn×p . Specifically, the scalar product or dot product is x⊤ y = n P xj yj for x = (xj ), y = j=1 (yj ) ∈ Rn . For x, y ∈ Rn we introduce the partial order: x ≤ y ⇐⇒ xj ≤ yj for all j = 1, . . . , n. With these conventions, we can write the problem (1.1) as: 6 Minimize c⊤x under the side conditions (1.5) x ≥ 0, and Ax = b . where c ∈ Rn , A ∈ Rm×n , b ∈ Rm are given and x ∈ Rn is to be found. Example 1.4 Let us look at an example which is not in the standard form and see how we might introduce auxiliary variables to derive an equivalent problem in standard form. Maximize x1 + 2 x2 + 3 x3 under the side conditions x1 ≥ 0, x3 ≤ 0 together with x1 − 2x2 + x3 ≤ 4 x1 + 3 x2 ≥ 5 x + x = 10 1 3 (1.6) This problem is first replaced with a minimization problem with constraints in the standard form: Minimize − x1 − 2 x2 − 3 x3 under the side conditions x1 ≥ 0, −x3 ≥ 0 together with x1 − 2x2 + x3 ≤ 4 −x1 − 3 x2 ≤ 5 x + x = 10 1 3 7 (1.7) We then introduce the slack variables x4 ≥ 0 and x5 ≥ 0 and rewrite the inequality constraints as equality constraints. Minimize − x1 − 2 x2 − 3 x3 under the side conditions x1 ≥ 0, −x3 ≥ 0, x4 ≥ 0, x5 ≥ 0 together with x1 − 2x2 + x3 ≤ 4 −x1 − 3 x2 ≤ 5 x + x = 10 1 3 (1.8) Now, since x2 is not subject to inequality constraints1 , we introduce auxiliary variables u ≥ 0 and v ≥ 0 and set x2 = u − v as well as x̂3 = −x3 . Substitution of these forms of x2 and x3 leads to the new system Minimize − x1 − 2 x2 + 3 x̂3 under the side conditions x1 ≥ 0, u ≥ 0, v ≥ 0, x̂3 ≥ 0, x4 ≥ 0, x5 ≥ 0 together with x1 − 2 u + 2 v − x̂3 + x4 = 4 −x1 − 3 u + 3 v + x5 = 5 x − x̂ = 10 1 3 (1.9) Finally, we can rename all the variables. Whichever way we do that, to recover the solution of the original problem, we need to keep track of the renaming. For the purposes of illustration, we make the following assignments: x1 → x1 , u → x2 , x̂3 → x4 , x4 → x5 , 1 v → x3 x5 → x6 (1.10) (1.11) Such variables are often called free variables which should not be confused with the use of the term free variable in Gaussian elimination 8 so that the problem can be written as Minimize − x1 − 2 x2 + 2 x3 + 3 x4 under the side conditions x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0, x6 ≥ 0 together with x1 − 2 x2 + 2 x3 − x4 + x5 = 4 −x1 − 3 x2 + 3 x3 + x6 = 5 x − x = 10 1 4 (1.12) The matrix form is then Minimize c⊤x under the side conditions x≥0 together with Ax = b where −1 −2 4 1 −2 −2 −1 1 0 2 c = A = 3 0 0 1 b := −5 −1 −3 3 10 1 0 0 −1 0 0 0 0 (1.13) x := x1 x2 x3 x4 x5 x6 Notation: In the following we will always take M to be the set of feasible points, i.e., M = {x ∈ Rn : x ≥ 0, Ax = b} . 9 It is important to note that the set M ⊂ Rn is a convex set, a fact which follows from the linearity of matrix multiplication. We can then introduce the following definition: Definition 1.5 Let (P ) stand for the linear programming problem in standard form. The vector x∗ ∈ Rn is called a solution of the problem (P ) provided (i) x∗ is feasible, i.e., x ∈ M, and (ii) x∗ is optimal, i.e., c⊤x∗ ≤ c⊤x for all x ∈ M. In addition, it is useful to introduce two further terms: Definition 1.6 A basic feasible solution of the linear programming problem is a feasible solution with no more than m positive components xj . Note that the positive components of x correspond to linearly independent columns of the matrix A. Definition 1.7 A non-degenerate basic feasible solution is a basic feasible solution with exactly m positive components xj . A degenerate basic feasible solution has fewer than m positive components. 2 The Simplex Method for Linear Programming The Simplex Method is designed for problems written in the standard form, i.e. (P ) Minimize f (x) := c⊤ x on M = {x ∈ Rn : Ax = b , x ≥ 0}. Here c ∈ Rn , A ∈ Rm×n and b ∈ Rm with m ≤ n are given. In the case that M is bounded, then M is compact, and there exists a solution of the optimization problem. Weaker assumptions that guarantee the existence of an optimal solution we will make later. In any case, no solution can exist if c⊤ x on M is unbounded below, i.e. if inf c⊤ x = −∞. x∈M Remarks: 10 (i) Every linear optimization problem can be rewritten in the form (P ) so that the new problem is equivalent to the old. (To do so, one introduces so-called ”slack variables” and the substitution of the unconstrained variables xj by uj − vj with uj ≥ 0, vj ≥ 0). (ii) If the rank A < m, then the equations are “ redundant”, and by dropping those which are, the matrix can be replaced by one of full rank. From a theoretical viewpoint, one can drop the requirement that rank A = m. 2.1 The Gauss-Jordan Method The Gauss-Jordan method is used for the solution of the equation Ax = b where, again, A ∈ Rm×n and b ∈ Rm are given and we are to find x ∈ Rn . It may well be that m ≤ n. By no means do we need to require that rank A = m, and we ignore also the requirement that x ≥ 0. For Ax = b we introduce the shorthand form by introducing what we will call the Gauss-Jordan Tableaus: a11 .. . a12 .. . ··· a1n .. . b1 .. . am1 am2 · · · amn bm The Gauss-Jordan Method consists of: (i) Choice of pivot: One can take any non-zero element lying to the left of the b−column. (ii) “Empty” the pivot column (both over and under the non-zero pivot element) and normalize the pivot element to 1. (iii) Choose a new pivot element from the rows below all the rows that already contain a pivot element. If all rows either already contain a pivot element, or consist only of zeros then STOP. (iv) Return to step (ii). Example 2.1 Find the general solution of the linear system x1 + x2 + x3 − x4 = 1 4x1 + 5x2 + 5x3 + 2x4 = 0 . = 5 x1 − 2x2 − 5x4 11 Using the steps listed above, we obtain the following sequence of Tableaus: 1 1 1 −1 1 4 5 5 2 0 1 −2 subtract 5∗ 1. eqn.! 0 −5 5 1 1 1 −1 0 0 add 2nd eqn.! −1 1 7 −5 add 2nd eqn.! 0 −5 5 1 −2 0 1 1 −1 0 0 7 −5 ×(−1) 0 2 0 div (−2) 0 −2 0 1 1 1 0 0 −7 5 0 1 0 −1 0 6 −4 subtract 3rd row. 6 −4 0 0 1 1 0 0 −7 5 0 1 0 −1 0 7 −4 We are now finished. Written out completely, the system of linear equations is: x3 + 7x4 = −4 x1 x2 − 7x4 = 5 − 0 x4 = From this system, we can write down the general solution of the system. We take the variable x4 as the parameter. (We have three equations in four unknowns, so we expect that there will be one degree of freedom in describing the general solution!) We refer to x4 as the “free variable” in contrast to the dependent variables x1 , x2 and x3 . With this choice, x = (5 + 7t, t, −4 − 7t, t)⊤ ∈ R4 , t ∈ R is the general solution and 12 dim ker A = 1 and rank A = 32 . Generally r = rank A is the number of pivot elements and n − r is the number of free variables. Of some interest is the special solution that one obtains when all the free variables are set equal to 0. In our example, this special solution is x = (5, 0, −4, 0)⊤ . How do we obtain this solution “ mechanically”? To do this, we simply set the variables xj which belong to the columns which do not contain a pivot element equal to 0. The other variables appear in the right-hand column b “suitably” permuted. Although we are now finished with the linear system, we can nevertheless continue to look for further “special” solutions, for example: 0 0 1 −4 0 0 1/7 1 −4/7 1 0 0 −7 5 1 0 1 0 1 0 1 0 −1 0 0 1 1/7 0 −4/7 7 So we obtain as a special solution x3 = 0 and x1 = 1, x2 = −4/7, x4 = −4/7. The Simplex Algorithm (in Phase II) searches for these so-called basic solutions. 2.2 The Idea of the Simplex Method through a Special Case Before discussing the general method, we want to take a special case and work through the details. We chose the following particular example. Example 2.2 Maximize 20 x1 + 60 x2 under the side conditions 120 1 1 x1 5 10 x2 ≤ 700 , 520 2 10 0 x1 . ≥ 0 x2 The next figure represents the feasible region. We first transform this problem into the normal form through the introduction of slack variables x3 , x4 , x5 . Moreover, we rewrite the maximization problem as a minimization problem. Hence, we wish to 2 Recall that ker A := {x ∈ Rm |A x = 0}. 13 60 (0,52) 50 (60,40) 40 x_2 30 (100,20) 20 10 (120,0) (0,0) 0 20 40 60 80 100 120 140 x_1 Figure 1: The Feasible Region Minimize −20 x1 − 60 x2 x ∈ R5 , under the side conditions 120 1 1 1 0 0 5 10 0 1 0 x = 700 . 520 2 10 0 0 1 x ≥ 0, We see now that the system Ax = b is already in the form that we would obtain by application of the Gauss-Jordan method. Indeed, rank A = 3. The variables x1 and x2 are the free variables while x3 , x4 , x5 are dependent. The corresponding basic solution is ẑ = (0, 0, 120, 700, 520)⊤ ∈ R5 . We append the equation c⊤ x = γ to the Gauss-Jordan Tableau (γ is a parameter) in the following way: −20 −60 0 0 0 γ 1 1 1 0 0 120 5 10 0 1 0 700 2 10 (2.14) 0 0 1 520 This new tableau is called the Simplex Tableau. The basic solution ẑ1 = (0, 0, 120, 700, 520)⊤ is a solution of the top equation corresponding to the choice γ = 0 (since c⊤ z = 0 in this case. Moreover, this basic solution ẑ1 is admissible since b ≥ 0! We now introduce a different tableau in which we choose the marked entry 10 as pivot element: 14 −8 0 0 0 4 5 1 0 1 0 − 10 3 0 0 1 1 5 6 γ + 3120 68 −1 180 1 10 52 1 0 0 (2.15) Now, x1 , x5 are the free variables and x2 , x3 , x4 the dependent ones. The corresponding basic solution is z̃ = (0, 52, 68, 180, 0)⊤. We have had some good luck here since we also see that z̃ ≥ 0. Thus ẑ2 is the solution for the complete system (??) for γ + 3120 = c⊤ ẑ = 0, and therefore γ = −3120. Having made nothing more than equivalent transformations, ẑ2 is also a solution of (??) γ = −3120. And for this value of ẑ2 the cost function takes a smaller value than at ẑ1 , and hence is “better”! Remarks: (A) What happens, in the last example, when one takes, say a12 as pivot element rether than a32 ? Then we obtain 40 0 1 1 60 0 0 γ + 60 · 120 1 0 0 120 −5 0 −10 1 0 −500 −8 0 −10 0 1 −680 with the basic solution (0, 120, 0, −500, −680)⊤. This solution is not admissible for some of the components are negative. The step in the simplex algorithm apparently is valid in the case that the pivot element ars is positive and br /ars is minimal. (B) When do we get something > γ in the upper right entry of the tableau? Obviously, this occurs exactly when cs < 0 and br /ars > 0. This tells us that if all the entries of the first row to the left of the left of the last entry are non-negative, then we cannot obtain a better solution by performing further steps of the Simplex Algorithm. We continue the example above and consider (??): we must look for the pivot element in the first column since the first row of the tableau contains a negative element only in that column. 15 −8 0 0 0 4 5 6 γ + 3120 1 0 1 0 − 10 68 3 0 0 1 −1 180 1 5 1 0 0 1 10 52 8 3 10 3 γ + 3600 4 0 0 1 − 15 1 6 1 −3 1 6 0 0 0 1 0 0 0 1 0 1 3 1 − 15 20 60 40 The corresponding basic solution is x∗ = (60, 40, 20, 0, 0)⊤ ∈ M. This vector also satisfies ĉ⊤ x∗ = γ for γ+3600 = 0, i.e. γ = −3600. Here ĉ is the “current cost vector”, i.e., the first row in the current tableau. Hence x∗ also solves the initial system (??) for γ = −3600. Now we are finished since the current cost vector is non-negative. Each further simplex step would only increase the value of γ. We can now ignore the slack variables x3 , x4 , x5 and then we have found an optimal vector (60, 40)⊤ ∈ R2 as the solution of the initial problem. Remark: It is important to note that the basic solutions produced in the various steps, namely ẑ1 , ẑ2 and x∗ , have first and second components which correspond to (actually neighboring) vertices of the corner points of the feasible region! We now should have some insight into the importance of the simplex tableau as a method of solution of organizing the computations necessary to solve the linear programming problem. Indeed, its importance lies in the fact that it collects in fact that it collects in a particularly efficient manner, all the information necessary for carrying out the algorithm itself. In particular: (a) the basic solution can be read off directly: the basic variables correspond to columns which, taken together and properly permuted, correspond to an identity matrix. Since the remaining variables are set equal to zero, we have xj1 = bj1 , xj2 = bj2 , . . . xjm = bjm , and we have bj1 ≥ 0, . . . , bjm ≥ 0 if the basis is feasible; 16 (b) the value of the objective function is obtained by solving the equation γ + K = 0 from the upper right-hand corner of the tableau. Indeed, the first row reads, in effect c⊤ z = 0 = γ + K; (c) The reduced costs of the non-basic variables are obtained by reading directly the first row of the simplex tableau. They allow us, in particular, to see at once whether the current basis is optimal. This is the case when all entries of the first row corresponding to the non-basic variables are non-negative. We now wish to try to understand the observation made above that the solutions produced by the various steps correspond to the vertices of the feasible region. Let us begin with the following theorem. Theorem 2.3 Consider the linear program min c⊤ · x subject to Ax = b Then and x≥0 1. If there is any feasible solution, then there is a basic feasible solution. 2. If there is any optimal solution, then there is a basic optimal solution. Proof: Suppose that a feasible solution exists. Choose any feasible solution among those with the fewest non-zero components. If there are no non-zero components, then x = 0 and x is a basic solution by definition. Otherwise, take the index set J := {j1 , j2 , . . . , jr } with elements corresponding to those xji > 0. Then the matrix AJ := col (a(ji ) ) must be non-singular. Indeed, were Aj singular, then its columns {a(j1 ) , a(j2 ) , . . . a(jr ) } would be a linearly dependent set of vectors and hence for some choice of scalars αi , not all zero, α1 a(j1 ) + α2 a(j2 ) + . . . + αr a(jr ) = 0. (2.16) Without loss of generality, we may take α1 6= 0 and, indeed, α1 > 0 (othewise multiply (??) by −1). 17 Now, since xji > 0, the corresponding feasible solution x is just a linear combination of the columns a(ji ) . Hence Ax = r X xji a(ji ) = b. i=1 Now multiplying the dependence relation (??) by a real number λ and subtracting, we have r X (xji − λαi ) a(ji ) = b. i=1 Now if λ is taken to be sufficiently small, we still have a feasible solution with components (xji − λαi ) ≥ 0. Indeed, we can insure this inequality holds by taking λ < x(ji ) /αi , for those i ∈ {1, 2, . . . , r} for whichαi > 0. If we take λ sufficiently large, however, we can arrange that the component (xj1 − λα1 ) < 0 and this will be the case if and only if λ > xj1 /α1 since α1 > 0. On the other hand, if αi ≤ 0 then (xji − λαi ) ≥ 0 for all λ ≥ 0. We see not that we may choose a number λ̃ by xji xji > 0, αi > 0 . (2.17) λ̃ := min αi If the minimum quotient occurs for i = k, then xjk − λ̃αk = 0 (why?) and so we have found a new feasible solution with fewer non-zero components and hence the a(ji ) must be linearly independent. It follows that the matrix AJ is non-singular and so xJ is feasible and basic. Hence, if we have a feasible solution, then there exists a basic feasible solution. Now assume that x∗ is an optimal solution. There is no guarantee that this optimal solution is unique. In fact, in many cases there is no uniqueness. Again, some of these solutions may have more positive components than others. Without loss of generality, we assume that x∗ has a minimal number of positive components. If x∗ = 0 then x∗ is basic and the cost is zero. If x∗ 6= 0 and if J is the corresponding index set, then we wish to show that the matrix AJ is non-singular or, equivalently, that the columns of AJ are linearly independent. If {a(ji ) } are, on the contrary, linearly dependent, then there exist coefficients αi , not all r P zero such that αi a(ji ) = 0. As before, we may assume that α1 > 0. We claim that i=1 r X αi cji = 0. i=1 18 (2.18) r P Since x∗ is feasible, we have A x∗ = b so that xji a(ji ) = b. Now look at the equation i=1 (j ) Pr ∗ i = b. Then the condition x∗ji > 0 implies that x∗ji − λαi ≥ 0 for i=1 xji − λαi a sufficiently small |λ|. Hence r X cj i i=1 x∗ji − λαi = r X cji x∗ji − i=1 r X λ cji αi = c⊤ x∗ − λ c⊤ α, i=1 so that if (??) were false, then we could decrease the cost by letting λ be some small positive or negative number and hence x∗ would not be optimal. r P Now, set γλ := cji (x∗ji − λαi ). Then γo = c⊤ x∗ . If we let λ increase from zero, then, as i=1 long as the components x∗ji − λαi remain non-negative for all i = 1, . . . , r, then the vector xλ with these components remains feasible and is optimal with the same cost as x∗ in light of the dependence relation (??). Since α1 > 0 we see that for some λ at least one component will be negative, in particular for λ > xj1 /α1 . Again, set λ̃ as in (??). If the minimum occurs for i = k then x∗jk − λ̃αk = 0 and so we have produced a new optimal feasible solution with fewer positive components than x∗ which contradicts the choice of the original optimal solution. We conclude that the matrix AJ must be non-singular, and so the optimal solution is basic. 2 3 The Simplex Algorithm: description The Simplex Method or Simplex Algorithm can be described in terms of two phases, namely Phase I: We construct an admissible basic solution ẑ, Jˆ of Ax = b, a vector ĉ ∈ Rn and a representation M = {x ∈ Rn : Âx = b̂, x ≥ 0} so that M has the following properties: (a) If Jˆ = {j1 , . . . , jm }, then âjk = ek for k = 1, . . . , m, where ek is the k th unit vector in Rm , (b) ĉj = 0 for all j ∈ Jˆ (and hence ĉ⊤ ẑ = 0), and b̂ ≥ 0, (c) ĉ⊤ x + f (ẑ) = c⊤ x for all x with Ax = b. ˆ of Phase II: We start with the assumption that we have found a basic solution (ẑ, J) n Ax = b, a vector ĉ ∈ R and a representation M, so that (a), (b), and (c) are satisfied. ˜ of Ax = b, a vector c̃ and Ã, b̃ with We then seek another admissible basic solution (z̃, J) the properties (a),(b),(c) for which f (z̃) < f (ẑ) . Once a new basic solution is found, 19 we then replace ẑ with z̃ and the other quantities in a similar manner. We continue this process to construct a sequence of feasible basic solutions {ẑ k } and hope that it converges to an optimal solution of the problem or, that the algorithm stops at an optimal solution, or that it gives us the information that inf(P ) = −∞. We illustrate this process with an example. Example 3.1 Consider the problem Maximize 3 x1 + x2 + 3 x3 , subject to 2x1 + x2 + x1 +2x2 + 2x1 + 2x2 x3 ≤ 3x3 ≤ 5 + 2 x3 ≤ 6 x1 ≥ 0, x2 ≥ 0, x3 ≥ 0. In order to put this problem into a standard form so that the simplex procedure can be applied, we change the maximization problem to minimization by multiplying the objective function by −1 and we introduce three non-negative slack variables x4 , x5 , x6 . We then have the initial tableau INITIAL TABLEAU x1 x2 x2 x4 x5 x6 b 2 1 1 1 0 0 2 ← eq′ n 1 1 2 3 0 1 0 5 ← eq′ n 2 2 2 1 0 0 1 6 ← eq′ n 3 −3 −1 −3 0 0 0 0 ← (−costfct′ n) ↑ ↑ ↑ neg. neg. neg. Note: This problem is in canonical form with the three slack variables as basic variables. 20 Simplex Method Phase II Given (P̂ ) with admissible basic solution ˆ satisfying the conditions (a),(b),(c) (ẑ, J), ? 6 Set γ̂ := f (ẑ) = c⊤ ẑ ? ẑ Solution of (P ) yes γ̂ = inf (P ), STOP ˆ ĉj ≥ 0 ∀j 6∈ J? no ? Choose s ∈ {1, . . . , n} \ Jˆ with ĉs < 0, e.g. ĉs := minj6∈Jˆ ĉj (P ) has no solution, inf(P ) = −∞, STOP update: ˜:=ˆ ? yes 6 â∗s ≤ 0? no ? Determine r ∈ {1, . . . , m} with b̂r /ârs = min{b̂i /âis : i ∈ {1, . . . , m}, âis > 0} ? New basis indices: ̃k = jk , k 6= r, ̃r := s, i.e. J˜ = {j1 , . . . , jr−1 , s, jr+1, . . . , jm } ? New Ã, b̃, c̃ and γ̃: (ãr1 , · · · , ãrn |b̃r ) = 1 (âr1 , · · · ârs , ârn |b̂r ) (ãk1 , · · · , ãkn |b̃k ) = (âk1 , · · · , âkn |b̂k ) − âks (ãr1 , · · · , ãrn |b̃r ), k 6= r (c̃1 , · · · , c̃n | − γ̃) = (ĉ1 , · · · , ĉn | − γ̂) − ĉs (ãr1 , · · · , ãrn |b̃r ) ? New admissible basic Solution: ˜ z̃̃ = b̃k , k = 1, . . . , m z̃j := 0 (j 6∈ J), k 21 - bi , for all positive elements xi,j . Rule for Pivots: Compute the ratios xi,j Find the element in the possible pivoting columns that corresponds to that minimal ratio. Pivoting on that element maintains feasibility and decreases cost. (Remember the non-degeneracy assumption!) Look again at the given array; the circled entries correspond to the possible pivot elements chosen by that rule! FIRST TABLEAU x1 x2 x2 x4 x5 x6 b 2 1 1 1 0 0 2 ← eq′ n 1 1 2 3 0 1 0 5 ← eq′ n 2 2 2 1 0 0 1 6 ← eq′ n 3 −3 −1 −3 0 0 0 0 ← (−cost fct′ n) ↑ ↑ ↑ neg. neg. neg. If we actually carry out the division and look at the corresponding entries we can see that this is the case since the resulting entries for the variables becomes 1 2 2 2 0 0 5 2 5 0 5 0 3 3 3 6 0 0 6 5 We chose to pivot on 1 in the second column (because of ease of hand computation) and obtain the following tableau. Note that the last row is computed by r4 − r1 which means that the entries are (−3 + 2), (−1 + 1), (−3 + 1), (0 + 1), (0 + 0), (0 + 0), (0 + 2) . so that the second tableau is 22 SECOND TABLEAU 2 1 1 −3 0 −2 −1 1 0 0 2 1 −2 1 0 1 0 −1 −2 0 1 2 0 −2 0 2 1 0 ↑ ↑ ↑ neg. neg. decr. to − 2 Notice that the first and third columns contain negative entries for the coefficients of the cost function so represent appropriate columns for pivoting. We have circled the appropriate pivot element. Using 1 in the third column as pivot element, we arrive at the third tableau: THIRD TABLEAU 5 1 0 3 −1 0 1 −3 0 1 −2 1 0 1 −5 0 0 −4 1 1 3 −7 0 0 −3 2 0 4 ↑ ↑ ↑ neg. neg. decr. to − 4 Since there are still negative elements in the last row, we pivot again this time chosing the (1,1) entry as pivot element. 23 1 0 1 5 3 5 FINAL TABLEAU 3 1 0 − 0 5 5 1 2 1 − 0 5 5 1 5 8 5 0 1 0 −1 0 0 4 0 7 5 0 6 5 3 5 0 27 5 ↑ ↑ ↑ pos. pos. pos. pos. pos. decr.to − 27 5 Looking at the last tableau, we can rewrite the linear system as x1 0 x1 1 3 1 1 x2 + 0 x3 + x4 + x5 + 0 x6 = 5 5 5 5 3 1 2 8 + x2 + x3 − x4 + x5 + 0 x6 = 5 5 5 5 + 0 x1 + x2 + 0 x3 − x4 + 0 x5 + x6 = 4 which yields the basic solution (setting the variables x2 = x4 = x5 = 0) 1 5 0 8 xB = 5 , 0 0 4 which is the optimal solution. 24 4 Extreme Points and Basic Solutions In Linear Programming, the feasible region in Rn is defined by P := {x ∈ Rn | Ax = b, x ≥ 0}. The set P , as we have seen, is a convex subset of Rn . It is called a convex polytope. The term convex polyhedron refers to convex polytope which is bounded. Polytopes in two dimensions are often called polygons. Recall that the vertices of a convex polytope are defined as the extreme points of that set. Extreme points of a convex set are those which cannot be represented as a proper convex combination of two other (distinct) points of the convex set. It may, or may not be the case that a convex set has any extreme points as shown by the example in R2 of the strip S := {(x, y) ∈ R2 | 0 ≤ x ≤ 1, y ∈ R}. On the other hand, the square defined by the inequalities |x| ≤ 1, |y| ≤ 1 has exactly four extreme points, while the unit disk described by the ineqality x2 + y 2 ≤ 1 has infinitely many. These examples raise the question of finding conditions under which a convex set has extreme points. The answer in general vector spaces is answered by one of the “big theorems” called the Krein-Milman Theorem. However, as we will see presently, our study of the linear programming problem actually answers this question for convex polytopes without needing to call on that major result. The algebraic characterization of the vertices of the feasible polytope confirms the observation that we made by following the steps of the Simplex Algorithm in our introductory example. Some of the techniques used in proving the preceeding theorem come into play in making this characterization as we will now discover. Theorem 4.1 The set of extreme points, E, of the feasible region P is exactly the set, B of all basic feasible solutions of the linear programming problem. Proof: We wish to show that E = B so, as usual, we break the proof into two parts. Part (a) B ⊂ E. Suppose that x(b) ∈ B. Then, for the index set J(x(b) ) ⊂ {1, 2, . . . , n} is defined by (b) j ∈ J(x(b) ) if and only if xj > 0. Now suppose that x(b) 6∈ E. Then there exist two distinct feasible points y, z ∈ P and a λ ∈ (0, 1) for which x(b) = (1 − λ) y + λ z. Observe that for any integer k 6∈ J, it must be true that (1 − λ) yk + λ zk = 0. Since 0 < λ < 1 and y, z ≥ 0, this implies that, for all such indices k, yk = zk = 0. Now x(b) is basic, so that the columns of A corresponding to the non-zero components form a linearly independent set in Rn . Since the only non-zero components of y and z have the same indices as the non-zero components of x(b) , we have span(a(j) ), j ∈ 25 J(x(b) contains x(b) , y and z. Moreover, since y and z are feasible, we have Ay = b and Az = b so that b = (1 − λ) b + λ b = (1 − λ) A y + λ A z. Now the system Ax = b is uniquely solvable on the set {x ∈ Rn |xi = 0, i 6∈ J(x(b) )}, so that we must have Ax(b) = Ay = Az and hence x(b) cannot we written as a proper convex combination of two other distinct points of P , which means that x(b) is an extreme point of P . Part (b) E ⊂ B If x(e) is an extreme point of P , then it has a minimal number of non-zero components. Indeed, if the number of non-zero components were not minimal, then there is a feasible solution y, i.e., y ≥ 0, Ay = b, with fewer non-zero components and we may, without loss of generality, assume that y has a minimal number of such components. Let J(y) be the index set for y and let k ∈ J(y) for which xk = min i∈J(y) yk xi yi =: λ̃ > 0. junk Then x(e) − λ̃y ≥ 0, and A(x(e) − λ̃y) = (1 − λ̃)b. (4.19) We now consider two cases: 1. If λ̃ ≥ 1 then for some δ > 0, λ̃ = 1 − δ. It follows that the equation (??) can be rewritten (1 − λ̃)b = −δ b = A(x(e) − y − δ) = A(x(e) − y) − δA(y) = A(x(e) − y) − δ b. and so, since x(e) 6= y, we have z := x(e) − y ≥ 0 (y has fewer non-zero components than x(e) , and A(z) = 0. Hence we can write x(e) as (e) x 1 = 2 2 2 1 1 1 y+ x + y + x = y1 + y2 , 3 2 3 2 2 26 y1 6= y2 6= x(e) . This being the case, we have 2 A y1 = A y + x = A y 3 and likewise A y2 = A y, so that A y1 = A y2 = b. y1 , y2 ≥ 0. So x(e) is a proper convex combination of two points of P and is therefore not an extreme point of P , a contradiction. Hence we conclude that we must have: x(e) − λ̃ y satisfies Az = b and z ≥ 0. (1 − λ̃) Furthermore, x(e) 6= z (again since y has fewer non-zero components than x(e) ) so that x(e) = (1 − λ̃) z + λ̃ y and therefore x(e) cannot be an extreme point of P. 2. λ̃ < 1. In this case, the vector z := From these two cases, we see that x(e) must have a minimal number of non-zero components. It remains to show that the number of non-zero components of x(e) is at most m. Indeed, if there were more than m such components, then the corresponding columns of A would be linearly dependent. Then there would exist a y with fewer non-zero components such that J(y) ⊂ J(x(e) ) and A y = 0 and, consequently also y 6= x(e) . By choosing λ = − min {i|yi <0} ( (e) x − i yi ) , or λ = − min i ( (e) xi yi ) if y ≥ 0, the vector x(e) +λ y would be a feasible solution i.e., x(e) +λ y ≥ 0, with J(x(e) +λ y) ( J(x(e) ). This is a contradiction. Hence J(x(e) ) can contain at most m indices and hence x(e) is a basic feasible solution. 2 As a simple corollary of this last theorem, we can see that there are at most a finite number of vertices that must be checked by the Simplex Algorithm. Corollary 4.2 The number of vertices of P is at most Cnm = 27 n! . m! (n − m)! Proof: : Cnm is he number of choices of m columns out of n, so the largest number of bases is Cnm , not all of which may be feasible. 2 The final piece of the puzzle is stated in the final result of this subsection that shows that if there is an optimal solution, it occurs at an extreme point of P . Corollary 4.3 The optimum of the linear form c⊤ x on a convex polyhedron, P , is attained at at least one vertex of P . If it is attained at more than one vertex, then it is attained at every point which is a convex combination of the two vertices. Proof: Let x(ei ) , i = 1, 2, . . . , k, be the extreme points of P . Set v ∗ = min {c⊤ x(ei ) , i = i=1..k 1, . . . , k}. We show that v ∗ is the minimum value of the cost on P . Using the first corollary, we see that every x ∈ P can be written as a convex combination of the x(ei ) and so x = k X (ei ) λi x , with λi ≥ 0, i=1 then c⊤ x = k P k X λi = 1, i=1 λi c⊤ x(ei ) , by linearity of the dot product in Rn . Hence i=1 ⊤ c x ≥ v ∗ k X λi = v ∗ . i=1 Therefore v ∗ is the minimum of c⊤ x on P , and it is attained on at least one vertex. The second part of the corollary follows directly from the linearity of the cost function. 2 28