Last revised 10:03 p.m. March 3, 2016 Polytopes and the simplex method Bill Casselman University of British Columbia cass@math.ubc.ca A region in Euclidean space is called convex if the line segment between any two points in the region is also in the region. For example, disks, cubes, and the emptyset are convex but circles, doughnuts, and most coffee cups are not. As I’ll recall later in detail, any convex region is the intersection of all affine half-spaces f ≥ 0 containing it. This note is mostly concerned with convex polytopes, which are the intersections of a finite number of such half-spaces. Thus cubes are polytopes but spheres are not. The simplest example, and the model Pin some sense for all others, is the coordinate octant xi ≥ 0. Another simple example is the simplex xi ≥ 0, xi ≤ 1. n If Σ is any set of points P P in R , its convex hull is the set of all points which can be represented as a finite sum ci Pi with ci ≥ 0, ci = 1, and all Pi in Σ. For example, the line segment between two points P and Q is the convex hull of those two points, and the convex hull of a circle is a disk. A convex hull is clearly convex. An extremal point of a convex region is a point that does not lie on the interior of a line segment in the region. In some sense dual to the result that any convex region is the intersection of all half-spaces containing it, any convex region is the convex hull of its extremal points. As was proved in [Weyl:1935], the convex hull of any finite set of points and rays is a polytope—i.e. it can also be specified in terms of a finite set of affine inequalities. I’ll recall Weyl’s proof of this in the first part. Conversely, any convex polytope is the convex hull of a finite set of points and rays. Defining a convex polytope by affine inequalities and defining it in terms of a convex hull are dual modes of specification. Thus we arrive immediately at the most basic computational problem concerned with convex polytopes—given a polytope specified in one of these ways, to find a specification in the other. But there are several other things one might ask. A face of a polytope is any affine polytope on its boundary. A wall (also often called a facet) is a proper face of maximal dimension. Here are some natural questions about a polytope that one might want to answer: Polytopes and the simplex method 2 • Is the region bounded? • If it is bounded: ◦ What are its extremal points(i.e. its vertices)? ◦ What is its volume? ◦ What is the number of lattice points it contains? • What is its dimension (taken to be −1 if the region is empty)? • What is a minimal set of affine inequalities defining it? • What is the maximum value of a linear function on the polytope (which might be infinite)? • How to describe the face where a linear function takes its maximum value? • What is its facial structure? • What is the dual cone of linear functions bounded on the polytope? • How to find the intersection of two convex sets? • How to find the union of two convex sets? • Given two convex sets, is one contained in the other? • Given a point exterior to the region, what is the point of the region nearest to it? Answering many of these questions reduces to answering a few basic ones. For example, to find out whether one region is contained in the other it suffices that you know the characterization of one in terms of affine inequalities, but the other in terms of extremal points. Finding the union of two convex regions can be figured out if the regions are given as convex hulls, but finding their intersection is easy if they are specified by affine inequalities. These problems are well known to arise in applications of mathematics. But in fact they also arise in trying to understand the geometry of configurations in high dimensions, which comes up surprisingly often even in the most theoretical mathematical investigations. For one thing, acquiring a picture of some kind of objects in high dimensions, or navigating regions of high dimensional space, often amounts to understanding the structure of convex regions. Answering these can be daunting, since the complexity of problems increases drastically as dimension increases. There are now many algorithms known to answer such questions, but for most practical purposes the one of choice is still the simplex method. This is known to require time exponential in the number of inequalities for certain somewhat artificial problems. But on the average, as is shown in a series of papers beginning perhaps with [Smale:1983] and including most spectacularly [Spielman-Teng:2004], it works well. It is also more flexible than competitors. Contents I. Geometry 1. Convex bodies 2. Proof of the basic theorem II. Basic procedures 3. Introduction 4. Affine coordinate transformations 5. Gauss elimination 6. An example 7. Vertex frames 8. Pivoting 9. Tracking coordinate changes 10. Finding a vertex basis to start III. Applications 11. Finding maximum values 12. Finding all vertices of a polygon 13. Finding the dimension of a convex region 14. Implementation IV. References Polytopes and the simplex method 3 My main reference has been [Chvátal:1983], although my notation is more geometric than his. Because I am almost exclusively interested in theoretical questions, I implement my routines with exact rational arithmetic, which eliminates a lot of technical difficulties in dealing with floating point numbers. The audience for expositions of linear programming is not, by and large, mathematicians, and the standard notation in linear programming is, for most mathematicians, rather bizarre. I follow more geometric conventions. As far as I know, convex bodies were first systematically examined by Minkowski, who used them in spectacular ways in his geometry of numbers. [Weyl:1927] is a pleasant account of the basic facts, and I follow his exposition in the first part. Part I. Geometry 1. Convex bodies The following is relatively straightforward: 1.1. Proposition. Any closed convex set in Rn is the intersection of all half-spaces containing it. For example, the unit disk in the plane is the intersection of all half-planes x cos θ + y sin θ ≤ 1. Proof. Let Ω be a closed convex set, P an arbitrary point not in Ω. It is to be proved that there exists a hyperplane f (x) = 0 such that f (P ) < 0, f (x) ≥ 0 for all x in Ω. Since Ω is closed, there exists a point Q in Ω at minimal distance from P . I may as well take Q to be the origin of the coordinate system. I claim that the hyperplane passing through Q and perpendicular to the line through P and Q will do. The equation of the corresponding half space that contains P is f (x) = x • P ≥ 0 . It is to be shown if R lies in Ω then R • P ≤ 0. If R lies in Ω so does tR for all small t. We now have kP − tRk2 = kP k2 − t (P • R) + t2 (R • R) . This implies that tR is closer to P for small t, a contradiction. Proposition 1.1 can also be interpreted in terms of duality. There are three slightly different notions of duality that are useful in this business. A cone in Rn is a non-empty region that is invariant under multiplication by non-negative scalars. If Ω is a closed convex subset of Rn , its dual Ω∨ is the set of all affine functions f such that f (Ω) ≥ 0. It is a closed cone in the space of affine functions, which has dimension n + 1. It is convex, since f ((1 − t)x + ty) = (1 − t)f (x) + tf (y) for any affine function f . I’ll call Ω∨ simply the dual of Ω. The space of affine functions is not just a vector space, but one with a particular coordinate, taking f to its constant term. Polytopes and the simplex method 4 If A is a closed cone in the vector space of affine functions, I’ll define its affine dual A∧ to be the subset of all x in the original vector space such that f (x) ≥ 0 for all f in A. It is also convex and closed. Any x in Ω is of course a point in the affine dual of Ω∨ . The following is just a rewording of Proposition 1.1. 1.2. Corollary. If Ω is a closed convex subset of Rn , then the embedding of Ω into the affine dual of Ω∨ is a bijection. 1.3. Corollary. If X and Y are two closed convex subsets of Rn then X = Y if and only if X ∨ = Y ∨ . b is the set of all linear functions f such that f (Ω) ≥ 0. Finally, if Ω is a closed convex cone in Rn , its dual cone Ω b . Another corollary It is again a closed convex cone. The cone Ω embeds canonically into the linear dual of Ω of the Theorem: b is a 1.4. Corollary. If Ω is a closed convex cone in Rn , then the embedding of Ω into the linear dual of Ω bijection. b = Yb . 1.5. Corollary. If X and Y are two closed convex cones in Rn then X = Y if and only if X If Ω is any subset of Rn , its affine support is the smallest affine subspace containing it. I’ll the dimension of that subspace the affine dimension of Ω. If the affine dimension of a region is n, then its affine support is all of Rn and it must contain a simplex with n + 1 vertices. This happens if and only if its interior is non-empty. It may happen that a convex region Ω is invariant under translation by vectors in some linear subspace. Define Inv(Ω) to be the linear space of all such vectors. b contains the line of all scalar multiples of y . This illustrates a If Ω is the non-negative x-axis in R2 , its dual Ω general phenomenon: 1.6. Proposition. The affine support of Ω is equal to m ≤ n if and only if Inv(Ω∨ ) has dimension n − m. In the rest of this note I’ll be concerned with a limited class of convex sets, those which are the intersections of a finite number of half-spaces. I shall call these convex polytopes, What I’ll be mainly interested in are algorithms to deal with convex polytopes, but I’ll begin with a few basic theorems. In the theory of convex polytopes there is one really basic theorem, from which many others follow. Suppose X to be a finite set containing n linearly independent points. A linear half-space f ≥ 0 will be called extremal (for X ) if its boundary contains n − 1 linearly independent points of X . If X is finite, there can be only a finite number of extremal half-spaces. P The homogeneous convex hull of X ⊆ Rn is the set of all finite sums X cx x with all cx ≥ 0, the convex homogeneous cone spanned by X . For example, if X is a collection of n linearly independent points its homogeneous convex hull is a simplicial cone, and every subset of n − 1 points determines an extremal half-space. For homogeneous convex polytopes, we have the following refinement of Proposition 1.1: 1.7. Theorem. If X is a finite set of points in Rn containing n linearly independent points, then its homoge- neous convex hull is the intersection of all linear half-spaces extremal for X . In particular, it is the intersection of a finite number of linear half spaces. I’ll postpone the proof to the next section, but I’ll now exhibit here a number of consequences. 1.8. Corollary. Suppose that the finite set X in Rn contains n linearly independent points. Let F = {f Pi } be the set of extremal linear functions for X . If f is a linear function, then f ≥ 0 on X if and only if f = with ci ≥ 0. ci f i Proof. From Corollary 1.5, since if Ω is the homogeneous convex hull of X and C is the homogeneous convex b both have dual cones equal to Ω. hull of F then according to the Theorem C and Ω Polytopes and the simplex method 5 1.9. Corollary. Any homogeneous convex polytope Ω in Rn is the convex hull of a finite number of rays. Proof. By Corollary 1.8, this is true if Ω is the dual of a homogeneous cone whose interior is not empty, which is the case if and only if Inv(Ω) = {0}. But it is straightforward to reduce the general case to this one. 1.10. Corollary. Any closed convex polytope in Rn is the convex hull of a finite number of points and rays. Proof. I begin with a construction that will be useful elsewhere as well. The space Rn may be embedded into Rn+1 as an affine subspace: x 7→ (x, 1). Rays in Rn+1 that intersect xn+1 = 1 correspond to points in Rn , while rays in Rn+1 in the hyperplane xn+1 = 0 correspond to rays in Rn . If X is a closed convex subset of Rn , let X be the unique homogeneous cone generated by the image of X . It is the homogenization of X . This Corollary now follows from Corollary 1.9 applied to X . Polytopes and the simplex method 6 2. Proof of the basic theorem It may very well happen that the homogeneous convex hull of a set is all of Rn , in which case the only linear function f with f (X) ≥ 0 is the constant 0. In this case the set of extremal half-spaces is empty. In general, the only problem is to show that there are enough extremal half-spaces to determine X . Proof. Let Ω be the homogeneous convex hull of the set X . We are assuming it to contain an open subset of Rn . One half of the Theorem is trivial, since Ω is certainly contained in the intersection of all extremal half-spaces. It remains to be shown that if a point lies in the intersection of extremal half-spaces then it lies in Ω. The proof proceeds by induction on n. If n = 0 there is nothing to be proven. If n = 1 then the points of X span a non-trivial segment. If this segment contains 0 in its interior, then the homogeneous convex hull is all of R, and there are no linear functions cx taking a non-zero value on all of X except 0 itself. Otherwise, the homogeneous convex hull is a half-line with 0 on its boundary, and again the result is straightforward. Assume the Theorem to be true for dimension up to n − 1, and suppose X to have affine dimension n. There are two parts to the argument, which is constructive. The first either produces an extremal half-space or demonstrates that the homogeneous convex hull of X is all of Rn . The second proves the theorem when it is known that extremal half-spaces exist. Part 1. Suppose α = 0 to be any hyperplane containing n − 1 linearly independent points of X . Either all points of X lie on one side of this hyperplane or they do not. If they do, then this hyperplane is extremal, and we can pass on to part 2. Suppose they do not. Let e be a point of X with hα, ei < 0. We may as well assume α to be the coordinate function xn , and e = (0, 0, . . . , −1). Let Y be the projection onto the plane xn = 0 of the points X + of X with xn ≥ 0. The new system Y is non-degenerate of dimension n − 1, by assumption on α. Either it possesses an extremal half-space hβ, xi ≥ 0 with hβ, xi = b1 x1 + · · · + bn−1 xn−1 or it does not. If it does, the half-space hβ, xi + xn ≥ 0 is extremal and contains, in addition to the points in X + the point e. e In the second case, when Y does not possess an extremal half-space, we can apply the induction assumption. Every point in the hyperplane hα, xi = 0 must be in the convex hull of Y . Adding positive multiples of e, we see that every point where xn ≤ 0 is in the convex hull. Since we have eliminated the case where all points of X lie on one side of α = 0, we can find a point with xn > 0; a similar argument implies that every point where xn ≥ 0 is also there. Part 2. At this point we know there exist extremal half-spaces containing Ω. Let ∆ 6= ∅ be a set of affine functions α such that the inequalities α > 0 define them. Choose a point e in the interior of the homogeneous convex hull of X = {xi }. Thus hα, ei > 0 for every α in ∆. Let p be an arbitrary point in the intersection of extremal half-spaces, so that also hα, pi ≥ 0 for all α in ∆. We want to show that p is in Ω, or that X p= with all ci ≥ 0. We may assume that p 6= e. ci xi Polytopes and the simplex method 7 The set of all λe + µp with λ, µ ≥ 0 is a two-dimensional wedge. Any point of the same form with λ < 0, µ ≥ 0 will be in the plane spanned by this wedge, outside this wedge on the far side of the line through p. Since hα, p − λei = hα, pi − λhα, ei and hα, ei < 0, the ray of all q = p − λe with λ > 0 will find itself eventually in the region where α < 0. It will cross the hyperplane α = 0 when λ= hα, pi . hα, ei If λ is chosen to be the least value of hα, pi/hα, ei as α ranges over ∆, then (1) p = q + λe where λ ≥ 0 and e is in the homogeneous convex hull of X ; (2) hλ, qi ≥ 0 for all α in ∆; (3) hα, qi = 0 for some α in ∆. In order to prove that p is in the homogeneous convex hull of X , it suffices to show that q is in it. We want to apply the induction hypothesis to q . Let Y be the set of all x in X on the hyperplane Hα where α = 0, which contains q . Because α is an extremal support, there exist at least n − 1 linearly independent points in Y , so that it is non-degenerate in n − 1 dimensions. Choose coordinates so that α = xn . By the induction assumption, the convex hull of Y is the intersection of all its extremal half-spaces in Hα . I claim: 2.1. Lemma. Any extremal hyperplane for Y = X ∩ Hα in Hα can be extended to an extremal hyperplane for X in Rn . Proof of the Lemma. Suppose hβ, yi = b1 x1 + · · · + bn−1 xn−1 ≥ 0 be one of the extremal half-spaces for Y in Hα . The function β will vanish on a subset Υ ⊆ Y of essential dimension n − 2. Any extension of β to X will be of the form hB, xi = b1 x1 + · · · + bn−1 xn−1 + cxn for some constant c. I claim that c can be chosen so that the B ≥ 0 on X and meets at least one point x in Y − X . This means no more and no less than that it defines an extremal half space forX . The conditions on B are that hB, xi = b1 x1 + · · · + bn−1 xn−1 + cxn ≥ 0 for all x in X − Y , and that hB, xi = 0 for at least one of these x. These require that −c ≤ b1 x1 + · · · + bn−1 xn−1 xn for all x in X − Y , with equality in at least one case. But X is finite and xn > 0 on X − Y , so the right hand side will achieve a minimum for some c < 0 as x varies. Part II. Basic procedures Polytopes and the simplex method 8 3. Introduction In the remainder of this note I’ll explain how to implement the basic theorems about convex polytopes in practical terms. There are basically three phases to dealing with a polytope defined initially by inequalities. (1) We are given the matrix of inequalities defined by affine functions, and we find a coordinate system adapted to the inequalities in the sense that (a) some of the coordinates lie among the affine functions expressing these inequalities and (b) the other inequalities may be expressed in terms of those. (2) From this, we find a coordinate system which satisfies in addition the condition that its origin is a vertex of the polytope. This should be thought of as giving us a handle (almost literally) on the polytope. (3) We use this coordinate system to explore the polytope, in any one of a dozen ways. We could start traversing through a subset of vertices, for example, basically running along its edges. Which vertices we examine depend on what our purpose is. If we want to find the entire facial structure, we must traverse the entire set. If we want to find the maximum of a linear function, just a restricted subset. If we just want to know the dimension of the polytope, we can poke around in the tangent cone at one vertex. If we are given a finite set of inequalities such as x≥1 y+2≥0 y ≤ x+1, we can rearrange them so as to be in a particular format: x−1≥0 y+2≥0 x−y −1 ≥ 0. I. e. from now on, I shall assume that we are given a finite set of affine functions fi of the form f (x) = n−1 X ai xi + an 0 defining the region Ω in which fi ≥ 0. We can even take an equation f (x) = 0 into account, replacing it by inequalities f (x) ≥ 0 and −f (x) ≥ 0. I should say immediately that I shall work only with functions whose coefficients are rational. Suppose that we start with m functions of n variables in this system, expresing the inequalities. The first step in almost all computations with these data is to find a coordinate system in which the coordinate functions are among the fi . Therefore we begin by looking for a minimal subset of the fi in terms of which all the others may be affinely expressed. For example, in the system above the first two form an affine basis since x − y − 1 = (x − 1) − (y + 2) + 2. However, this might not be possible, because the fi might not be a large enough set. For example, you might have just a single inequality f ≥ c in three dimensions, with f linear. This phenomenon is easy to deal with—we shall find a maximal subset of the fi whose linear components are linearly independent. This problem is related to some other phenomenon—the region we are examining might be invariant with respect to a large space of vector translations. For the example, the region f (x) ≥ c is invariant under translations along vectors in the plane f = 0. A coordinate system in which (1) some of the fi are among the coordinates and (2) all the others may be expressed in terms of those I shall call one adapted to the set of inequalities. It constitutes an affine frame of Polytopes and the simplex method 9 the system. Given one, we replace the given set of inequalities by those in which all are expressed in terms of the given basis. Setting X = x − 1, Y = y − 2, the system above thus becomes the simpler X ≥0 Y ≥0 X −Y +2 ≥ 0. The choice of the first two functions as basis is not canonical—any two of the three given will also serve. As there is here, in general there might be several choices for such a coordinate change. In any case, finding an adapted coordinate system amounts to finding an affine transformation that makes the original system somewhat simpler. In a later section I examine the effect of a simple affine coordinate transformation on a system of inequalities. 4. Affine coordinate transformations An affine coordinate change from (xi ) to (yi ) is one of the form x A a y = . 1 0 1 1 Write affine functions as row vectors, so x f (x) = [ ∇f c ] , 1 in which ∇f is the linear part of f , and the vector representing it is made up of the coordinates of f in this coordinate system. Then x A a = [ ∇f c ] 1 0 1 [ ∇f c ] so that its expression in y coordinates is [ ∇f c ] A a 0 1 y 1 . In our case, we are going to be making very special coordinate changes by a process known as pivoting. We are given a coordinate system (xi ) and a set of linear functions fj , and we are going to swap some fℓ for a variable xk . At the same time, we want to express all of the fj , which are given to us in a matrix, in terms of the new variables. This reduces to the following problem: we are given fℓ = n−1 X ajℓ xj + anℓ , 0 in which akℓ 6= 0 (so that the coordinate transformation is legitimate). We are also given the fi = n−1 X aji xj + ani (j 6= ℓ) , 0 each of which we wish to convert to an expression with fℓ replacing xk . We get X j xk = (1/akℓ ) fℓ − aℓ xj − anℓ j6=k fi = X aji xj j6=k = X j6=k In effect, we are X j + (aki /akℓ ) fℓ − aℓ xj − anℓ + ani j6=k aji − (ajℓ /akℓ )aki xj + (aki /akℓ )fℓ + ani − (anℓ /akℓ )aki . Polytopes and the simplex method 10 (a) dividing the k -th column of F by akℓ , (b) for each j = 6 k (including j = n), replacing its j -th column cj of F by cj − pck , with p = ajℓ /akℓ . If i = ℓ, we get a single 1 in row ℓ, and in fact the column operations are designed to have exactly that effect. Since column operations are equivalent to multiplication by certain (n + 1) × (n + 1) matrices on the right, this is consistent with earlier remarks on coordinate change. Explicitly, the matrix by which we finally multiply is that obtained by performing the same column operations on the identity matrix In+1 . These effects accumulate. This operation is called pivoting on the pair (k, ℓ). 5. Gauss elimination The first step in the process is to finding an adapted coordinate system. To begin, let F be the m × (n + 1) matrix of affine functions at hand (with constants in the last column). Let ∇F be the matrix whose rows are the linear components ∇fi of the fi (i.e. all but the last column of F ), and let ϕ be the last column. Thus F = [ ∇F ϕ ] . The next step is, essentially, to reduce ∇F to Gauss-Jordan form by applying elementary column operations, or equivalently multiplying F on the right by certain matrices effecting those operations. As we have seen in the previous section, this amounts to a linear change of coordinates. Doing this, we are replacing the original coordinates by new ones, and at the same time we are keeping track of this transformation. At the end we’ll have a matrix E with ∇E in column echelon form. We shall have found a subset of rows of E with the properties: (a) their linear parts are linearly independent; (b) they are maximal satisfying (a); (c) the ith-row among them is the highest row linearly independent of the previous i − 1. If c is the (column) rank of the matrix, then for 1 ≤ j ≤ c we are given a row rj corresponding to the j -th basis function frj . In row rj there is a 1 in column j and 0 in all but the last column. All entries ei,j with i < rj also vanish. Finally, we can arrange by column operations that the final entries in the rows rj also become 0. At the same time, keep track of the operations performed by applying the same operations to an (n+1)×(n+1) identity matrix, which at each step will be of the form A ρ R= 0 1 . Here is a final sample pattern, with r1 = 1, r2 = 2, r3 = 4: 1 0 ∗ 0 ∗ ∗ ∗ 0 0 1 0 ∗ 0 0 1 ∗ ∗ ∗ ∗ ∗ ∗ 0 0 0 0 0 0 0 0 0 ∗ 0 . ∗ ∗ ∗ In other words, embedded in ∇E is a copy of the c × c identity matrix Ic . Here c is what I call the nominal dimension of the system. The rows of this matrix are the rows corresponding to the subset of the fi we have found to form part of an affine coordinate system. The coordinates of others with respect to this basis are laid out in the other rows. The affine transformation matrix R records the accumulated coordinate changes. Thus Fx = Fy · R, if x and y are the original and new coordinates. Columns c + 1 through n − 1 of R record a basis of vectors Polytopes and the simplex method 11 (in the original coordinate system) whose translations leave the inequalities invariant. The region defined by the original inequalities is the product of this space with a region in Rc defined by the matrix we get by eliminating these columns, which we do. We may assume from now on that c = n, or equivalently that the region does not contain a line. In the end we shall be locating certain points specified in the new coordinate system. But we shall also need to know their coordinates in the original coordinates. We have xold 1 = A ρ 0 1 xnew 1 . From now on, we’ll assume that we have the system expressed in terms of an adapted affine coordinate system. This means that the region is described by coordinate inequalities yi ≥ 0 together with more inequalities determined by affine functions expressed in terms of the yi . 6. An example Suppose given the system of inequalities 2x + y ≥ 1 x + 2y ≥ x+y ≥ 1 1 x − y ≥ −1 3x + 2y ≤ 6 . (4/5, 9/5) 1 x− −3 y= x − y + 2x = − 1 We have the following picture, where the shading along a line f = c indicates the side where f − c ≥ 0. The convex region defined by the inequalities is outlined in heavy lines. (0, 1) = 2y −6 x + x+ (1, 0) y = 2y = (5/2, −3/4) 1 1 We start with 2 1 −1 2 −1 1 Fx = 1 1 −1 . 1 −1 1 −3 −2 6 It reduces in three steps: 1 0 0 3/2 −1/2 1/2 F = 1/2 1/2 −1/2 , 1/2 −3/2 3/2 −3/2 −1/2 9/2 1/2 −1/2 1/2 R= 0 1 0 . 0 0 1 Polytopes and the simplex method 12 0 0 −1/3 , 1 13/3 0 0 −1/3 , 1 13/3 1 0 1 0 F = 1/3 1/3 1 −1 −4/3 −1/3 1 0 1 0 F = 1/3 1/3 1 −1 −4/3 −1/3 2/3 −1/3 1/3 R = −1/3 2/3 1/3 0 0 1 2/3 −1/3 1/3 R = −1/3 2/3 1/3 0 0 1 The affine basis chosen is thus the pair of functions 2x + y − 1 x + 2y − 1 and expressed in this basis the system of inequalities is that corresponding to the matrix (6.1) 1 0 1 0 Fy = Fx · R = 1/3 1/3 1 −1 −4/3 −1/3 0 0 −1/3 . 1 13/3 The origin of the coordinate system with this basis is the point (1/3, 1/3) in the original coordinates, the intersection of the two lines x + 2y − 1 = 0 2x + y − 1 = 0 . + 2x y= 1 x+ 2y = 1 It is not a vertex of the region defined by the inequalities. This agrees with the sign of the constant in the third row above: [ 1/3 1/3 −1/3 ] because this means that the affine function originally x + y − 1 is now (x + y − 1)/3, and is negative when evaluated at the new origin. Polytopes and the simplex method 13 7. Vertex frames The next step in the simplex method is to find a vertex of the region as well as a coordinate system adapted to the inequalities whose origin is this vertex. I’ll call this a vertex-adapted coordinate system. Such a coordinate system is equivalent to a frame of the affine space based at the given vertex satisfying the condition that any n − 1 of the vectors in that frame span one of the spaces fi = 0. I’ll call one of these a vertex frame. This is the basic structure in the simplex method, and finding one is usually the first step in any exploration of a convex polytope. The point is that any vertex of the polytope may be reached by progressing through a succession of vertex frames by following edges of the frames one encounters. One does this by pivoting, in the sense defined above. In my programs, a vertex frame amounts to (a) an m × (n + 1) matrix F , each row of which is an affine function defining the region involved, in the current coordinate system; (b) a list of the rows of this matrix that represent the current coordinates; (c) for each of these, an index specifying which column in F corresponds to it; (d) an (n + 1) × (n + 1) matrix R specifying the affine transformation from current to original coordinates. There is also, for technical reasons, (e) a list of the rows that are not the coordinates. The simplest vertices are those for which where the region is locally simplicial—given locally at the origin exactly by inequalities f ≥ 0 for f in the basis, but this does not always occur. This simple situation is the generic case, in the sense that perturbing a given system almost always turns any given polytope into one all of whose vertices are simplicial. A vertex not of this form is called singular. Keep in mind that it is the system that is singular, not necessarily the polytope itself. In two dimensions, for example, all vertices are geometrically simplicial, but systems of inequalities may nonetheless be singular, as the example above is at the vertex (0, 1). In that example, the system is singular at (0, 1) but not at the other three vertices of the region. The singularity at (0, 1) arises because the three bounding lines 2x + y − 1 = 0 x+y−1=0 x−y+1=0 y= x − y + 2x = − 1 all pass through that point. Any two of the corresponding affine functions will form a vertex basis. Only the last two lines actually bound the region. But any two of these three might be the local coordinate system of a vertex frame. 1 x + y = 1 Being a vertex basis is a local property—it depends purely on the geometry of the region fi ≥ 0 in a neighbourhood of the origin. We’ll see later how to find a vertex-adapted coordinate system, given an adapted system whose origin might not be a vertex. Polytopes and the simplex method 14 8. Pivoting The basic step in using the simplex method is moving from one vertex basis to another—and, hopefully, from one vertex to another—along a single edge of the polygon. (Or rather, to be precise, along what you might call a virtual edge.) Geometrically we are moving from one vertex to a neighbouring one, except that in the case of a singular vertex where more than the generic number of faces meet the move may be a move along an edge of length 0. In all cases, this basic step is called a pivot. It is executed by pivoting the matrix F . The explicit problem encountered in pivoting is this: we are given a vertex basis and a variable xc in the basis. The xc -axis represents an edge (or possibly a virtual edge, as I’ll explain in a moment) of the polygon. We want to locate the vertex of the polygon next along that edge, and replace the old vertex frame by one adapted to that vertex. The variable xc is part of the original basis. The new basis will be obtained by replacing that variable by one of the affine functions fr in the system of inequalities defining the polygon. In applications, the exiting variable xc is chosen according to various criteria which depend on the goal in mind, as we shall see in the next section. Sometimes there are several choices for fr as well. ignore But there are some conditions on the possible fr . It must be that the new vertex is also actually a vertex of the region Ω concerned. This puts a bound on how far out along the xc -axis we can go. Suppose f (x) = X ai xi + an . We know already that an ≥ 0, because our vertex does lie in Ω. On the xc -axis f (x) = ac xc + an . If ac ≥ 0, there is nolimit to xc . But if ac < 0, we must have ac xc + an ≥ 0, xc ≤ an . −ac c This has to hold for all the functions fr , so the new value of xc is the minimum value of −an r /ar for those c r for which ar < 0. Given this choice of variable xc , and without further conditions imposed, the possible choices for the new basis variable will be any of the fr producing this minimum value of xc . In the example we have already looked at, we could start with coordinates x + 2y = 1, x + y = 1 at the vertex (1, 0) and move along the edge x + y = 1, with xc = x + 2y − 1. We can only go as far as the vertex (0, 1), because beyond there both of the affine functions x − y + 1 and 2x + y − 1 become negative. We can choose either one of them to replace x + 2y − 1 in the vertex coordinates. When a pivot is made, the current coordinate frame is the given vertex frame, and in the course of the pivot it is changed to the new vertex frame. This involves expressing all the fi in terms of the new basis. Polytopes and the simplex method 15 How does it work in general? Given a vertex frame, any of the fi may be expressed as f i = ai + X ai,j xj j with the ai ≥ 0 since the origin is assumed, by definition of a vertex frame, to lie in the region fi ≥ 0. We propose to move out along the directed edge where all basic variables but one of them, say xc , are equal to 0, and in the direction where xc takes non-negative values. This amounts to moving along an edge of the polytope. We shall move along this edge until we come to the vertex at its other end, if there is one, or take into account the fact that this edge extends to infinity. So we must find the maximum value that xc can take along the line it is moving on. Along this edge the other xj vanish, and we have each fi = ai + ai,c xc There are two cases to deal with: (a) If ai,c ≥ 0 there is no restriction on how large xc can be; (b) if ai,c < 0 then we can only go as far as xc = −ai /ai,c . Therefore if all the ai,c are non-negative then the edge goes off to infinity, but otherwise the value of xc is bounded by all of the −ar /ar,c . In that case, we choose r such that ar,c < 0 and −ar /ar,c takes that minimum value among those where ai,c < 0, and set xc equal to it. We also change bases—replacing xc by fr . The minimum value might in fact be 0, in which case we change vertex frames (i.e. virtual vertices) without changing the geometric vertex. This sort of virtual move will happen only if the vertex is singular, which means as I have said that more than the generic number of faces meet in that vertex, and the maximal value of xc will be 0. Thus our choice of r is made so that ai − (ar /ar,c )ai,c ≥ 0 for all i, since if ai,c ≥ 0 there is no condition (ai , ai,c and −ar /ar,c are all non-negative) and if ai,c < 0 then by choice of r − ai ar ≥− . ai,c ar,c When we change frame, we have to change all occurrences of xc (which becomes just a function fc in our system) by substitution. In short, we pivot. Sometimes, as in the example, there will be several possible choices of fr , if the target vertex isn’t simplicial. Indeed, in practice when pivoting is done in an application there may be several possible edges to move out along—i.e. which variable xc to exit from the basis. The most important thing in all cases is to prevent looping. There are two commonly used methods to do this. One is a method of choosing both xc and fr , while the other is fussy about only fr . I’ll explain these in the next section. 9. Tracking coordinate changes As we move among vertices, changing from one vertex basis to another, we’ll want to keep track of where we are in the original coordinate system. We can recover the location of a vertex from the basis, since it amounts to solving a system of linear equations. But this is a lot of unnecessary work. We can instead maintain some data as we go along that make this a much simpler task. What data do we have to maintain in order to locate easily the current vertex? A slightly better question: What data do we have to maintain in order to change back and forth between the current basis and the original one? Keeping track of location comes in two halves: changing back and forth between the original coordinate system and the one determined by the adapted coordinate system we first came up with; changing back and forth between that frame and the current vertex frame. I have already described the construction of an initial adapted frame. After that, the matrix R, substituting fr for xc . Thus is done by exactly the same column operations that are performed on F , and in programs R is often tacked onto the bottom of F to make this convenient. Its rows are not listed in either of the two lists accompanying the data structure, of course. Polytopes and the simplex method 16 10. Finding a vertex basis to start What remains is the problem of finding an initial vertex basis of a system fi ≥ 0 (or finding out that the system is inconsistent and that no vertex exists). This will be done being given already an admissible affine coordinate system. One of the more elegant features of the simplex method is that this will turn out to be a special case of the original problem that motivated the simplex method, that of finding the maximum value of a linear function on a polytope, given an initial vertex frame. I postpone to the second part the detailed explanation of how this works. At any rate, the solution of the problem here is to use pivoting applied to a simpler problem in space of one dimension higher obtained by homogenizing the original one—a problem for which an initial vertex frame is obvious. Homogenize the affine functions with an added variable, say x∗ , as the coefficient of the constant terms. Write out the matrix of the fi as rows. Add x∗ ≥ 0 and x∗ ≤ 1 to the list of inequalities. The origin is a vertex of the region involved, and the xi form a vertex frame. That is to say, extend F by adding an extra column (with penultimate column for the new variable x∗ ) and two extra rows (for x∗ and 1 − x∗ ). The idea is now to maximize x∗ . −1 0 1 x≥0 x≤1 x∗ ∗ + ≥ x 0 x ≥ −1 x + 0 − x ≥ x≥0 −x∗ + 1 ≥ 0 x∗ ≥ 0 It should evident from the picture that the function −x∗ + 1 has to be in the final frame, if the region is not empty. Let’s look at what happens with our example. We start with the affine frame whose matrix is that in (6.1): 1 0 1 0 Fy = 1/3 1/3 1 −1 −4/3 −1/3 0 0 −1/3 . 1 13/3 This becomes the homogeneous system 1 0 1 0 1/3 1/3 −1 1 −4/3 −1/3 0 0 0 0 0 0 −1/3 1 13/3 1 −1 0 0 0 0 . 0 0 1 It is a system in three dimensions. The last inequalities are x∗ ≥ 0 − x∗ + 1 ≥ 0 . Polytopes and the simplex method 17 We just need to find some edge, any true edge, leading away from the origin. If there is none, the original region is empty. Otherwise the maximum value of x∗ is 1. In this case, −x∗ + 1 will be one of the coordinate functions. We’ll recover a vertex frame of the original problem by throwing away this from the basis we have at the end, throwing away x∗ ≥ 0 from the set of inequalities, and setting x∗ = 1. In the example, we wind up with the vertex frame −1 3 0 1 1 0 −2 3 1 −4 1 0 0 2 3 The new basis functions are those which in the original coordinate system were x + 2y − 1, x+y −1, and the vertex is what was originally (1, 0). This process will be a bit unusual from the standpoint of linear programming, since all moves except the final one will be virtual. What we are doing, roughly, is moving around in the ‘web’ of frames defined by the homogenized functions fi . Part III. Applications 11. Finding maximum values P The basic problem inP linear programming is to maximize an affine function z = c + ci xi subject to the conditions fi = ai + ai,j xj ≥ 0. Here, as we change vertices we keep track of the expression for z in the current coordinate system. The basic idea is to choose the edge to follow by requiring that z increase on this edge, until z is maximized and no choice of edge is possible. But if the move is virtual then something has to be done to avoid looping. One valid process to follow is to choose the exiting and entering variables to have the least possible subscript. It is a theorem of R. Bland, explained in Chvátal’s book on pages 37–38, that this will always work. This works because at any vertex that is the intersection of n hyperplanes, either one edge goes down, or it is part of the bottom of the polytope. This is pretty simple. Not so simple is the perturbation method, which has as variant the lexicographic method. This also is explained in Chvátal’s book. The variant is the practical implementation of the first, which is simpler to understand. In this, as soon as we have found an affine basis, we add to each of the ‘slack’ variables fi a symbolic constant εi with these subject to conditions 1 ≫ ε1 ≫ ε2 ≫ · · · ≫ εℓ > 0 , Polytopes and the simplex method 18 in which ℓ is the number of non-basic inequalities. Then whenever we pivot, the constant terms in all the variables and the objective function will be linear combinations of numbers and these εi . In effect, we are P setting the constants to be some kind of ε-numbers c0 + ci εi , which we can add, subtract, and multiply by scalars. Such numbers are compared in the natural way, which amounts to lexicograophic ordering of the arrays. Furthermore, with these adjustments th system becomes simplicial—none of the functions fi (i.e. no variable other than the basis variables) will ever evaluate to 0 at the vertex, because the different linear combinations of the εi will always be linearly independent (pivoting preserves this property, and the matrix starts with I ). Thus, no pivots will ever be virtual. At the end, set all εi = 0. This method has the property that we are allowed to go out along any coordinate edge, and need only make a choice among the functions that will become part of the basis. This happens only when there are ties. This feature plays a role in the process discovered by Avis and Fukuda that I shall mention later on. There is an equivalent description common in the literature, which also uses lexicographic ordering of arrays. The best account I know is in [Apte:2012], although more cryptic accounts are easy enough to find. It does not use ε-numbers, but expands the matrices of dictionaries instead. I prefer the version given here, which seems to be more intuitive even if slightly less efficient. The advantage of either one of these lexicographic strategies is that it is only the choice of new variable toset as coordinate that uses it. This makes it exactly suitable for the lrs method of Avis and Fukuda for traversing all vertices of a conves polytope from top down. 12. Finding all vertices of a polygon If we simply want to locate all possible vertices of a convex region, starting with one vertex frame. The traditional method from this point on is to use a stack to index and locate vertex bases, collecting them into equivalence classes. Note that each vertex will correspond to a subset of the original set of functions, so our search is finite. (Indeed, a good way to specify a vertex, at least when using exact arithmetic, is to tag a vertex by the set of functions vanishing there.) On the other hand, it may happen that we have a singular vertex where several different sets of basic variables give rise to the same geometrical vertex. If we are doing floating point arithmetic, because of possible floating point round-off we should ignore this possibility, and proceed as if it doesn’t occur. It shouldn’t cause trouble—a single point might decompose into several, but they will all be close together and effectively indistinguishable. Of course the region may be unbounded, in which case we should find the infinite edges of the polytope, too. This method locates all edges and vertices by the simplex method, indexing the vertices by the set of basis variables, edges by the coordinate of the edge. Store vertices and edges as we find them. Chvátal recommends using a queue for this, but I don’t see why a stack wouldn’t work as well. Vertices are pushed onto the stack as they are met. They are distinguished by the subset of basic functions, so two vertices with the same geometric vertex might be distinguished in this scheme. Most serious is the fact that each vertex comes equipped with its dictionary, which must be transformed in going from one vertex to a neighbour through a pivot. This makes the algorithm here of order n2 for n inequalities. [Avis-Fukuda:1992] and [Avis:2000] describe a better method. I’ll explain this in detail in a later version of this essay. Pass through the questions I ask at the beginning. There is another problem I’d like to also take into consideration—that of finding for a point of space the nearest point of the polytope. The first step is to scan all the inequalities, to see which the given point satisfies. If it satisfies all of them, then the point lies in the region, and we are through. Otherwise, throw away those satisfied by the point, and calculating perpendicular projections onto the remaining facial supports. Finally, throw away the projections that do not lie strictly inside the corresponding face. This is not very efficient, but is there a better way? Polytopes and the simplex method 19 13. Finding the dimension of a convex region The dimension of a polytope makes sense in computation only when exact arithmetic is used, because the smallest of perturbations can change a set of low dimension to an open one. Start with trying to find a vertex frame. If there is none, the dimension is −1. Next, throw away all but the fi vanishing at the origin, so we are looking at the tangent cone of the vertex. Take as function to maximize the last basis variable; then find its maximum. If this is 0, we reduce the problem by eliminating that variable. Or we come across an infinite edge (axis of some xc ) out. In that case, throw away all those containing the variable xc , in effect looking now at the tangent cone of that edge. Recursion: the dimension is one more than the dimension of that tangent cone. The whole process amounts to a succession of one or the other of these two strategies, applied recursing on rank. The case of rank one is straightforward—if the rank is 1, then in case (1) the dimension is 0 while in case (2) it is 1. 14. Implementation I follow [Chvátal:1983] closely. Classically one uses tableaux, but Chvátal replaces them by what he calls dictionaries. All the operations I have described amount to constructing an initial dictionary, and then proceeding through a sequence of them according to what one is looking for. So in my programs, an LP dictionary is a Python class with data an m × (n + 1) matrix whose rows represent affine inequalities fi ≥ 0. In initialization, we pass an arbitrary matrix of this type, and from it we immediately turn it to an affine basis together with R, ρ. After the initialization we have lists of basis variables and dependent variables. Through all subsequent operations we maintain these structures. In effect, each inequality represents one of the possible basic variables (these are called slack variables in the literature). The next step is to turn this affine basis into a vertex basis. After that, what happens on what the ultimate goal is. The basic routine is a pivot, whose argument is one of the basis variables. This variable is removed from the basis variables, and replaced by a new one of the affine functions, which is chosen according to the minimal subscript rule. In the pivot, the matrix of affine functions is expressed in terms of the new basis. I have said that the conventional terminology is bizarre. The most bizarre usage is that the affine functions among the inequalities that are not coordinates in a frame are said to make up the basis of that frame. Thus, when a function f (x) ceases to be one of the coordinate system it is said to ‘enter the basis’, and when another is chosen as the new coordinate, it is said to ‘leave the basis’. This terminology is not aimed at mathematicians, but then most consumers of linear programming are not mathematicians, either. Part IV. References 1. Jayant Apte, ‘Introduction to lexicographic reverse search’, available from a link on the web page https://sites.google.com/site/jayantapteshomepage/ 2. David Avis, ‘lrs: a revised implementation of the reverse search vertex enumeration algorithm’, pp. 177–198 in Polytopes—Combinatorics and Computation, edited by Gil Kalai and Günter Ziegler, BirkhäuserVerlag, 2000. 3. —— and Komei Fukuda, ‘A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra’, Discrete and Computational Geometry 8 (1992), 295–313. 4. Vašek Chvátal, Linear programming, W. H. Freeman, 1983. 5. Hermann Minkowski, ‘Theorie der konvexen Körper, insbesondere Begründung ihres Oberflächenbegriffs’, published only as item XXV in volume II of Gesammelte Abhandlungen, Leipzig, 1911. 6. Stephen Smale, ‘On the average number of steps of the simplex method of linear programming’, Mathematical programming 27 (1983), 241–262. Polytopes and the simplex method 20 7. Daniel Spielman and Shang-Hua Teng, ‘Smoothed analysis: why the simplex method usually takes polynomial time’, Journal of the ACM 51 (2004), 385–463. 8. Hermann Weyl, ‘Elementare Theorie der konvexen Polyeder’, Commentarii Mathematici Helvetici 7, 1935, pp. 290–306. There is a translation of this into English by H. Kuhn: ‘The elementary theory of convex polytopes’, pp. 3–18 in volume I of Contributions to the theory of games, Annals of Mathematics Studies 21, Princeton Press, 1950.