Polytopes and the simplex method

advertisement
Last revised 10:03 p.m. March 3, 2016
Polytopes and the simplex method
Bill Casselman
University of British Columbia
cass@math.ubc.ca
A region in Euclidean space is called convex if the line segment between any two points in the region is also
in the region. For example, disks, cubes, and the emptyset are convex but circles, doughnuts, and most coffee
cups are not.
As I’ll recall later in detail, any convex region is the intersection of all affine half-spaces f ≥ 0 containing it.
This note is mostly concerned with convex polytopes, which are the intersections of a finite number of such
half-spaces. Thus cubes are polytopes but spheres are not. The simplest example, and the model
Pin some
sense for all others, is the coordinate octant xi ≥ 0. Another simple example is the simplex xi ≥ 0, xi ≤ 1.
n
If Σ is any set of points
P
P in R , its convex hull is the set of all points which can be represented as a finite sum
ci Pi with ci ≥ 0, ci = 1, and all Pi in Σ. For example, the line segment between two points P and Q is
the convex hull of those two points, and the convex hull of a circle is a disk. A convex hull is clearly convex.
An extremal point of a convex region is a point that does not lie on the interior of a line segment in the region.
In some sense dual to the result that any convex region is the intersection of all half-spaces containing it, any
convex region is the convex hull of its extremal points.
As was proved in [Weyl:1935], the convex hull of any finite set of points and rays is a polytope—i.e. it can
also be specified in terms of a finite set of affine inequalities. I’ll recall Weyl’s proof of this in the first part.
Conversely, any convex polytope is the convex hull of a finite set of points and rays. Defining a convex
polytope by affine inequalities and defining it in terms of a convex hull are dual modes of specification. Thus
we arrive immediately at the most basic computational problem concerned with convex polytopes—given a
polytope specified in one of these ways, to find a specification in the other.
But there are several other things one might ask. A face of a polytope is any affine polytope on its boundary.
A wall (also often called a facet) is a proper face of maximal dimension. Here are some natural questions
about a polytope that one might want to answer:
Polytopes and the simplex method
2
• Is the region bounded?
• If it is bounded:
◦ What are its extremal points(i.e. its vertices)?
◦ What is its volume?
◦ What is the number of lattice points it contains?
• What is its dimension (taken to be −1 if the region is empty)?
• What is a minimal set of affine inequalities defining it?
• What is the maximum value of a linear function on the polytope (which might be infinite)?
• How to describe the face where a linear function takes its maximum value?
• What is its facial structure?
• What is the dual cone of linear functions bounded on the polytope?
• How to find the intersection of two convex sets?
• How to find the union of two convex sets?
• Given two convex sets, is one contained in the other?
• Given a point exterior to the region, what is the point of the region nearest to it?
Answering many of these questions reduces to answering a few basic ones. For example, to find out whether
one region is contained in the other it suffices that you know the characterization of one in terms of affine
inequalities, but the other in terms of extremal points. Finding the union of two convex regions can be figured
out if the regions are given as convex hulls, but finding their intersection is easy if they are specified by affine
inequalities.
These problems are well known to arise in applications of mathematics. But in fact they also arise in trying to
understand the geometry of configurations in high dimensions, which comes up surprisingly often even in
the most theoretical mathematical investigations. For one thing, acquiring a picture of some kind of objects
in high dimensions, or navigating regions of high dimensional space, often amounts to understanding the
structure of convex regions.
Answering these can be daunting, since the complexity of problems increases drastically as dimension
increases. There are now many algorithms known to answer such questions, but for most practical purposes
the one of choice is still the simplex method. This is known to require time exponential in the number of
inequalities for certain somewhat artificial problems. But on the average, as is shown in a series of papers
beginning perhaps with [Smale:1983] and including most spectacularly [Spielman-Teng:2004], it works well.
It is also more flexible than competitors.
Contents
I. Geometry
1. Convex bodies
2. Proof of the basic theorem
II. Basic procedures
3. Introduction
4. Affine coordinate transformations
5. Gauss elimination
6. An example
7. Vertex frames
8. Pivoting
9. Tracking coordinate changes
10. Finding a vertex basis to start
III. Applications
11. Finding maximum values
12. Finding all vertices of a polygon
13. Finding the dimension of a convex region
14. Implementation
IV. References
Polytopes and the simplex method
3
My main reference has been [Chvátal:1983], although my notation is more geometric than his. Because I am
almost exclusively interested in theoretical questions, I implement my routines with exact rational arithmetic,
which eliminates a lot of technical difficulties in dealing with floating point numbers. The audience for
expositions of linear programming is not, by and large, mathematicians, and the standard notation in linear
programming is, for most mathematicians, rather bizarre. I follow more geometric conventions.
As far as I know, convex bodies were first systematically examined by Minkowski, who used them in
spectacular ways in his geometry of numbers. [Weyl:1927] is a pleasant account of the basic facts, and I
follow his exposition in the first part.
Part I. Geometry
1. Convex bodies
The following is relatively straightforward:
1.1. Proposition. Any closed convex set in Rn is the intersection of all half-spaces containing it.
For example, the unit disk in the plane is the intersection of all half-planes x cos θ + y sin θ ≤ 1.
Proof. Let Ω be a closed convex set, P an arbitrary point not in Ω. It is to be proved that there exists a
hyperplane f (x) = 0 such that f (P ) < 0, f (x) ≥ 0 for all x in Ω.
Since Ω is closed, there exists a point Q in Ω at minimal distance from P . I may as well take Q to be the origin
of the coordinate system.
I claim that the hyperplane passing through Q and perpendicular to the line through P and Q will do. The
equation of the corresponding half space that contains P is
f (x) = x • P ≥ 0 .
It is to be shown if R lies in Ω then R • P ≤ 0. If R lies in Ω so does tR for all small t. We now have
kP − tRk2 = kP k2 − t (P • R) + t2 (R • R) .
This implies that tR is closer to P for small t, a contradiction.
Proposition 1.1 can also be interpreted in terms of duality. There are three slightly different notions of duality
that are useful in this business.
A cone in Rn is a non-empty region that is invariant under multiplication by non-negative scalars.
If Ω is a closed convex subset of Rn , its dual Ω∨ is the set of all affine functions f such that f (Ω) ≥ 0. It is a
closed cone in the space of affine functions, which has dimension n + 1. It is convex, since f ((1 − t)x + ty) =
(1 − t)f (x) + tf (y) for any affine function f . I’ll call Ω∨ simply the dual of Ω. The space of affine functions
is not just a vector space, but one with a particular coordinate, taking f to its constant term.
Polytopes and the simplex method
4
If A is a closed cone in the vector space of affine functions, I’ll define its affine dual A∧ to be the subset of all
x in the original vector space such that f (x) ≥ 0 for all f in A. It is also convex and closed. Any x in Ω is of
course a point in the affine dual of Ω∨ .
The following is just a rewording of Proposition 1.1.
1.2. Corollary. If Ω is a closed convex subset of Rn , then the embedding of Ω into the affine dual of Ω∨ is a
bijection.
1.3. Corollary. If X and Y are two closed convex subsets of Rn then X = Y if and only if X ∨ = Y ∨ .
b is the set of all linear functions f such that f (Ω) ≥ 0.
Finally, if Ω is a closed convex cone in Rn , its dual cone Ω
b . Another corollary
It is again a closed convex cone. The cone Ω embeds canonically into the linear dual of Ω
of the Theorem:
b is a
1.4. Corollary. If Ω is a closed convex cone in Rn , then the embedding of Ω into the linear dual of Ω
bijection.
b = Yb .
1.5. Corollary. If X and Y are two closed convex cones in Rn then X = Y if and only if X
If Ω is any subset of Rn , its affine support is the smallest affine subspace containing it. I’ll the dimension of
that subspace the affine dimension of Ω. If the affine dimension of a region is n, then its affine support is all
of Rn and it must contain a simplex with n + 1 vertices. This happens if and only if its interior is non-empty.
It may happen that a convex region Ω is invariant under translation by vectors in some linear subspace.
Define Inv(Ω) to be the linear space of all such vectors.
b contains the line of all scalar multiples of y . This illustrates a
If Ω is the non-negative x-axis in R2 , its dual Ω
general phenomenon:
1.6. Proposition. The affine support of Ω is equal to m ≤ n if and only if Inv(Ω∨ ) has dimension n − m.
In the rest of this note I’ll be concerned with a limited class of convex sets, those which are the intersections
of a finite number of half-spaces. I shall call these convex polytopes, What I’ll be mainly interested in are
algorithms to deal with convex polytopes, but I’ll begin with a few basic theorems.
In the theory of convex polytopes there is one really basic theorem, from which many others follow.
Suppose X to be a finite set containing n linearly independent points. A linear half-space f ≥ 0 will be called
extremal (for X ) if its boundary contains n − 1 linearly independent points of X . If X is finite, there can be
only a finite number of extremal half-spaces.
P
The homogeneous convex hull of X ⊆ Rn is the set of all finite sums X cx x with all cx ≥ 0, the convex
homogeneous cone spanned by X . For example, if X is a collection of n linearly independent points its
homogeneous convex hull is a simplicial cone, and every subset of n − 1 points determines an extremal
half-space.
For homogeneous convex polytopes, we have the following refinement of Proposition 1.1:
1.7. Theorem. If X is a finite set of points in Rn containing n linearly independent points, then its homoge-
neous convex hull is the intersection of all linear half-spaces extremal for X .
In particular, it is the intersection of a finite number of linear half spaces.
I’ll postpone the proof to the next section, but I’ll now exhibit here a number of consequences.
1.8. Corollary. Suppose that the finite set X in Rn contains n linearly independent points. Let F = {f
Pi } be
the set of extremal linear functions for X . If f is a linear function, then f ≥ 0 on X if and only if f =
with ci ≥ 0.
ci f i
Proof. From Corollary 1.5, since if Ω is the homogeneous convex hull of X and C is the homogeneous convex
b both have dual cones equal to Ω.
hull of F then according to the Theorem C and Ω
Polytopes and the simplex method
5
1.9. Corollary. Any homogeneous convex polytope Ω in Rn is the convex hull of a finite number of rays.
Proof. By Corollary 1.8, this is true if Ω is the dual of a homogeneous cone whose interior is not empty, which
is the case if and only if Inv(Ω) = {0}. But it is straightforward to reduce the general case to this one.
1.10. Corollary. Any closed convex polytope in Rn is the convex hull of a finite number of points and rays.
Proof. I begin with a construction that will be useful elsewhere as well.
The space Rn may be embedded into Rn+1 as an affine subspace: x 7→ (x, 1). Rays in Rn+1 that intersect
xn+1 = 1 correspond to points in Rn , while rays in Rn+1 in the hyperplane xn+1 = 0 correspond to rays in
Rn . If X is a closed convex subset of Rn , let X be the unique homogeneous cone generated by the image of
X . It is the homogenization of X .
This Corollary now follows from Corollary 1.9 applied to X .
Polytopes and the simplex method
6
2. Proof of the basic theorem
It may very well happen that the homogeneous convex hull of a set is all of Rn , in which case the only linear
function f with f (X) ≥ 0 is the constant 0. In this case the set of extremal half-spaces is empty. In general,
the only problem is to show that there are enough extremal half-spaces to determine X .
Proof. Let Ω be the homogeneous convex hull of the set X . We are assuming it to contain an open subset of
Rn .
One half of the Theorem is trivial, since Ω is certainly contained in the intersection of all extremal half-spaces.
It remains to be shown that if a point lies in the intersection of extremal half-spaces then it lies in Ω.
The proof proceeds by induction on n. If n = 0 there is nothing to be proven. If n = 1 then the points of X
span a non-trivial segment. If this segment contains 0 in its interior, then the homogeneous convex hull is all
of R, and there are no linear functions cx taking a non-zero value on all of X except 0 itself. Otherwise, the
homogeneous convex hull is a half-line with 0 on its boundary, and again the result is straightforward.
Assume the Theorem to be true for dimension up to n − 1, and suppose X to have affine dimension n. There
are two parts to the argument, which is constructive. The first either produces an extremal half-space or
demonstrates that the homogeneous convex hull of X is all of Rn . The second proves the theorem when it is
known that extremal half-spaces exist.
Part 1. Suppose α = 0 to be any hyperplane containing n − 1 linearly independent points of X . Either all
points of X lie on one side of this hyperplane or they do not. If they do, then this hyperplane is extremal,
and we can pass on to part 2. Suppose they do not. Let e be a point of X with hα, ei < 0. We may as well
assume α to be the coordinate function xn , and e = (0, 0, . . . , −1). Let Y be the projection onto the plane
xn = 0 of the points X + of X with xn ≥ 0. The new system Y is non-degenerate of dimension n − 1, by
assumption on α. Either it possesses an extremal half-space hβ, xi ≥ 0 with hβ, xi = b1 x1 + · · · + bn−1 xn−1
or it does not. If it does, the half-space hβ, xi + xn ≥ 0 is extremal and contains, in addition to the points in
X + the point e.
e
In the second case, when Y does not possess an extremal half-space, we can apply the induction assumption.
Every point in the hyperplane hα, xi = 0 must be in the convex hull of Y . Adding positive multiples of e, we
see that every point where xn ≤ 0 is in the convex hull. Since we have eliminated the case where all points
of X lie on one side of α = 0, we can find a point with xn > 0; a similar argument implies that every point
where xn ≥ 0 is also there.
Part 2. At this point we know there exist extremal half-spaces containing Ω. Let ∆ 6= ∅ be a set of affine
functions α such that the inequalities α > 0 define them.
Choose a point e in the interior of the homogeneous convex hull of X = {xi }. Thus hα, ei > 0 for every α
in ∆. Let p be an arbitrary point in the intersection of extremal half-spaces, so that also hα, pi ≥ 0 for all α in
∆. We want to show that p is in Ω, or that
X
p=
with all ci ≥ 0. We may assume that p 6= e.
ci xi
Polytopes and the simplex method
7
The set of all
λe + µp
with λ, µ ≥ 0 is a two-dimensional wedge. Any point of the same form with λ < 0, µ ≥ 0 will be in the
plane spanned by this wedge, outside this wedge on the far side of the line through p. Since
hα, p − λei = hα, pi − λhα, ei
and hα, ei < 0, the ray of all q = p − λe with λ > 0 will find itself eventually in the region where α < 0. It
will cross the hyperplane α = 0 when
λ=
hα, pi
.
hα, ei
If λ is chosen to be the least value of hα, pi/hα, ei as α ranges over ∆, then
(1) p = q + λe where λ ≥ 0 and e is in the homogeneous convex hull of X ;
(2) hλ, qi ≥ 0 for all α in ∆;
(3) hα, qi = 0 for some α in ∆.
In order to prove that p is in the homogeneous convex hull of X , it suffices to show that q is in it.
We want to apply the induction hypothesis to q . Let Y be the set of all x in X on the hyperplane Hα where
α = 0, which contains q . Because α is an extremal support, there exist at least n − 1 linearly independent
points in Y , so that it is non-degenerate in n − 1 dimensions. Choose coordinates so that α = xn . By the
induction assumption, the convex hull of Y is the intersection of all its extremal half-spaces in Hα . I claim:
2.1. Lemma. Any extremal hyperplane for Y = X ∩ Hα in Hα can be extended to an extremal hyperplane
for X in Rn .
Proof of the Lemma. Suppose
hβ, yi = b1 x1 + · · · + bn−1 xn−1 ≥ 0
be one of the extremal half-spaces for Y in Hα . The function β will vanish on a subset Υ ⊆ Y of essential
dimension n − 2. Any extension of β to X will be of the form
hB, xi = b1 x1 + · · · + bn−1 xn−1 + cxn
for some constant c.
I claim that c can be chosen so that the B ≥ 0 on X and meets at least one point x in Y − X . This means no
more and no less than that it defines an extremal half space forX . The conditions on B are that
hB, xi = b1 x1 + · · · + bn−1 xn−1 + cxn ≥ 0
for all x in X − Y , and that hB, xi = 0 for at least one of these x. These require that
−c ≤
b1 x1 + · · · + bn−1 xn−1
xn
for all x in X − Y , with equality in at least one case. But X is finite and xn > 0 on X − Y , so the right hand
side will achieve a minimum for some c < 0 as x varies.
Part II. Basic procedures
Polytopes and the simplex method
8
3. Introduction
In the remainder of this note I’ll explain how to implement the basic theorems about convex polytopes in
practical terms.
There are basically three phases to dealing with a polytope defined initially by inequalities. (1) We are
given the matrix of inequalities defined by affine functions, and we find a coordinate system adapted to
the inequalities in the sense that (a) some of the coordinates lie among the affine functions expressing these
inequalities and (b) the other inequalities may be expressed in terms of those. (2) From this, we find a
coordinate system which satisfies in addition the condition that its origin is a vertex of the polytope. This
should be thought of as giving us a handle (almost literally) on the polytope. (3) We use this coordinate
system to explore the polytope, in any one of a dozen ways. We could start traversing through a subset of
vertices, for example, basically running along its edges. Which vertices we examine depend on what our
purpose is. If we want to find the entire facial structure, we must traverse the entire set. If we want to find
the maximum of a linear function, just a restricted subset. If we just want to know the dimension of the
polytope, we can poke around in the tangent cone at one vertex.
If we are given a finite set of inequalities such as
x≥1
y+2≥0
y ≤ x+1,
we can rearrange them so as to be in a particular format:
x−1≥0
y+2≥0
x−y −1 ≥ 0.
I. e. from now on, I shall assume that we are given a finite set of affine functions fi of the form
f (x) =
n−1
X
ai xi + an
0
defining the region Ω in which fi ≥ 0. We can even take an equation f (x) = 0 into account, replacing it by
inequalities f (x) ≥ 0 and −f (x) ≥ 0. I should say immediately that I shall work only with functions whose
coefficients are rational.
Suppose that we start with m functions of n variables in this system, expresing the inequalities. The first step
in almost all computations with these data is to find a coordinate system in which the coordinate functions
are among the fi . Therefore we begin by looking for a minimal subset of the fi in terms of which all the
others may be affinely expressed. For example, in the system above the first two form an affine basis since
x − y − 1 = (x − 1) − (y + 2) + 2.
However, this might not be possible, because the fi might not be a large enough set. For example, you
might have just a single inequality f ≥ c in three dimensions, with f linear. This phenomenon is easy to
deal with—we shall find a maximal subset of the fi whose linear components are linearly independent. This
problem is related to some other phenomenon—the region we are examining might be invariant with respect
to a large space of vector translations. For the example, the region f (x) ≥ c is invariant under translations
along vectors in the plane f = 0.
A coordinate system in which (1) some of the fi are among the coordinates and (2) all the others may be
expressed in terms of those I shall call one adapted to the set of inequalities. It constitutes an affine frame of
Polytopes and the simplex method
9
the system. Given one, we replace the given set of inequalities by those in which all are expressed in terms
of the given basis. Setting X = x − 1, Y = y − 2, the system above thus becomes the simpler
X ≥0
Y ≥0
X −Y +2 ≥ 0.
The choice of the first two functions as basis is not canonical—any two of the three given will also serve. As
there is here, in general there might be several choices for such a coordinate change. In any case, finding
an adapted coordinate system amounts to finding an affine transformation that makes the original system
somewhat simpler. In a later section I examine the effect of a simple affine coordinate transformation on a
system of inequalities.
4. Affine coordinate transformations
An affine coordinate change from (xi ) to (yi ) is one of the form
x
A a
y
=
.
1
0 1
1
Write affine functions as row vectors, so
x
f (x) = [ ∇f c ]
,
1
in which ∇f is the linear part of f , and the vector representing it is made up of the coordinates of f in this
coordinate system. Then
x
A a
= [ ∇f c ]
1
0 1
[ ∇f c ]
so that its expression in y coordinates is
[ ∇f c ]
A a
0 1
y
1
.
In our case, we are going to be making very special coordinate changes by a process known as pivoting. We
are given a coordinate system (xi ) and a set of linear functions fj , and we are going to swap some fℓ for a
variable xk . At the same time, we want to express all of the fj , which are given to us in a matrix, in terms of
the new variables. This reduces to the following problem: we are given
fℓ =
n−1
X
ajℓ xj + anℓ ,
0
in which akℓ 6= 0 (so that the coordinate transformation is legitimate). We are also given the
fi =
n−1
X
aji xj + ani (j 6= ℓ) ,
0
each of which we wish to convert to an expression with fℓ replacing xk . We get
X j
xk = (1/akℓ ) fℓ −
aℓ xj − anℓ
j6=k
fi =
X
aji xj
j6=k
=
X
j6=k
In effect, we are
X j
+ (aki /akℓ ) fℓ −
aℓ xj − anℓ + ani
j6=k
aji
−
(ajℓ /akℓ )aki xj
+ (aki /akℓ )fℓ + ani − (anℓ /akℓ )aki .
Polytopes and the simplex method
10
(a) dividing the k -th column of F by akℓ ,
(b) for each j =
6 k (including j = n), replacing its j -th column cj of F by cj − pck , with p = ajℓ /akℓ .
If i = ℓ, we get a single 1 in row ℓ, and in fact the column operations are designed to have exactly that
effect. Since column operations are equivalent to multiplication by certain (n + 1) × (n + 1) matrices on the
right, this is consistent with earlier remarks on coordinate change. Explicitly, the matrix by which we finally
multiply is that obtained by performing the same column operations on the identity matrix In+1 . These
effects accumulate.
This operation is called pivoting on the pair (k, ℓ).
5. Gauss elimination
The first step in the process is to finding an adapted coordinate system. To begin, let F be the m × (n + 1)
matrix of affine functions at hand (with constants in the last column). Let ∇F be the matrix whose rows are
the linear components ∇fi of the fi (i.e. all but the last column of F ), and let ϕ be the last column. Thus
F = [ ∇F ϕ ] .
The next step is, essentially, to reduce ∇F to Gauss-Jordan form by applying elementary column operations,
or equivalently multiplying F on the right by certain matrices effecting those operations. As we have seen
in the previous section, this amounts to a linear change of coordinates.
Doing this, we are replacing the original coordinates by new ones, and at the same time we are keeping track
of this transformation. At the end we’ll have a matrix E with ∇E in column echelon form.
We shall have found a subset of rows of E with the properties: (a) their linear parts are linearly independent;
(b) they are maximal satisfying (a); (c) the ith-row among them is the highest row linearly independent of the
previous i − 1. If c is the (column) rank of the matrix, then for 1 ≤ j ≤ c we are given a row rj corresponding
to the j -th basis function frj . In row rj there is a 1 in column j and 0 in all but the last column. All entries
ei,j with i < rj also vanish.
Finally, we can arrange by column operations that the final entries in the rows rj also become 0.
At the same time, keep track of the operations performed by applying the same operations to an (n+1)×(n+1)
identity matrix, which at each step will be of the form
A ρ
R=
0 1
.
Here is a final sample pattern, with r1 = 1, r2 = 2, r3 = 4:

1
0

∗

0

∗

∗
∗
0 0
1 0
∗ 0
0 1
∗ ∗
∗ ∗
∗ ∗
0
0
0
0
0
0
0

0
0

∗

0 .

∗

∗
∗
In other words, embedded in ∇E is a copy of the c × c identity matrix Ic . Here c is what I call the nominal
dimension of the system. The rows of this matrix are the rows corresponding to the subset of the fi we have
found to form part of an affine coordinate system. The coordinates of others with respect to this basis are
laid out in the other rows.
The affine transformation matrix R records the accumulated coordinate changes. Thus Fx = Fy · R, if x
and y are the original and new coordinates. Columns c + 1 through n − 1 of R record a basis of vectors
Polytopes and the simplex method
11
(in the original coordinate system) whose translations leave the inequalities invariant. The region defined
by the original inequalities is the product of this space with a region in Rc defined by the matrix we get by
eliminating these columns, which we do. We may assume from now on that c = n, or equivalently that the
region does not contain a line.
In the end we shall be locating certain points specified in the new coordinate system. But we shall also need
to know their coordinates in the original coordinates. We have
xold
1
=
A ρ
0 1
xnew
1
.
From now on, we’ll assume that we have the system expressed in terms of an adapted affine coordinate
system. This means that the region is described by coordinate inequalities yi ≥ 0 together with more
inequalities determined by affine functions expressed in terms of the yi .
6. An example
Suppose given the system of inequalities
2x + y ≥
1
x + 2y ≥
x+y ≥
1
1
x − y ≥ −1
3x + 2y ≤ 6 .
(4/5, 9/5)
1
x−
−3
y=
x
−
y
+
2x
=
−
1
We have the following picture, where the shading along a line f = c indicates the side where f − c ≥ 0. The
convex region defined by the inequalities is outlined in heavy lines.
(0, 1)
=
2y
−6
x
+
x+
(1, 0)
y
=
2y
=
(5/2, −3/4)
1
1
We start with


2
1 −1
2 −1 
 1


Fx =  1
1 −1  .


1 −1
1
−3 −2
6
It reduces in three steps:


1
0
0
3/2 −1/2 
 1/2


F =  1/2
1/2 −1/2  ,


1/2 −3/2
3/2
−3/2 −1/2
9/2


1/2 −1/2 1/2
R= 0
1
0 .
0
0
1
Polytopes and the simplex method
12


0
0 

−1/3  ,

1
13/3

0
0 

−1/3  ,

1
13/3
1
0
1
 0

F =  1/3
1/3

1
−1
−4/3 −1/3

1
0
1
 0

F =  1/3
1/3

1
−1
−4/3 −1/3


2/3 −1/3 1/3
R =  −1/3
2/3 1/3 
0
0
1


2/3 −1/3 1/3
R =  −1/3
2/3 1/3 
0
0
1
The affine basis chosen is thus the pair of functions
2x + y − 1
x + 2y − 1
and expressed in this basis the system of inequalities is that corresponding to the matrix
(6.1)

1
0
1
 0

Fy = Fx · R =  1/3
1/3

1
−1
−4/3 −1/3

0
0 

−1/3  .

1
13/3
The origin of the coordinate system with this basis is the point (1/3, 1/3) in the original coordinates, the
intersection of the two lines
x + 2y − 1 = 0
2x + y − 1 = 0 .
+
2x
y=
1
x+
2y
=
1
It is not a vertex of the region defined by the inequalities. This agrees with the sign of the constant in the
third row above:
[ 1/3
1/3 −1/3 ]
because this means that the affine function originally x + y − 1 is now (x + y − 1)/3, and is negative when
evaluated at the new origin.
Polytopes and the simplex method
13
7. Vertex frames
The next step in the simplex method is to find a vertex of the region as well as a coordinate system adapted to
the inequalities whose origin is this vertex. I’ll call this a vertex-adapted coordinate system. Such a coordinate
system is equivalent to a frame of the affine space based at the given vertex satisfying the condition that any
n − 1 of the vectors in that frame span one of the spaces fi = 0. I’ll call one of these a vertex frame. This
is the basic structure in the simplex method, and finding one is usually the first step in any exploration of
a convex polytope. The point is that any vertex of the polytope may be reached by progressing through a
succession of vertex frames by following edges of the frames one encounters. One does this by pivoting, in
the sense defined above.
In my programs, a vertex frame amounts to (a) an m × (n + 1) matrix F , each row of which is an affine
function defining the region involved, in the current coordinate system; (b) a list of the rows of this matrix that
represent the current coordinates; (c) for each of these, an index specifying which column in F corresponds to
it; (d) an (n + 1) × (n + 1) matrix R specifying the affine transformation from current to original coordinates.
There is also, for technical reasons, (e) a list of the rows that are not the coordinates.
The simplest vertices are those for which where the region is locally simplicial—given locally at the origin
exactly by inequalities f ≥ 0 for f in the basis, but this does not always occur. This simple situation is the
generic case, in the sense that perturbing a given system almost always turns any given polytope into one
all of whose vertices are simplicial. A vertex not of this form is called singular. Keep in mind that it is the
system that is singular, not necessarily the polytope itself. In two dimensions, for example, all vertices are
geometrically simplicial, but systems of inequalities may nonetheless be singular, as the example above is at
the vertex (0, 1).
In that example, the system is singular at (0, 1) but not at the other three vertices of the region. The singularity
at (0, 1) arises because the three bounding lines
2x + y − 1 = 0
x+y−1=0
x−y+1=0
y=
x
−
y
+
2x
=
−
1
all pass through that point. Any two of the corresponding affine functions will form a vertex basis. Only the
last two lines actually bound the region. But any two of these three might be the local coordinate system of
a vertex frame.
1
x
+
y
=
1
Being a vertex basis is a local property—it depends purely on the geometry of the region fi ≥ 0 in a
neighbourhood of the origin.
We’ll see later how to find a vertex-adapted coordinate system, given an adapted system whose origin might
not be a vertex.
Polytopes and the simplex method
14
8. Pivoting
The basic step in using the simplex method is moving from one vertex basis to another—and, hopefully, from
one vertex to another—along a single edge of the polygon. (Or rather, to be precise, along what you might
call a virtual edge.) Geometrically we are moving from one vertex to a neighbouring one, except that in the
case of a singular vertex where more than the generic number of faces meet the move may be a move along
an edge of length 0. In all cases, this basic step is called a pivot. It is executed by pivoting the matrix F .
The explicit problem encountered in pivoting is this: we are given a vertex basis and a variable xc in the basis.
The xc -axis represents an edge (or possibly a virtual edge, as I’ll explain in a moment) of the polygon. We
want to locate the vertex of the polygon next along that edge, and replace the old vertex frame by one adapted
to that vertex. The variable xc is part of the original basis. The new basis will be obtained by replacing that
variable by one of the affine functions fr in the system of inequalities defining the polygon. In applications,
the exiting variable xc is chosen according to various criteria which depend on the goal in mind, as we shall
see in the next section. Sometimes there are several choices for fr as well.
ignore But there are some conditions on the possible fr . It must be that the new vertex is also actually a
vertex of the region Ω concerned. This puts a bound on how far out along the xc -axis we can go. Suppose
f (x) =
X
ai xi + an .
We know already that an ≥ 0, because our vertex does lie in Ω. On the xc -axis
f (x) = ac xc + an .
If ac ≥ 0, there is nolimit to xc . But if ac < 0, we must have
ac xc + an ≥ 0,
xc ≤
an
.
−ac
c
This has to hold for all the functions fr , so the new value of xc is the minimum value of −an
r /ar for those
c
r for which ar < 0. Given this choice of variable xc , and without further conditions imposed, the possible
choices for the new basis variable will be any of the fr producing this minimum value of xc .
In the example we have already looked at, we could start with coordinates x + 2y = 1, x + y = 1 at the vertex
(1, 0) and move along the edge x + y = 1, with xc = x + 2y − 1. We can only go as far as the vertex (0, 1),
because beyond there both of the affine functions x − y + 1 and 2x + y − 1 become negative. We can choose
either one of them to replace x + 2y − 1 in the vertex coordinates.
When a pivot is made, the current coordinate frame is the given vertex frame, and in the course of the pivot
it is changed to the new vertex frame. This involves expressing all the fi in terms of the new basis.
Polytopes and the simplex method
15
How does it work in general? Given a vertex frame, any of the fi may be expressed as
f i = ai +
X
ai,j xj
j
with the ai ≥ 0 since the origin is assumed, by definition of a vertex frame, to lie in the region fi ≥ 0. We
propose to move out along the directed edge where all basic variables but one of them, say xc , are equal to
0, and in the direction where xc takes non-negative values. This amounts to moving along an edge of the
polytope. We shall move along this edge until we come to the vertex at its other end, if there is one, or take
into account the fact that this edge extends to infinity. So we must find the maximum value that xc can take
along the line it is moving on. Along this edge the other xj vanish, and we have each
fi = ai + ai,c xc
There are two cases to deal with:
(a) If ai,c ≥ 0 there is no restriction on how large xc can be;
(b) if ai,c < 0 then we can only go as far as xc = −ai /ai,c .
Therefore if all the ai,c are non-negative then the edge goes off to infinity, but otherwise the value of xc
is bounded by all of the −ar /ar,c . In that case, we choose r such that ar,c < 0 and −ar /ar,c takes that
minimum value among those where ai,c < 0, and set xc equal to it. We also change bases—replacing xc
by fr . The minimum value might in fact be 0, in which case we change vertex frames (i.e. virtual vertices)
without changing the geometric vertex. This sort of virtual move will happen only if the vertex is singular,
which means as I have said that more than the generic number of faces meet in that vertex, and the maximal
value of xc will be 0.
Thus our choice of r is made so that
ai − (ar /ar,c )ai,c ≥ 0
for all i, since if ai,c ≥ 0 there is no condition (ai , ai,c and −ar /ar,c are all non-negative) and if ai,c < 0 then
by choice of r
−
ai
ar
≥−
.
ai,c
ar,c
When we change frame, we have to change all occurrences of xc (which becomes just a function fc in our
system) by substitution. In short, we pivot.
Sometimes, as in the example, there will be several possible choices of fr , if the target vertex isn’t simplicial.
Indeed, in practice when pivoting is done in an application there may be several possible edges to move
out along—i.e. which variable xc to exit from the basis. The most important thing in all cases is to prevent
looping. There are two commonly used methods to do this. One is a method of choosing both xc and fr ,
while the other is fussy about only fr . I’ll explain these in the next section.
9. Tracking coordinate changes
As we move among vertices, changing from one vertex basis to another, we’ll want to keep track of where we
are in the original coordinate system. We can recover the location of a vertex from the basis, since it amounts
to solving a system of linear equations. But this is a lot of unnecessary work. We can instead maintain some
data as we go along that make this a much simpler task. What data do we have to maintain in order to locate
easily the current vertex? A slightly better question: What data do we have to maintain in order to change
back and forth between the current basis and the original one?
Keeping track of location comes in two halves: changing back and forth between the original coordinate
system and the one determined by the adapted coordinate system we first came up with; changing back and
forth between that frame and the current vertex frame. I have already described the construction of an initial
adapted frame. After that, the matrix R, substituting fr for xc . Thus is done by exactly the same column
operations that are performed on F , and in programs R is often tacked onto the bottom of F to make this
convenient. Its rows are not listed in either of the two lists accompanying the data structure, of course.
Polytopes and the simplex method
16
10. Finding a vertex basis to start
What remains is the problem of finding an initial vertex basis of a system fi ≥ 0 (or finding out that the
system is inconsistent and that no vertex exists). This will be done being given already an admissible affine
coordinate system.
One of the more elegant features of the simplex method is that this will turn out to be a special case of the
original problem that motivated the simplex method, that of finding the maximum value of a linear function
on a polytope, given an initial vertex frame. I postpone to the second part the detailed explanation of how
this works. At any rate, the solution of the problem here is to use pivoting applied to a simpler problem in
space of one dimension higher obtained by homogenizing the original one—a problem for which an initial
vertex frame is obvious.
Homogenize the affine functions with an added variable, say x∗ , as the coefficient of the constant terms.
Write out the matrix of the fi as rows. Add x∗ ≥ 0 and x∗ ≤ 1 to the list of inequalities. The origin is a vertex
of the region involved, and the xi form a vertex frame. That is to say, extend F by adding an extra column
(with penultimate column for the new variable x∗ ) and two extra rows (for x∗ and 1 − x∗ ). The idea is now
to maximize x∗ .
−1
0
1
x≥0
x≤1
x∗
∗
+
≥
x
0
x ≥ −1
x
+
0
−
x
≥
x≥0
−x∗ + 1 ≥ 0
x∗ ≥ 0
It should evident from the picture that the function −x∗ + 1 has to be in the final frame, if the region is not
empty.
Let’s look at what happens with our example. We start with the affine frame whose matrix is that in (6.1):

1
0
1
 0

Fy =  1/3
1/3

1
−1
−4/3 −1/3

0
0 

−1/3  .

1
13/3
This becomes the homogeneous system

1
0
1
 0

1/3
 1/3

−1
 1

 −4/3 −1/3

0
0
0
0
0
0
−1/3
1
13/3
1
−1

0
0

0

0 .

0

0
1
It is a system in three dimensions. The last inequalities are
x∗ ≥ 0
− x∗ + 1 ≥ 0 .
Polytopes and the simplex method
17
We just need to find some edge, any true edge, leading away from the origin. If there is none, the original
region is empty. Otherwise the maximum value of x∗ is 1. In this case, −x∗ + 1 will be one of the coordinate
functions. We’ll recover a vertex frame of the original problem by throwing away this from the basis we have
at the end, throwing away x∗ ≥ 0 from the set of inequalities, and setting x∗ = 1. In the example, we wind
up with the vertex frame

−1
3
0
 1

1
 0

−2
3
1 −4

1
0

0

2
3
The new basis functions are those which in the original coordinate system were
x + 2y − 1,
x+y −1,
and the vertex is what was originally (1, 0).
This process will be a bit unusual from the standpoint of linear programming, since all moves except the
final one will be virtual. What we are doing, roughly, is moving around in the ‘web’ of frames defined by the
homogenized functions fi .
Part III. Applications
11. Finding maximum values
P
The basic problem inP
linear programming is to maximize an affine function z = c +
ci xi subject to the
conditions fi = ai +
ai,j xj ≥ 0. Here, as we change vertices we keep track of the expression for z in the
current coordinate system. The basic idea is to choose the edge to follow by requiring that z increase on this
edge, until z is maximized and no choice of edge is possible. But if the move is virtual then something has
to be done to avoid looping.
One valid process to follow is to choose the exiting and entering variables to have the least possible subscript.
It is a theorem of R. Bland, explained in Chvátal’s book on pages 37–38, that this will always work. This
works because at any vertex that is the intersection of n hyperplanes, either one edge goes down, or it is part
of the bottom of the polytope.
This is pretty simple. Not so simple is the perturbation method, which has as variant the lexicographic
method. This also is explained in Chvátal’s book. The variant is the practical implementation of the first,
which is simpler to understand. In this, as soon as we have found an affine basis, we add to each of the ‘slack’
variables fi a symbolic constant εi with these subject to conditions
1 ≫ ε1 ≫ ε2 ≫ · · · ≫ εℓ > 0 ,
Polytopes and the simplex method
18
in which ℓ is the number of non-basic inequalities. Then whenever we pivot, the constant terms in all the
variables and the objective function will be linear combinations
of numbers and these εi . In effect, we are
P
setting the constants to be some kind of ε-numbers c0 +
ci εi , which we can add, subtract, and multiply
by scalars. Such numbers are compared in the natural way, which amounts to lexicograophic ordering of
the arrays. Furthermore, with these adjustments th system becomes simplicial—none of the functions fi (i.e.
no variable other than the basis variables) will ever evaluate to 0 at the vertex, because the different linear
combinations of the εi will always be linearly independent (pivoting preserves this property, and the matrix
starts with I ). Thus, no pivots will ever be virtual.
At the end, set all εi = 0. This method has the property that we are allowed to go out along any coordinate
edge, and need only make a choice among the functions that will become part of the basis. This happens
only when there are ties. This feature plays a role in the process discovered by Avis and Fukuda that I shall
mention later on.
There is an equivalent description common in the literature, which also uses lexicographic ordering of arrays.
The best account I know is in [Apte:2012], although more cryptic accounts are easy enough to find. It does
not use ε-numbers, but expands the matrices of dictionaries instead. I prefer the version given here, which
seems to be more intuitive even if slightly less efficient.
The advantage of either one of these lexicographic strategies is that it is only the choice of new variable toset
as coordinate that uses it. This makes it exactly suitable for the lrs method of Avis and Fukuda for traversing
all vertices of a conves polytope from top down.
12. Finding all vertices of a polygon
If we simply want to locate all possible vertices of a convex region, starting with one vertex frame. The
traditional method from this point on is to use a stack to index and locate vertex bases, collecting them into
equivalence classes. Note that each vertex will correspond to a subset of the original set of functions, so our
search is finite. (Indeed, a good way to specify a vertex, at least when using exact arithmetic, is to tag a
vertex by the set of functions vanishing there.) On the other hand, it may happen that we have a singular
vertex where several different sets of basic variables give rise to the same geometrical vertex. If we are doing
floating point arithmetic, because of possible floating point round-off we should ignore this possibility, and
proceed as if it doesn’t occur. It shouldn’t cause trouble—a single point might decompose into several, but
they will all be close together and effectively indistinguishable.
Of course the region may be unbounded, in which case we should find the infinite edges of the polytope, too.
This method locates all edges and vertices by the simplex method, indexing the vertices by the set of basis
variables, edges by the coordinate of the edge. Store vertices and edges as we find them. Chvátal recommends
using a queue for this, but I don’t see why a stack wouldn’t work as well. Vertices are pushed onto the stack as
they are met. They are distinguished by the subset of basic functions, so two vertices with the same geometric
vertex might be distinguished in this scheme. Most serious is the fact that each vertex comes equipped with
its dictionary, which must be transformed in going from one vertex to a neighbour through a pivot. This
makes the algorithm here of order n2 for n inequalities.
[Avis-Fukuda:1992] and [Avis:2000] describe a better method. I’ll explain this in detail in a later version of
this essay.
Pass through the questions I ask at the beginning.
There is another problem I’d like to also take into consideration—that of finding for a point of space the nearest
point of the polytope. The first step is to scan all the inequalities, to see which the given point satisfies. If
it satisfies all of them, then the point lies in the region, and we are through. Otherwise, throw away those
satisfied by the point, and calculating perpendicular projections onto the remaining facial supports. Finally,
throw away the projections that do not lie strictly inside the corresponding face. This is not very efficient,
but is there a better way?
Polytopes and the simplex method
19
13. Finding the dimension of a convex region
The dimension of a polytope makes sense in computation only when exact arithmetic is used, because the
smallest of perturbations can change a set of low dimension to an open one.
Start with trying to find a vertex frame. If there is none, the dimension is −1. Next, throw away all but the fi
vanishing at the origin, so we are looking at the tangent cone of the vertex. Take as function to maximize the
last basis variable; then find its maximum. If this is 0, we reduce the problem by eliminating that variable.
Or we come across an infinite edge (axis of some xc ) out. In that case, throw away all those containing the
variable xc , in effect looking now at the tangent cone of that edge. Recursion: the dimension is one more
than the dimension of that tangent cone. The whole process amounts to a succession of one or the other of
these two strategies, applied recursing on rank. The case of rank one is straightforward—if the rank is 1,
then in case (1) the dimension is 0 while in case (2) it is 1.
14. Implementation
I follow [Chvátal:1983] closely. Classically one uses tableaux, but Chvátal replaces them by what he calls
dictionaries. All the operations I have described amount to constructing an initial dictionary, and then
proceeding through a sequence of them according to what one is looking for.
So in my programs, an LP dictionary is a Python class with data an m × (n + 1) matrix whose rows
represent affine inequalities fi ≥ 0. In initialization, we pass an arbitrary matrix of this type, and from it
we immediately turn it to an affine basis together with R, ρ. After the initialization we have lists of basis
variables and dependent variables. Through all subsequent operations we maintain these structures. In
effect, each inequality represents one of the possible basic variables (these are called slack variables in the
literature).
The next step is to turn this affine basis into a vertex basis. After that, what happens on what the ultimate
goal is. The basic routine is a pivot, whose argument is one of the basis variables. This variable is removed
from the basis variables, and replaced by a new one of the affine functions, which is chosen according to the
minimal subscript rule. In the pivot, the matrix of affine functions is expressed in terms of the new basis.
I have said that the conventional terminology is bizarre. The most bizarre usage is that the affine functions
among the inequalities that are not coordinates in a frame are said to make up the basis of that frame. Thus,
when a function f (x) ceases to be one of the coordinate system it is said to ‘enter the basis’, and when
another is chosen as the new coordinate, it is said to ‘leave the basis’. This terminology is not aimed at
mathematicians, but then most consumers of linear programming are not mathematicians, either.
Part IV. References
1. Jayant Apte, ‘Introduction to lexicographic reverse search’, available from a link on the web page
https://sites.google.com/site/jayantapteshomepage/
2. David Avis, ‘lrs: a revised implementation of the reverse search vertex enumeration algorithm’, pp.
177–198 in Polytopes—Combinatorics and Computation, edited by Gil Kalai and Günter Ziegler, BirkhäuserVerlag, 2000.
3. —— and Komei Fukuda, ‘A pivoting algorithm for convex hulls and vertex enumeration of arrangements
and polyhedra’, Discrete and Computational Geometry 8 (1992), 295–313.
4. Vašek Chvátal, Linear programming, W. H. Freeman, 1983.
5. Hermann Minkowski, ‘Theorie der konvexen Körper, insbesondere Begründung ihres Oberflächenbegriffs’, published only as item XXV in volume II of Gesammelte Abhandlungen, Leipzig, 1911.
6. Stephen Smale, ‘On the average number of steps of the simplex method of linear programming’, Mathematical programming 27 (1983), 241–262.
Polytopes and the simplex method
20
7. Daniel Spielman and Shang-Hua Teng, ‘Smoothed analysis: why the simplex method usually takes
polynomial time’, Journal of the ACM 51 (2004), 385–463.
8. Hermann Weyl, ‘Elementare Theorie der konvexen Polyeder’, Commentarii Mathematici Helvetici 7,
1935, pp. 290–306. There is a translation of this into English by H. Kuhn: ‘The elementary theory of convex
polytopes’, pp. 3–18 in volume I of Contributions to the theory of games, Annals of Mathematics Studies 21,
Princeton Press, 1950.
Download