Approximation Algorithms for Convex Hulls

advertisement
Technical Notes
Programming Techniques
and Data Structures
Fig. 1. A Point Set and its Convex Hull.
M . Douglas Mcllroy*
Editor
Approximation Algorithms
for Convex Hulls
LL/
Jon Louis Bentley and Mark G. Faust
Carnegie-Mellon University
Franco P. Preparata
University of Illinois at Urbana-Champaign
The problem of constructing the convex hull of a
finite point set in a Euclidean space arises in many
applications. In this paper we study a set of algorithms
for constructing approximate convex hulls. We show that
an E-approximate hull of N points in the plane can be
constructed in O(n + I/e) time. The planar algorithm has
been implemented and is very fast on point sets that
arise in practice. The method can be generalized to
compute hulls of point sets in higher dimensional spaces.
CR Categories and Subject Descriptors: F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems--geometricalproblems
and computations; G.2.1 [Discrete Mathematics]: Com-
binatorics--combinatorial algorithms.
General Terms: Algorithms, Design
Additional Key Words and Phrases: convex hulls,
approximation algorithms
I. Introduction
The convex hull of a point set in Euclidean d-space is
defined to be the smallest convex set containing the
points; a set of points in the plane and its convex hull
are illustrated in Figure 1. Because it is a precise characterization of the boundary of a point set, the convex
hull is used in many applications; for example, Shamos
[8] discusses its use in such areas as robust estimation,
Chebyshev approximation, least-squares isotonic regression, and clustering. In addition to its inherent uses,
computation of the convex hull also arises as an interThis research was supported in part by the Office of Naval
Research under Contract N00014-76-C-0370, in part by the National
Science Foundation under Grant MCS-78-13642, and in part by the
Joint Service Electronics Program under Contract N00014-79-C-0424.
Authors' Present Addresses: Jon Louis Bentley and Mark G. Faust,
Departments of Computer Science and Mathematics, Carnegie-Mellon
University, Pittsburgh, PA 15213; Franco P. Preparata, Coordinated
Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801.
* Former editor of Programming Techniques and Data Structures
of which Ellis Horowitz is the current editor.
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for direct
commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by
permission of the Association for Computing Machinery. To copy
otherwise, or to republish, requires a fee and/or specific permission.
© 1982 ACM 0001-0782/82/0100--0064 $00.75.
64
mediate step in many problems in computational geometry. For all these reasons, the computation of the
convex hull of a finite point set is an interesting algorithmic problem.
Much work has recently been devoted to algorithms
for computing convex hulls. Graham [4] proposed the
first algorithm for computing the hull of n planar points
in O(n lg n) worst-case time. (A number of different
worst-case algorithms have since been proposed for this
problem; see [8].) Preparata and Hong [7] later gave an
algorithm for computing the hull of n points in 3-space
in O(n lg n) worst-case time. Shamos [8] was the first to
show an f~(n lg n) lower bound on the problem of
computing the polygon that is the edge of the planar
convex hull; his proof makes crucial use of the fact that
the vertices of the polygon must appear in sorted order.
Yao [10] later achieved a much stronger result showing
that merely identifying the extremal points of the hull
requires f~(n lg n) time. Because these lower and upper
bounds together establish the worst-case complexity of
the problem to within a constant factor, researchers have
concentrated on other aspects of the problem. A particularly interesting problem is the expected complexity of
computing convex hulls. Bentley and Shamos [1] showed
that the hull of n points in 2-space or 3-space could be
computed in linear time for a large class of probability
distributions. Devroye [3] extended their technique to a
linear-time algorithm in d-space but for a much smaller
class of distributions. Many other aspects of convex hulls
have been discussed in the literature. Preparata [6], for
example, discusses a real-time on-line algorithm for constructing planar hulls.
In this paper we take an approach to the problem of
computing convex hulls that has not been explored in
any of the above papers. Specifically, we will investigate
the problem of computing an approximate convex hull.
Our algorithm runs in worst-case time linear in the
number of input points, and computes an approximation
that is arbitrarily close to the true hull. This algorithm is
useful for applications that must have rapid solutions,
even at the expense of accuracy. Thus the algorithm is
particularly appropriate, for example, in statistical applications where observations are not exact but are
known to a well-defined precision; there is no reason to
spend a great deal of time getting an accurate answer if
that accuracy is not really present in the underlying data!
Communications
of
the ACM
January 1982
Volume 25
Number 1
Our technique is illustrated in the planar case in Sec. 2,
and we examine applications of the algorithm to a number of problems in planar computational geometry in
Sec. 3. Extension of these methods to higher dimensions
is then discussed in Sec. 4, and conclusions are offered
in Sec. 5.
Fig. 2. The Planar Approximate Hull Algorithm. (a) Place points in
bin. (b) Find extremes in bins. (c) Build hull of extremes.
(a)
•
2. The Planar Algorithm
In this section we study an algorithm for computing
an approximate convex hull of a planar point set. The
basic idea of the algorithm is simple: we judiciously
sample some subset of the points and then use the convex
hull of the sample as an approximation to the convex
hull of the points.
The particular sampling scheme that we use is illustrated in Figure 2. The first step of our algorithm is to
find the minimum and the maximum values in the x
dimension, and then divide the area between them into
k equally spaced strips; see Figure 2(a). Next, for each
strip, we find the two points in that strip with minimum
and maximum y value (we call this set of points S); see
Figure 2(b). We also include in S the points with extreme
x values, and if there are several points with extreme x
values, we include both their maximum and minimum
by y value; there are therefore a maximum of 2k + 4
points in S. Finally, we construct the convex hull of S,
and use that as the approximate hull of the set; see
Figure 2(c). Note that the resulting hull is indeed only
an approximation; one of the points of the set lies outside
it.
The method that we just sketched informally is easy
to implement as a computer program. Finding the minimum and maximum x values is trivial, and the k-strips
are implemented as a (k + 2) element array (the zeroth
and (k + 1)st elements of the array contain only the
points with minimum ~ind maximum x value and are
present primarily as a programming convenience.) To
see in which strip a particular point falls, we subtract the
minimum x value from the point's x value, multiply that
difference by the scaling factor of l / k times the difference of the x extremes and take the floor of the product
as the strip number. To find the minimum and maximum
in each strip, we iterate through the point set, and for
each point check to see if it exceeds either the current
maximum or minimum in the strip, and, if so, update a
value in the array. We then use one of the convex hull
algorithms described by Shamos [8] to compute the hull
of S. (The most natural algorithm for this problem is a
modified version of Graham's [4] algorithm which scans
in right-to-left rather than counter-clockwise order.)
The performance of this program is easy to analyze.
The space that it takes is proportional to k (if we assume
the input points are already present.) Finding the minimum and maximum x values requires 8(n) time, and
initializing the array can be accomplished in O(k) time.
Finding the extremes in each strip requires 8(n) time,
65
Q
(b)
r
I
i"
"I
*-I
•
I
(c)
and the convex hull of S can be found in O(k) time
(because the, at most, 2k + 4 points in S are already
sorted by x value.) The running time of the entire
program is therefore O(n + k).
We have just shown that we can rapidly construct
some approximation to the hull, but before we can use it
we must be precise about how accurate an approximation
it is. One fact is easy to observe: because it uses only
points in the input set as hull points, it is a conservative
approximation in the sense that every point within the
approximate hull is also within the true hull. We must
now answer the question, "how far can a point be outside
the approximate hull if it is inside the true hull?" Note
that the maximum distance is realized by one of the
points in the input set. Let us assume that the distance
between the minimum and maximum x values is unity;
this implies that the width of each strip is 1/k. It is now
easy to prove the following fact.
Communications
of
the ACM
January 1982
Volume 25
Number 1
Fact 1. Any point P in the point set that is not inside
the approximate convex hull is within distance l / k of the
hull.
values of the points in S need not be stored), and has an
error radius of l/2k, rather than 1/k.
To prove this, consider the strip in which P falls. Because
P is outside the approximate hull, it cannot have either
the minimum or maximum y value of points in the strip;
P's y value must therefore lie between these two extremes. Now draw a horizontal line from P to the hull;
since the strip is l / k wide, this line can be at most that
long, and the fact holds.
This approximation error analysis can be used in
either an absolute or a relative sense. If one wishes to
have an approximation accurate to within x units, then
one can use strips of width x. For a relative error analysis,
we can study the ratio of the error induced by the
approximation to the true diameter 1 of the point set,
which we will denote by D. We will see in the next
section that this is a natural measure of relative error.
Note that D is at least as great as the two points furthest
apart in the x dimension. We can now use Fact 1 (which
assumed that the distance between x extremes is unity)
to prove the following.
3. Applications of the Planar Algorithm
Fact 2. The relative error of a k-strip approximation
is at most 1/k. That is, anypoint in thepoint set that is not
inside the approximate hull is within distance D / k of the
hull, where D is the diameter of the point set.
This fact is an immediate consequence of Fact 1. We will
investigate the use of this relative error approximation in
the next section.
The strip method of this section can be used to yield
various approximations. The approximation we just described yields an approximate hull that is a subset of the
true hull, but it is not hard to modify it to yield a superset
of the true hull. To do this, we just "move the x value"
of every maximal point P to be outermost in its strip (i.e.,
away from the point realizing the maximum y value in
the point set if P is the maximum point in its strip, and
similarly for the minimum points.) This can be viewed
as covering the points in each strip with a rectangle
whose left and right sides are those of the strip and
whose top and bottom are given by the minimum and
maximum points. The approximate hull is then taken to
be the convex hull of these covering rectangles. Because
the rectangles contain all the points in their strips, the
approximate convex hull properly contains the true convex hull. An error analysis similar to that above shows
that any point in the approximate hull but not in the
true hull must be within distance l / k of the true hull.
Yet another variation of the strip approach is to
"align" all points to be in the center of their respective
strip. The resulting hull is then provably neither a superset nor a subset of the true hull. This method, however,
is easier to program, more storage-efficient (since the x
i The diameter of a point set is defmed as the m a x i m u m distance
between any two points in the set, and is realized by two points on the
convex hull of the set (see [8].)
66
In the last section we studied the approximation
algorithm for convex hulls in an abstract setting; in this
section we study its application in a more concrete
context. Specifically, we study a number of computational problems that can be quickly solved by using our
approximation algorithm as though it produced the exact
hull.
The first problem that we will examine is that of
computing the diameter of a planar point set. It is easy
to prove that the diameter is realized by two points on
the convex hull, and Shamos [8] has shown that, given
the hull, the diameter can be found in time linear in the
number of hull points. We can readily employ our
approximate hull algorithm in this context: we first find
the approximate hull, and then use its diameter as an
approximation to the diameter of the entire set. This
requires O(N + k) time for constructing the approximate
hull and O(k) time for finding its diameter, for a total of
O(N + k) time.
• To measure the efficacy of the approximation, let us
assume that we did not find the true diameter D (which
is realized by two points outside the approximate convex
hull) but rather that we found the approximate diameter
A, where A < D. By Fact 2, we know that both of the
points realizing the true diameter are within distance
D / k of the approximate hull. Applying this observation
and the triangle inequality twice, we know that
D <_A + 2D/k,
which implies that
D - A <_ 2D/k.
We will take as a measure of our approximation the ratio
of the difference of the approximate and the true diameters to the true diameter, which is
(D - A ) / D <_ 2/k,
from the above equation. Thus we know that the approximate diameter produced by this method is a lower
bound on the true diameter, and not more than 2D/k
less than the true. We can therefore compute E-approximate diameters in O(n + l/e) time and O(l/e) storage.
To illustrate the approximate diameter algorithm, we
consider the problem of finding the diameter of a point
set to within one percent accuracy. For this accuracy, we
let k be 200, giving a maximum of 404 points in the 200
strips. A Pascal program implementing the algorithm
found the approximate diameter of a 25,000 element
point set in 600 milliseconds of PDP-KL10 time. By way
of comparison, a carefully coded Pascal program that
finds the diameter by computing all ( 2 )
interpoint
distances and taking the maximum required approxiCommunications
of
the A C M
January 1982
Volume 25
Number 1
mately an hour and a half of CPU time.
The diameter approximation algorithm we have just
seen produces a lower bound on the true diameter by
exhibiting two points in the set with distance provably
close to the true diameter. In some applications, it might
be more desirable to have an upper bound on the diameter, and this can be easily achieved by the method
described at the end o f Sec. 2. (This involves moving
each point "outward" in its strip, away from the point
realizing minimum or maximum y value.) Finally, an
approximation that is neither an upper nor lower bound
but is within 1/k relative accuracy can be found by using
only k strips and "placing" each point at the center of its
cell. This approach is especially appealing in applications
because the x values of the sample points need not be
stored (reducing the storage required from approximately 4k reals to 2k reals.)
Two algorithms for computing approximate diameter
that are substantially different than the above have been
given by Brown [2, Sec. 6.1]. His algorithms are based
on geometric transforms and require the computation of
trigonometric functions. For a given sample size, his
approach yields a slight increase in accuracy over ours,
but at a substantial increase in the constant factor of the
run time.
We turn now from the diameter problem to the
problem of convex hull searching. Precisely, we are given
a planar set of points and must organize it so that we can
quickly tell whether a new point is inside or outside the
convex hull of the set. We will investigate an approximation algorithm for convex hull searching that returns
one of the three answers "yes, the point is definitely
inside the hull," "no, the point is definitely outside the
hull," or "I do not know whether the point is inside or
outside the hull, but it is within E of the hull boundary."
Our first algorithm is based on the approximate hull
algorithm that builds a subset of the true hull. We
construct the hull as before, in O(n + k) time. Consider
now the shape of the intersection of the approximate
hull and any particular vertical strip used to make the
hull; the top of the strip is bounded by one or two lines,
as is the bottom. We can therefore describe either the
top or the bottom by at most four reals (giving the values
of the two y intercepts, and the possible point at which
the value changes.) To perform a hull search for point P,
we first locate the strip in which it lies in constant time
by subtracting, multiplying, and taking the floor function. We then use the stored values to see if it lies within
the approximate hull, and if so, we correctly report that
the point is inside the true hull. If it is further from the
hull than the guaranteed error radius, then we correctly
report that the point is outside the hull. If neither of
these conditions holds then we respond that we do not
know whether the point is inside or outside the hull. This
method requires O(n + k) time to organize the set and
constant time to answer a query. (Note that just as in the
case of computing diameter, we could use other approximations to the hull.)
As a final application o f these approximation techniques, we mention the problem of computing allfurthest
neighbors in a point set. Specifically, for each point in the
set we must tell which point in the set is furthest from it.
Since such a furthest neighbor must be a convex hull
point, given an approximate convex hull of k points we
can locate the approximate furthest neighbor of a new
point in lg k time. 2 This method gives us the theoretical
result that we can find all furthest neighbors in a set in
O[(n + k) lg k] time, with each reported neighbor within
D/k o f the true furthest neighbor, where D is the diameter of the point set.
67
Communications
of
the ACM
4. Extensions to Higher Dimensions
In previous sections we have studied approximation
algorithms for planar convex hulls; in this section we
turn our attention to convex hulls in d-space, where d
> 2. We first study the case that d = 3, and then consider
greater values of d.
Our algorithm for computing an approximate convex
hull of a set of points in 3-space is an extension of the
planar algorithm described in Sec. 2. An initial sampling
pass finds the minimum and maximum values in both
the x and the y dimensions; the resulting rectangle is
then partitioned into (at most) a (k + 2) × (k + 2) grid
o f squares with lines parallel to the x and y axes. (We
use k of the squares in each dimension for "real" points
in squares, and the two squares on the end for the
extreme values.) Note that one of the dimensions can be
less than k + 2 if the point set does not happen to occupy
a square region. In each square of the grid we find the
points with minimum and maximum z values; the resuiting set o f points, called S, consists of at most 2(k +
2) 2 points. Finally, we construct the convex hull of S and
use it as the approximate hull of the original set. The
hull is constructed using Preparata and Hong's [7] general hull algorithm for point sets in 3-space. Notice the
striking difference with respect to the two-dimensional
method, where we were able to take advantage of the
known ordering of the points in the x dimension: we do
not exploit the grid structure of the points to yield a
faster algorithm, but rather use a general algorithm. An
intriguing question is whether the regular placement of
the points of S in a two-dimensional grid can be used to
obtain an algorithm more efficient than the general one.
The analysis of the three-dimensional algorithm is
somewhat different than that of the two-dimensional
algorithm. The set S can be found in time O(n+ k 2) but
computing the convex hull of S using the general algorithm requires time O(k 2 lg k 2) rather than O(k 2) since
we are unable to exploit any natural ordering of the grid
points as we were in the two-dimensional case. Thus, the
We are using here the fact that a set of M points can be processed
in O(M lg M) time to yield a furthest-neighborsearch time of O(lg
M). Such an algorithm uses the furthest neighbor graph as discussed
in [8, Sec. 6.6] and the searching algorithm of Lipton and Tarjan [5].
January 1982
Volume 25
Number 1
total running time of the algorithm is O(n + k 2 lg k).
Using Preparata and Hong's algorithm, the storage space
is O(k 2) beyond the space used for the n input points.
To determine the efficacy of the method, take as
unity the larger of the x or y span of the originally given
points in S. Since the diagonal of a grid cell has length
2~/2/k, this value is an upper bound of the distance from
the approximate hull to any point in the original set that
is external to the hull. As in Sec. 3 this bound can be
used in either an absolute or relative sense. As before,
the approximate convex hull found by this method is a
conservative one; other types of approximation are possible, by obvious generalizations of the cases discussed
in Sec. 2.
We now turn our attention to applications of this
method. A simple way to find an approximate diameter
computes all pairs of diameters in S and returns the
maximum; this gives a O(n + k 4) approximate diameter
algorithm. Unfortunately, the best of the known general
algorithms for computing the diameter of a three-dimensional set of M points [9] runs in time O[(M lg M)ls],
and thus yields an O[n + (k 2 lg k) 18] approximate
diameter algorithm. Here again, it remains an intriguing
question whether the regular grid spacing of the approximate hull points can be exploited in some way to obtain
a more efficient diameter algorithm.
A similarly disappointing conclusion is reached with
regard to the hull searching problem. Indeed, all that can
be said in general is that the polyhedral surface defining
the hull is a triangulation of a set of grid points. It follows
that some cells of this grid may intersect as many as
O(k 2) faces of the polyhedron, so that no apparent
advantage originates from locating a target point in a
cell of the grid. It appears therefore that hull searching
must be solved by the standard technique, which transforms the problem under consideration to a planar point
location and is solved in O(lg k) time (see, for example,
Lipton and Tarjan [5].) Note that although lg k time is
high compared to the constant time of planar approximate hull searching, it can be very small compared to lg
N.
The method that we have described to compute
approximate hulls in 2- and 3-space can easily be extended to higher dimensions. In d-space we project the
d-dimensional point set onto a (d - 1)-dimensional array
of (d - 1) cubes with (at most) k + 2 cubes in each
dimension. The approximate hull is constructed from the
set S of minimum and maximum points in each grid cell,
which is bounded above in size by 2(k + 2)d-1 points.
Any point inside the set yet outside the approximate hull
is at most distance (d - l)~/2/k from the hull, assuming
unity as the maximum distance between coordinate extremes. Note that a major drawback of this approach is
that S can be very large; we will consider the case of
ensuring one percent accuracy for a set of points in 4space. We must have (4 - 1)~/2/k < 0.01, so k _ 173.
Because the size of S is 2(k + 2) 3, it contains approximately 1.04 × 107 points!
68
5. Conclusions
We have discussed an approach to computing convex
hulls of point sets that quickly computes approximate
hulls of controlled accuracy. Precisely, the accuracy of
the result can be made to grow quickly with the expended
computational work. The approach yields efficient
planar algorithms for hull construction, diameter finding,
and hull searching that are quite practical. These algorithms are also novel in a theoretical context; few approximate algorithms have been studied for problems
known to require polynomial worst-case time.
A limitation of the approach is that it fails to yield
algorithms of comparable efficiency and simplicity in
three and higher dimensions. In this connection, there
remain some intriguing open problems regarding point
sets that project onto grids. Is the construction of the
convex hull of a three-dimensional point set that projects
to a rectangular grid easier than the general case? Are
such sets easier for the determination of the diameter, or
for the problem of hull searching?
In addition to the particular problems we have studied, the more general contribution of this paper is the
entire approach of giving approximate answers. The
technique we used to achieve an approximate answer is
straightforward and applicable in many contexts: we
solve a smaller problem comprised of a judiciously chosen sample of the input. The number of points in the
sample controls the accuracy of the approximation, and
the way the sample is selected controls the type of
approximation (i.e., liberal or conservative.) These
methods can prove to be a useful tool for practicing
programmers.
Acknowledgment. The authors would like to thank
M.I. Shamos for suggesting the idea of approximation
algorithms for the problems discussed in this paper.
Received 4/80; revised 10/80; accepted 6/81
References
!. Bentley, J.L. and Shamos, M.I. Divide and conquer for linear
expected time. Information Processing Lett. 7, 2, (Feb. 1978), 87-91.
2. Brown, K. Q. Geometric transforms for fast geometric
algorithms. Ph.D. Thesis, Carnegie-Mellon University, December
1979. Carnegie-Mellon Computer Science Tech. Rept. CMU-CS-80101.
3. Devroye, L. A note on finding convex hulls via maximal vectors.
Information Processing Letts. 11, 1, 53-56.
4. Graham, R.L. An efficient algorithm for determining the convex
hull of a finite planar set. Information Processing Lett. 1, 132-133.
5. Lipton, R.J. and Tarjan, R.E. Application of a planar separator
theorem. 18th Syrup. Foundations of Computer Science (Oct. 1977),
IEEE, pp. 162-170.
6. Preparata, F.P. An optimal real-time algorithm for convex hulls.
Comm. ACM 22, 7, (July 1979). 402-405.
7. Preparata, F.P. and Hong, S.J. Convex hulls of finite sets in two
and three dimensions. Comm. ACM 20, 2, (Feb. 1977), 87-93.
8. Shamos, M.I. Computational geometry. Unpublished Ph.D.
Thesis, Yale University (May 1978), New Haven, Connecticut.
9. Yao, A.C. On constructing minimum spanning trees in kdimensional space and related problems. Stanford University
Computer Science Department Report STAN-CS-77-642, (Dec.
1977).
10. Yao, A. C. A lower bound to finding convex hulls. Stanford
University Computer Science Department Report STAN-CS-79-733,
(April 1979).
Communications
of
the ACM
January 1982
Volume 25
Number 1
Download