Technical Notes Programming Techniques and Data Structures Fig. 1. A Point Set and its Convex Hull. M . Douglas Mcllroy* Editor Approximation Algorithms for Convex Hulls LL/ Jon Louis Bentley and Mark G. Faust Carnegie-Mellon University Franco P. Preparata University of Illinois at Urbana-Champaign The problem of constructing the convex hull of a finite point set in a Euclidean space arises in many applications. In this paper we study a set of algorithms for constructing approximate convex hulls. We show that an E-approximate hull of N points in the plane can be constructed in O(n + I/e) time. The planar algorithm has been implemented and is very fast on point sets that arise in practice. The method can be generalized to compute hulls of point sets in higher dimensional spaces. CR Categories and Subject Descriptors: F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems--geometricalproblems and computations; G.2.1 [Discrete Mathematics]: Com- binatorics--combinatorial algorithms. General Terms: Algorithms, Design Additional Key Words and Phrases: convex hulls, approximation algorithms I. Introduction The convex hull of a point set in Euclidean d-space is defined to be the smallest convex set containing the points; a set of points in the plane and its convex hull are illustrated in Figure 1. Because it is a precise characterization of the boundary of a point set, the convex hull is used in many applications; for example, Shamos [8] discusses its use in such areas as robust estimation, Chebyshev approximation, least-squares isotonic regression, and clustering. In addition to its inherent uses, computation of the convex hull also arises as an interThis research was supported in part by the Office of Naval Research under Contract N00014-76-C-0370, in part by the National Science Foundation under Grant MCS-78-13642, and in part by the Joint Service Electronics Program under Contract N00014-79-C-0424. Authors' Present Addresses: Jon Louis Bentley and Mark G. Faust, Departments of Computer Science and Mathematics, Carnegie-Mellon University, Pittsburgh, PA 15213; Franco P. Preparata, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801. * Former editor of Programming Techniques and Data Structures of which Ellis Horowitz is the current editor. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1982 ACM 0001-0782/82/0100--0064 $00.75. 64 mediate step in many problems in computational geometry. For all these reasons, the computation of the convex hull of a finite point set is an interesting algorithmic problem. Much work has recently been devoted to algorithms for computing convex hulls. Graham [4] proposed the first algorithm for computing the hull of n planar points in O(n lg n) worst-case time. (A number of different worst-case algorithms have since been proposed for this problem; see [8].) Preparata and Hong [7] later gave an algorithm for computing the hull of n points in 3-space in O(n lg n) worst-case time. Shamos [8] was the first to show an f~(n lg n) lower bound on the problem of computing the polygon that is the edge of the planar convex hull; his proof makes crucial use of the fact that the vertices of the polygon must appear in sorted order. Yao [10] later achieved a much stronger result showing that merely identifying the extremal points of the hull requires f~(n lg n) time. Because these lower and upper bounds together establish the worst-case complexity of the problem to within a constant factor, researchers have concentrated on other aspects of the problem. A particularly interesting problem is the expected complexity of computing convex hulls. Bentley and Shamos [1] showed that the hull of n points in 2-space or 3-space could be computed in linear time for a large class of probability distributions. Devroye [3] extended their technique to a linear-time algorithm in d-space but for a much smaller class of distributions. Many other aspects of convex hulls have been discussed in the literature. Preparata [6], for example, discusses a real-time on-line algorithm for constructing planar hulls. In this paper we take an approach to the problem of computing convex hulls that has not been explored in any of the above papers. Specifically, we will investigate the problem of computing an approximate convex hull. Our algorithm runs in worst-case time linear in the number of input points, and computes an approximation that is arbitrarily close to the true hull. This algorithm is useful for applications that must have rapid solutions, even at the expense of accuracy. Thus the algorithm is particularly appropriate, for example, in statistical applications where observations are not exact but are known to a well-defined precision; there is no reason to spend a great deal of time getting an accurate answer if that accuracy is not really present in the underlying data! Communications of the ACM January 1982 Volume 25 Number 1 Our technique is illustrated in the planar case in Sec. 2, and we examine applications of the algorithm to a number of problems in planar computational geometry in Sec. 3. Extension of these methods to higher dimensions is then discussed in Sec. 4, and conclusions are offered in Sec. 5. Fig. 2. The Planar Approximate Hull Algorithm. (a) Place points in bin. (b) Find extremes in bins. (c) Build hull of extremes. (a) • 2. The Planar Algorithm In this section we study an algorithm for computing an approximate convex hull of a planar point set. The basic idea of the algorithm is simple: we judiciously sample some subset of the points and then use the convex hull of the sample as an approximation to the convex hull of the points. The particular sampling scheme that we use is illustrated in Figure 2. The first step of our algorithm is to find the minimum and the maximum values in the x dimension, and then divide the area between them into k equally spaced strips; see Figure 2(a). Next, for each strip, we find the two points in that strip with minimum and maximum y value (we call this set of points S); see Figure 2(b). We also include in S the points with extreme x values, and if there are several points with extreme x values, we include both their maximum and minimum by y value; there are therefore a maximum of 2k + 4 points in S. Finally, we construct the convex hull of S, and use that as the approximate hull of the set; see Figure 2(c). Note that the resulting hull is indeed only an approximation; one of the points of the set lies outside it. The method that we just sketched informally is easy to implement as a computer program. Finding the minimum and maximum x values is trivial, and the k-strips are implemented as a (k + 2) element array (the zeroth and (k + 1)st elements of the array contain only the points with minimum ~ind maximum x value and are present primarily as a programming convenience.) To see in which strip a particular point falls, we subtract the minimum x value from the point's x value, multiply that difference by the scaling factor of l / k times the difference of the x extremes and take the floor of the product as the strip number. To find the minimum and maximum in each strip, we iterate through the point set, and for each point check to see if it exceeds either the current maximum or minimum in the strip, and, if so, update a value in the array. We then use one of the convex hull algorithms described by Shamos [8] to compute the hull of S. (The most natural algorithm for this problem is a modified version of Graham's [4] algorithm which scans in right-to-left rather than counter-clockwise order.) The performance of this program is easy to analyze. The space that it takes is proportional to k (if we assume the input points are already present.) Finding the minimum and maximum x values requires 8(n) time, and initializing the array can be accomplished in O(k) time. Finding the extremes in each strip requires 8(n) time, 65 Q (b) r I i" "I *-I • I (c) and the convex hull of S can be found in O(k) time (because the, at most, 2k + 4 points in S are already sorted by x value.) The running time of the entire program is therefore O(n + k). We have just shown that we can rapidly construct some approximation to the hull, but before we can use it we must be precise about how accurate an approximation it is. One fact is easy to observe: because it uses only points in the input set as hull points, it is a conservative approximation in the sense that every point within the approximate hull is also within the true hull. We must now answer the question, "how far can a point be outside the approximate hull if it is inside the true hull?" Note that the maximum distance is realized by one of the points in the input set. Let us assume that the distance between the minimum and maximum x values is unity; this implies that the width of each strip is 1/k. It is now easy to prove the following fact. Communications of the ACM January 1982 Volume 25 Number 1 Fact 1. Any point P in the point set that is not inside the approximate convex hull is within distance l / k of the hull. values of the points in S need not be stored), and has an error radius of l/2k, rather than 1/k. To prove this, consider the strip in which P falls. Because P is outside the approximate hull, it cannot have either the minimum or maximum y value of points in the strip; P's y value must therefore lie between these two extremes. Now draw a horizontal line from P to the hull; since the strip is l / k wide, this line can be at most that long, and the fact holds. This approximation error analysis can be used in either an absolute or a relative sense. If one wishes to have an approximation accurate to within x units, then one can use strips of width x. For a relative error analysis, we can study the ratio of the error induced by the approximation to the true diameter 1 of the point set, which we will denote by D. We will see in the next section that this is a natural measure of relative error. Note that D is at least as great as the two points furthest apart in the x dimension. We can now use Fact 1 (which assumed that the distance between x extremes is unity) to prove the following. 3. Applications of the Planar Algorithm Fact 2. The relative error of a k-strip approximation is at most 1/k. That is, anypoint in thepoint set that is not inside the approximate hull is within distance D / k of the hull, where D is the diameter of the point set. This fact is an immediate consequence of Fact 1. We will investigate the use of this relative error approximation in the next section. The strip method of this section can be used to yield various approximations. The approximation we just described yields an approximate hull that is a subset of the true hull, but it is not hard to modify it to yield a superset of the true hull. To do this, we just "move the x value" of every maximal point P to be outermost in its strip (i.e., away from the point realizing the maximum y value in the point set if P is the maximum point in its strip, and similarly for the minimum points.) This can be viewed as covering the points in each strip with a rectangle whose left and right sides are those of the strip and whose top and bottom are given by the minimum and maximum points. The approximate hull is then taken to be the convex hull of these covering rectangles. Because the rectangles contain all the points in their strips, the approximate convex hull properly contains the true convex hull. An error analysis similar to that above shows that any point in the approximate hull but not in the true hull must be within distance l / k of the true hull. Yet another variation of the strip approach is to "align" all points to be in the center of their respective strip. The resulting hull is then provably neither a superset nor a subset of the true hull. This method, however, is easier to program, more storage-efficient (since the x i The diameter of a point set is defmed as the m a x i m u m distance between any two points in the set, and is realized by two points on the convex hull of the set (see [8].) 66 In the last section we studied the approximation algorithm for convex hulls in an abstract setting; in this section we study its application in a more concrete context. Specifically, we study a number of computational problems that can be quickly solved by using our approximation algorithm as though it produced the exact hull. The first problem that we will examine is that of computing the diameter of a planar point set. It is easy to prove that the diameter is realized by two points on the convex hull, and Shamos [8] has shown that, given the hull, the diameter can be found in time linear in the number of hull points. We can readily employ our approximate hull algorithm in this context: we first find the approximate hull, and then use its diameter as an approximation to the diameter of the entire set. This requires O(N + k) time for constructing the approximate hull and O(k) time for finding its diameter, for a total of O(N + k) time. • To measure the efficacy of the approximation, let us assume that we did not find the true diameter D (which is realized by two points outside the approximate convex hull) but rather that we found the approximate diameter A, where A < D. By Fact 2, we know that both of the points realizing the true diameter are within distance D / k of the approximate hull. Applying this observation and the triangle inequality twice, we know that D <_A + 2D/k, which implies that D - A <_ 2D/k. We will take as a measure of our approximation the ratio of the difference of the approximate and the true diameters to the true diameter, which is (D - A ) / D <_ 2/k, from the above equation. Thus we know that the approximate diameter produced by this method is a lower bound on the true diameter, and not more than 2D/k less than the true. We can therefore compute E-approximate diameters in O(n + l/e) time and O(l/e) storage. To illustrate the approximate diameter algorithm, we consider the problem of finding the diameter of a point set to within one percent accuracy. For this accuracy, we let k be 200, giving a maximum of 404 points in the 200 strips. A Pascal program implementing the algorithm found the approximate diameter of a 25,000 element point set in 600 milliseconds of PDP-KL10 time. By way of comparison, a carefully coded Pascal program that finds the diameter by computing all ( 2 ) interpoint distances and taking the maximum required approxiCommunications of the A C M January 1982 Volume 25 Number 1 mately an hour and a half of CPU time. The diameter approximation algorithm we have just seen produces a lower bound on the true diameter by exhibiting two points in the set with distance provably close to the true diameter. In some applications, it might be more desirable to have an upper bound on the diameter, and this can be easily achieved by the method described at the end o f Sec. 2. (This involves moving each point "outward" in its strip, away from the point realizing minimum or maximum y value.) Finally, an approximation that is neither an upper nor lower bound but is within 1/k relative accuracy can be found by using only k strips and "placing" each point at the center of its cell. This approach is especially appealing in applications because the x values of the sample points need not be stored (reducing the storage required from approximately 4k reals to 2k reals.) Two algorithms for computing approximate diameter that are substantially different than the above have been given by Brown [2, Sec. 6.1]. His algorithms are based on geometric transforms and require the computation of trigonometric functions. For a given sample size, his approach yields a slight increase in accuracy over ours, but at a substantial increase in the constant factor of the run time. We turn now from the diameter problem to the problem of convex hull searching. Precisely, we are given a planar set of points and must organize it so that we can quickly tell whether a new point is inside or outside the convex hull of the set. We will investigate an approximation algorithm for convex hull searching that returns one of the three answers "yes, the point is definitely inside the hull," "no, the point is definitely outside the hull," or "I do not know whether the point is inside or outside the hull, but it is within E of the hull boundary." Our first algorithm is based on the approximate hull algorithm that builds a subset of the true hull. We construct the hull as before, in O(n + k) time. Consider now the shape of the intersection of the approximate hull and any particular vertical strip used to make the hull; the top of the strip is bounded by one or two lines, as is the bottom. We can therefore describe either the top or the bottom by at most four reals (giving the values of the two y intercepts, and the possible point at which the value changes.) To perform a hull search for point P, we first locate the strip in which it lies in constant time by subtracting, multiplying, and taking the floor function. We then use the stored values to see if it lies within the approximate hull, and if so, we correctly report that the point is inside the true hull. If it is further from the hull than the guaranteed error radius, then we correctly report that the point is outside the hull. If neither of these conditions holds then we respond that we do not know whether the point is inside or outside the hull. This method requires O(n + k) time to organize the set and constant time to answer a query. (Note that just as in the case of computing diameter, we could use other approximations to the hull.) As a final application o f these approximation techniques, we mention the problem of computing allfurthest neighbors in a point set. Specifically, for each point in the set we must tell which point in the set is furthest from it. Since such a furthest neighbor must be a convex hull point, given an approximate convex hull of k points we can locate the approximate furthest neighbor of a new point in lg k time. 2 This method gives us the theoretical result that we can find all furthest neighbors in a set in O[(n + k) lg k] time, with each reported neighbor within D/k o f the true furthest neighbor, where D is the diameter of the point set. 67 Communications of the ACM 4. Extensions to Higher Dimensions In previous sections we have studied approximation algorithms for planar convex hulls; in this section we turn our attention to convex hulls in d-space, where d > 2. We first study the case that d = 3, and then consider greater values of d. Our algorithm for computing an approximate convex hull of a set of points in 3-space is an extension of the planar algorithm described in Sec. 2. An initial sampling pass finds the minimum and maximum values in both the x and the y dimensions; the resulting rectangle is then partitioned into (at most) a (k + 2) × (k + 2) grid o f squares with lines parallel to the x and y axes. (We use k of the squares in each dimension for "real" points in squares, and the two squares on the end for the extreme values.) Note that one of the dimensions can be less than k + 2 if the point set does not happen to occupy a square region. In each square of the grid we find the points with minimum and maximum z values; the resuiting set o f points, called S, consists of at most 2(k + 2) 2 points. Finally, we construct the convex hull of S and use it as the approximate hull of the original set. The hull is constructed using Preparata and Hong's [7] general hull algorithm for point sets in 3-space. Notice the striking difference with respect to the two-dimensional method, where we were able to take advantage of the known ordering of the points in the x dimension: we do not exploit the grid structure of the points to yield a faster algorithm, but rather use a general algorithm. An intriguing question is whether the regular placement of the points of S in a two-dimensional grid can be used to obtain an algorithm more efficient than the general one. The analysis of the three-dimensional algorithm is somewhat different than that of the two-dimensional algorithm. The set S can be found in time O(n+ k 2) but computing the convex hull of S using the general algorithm requires time O(k 2 lg k 2) rather than O(k 2) since we are unable to exploit any natural ordering of the grid points as we were in the two-dimensional case. Thus, the We are using here the fact that a set of M points can be processed in O(M lg M) time to yield a furthest-neighborsearch time of O(lg M). Such an algorithm uses the furthest neighbor graph as discussed in [8, Sec. 6.6] and the searching algorithm of Lipton and Tarjan [5]. January 1982 Volume 25 Number 1 total running time of the algorithm is O(n + k 2 lg k). Using Preparata and Hong's algorithm, the storage space is O(k 2) beyond the space used for the n input points. To determine the efficacy of the method, take as unity the larger of the x or y span of the originally given points in S. Since the diagonal of a grid cell has length 2~/2/k, this value is an upper bound of the distance from the approximate hull to any point in the original set that is external to the hull. As in Sec. 3 this bound can be used in either an absolute or relative sense. As before, the approximate convex hull found by this method is a conservative one; other types of approximation are possible, by obvious generalizations of the cases discussed in Sec. 2. We now turn our attention to applications of this method. A simple way to find an approximate diameter computes all pairs of diameters in S and returns the maximum; this gives a O(n + k 4) approximate diameter algorithm. Unfortunately, the best of the known general algorithms for computing the diameter of a three-dimensional set of M points [9] runs in time O[(M lg M)ls], and thus yields an O[n + (k 2 lg k) 18] approximate diameter algorithm. Here again, it remains an intriguing question whether the regular grid spacing of the approximate hull points can be exploited in some way to obtain a more efficient diameter algorithm. A similarly disappointing conclusion is reached with regard to the hull searching problem. Indeed, all that can be said in general is that the polyhedral surface defining the hull is a triangulation of a set of grid points. It follows that some cells of this grid may intersect as many as O(k 2) faces of the polyhedron, so that no apparent advantage originates from locating a target point in a cell of the grid. It appears therefore that hull searching must be solved by the standard technique, which transforms the problem under consideration to a planar point location and is solved in O(lg k) time (see, for example, Lipton and Tarjan [5].) Note that although lg k time is high compared to the constant time of planar approximate hull searching, it can be very small compared to lg N. The method that we have described to compute approximate hulls in 2- and 3-space can easily be extended to higher dimensions. In d-space we project the d-dimensional point set onto a (d - 1)-dimensional array of (d - 1) cubes with (at most) k + 2 cubes in each dimension. The approximate hull is constructed from the set S of minimum and maximum points in each grid cell, which is bounded above in size by 2(k + 2)d-1 points. Any point inside the set yet outside the approximate hull is at most distance (d - l)~/2/k from the hull, assuming unity as the maximum distance between coordinate extremes. Note that a major drawback of this approach is that S can be very large; we will consider the case of ensuring one percent accuracy for a set of points in 4space. We must have (4 - 1)~/2/k < 0.01, so k _ 173. Because the size of S is 2(k + 2) 3, it contains approximately 1.04 × 107 points! 68 5. Conclusions We have discussed an approach to computing convex hulls of point sets that quickly computes approximate hulls of controlled accuracy. Precisely, the accuracy of the result can be made to grow quickly with the expended computational work. The approach yields efficient planar algorithms for hull construction, diameter finding, and hull searching that are quite practical. These algorithms are also novel in a theoretical context; few approximate algorithms have been studied for problems known to require polynomial worst-case time. A limitation of the approach is that it fails to yield algorithms of comparable efficiency and simplicity in three and higher dimensions. In this connection, there remain some intriguing open problems regarding point sets that project onto grids. Is the construction of the convex hull of a three-dimensional point set that projects to a rectangular grid easier than the general case? Are such sets easier for the determination of the diameter, or for the problem of hull searching? In addition to the particular problems we have studied, the more general contribution of this paper is the entire approach of giving approximate answers. The technique we used to achieve an approximate answer is straightforward and applicable in many contexts: we solve a smaller problem comprised of a judiciously chosen sample of the input. The number of points in the sample controls the accuracy of the approximation, and the way the sample is selected controls the type of approximation (i.e., liberal or conservative.) These methods can prove to be a useful tool for practicing programmers. Acknowledgment. The authors would like to thank M.I. Shamos for suggesting the idea of approximation algorithms for the problems discussed in this paper. Received 4/80; revised 10/80; accepted 6/81 References !. Bentley, J.L. and Shamos, M.I. Divide and conquer for linear expected time. Information Processing Lett. 7, 2, (Feb. 1978), 87-91. 2. Brown, K. Q. Geometric transforms for fast geometric algorithms. Ph.D. Thesis, Carnegie-Mellon University, December 1979. Carnegie-Mellon Computer Science Tech. Rept. CMU-CS-80101. 3. Devroye, L. A note on finding convex hulls via maximal vectors. Information Processing Letts. 11, 1, 53-56. 4. Graham, R.L. An efficient algorithm for determining the convex hull of a finite planar set. Information Processing Lett. 1, 132-133. 5. Lipton, R.J. and Tarjan, R.E. Application of a planar separator theorem. 18th Syrup. Foundations of Computer Science (Oct. 1977), IEEE, pp. 162-170. 6. Preparata, F.P. An optimal real-time algorithm for convex hulls. Comm. ACM 22, 7, (July 1979). 402-405. 7. Preparata, F.P. and Hong, S.J. Convex hulls of finite sets in two and three dimensions. Comm. ACM 20, 2, (Feb. 1977), 87-93. 8. Shamos, M.I. Computational geometry. Unpublished Ph.D. Thesis, Yale University (May 1978), New Haven, Connecticut. 9. Yao, A.C. On constructing minimum spanning trees in kdimensional space and related problems. Stanford University Computer Science Department Report STAN-CS-77-642, (Dec. 1977). 10. Yao, A. C. A lower bound to finding convex hulls. Stanford University Computer Science Department Report STAN-CS-79-733, (April 1979). Communications of the ACM January 1982 Volume 25 Number 1