Volume 0 (1981), Number 0 pp. 1–13 High Performance GPU-based Proximity Queries using Distance Fields T. Morvan1 and M. Reimers2 and E. Samset1,3 1 Interventional Centre, Faculty of Medicine, University of Oslo, Norway of Mathematics for Applications, University of Oslo, Norway 3 Interventional Centre, Rikshospitalet Medical Centre, Norway 2 Centre Abstract Proximity queries such as closest point computation and collision detection have many applications in computer graphics, including computer animation, physics-based modeling, augmented and virtual reality. We present efficient algorithms for proximity queries between a closed rigid object and an arbitrary, possibly deformable, polygonal mesh. Using graphics hardware to densely sample the distance field of the rigid object over the arbitrary mesh, we compute minimal proximity and collision response information on the GPU using blending and depth buffering, as well as parallel reduction techniques, thus minimizing the readback bottleneck. Although limited to image-space resolution, our algorithm provides high and steady performance when compared with other similar algorithms. Proximity queries between arbitrary meshes with hundreds of thousands of triangles and detailed distance fields of rigid objects are computed in a few milliseconds at high sampling resolution, even in situations with large overlap. Categories and Subject Descriptors (according to ACM CCS): I.3.5 [Computer Graphics]: Geometric Algorithms; I.3.7 [Computer Graphics]: Animation, virtual reality 1. Introduction Proximity algorithms such as collision detection have been subject to intensive research during the past decades. Efficient algorithms have been developed, but many challenges remain, especially in the domain of fast proximity queries between deformable objects. We are motivated by safety aspects in surgical applications such as robot- and imageguided surgery where collision or proximity between robotic arms, surgical instruments and critical anatomical structures has to be detected and relevant response such as haptic feedback must be computed. Usually, these applications involve proximity computations between two rigid models or between one rigid and one deformable model. 1.1. Main Contributions In this paper we present algorithms for collision detection and proximity queries between a rigid closed object and an arbitrary polygonal object such as a deformable triangle mesh. We use a signed distance field as the representation for c The Eurographics Association and Blackwell Publishing 2008. Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. the rigid object and exploit the rasterization and texture mapping capabilities of the Graphics Processing Unit (GPU) to sample the distance field over the arbitrary polygonal mesh. Such a GPU-based sampling limits the precision of our algorithm to framebuffer resolution but allows for fast generation and processing of many samples. Depth buffering, blending and GPU parallel reduction techniques are used to produce compact proximity and collision response information in the form of most penetrating or closest points and global penetration forces and torques in the case of rigid objects. We apply the latter to perform dynamic simulation of rigid objects. We also present two optimizations: one using early rejection of fragments based on their depth values, the other using the geometry shader to reduce the number of rendering passes. Our algorithms offer the following benefits as compared to earlier approaches: Low number of rendering passes: Our algorithms perform at most three rendering passes for each test. Compact and global proximity information: We compute compact collision response and characterization T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields information directly on the GPU: two different techniques compute either the closest or most penetrating point between the objects, or global penetration force and torque characterizing the penetrating volume. Readback minimization The compact information produced by our algorithms leads to minimal readbacks, reducing one of the main bottlenecks in GPU-based algorithms, and allowing our algorithm to be faster than other GPU implementations which read back the whole framebuffer. Small dependence on object configuration: Our approach is less influenced by the relative configuration of objects than bounding volume hierarchy methods since we always process all visible triangles. 2. Background and Related Work 2.1. Collision Detection and Proximity Queries Most collision detection techniques rely on the use of bounding volume hierarchies to quickly cull away groups of primitives from the elementary collision tests. These approaches work particularly well for rigid objects where tight-fitting bounding volumes such as oriented bounding boxes [GM96] can be used. When handling deformable objects, however, the hierarchies have to be updated at each frame, which introduces an additional cost. As a result, less optimal hierarchies are often used in deformable cases, such as axis-aligned bounding boxes (AABBs) [vdB97] or sphere trees [Qui94]. Bounding volume hierarchies become less efficient in situations where many primitives are in contact, since a large number of nodes have to be tested for overlap. An overview of algorithms for collision detection and proximity queries can be found in recent surveys [TKH∗ 05, LM03]. 2.2. Distance Fields A distance field is a scalar field that represents the distance to a geometric object from any point in space. A solid object can be represented as a signed distance field where the sign of the distance function is negative in the interior region of the object and positive outside. Distance fields have been used for proximity query algorithms due to the straightforward distance computations they provide. They have been used to detect collision and proximity between two rigid objects [GBF03], a deformable object and a rigid object [FSG03, BJ07], a rigid object and a particle system [KSW04], and two deformable objects [HZLM02, SGGM06]. An overview of techniques and applications using distance fields can be found in [JBS06]. Some of the main drawbacks of distance fields are their long computation time and their memory requirements. However, recent algorithms using the parallel processing power of graphics hardware are able to compute distance fields at near interactive rates [SPG03, SOM04, SGGM06]. 2.3. GPU-based Algorithms Over the recent years, an increasing number of algorithms using graphics hardware for collision detection have been developed. The vast majority of these algorithms render the objects along a number of selected views. Some techniques involve ray-casting and use the depth and stencil buffers to detect intersections between solid objects [SF91, RMS92, KP03]. Heidelberger et al. [HTG03, HTG04] use the stencil and depth buffers to generate “Layered Depth Images” for closed surfaces and use them to determine volumes of intersection and to perform vertex-in-volume tests. Other techniques compute a distance field along slices of a 3D grid, and then perform collision detection on these slices [HZLM02, SGGM06]. A recent technique builds on these two previous algorithms, and performs N-body distance queries by computing 2nd order Voronoi diagrams on the GPU [SGG∗ 06]. Govindaraju et al. [GLM05] use occlusion-based culling to compute potential colliding sets of objects or primitives. They further added mesh decomposition to their algorithm to perform continuous collision detection between deformable models [GKLM07]. Many of these algorithms involve one or several framebuffer readbacks from the GPU to main memory. Such readbacks are one of the major bottlenecks of current graphics hardware and several techniques try to minimize them [KP03, HTG04, GLM05]. Another drawback of many of these algorithms is a large number of rendering passes. Some GPU-based algorithms do not rely on rendering polygonal meshes for proximity queries. Greß et al. [GGK06] perform collision detection between deformable parametrized surfaces by updating and traversing bounding volumes hierarchies on the GPU using nonuniform stream reduction. Galoppo et al. [GOM∗ 06] perform texture-based collision detection on objects modeled as a rigid core covered by a deformable layer. 2.4. Collision Response One of the main applications of collision detection is physics-based simulation of virtual objects, i.e. computing realistic motions based on the laws of physics. Such simulators are usually divided in three components: dynamic simulation, collision detection and collision response/contact handling. Physics-based simulation of rigid bodies has been extensively studied and several approaches are available to solve the problem of collision response and contact handling: Constraint-based methods avoid interpenetration by computing constrained contact forces [Bar94]. Impulse-based methods as in [MC95] apply impulses on velocities to resolve resting contacts and collisions. c The Eurographics Association and Blackwell Publishing 2008. T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields Penalty-based methods introduce damped springs at points of penetration between the objects, producing a force proportional to the amount of penetration at this point [MW88, HS04, OL05]. Constraint- and impulse-based methods produce stable and realistic motions and avoid interpenetrations. Their complexity is however highly dependent on the number of contacts. Moreover, parallelization of such methods is not straightforward. Penalty-based methods, on the other hand, are simple to implement and easily parallelizable. Nevertheless, they have several drawbacks: they might lead to stiff equations of motion, requiring small time steps for stable integration. Implicit integration [OL05] can help alleviate this problem. Furthermore, correct simulation of both light and heavy objects might require tuning of the parameters. Moreover, discontinuities in the positions and number of detected contact points affect the stability of the simulation. Finally they allow some interpenetration between objects. 3. Main Algorithms 3.1. Overview Throughout this paper we will use bold-face type to distinguish vector quantities from scalars. We will consider proximity queries and collision detection between : • The surface M of a rigid, solid object. • An arbitrary, possibly deformable, polygonal mesh denoted N. M partitions space into a bounded interior region and an unbounded exterior region. We define DM to be the signed distance field with respect to M, i.e. for all points p in space we have DM (p) = sgn(p) min ||x − p||, x∈M where || · || is the Euclidean norm and ( −1 if p is in the interior region of M; sgn(p) = 1 otherwise. We next gather the required proximity information from the computed fragments. The nature of current graphics hardware imposes limitations on how the proximity information at each point in N S can be processed. We therefore introduce two techniques to produce compact and minimal proximity information. The first technique uses the depth buffer and the classic parallel reduction technique to sort the distance values and extract from them the closest or most penetrating point of N S into M, and the associated local penetration depth or separation distance. Our second technique computes collision response directly on the GPU in the form of penalty forces. At each penetrating point of N S a local penalty force is computed, characterizing its penetration and the surface area of N around the point. These forces are then combined into global penalty force and torque at the center of gravity of the object using the blending functionality of graphics hardware and parallel reduction. The global penalty force and torque correspond to the integrated local penalty forces over the penetrating area of N into M. We finally present two optimizations. The first one is intended as an optimization of the collision detection algorithm. It uses a depth-only rendering of M and the early-Z culling capabilities of graphics hardware to quickly reject points of N S which are not penetrating M. The second optimization uses the geometry shader to sample N in a single rendering pass. 3.2. Image-space Sampling of Meshes (1) We use the GPU to perform a dense and uniform sampling of N through rasterization in real-time, avoiding precomputation of the samples [BJ07] or problems related to vertexbased sampling [FSG03] as illustrated in Figure 1(a). (2) Rendering a triangular mesh produces a dense and efficient sampling of the part of the mesh facing the viewing direction in the form of fragments, which are projected onto pixels in a framebuffer, as illustrated in Figure 1(b). A dense and relatively uniform sampling of the whole mesh can be produced by rendering it along orthogonal directions, see Figure 1(c). This image-space sampling is less affected by deformations of the mesh than object-space sampling based on vertices. We therefore perform three orthographic renderings of N in ΩM along its three orthogonal axes as seen in Figure 1(d). For any type of triangular mesh N these three renderings will sample all triangles of N to image-space precision. Our first task is to compute an approximation of DM on a regular grid over a 3D domain ΩM , typically an expanded bounding box for M. We used the pseudo-angle normal and the acceleration technique based on a hierarchy of oriented bounding boxes as presented in [BA05]. Since this is an offline process, performance is not critical at this stage. In addition to the signed distance to M, we compute at each vertex in the grid the direction to the closest point on M. We then store these values in a 4-component floating point 3D texture on the GPU. The direction is stored in the RGB color channels, whereas DM is stored in the alpha channel. To perform proximity queries between M and N we evaluate DM over N. We use graphics hardware to sample N from c The Eurographics Association and Blackwell Publishing 2008. three orthogonal views, generating a set of image-space samples or fragments N S . We then use the texturing capabilities of the GPU to evaluate DM at each point of N S using a fragment shader program. In this manner we obtain a fairly uniform sampling over N of the distance function DM . Graphics hardware 3D texture mapping capabilities provide fast and efficient trilinear interpolation, thus allowing to compute DM and the associated direction vector at any T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields viewing direction M d5 viewing direction . (a) d4 (b) d3 d2 d1 d1 Μ Ω d4 d3 d2 view 1 d5 N view 1 (a) view 2 view 2 (c) d5 view 3 (d) point of ΩM . Once the proximity data is known at each point of N S , we need to compute relevant information from these values. In the next two sections we present two techniques to efficiently produce minimal and global proximity and collision information from the computed fragments, using the fragment shader. 3.3. Proximity Queries Using Depth Buffering Our first algorithm computes a point of N S with the minimum signed distance to M, i.e. the closest (when the objects are disjoint) or most penetrating (when the objects are intersecting) point. We process the distance information on the GPU, using depth buffering. The depth buffer provides the ability to sort the incoming fragments projecting onto the same pixels according to their depth. We first set the depth test to pass when a fragment has a smaller depth value than the one present in the depth buffer and clear the depth buffer with the maximal depth value of 1. This ensures that the first fragment rendered at each pixel position passes the depth test. We perform the three orthogonal renderings described previously and compute at each fragment a depth value corresponding to the signed distance at this fragment. Let dmax and dmin be the maximum and minimum distance field values. We first assign at each fragment pi the world position of this fragment in the RGB color channels. We then fetch the distance field value DM (pi ) from the 3D texture containing DM and compute the depth value DM (pi ) − dmin dmax − dmin d5 d4 Figure 1: (a) The vertices of a mesh can yield an irregular and non-uniform set of samples. (b) The faces whose normal is close to the viewing direction are densely sampled using the GPU. (c) Rendering a mesh along several orthogonal directions provides dense sampling for every face. (d) The part of N lying inside ΩM is rendered along the three orthogonal axes of ΩM . zi = (b) (3) d2 d3 d2 d2 (c) Figure 2: (a) While rendering N, the value of DM is computed at each fragment. (b) The fragments are sorted in depth according to their distance to M. (c) The minimum value is extracted form the framebuffer using parallel reduction. which clearly yields zi ∈ [0, 1]. The fragments produced in this way are ordered according to their distance values as illustrated in Figure 2(a) and 2(b). At the end of each rendering pass, each pixel contains the world position of the point (fragment) of N S projecting onto this pixel which has the smallest signed distance value. We reuse the same depth buffer for each of the rendering passes since this allows us to reuse the depth values produced in the previous passes to reject fragments which have no chance to have the minimum signed distance. At the end of the three rendering passes, each pixel corresponds to a candidate for the closest or most penetrating point of N S with respect to M. Next, the parallel reduction technique [Har05] is used to perform minimum reduction of the frame buffer. During each pass, groups of four neighboring pixels are collapsed into one single pixel corresponding to the one with the smallest depth value as in Figure 2(c). Since depth buffer precision is not linear, and often lower than the framebuffer precision, we mirror the depth buffer into the alpha channel of the framebuffer to perform the comparisons: we clear the alpha values to 1 before performing the renderings, and at each fragment produced we copy the depth value into the alpha channel. After log2 (n) passes (for an initial buffer of n2 pixels), we obtain a single pixel containing either the closest or most penetrating point of N S to M and its associated signed distance, which we can finally read back to main memory c The Eurographics Association and Blackwell Publishing 2008. T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields ni Pi view 1 Si viewing direction N . view 2 Figure 4: Normal and area of a fragment. Figure 3: Fragments are projected into different views according to their normals. acting at qi is then fi = f i P + f i D . using minimal bandwidth. It is also possible to get the corresponding closest point on M by rendering it to a second render target. This force can also be expressed at the center of gravity gM of the rigid object M as the same force fi together with a penalty torque ti = (qi − gM ) × fi , 3.4. Global Penalty Forces for Collision Response Our second algorithm follows an approach similar to the one presented in the previous section but focuses on detecting and responding to collisions between M and N. It computes penalty forces on the GPU from the penetration depths and velocities at penetrating points, in order to untangle collisions. We perform the three orthogonal renderings described in section 3.2. However, we try to avoid rendering any fragment twice by only rendering the fragments for which the largest coordinate of their normal in eye space is along the viewing direction in each rendering pass, as seen in Figure 3. This also ensures that all fragments rendered in each pass are approximately facing the viewing direction. Using texture lookups, we get at each penetrating fragment pi of N S , the closest point qi on M from the distance field. Since we are only interested in collisions, we reject non-penetrating fragments, i.e. whose distance DM (pi ) is positive. We also have the velocities vp i and vq i of pi and qi as additional information. We then compute a local penalty force acting at qi to push M out of collision P N N fi = k(pi − qi ) + b(vp i − vq i ), N (4) N where vp i and vq i are the respective components of vp i and vq i along the vector pi − qi (normal velocities), and k and b are spring and damping constants. We can additionally compute a dynamic friction force acting at qi fi D = µD ||fi P || vp i T − vq i T , ||vp i T − vq i T || (5) where vp i T and vq i T are the respective components of vp i and vq i orthogonal to pi − qi (tangential velocities), and µD is the dynamic friction coefficient. The total penalty force c The Eurographics Association and Blackwell Publishing 2008. (6) (7) as depicted in Figure 5(a). Note that if N is also a rigid object, it is possible to compute similar forces and torques acting at the center of gravity of N to push it out of collision. Let us assume that at each fragment pi produced, the projection of the surrounding surface occupies the whole corresponding pixel. The area Pi of a pixel is constant for all fragments of a given rendering pass but might vary across rendering passes due to different dimensions of ΩM . Let us also denote by ni = (nix , niy , niz ) the interpolated normal at pi expressed in viewing coordinates, i.e. with z along the viewing direction. In this situation the surface around pi is a parallelogram with area Si = Pi , |niz | (8) see Figure 4. We then assign to the fragment pi the areaweighted penalty force applied at qi fiW = Si fi = Pi fi |niz | (9) and area-weighted penalty torque tiW = Si ti = (qi − gM ) × fiW . (10) W The area-weighted penalty force fi can be associated to a small element of penetrating volume between pi and qi . These area-weighted forces and torques are stored in the RGB channels of two render targets. We additionally store Si in the alpha channel. Given several fragments projecting onto the same pixel, we are confronted with the same problem as in the previous section: merging the information of these fragments. The blending functionality of graphics hardware allows us to combine the color value of the current fragment with the value present in the framebuffer. In particular, additive T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields blending sums the color values. We initialize the framebuffer with zeros, enable additive blending and perform the three orthogonal rendering passes. Each pixel in the framebuffer will then contain the total area of the penetrating fragments projecting to this position in the alpha channel, as well as vectors representing the sum of the area-weighted penetration forces and torques of each fragment in the RGB channels. We use the same framebuffer across all three rendering passes to accumulate the result. We further sum the values contained in the framebuffer using a parallel reduction technique similar to the one outlined in the previous section, see Figure 5(b). The results are the accumulated penalty force at gM FS = n n i=0 i=0 ∑ fiW = ∑ Si fi pi M viewing direction qi fi ti fi gM N (a) (11) and accumulated penalty torque at gM TS = n n i=0 i=0 ∑ tiW = ∑ Si ti (12) in the RGB channels of each render target as well as the total penetration area n ∑ Si S= (13) i=0 in the alpha channel of the resulting pixel in our framebuffer. We read back FS , TS and S to main memory. We then compute the global penalty force at gM F= FS , S (14) and global penalty torque at gM (b) Figure 5: (a) At each fragment, a penalty force and torque acting on the center of gravity are computed. These are then summed using additive blending, yielding one penalty force and torque at each pixel. (b) The pixel penalty forces and torques are summed up to produce the accumulated penetration force and torque. a given direction, we can determine if a fragment is not penetrating by comparing its depth value with the values obtained by rendering M along the same direction. 4.1. Early-Z Culling As a preliminary step, after computing DM , we perform two depth-only renderings of M along the three axes of ΩM . One of the renderings is performed with an initial depth buffer cleared with the maximum value of 1, and a depth test set to pass if an incoming fragment depth is smaller than the one present in the depth buffer. In this way we obtain a depth buffer containing at each pixel either the minimal depth of M, or 1 for pixels onto which M does not project (Figure 6(a)). The second rendering is performed with an initial depth buffer cleared with the minimum value of 0 and a depth test set to pass if an incoming fragment depth is larger than the current depth. This produces a depth buffer containing at each pixel either the maximum depth of M or 0 where M does not project (Figure 6(b)). We store these six depth buffers on the GPU. The collision detection and response algorithm described in Section 3.4 rejects fragments corresponding to sampled points which are exterior to M. Early-Z culling is a functionality present on most current GPUs which provides a way of discarding fragments produced by rasterization before they are processed, by comparing their depth value with the value already present in the depth buffer. When rendering N along For each rendering of N along one of the axes of ΩM , we bind one of the corresponding precomputed depth buffers, enable the depth test and disable writing to the depth buffer. If the depth buffer containing the minimum depth of M is bound, we set the depth test to pass if an incoming fragment depth is larger than the current depth. In this way we cull fragments which are "in front" of M with respect to the TS , (15) T= S which can be considered as area weighted averages of the local penetration forces and torques. These global penalty forces and torques can then be applied to M in a dynamic simulation to resolve collisions. 4. Optimizations In this section, we present two techniques for improving the performance of our algorithms. c The Eurographics Association and Blackwell Publishing 2008. T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields viewing direction M viewing direction M max depth min depth (a) (b) viewing direction viewing direction min depth culled region culled fragments N (c) max depth culled region culled fragments shader. For a given incoming triangle, the geometry shader computes the triangle normal in the local coordinate frame of ΩM from the incoming positions. We then choose the viewing direction for this triangle which corresponds to the axis of ΩM onto which the normal has the largest projection (largest coordinate). We finally output a triangle whose vertices contain the projected coordinates corresponding to the chosen viewing direction and texture coordinates corresponding to the positions in the local coordinate frame of ΩM . This optimization allows us to produce N S in a single rendering pass instead of three passes as in Section 3.2. N 5. Performance and Comparison (d) Figure 6: M is rendered twice in depth along the viewing direction to obtain two buffers containing its minimal (a) and maximal (b) depth. These buffers are then used to cull fragments of N "in front of" (c) or "behind" (d) M with respect to the viewing direction. viewing direction, or which correspond to a pixel position where M does not project (Figure 6(c)). Conversely, if the depth buffer containing the maximal depth of M is bound, the depth test is set to pass for fragments with smaller depth than the current depth and fragments "behind" M along the current viewing direction are culled, see Figure 6(d). The depth buffer to be bound (maximum or minimum depth) can be determined by looking at the position of the bounding box of N relative to ΩM along the viewing direction, since this gives us a hint as to which mesh is "behind" the other along the viewing direction. It should be noted that this optimization is only possible for the algorithm presented in Section 3.4 since the algorithm of Section 3.3 writes the depth of the fragments which disables Early-Z culling. 4.2. Geometry Shader We present an additional optimization using the geometry shader, which is a recently introduced programmable stage in the rendering pipeline [Bly06]. It occurs after primitive assembly, takes a single primitive as input and generates one or several primitives. We use this stage to sample N in a single rendering pass. We first upload to the GPU three projection matrices corresponding to the orthogonal renderings described in Section 3.2 along the three axes of ΩM . We then render N, and for each of its vertices, compute its projections along the three orthogonal rendering views in the vertex shader, as well as the position of the vertex in the local coordinate frame of ΩM . These positions are passed on to the geometry c The Eurographics Association and Blackwell Publishing 2008. In this section we discuss the performance and characteristics of our algorithms and compare it to similar algorithms. We implemented them on a PC running Linux with an Athlon 64 3800+ X2 CPU with 2 GB of memory and an NVIDIA GeForce 8800 GTX GPU connected via a 16x PCI Express bus. We use the Coin and OpenGL graphics APIs to render polygonal models, and GLSL to program our shaders. The whole graphics pipeline (textures, color, depth buffers, etc.) is in 32 bits floating point precision. We store DM in a 256x256x256 3D texture. The size of the distance field texture appeared to have little influence on performance. 5.1. Comparison with CPU-based methods We benchmark our application using an approach similar to the one presented in [Zac98]: given two identical models at the same position and orientation, we first translate one of them along the x axis until both models are not colliding but in close proximity. In order to benchmark the collision detection algorithm presented in Section 3.4, we then gradually bring the models closer together along the x axis. For each step we rotate the objects entirely around the y and z axes, performing collision detections for 720 different orientations (Figure 7). We then average the collision detection time for all orientations at a given translation step. We use a similar approach to benchmark the proximity query algorithm presented in Section 3.3, but we pull the objects further away instead of closer together. We measure the performance in terms of the average computation time over the whole range of configurations. We compare the performance of our algorithm with the SOLID collision detection library which uses AABBs to detect collisions and proximity [vdB97], as well as a distance field proximity algorithm running on the CPU and using vertex-based sampling. Since there is no algorithm in SOLID which corresponds exactly to the one described in Section 3.4, we compare it to an algorithm that detects collisions between two polygonal models, computes a penetration depth at each colliding couple of primitives and returns the maximum value. We compare the proximity algorithm of Section 3.3 to a similar algorithm in SOLID which computes T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields y collision time (msec) 60 SOLID deformable CPU distance fields SOLID rigid Our algorithm 50 40 30 20 10 0 0 0.4 0.6 0.8 1 distance z. Figure 7: Benchmarking of our collision detection algorithm: the green model is progressively translated towards the blue along the x axis and rotated around the y and z axis at each translation step. Collision detection is performed for each orientation and the collision time is averaged at each step. the closest point between two polygonal models and the associated separation distance. We measure the performance of SOLID in the case where the two objects are considered rigid, but also in the case where one of them is considered deformable, updating the AABB hierarchy at each frame. When benchmarking the vertex-based distance field algorithm, we compute the sum of penetrating vectors at each vertex for collision detection and the vertex with the minimum signed distance for proximity queries. A framebuffer size of 1024x1024 was used for all benchmarks. Figure 8 shows the collision time for the bunny model (69 451 triangles) for the different collision detection algorithms. The initial distance of 1 corresponds to objects in close proximity and the distance 0 corresponds to coincident objects. The performance of our algorithm stays fairly constant at around 1.8 millisecond per query as distance diminishes and the number of contacts increase, whereas SOLID shows a significant performance drop due to the fact that many AABBs overlap and a large number of contacts have to be processed. For distances close to 1, the models are in close proximity and few contacts are processed. In this situation SOLID is able to efficiently prune computations and becomes more performant than our algorithm for rigid objects. Our algorithm is also consistently more performant than the vertex-based distance field method. Figure 9 shows the query time for computing the closest point between two bunny models at various distances using the different proximity algorithms. The initial distance of 1 correspond to objects in close proximity and a distance of 2 corresponds to objects separated by a distance approximately equal to the length of their bounding box. Again the performance of our algorithm varies little with configuration. . Figure 8: Benchmarking of the collision detection algorithm in Section 3.4 for the bunny model. 60 SOLID deformable CPU distance fields SOLID rigid Our algorithm 50 query time (msec) x 0.2 40 30 20 10 0 1 1.2 1.4 1.6 distance 1.8 2 . Figure 9: Benchmarking of the proximity query algorithm in Section 3.3 for the bunny model. This time our algorithm outperforms both SOLID and the vertex-based distance field algorithm for all configurations and types of objects. This is due to the fact that AABB pruning is less efficient for proximity queries than for collision detection. Compared to the SOLID collision detection algorithm, our methods maintain steady performance even in complex contact scenarios. This is a result of the fact that our algorithm processes all primitives at every step. In the case of close proximity between rigid objects, few AABBs overlap and SOLID is more performant than our algorithm. However, under the assumption of one rigid and one deformable object, our algorithm is likely to outperform SOLID for any configuration due to the cost of updating the AABB tree at each frame. 5.2. Comparison with GPU-based methods Unfortunately no direct comparison was possible due to lack of access to GPU implementations of proximity query algorithms. It is however reasonable to believe that, c The Eurographics Association and Blackwell Publishing 2008. T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields Readback time (msec) 3 33 345 2701 5423 10849 Table 1: Readback time of uniform 3D grids from the GPU to main memory for different grid sizes 1.95 basic early-Z geometry shader 1.9 collision time (msec) Resolution 64 x 64 x 64 128 x 128 x 128 256 x 256 x 256 512 x 512 x 512 1024 x 1024 x 1024 2048 x 2048 x 2048 1.85 1.8 1.75 1.7 1.65 1.6 1.55 0 0.2 0.4 0.6 distance due to its properties (minimum readbacks, few rendering passes), our method can attain better performance than most similar GPU-based techniques. Given relatively complex scenes with thousands of triangles, many methods, such as [SGG∗ 06], although more complete than ours, report interactive framerates (around 100 to 1000 milliseconds) whereas our method performs in real-time (around 1 to 10 milliseconds). It is useful to highlight certain differences between our algorithm and other GPU techniques to shed light on their impact on performance. The principal reasons behind the efficiency of our algorithm are the fact that it minimizes readbacks to main memory and the number of rendering passes, both of which are typical bottlenecks in GPU-based methods. By constraining one of the objects to be rigid, we are able to perform collision response directly on the GPU, computing global penetration forces and torques as described in Section 3.4, thus dramatically reducing the information to be read back from the GPU. Similarly, by limiting the proximity information to the closest point between the two models we are also able to minimize readbacks. In comparison, methods such as the ones presented in [SGG∗ 06] perform proximity queries on a 3D grid in space, and readback the whole grid to the CPU to process the result. Reading back a 3D grid on the same configuration can take from tens to thousands of milliseconds depending on the grid size as illustrated in Table 1. By comparing these timings to the performance of our algorithm as illustrated in Table 2, it can clearly be seen that our method provides better performance than a method based on reading back a 3D grid at similar resolutions. Moreover, the use of blending and depth buffering allows our algorithm to perform proximity queries using only 3 rendering passes or just one when using the geometry shader, whereas algorithms based on rendering to a 3D grid perform one rendering pass for each slice of the grid. Finally, while we are still limited by the precision of the framebuffer in the x and y directions, the precision in the z direction is that of the depth buffer, and as such is generally higher than the one achieved when using a discrete 3D grid. The efficiency of our method, however, comes at the price of less flexibility in the collision detection information available. Several GPU-based approaches perform collision pruning c The Eurographics Association and Blackwell Publishing 2008. 0.8 1 . Figure 10: Influence of our optimizations on the performance of the collision detection algorithm in Section 3.4 for the bunny model. on the GPU and then exact intersection or proximity tests between primitives on the CPU [GLM05,SGG∗ 06]. This might result in varying performance depending on the configuration of the object and the number of exact intersection tests to be performed. Our method on the other hand always processes all triangles, which allows it to keep a steady performance in every object configuration. The main downsides to this are that our method is approximate, contrary to methods such as the ones presented in [GLM05,SGG∗ 06] and that our method cannot take fully advantage of configurations where there is little overlap between objects. 5.3. Performance The influence of the optimizations on the algorithm from Section 3.4 is illustrated in Figure 10. The optimizations improve performance of this algorithm for any possible configuration of the objects. The influence of the geometry shader optimization on the algorithm of Section 3.3 is illustrated in Figure 11. For this algorithm the geometry shader does not improve performance when the models are far away from each other. Finally, Table 2 summarizes the computation time for both algorithms, averaged over all configurations, for each of the models in Figure 12, with varying framebuffer sizes, with and without early-Z and the geometry shader optimizations. As framebuffer size increases, the number of fragments to be rendered grows and our optimizations have a higher impact on performance. The influence of these optimizations decreases however with model complexity. For thin models such as the blade, the collision algorithm shows better average performance than the proximity algorithm due to the smaller ΩM used which leads to many triangles being clipped in configurations of close proximity. Our optimizations improve performance of the collision detection algorithm in any situation. However, early-Z T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields Figure 12: Benchmarking models: cow (6K tri.) , bunny (69K tri.), horse (97K tri.), dragon (480K tri.) and blade (1765K tri.). Model Cow Bunny Horse Dragon Blade Basic Framebuffer Size 512 1024 2048 0.68 1.58 5.78 1.17 1.79 5.26 1.21 1.90 5.50 4.86 5.54 9.32 5.63 5.96 7.89 Collision Time (milliseconds) Early-Z Geometry Shader Framebuffer Size Framebuffer Size 512 1024 2048 512 1024 2048 0.57 1.26 4.62 0.58 1.32 4.88 1.14 1.68 4.30 1.15 1.60 4.52 1.19 1.76 4.69 1.11 1.59 4.67 4.84 5.52 8.83 4.63 5.33 8.23 5.61 5.93 7.58 5.01 5.27 6.62 Proximity Query Time (milliseconds) Basic Geometry Shader Framebuffer Size Framebuffer Size 512 1024 2048 512 1024 2048 0.60 1.15 4.03 0.57 1.09 3.82 1.13 1.74 4.47 1.22 1.73 4.34 1.18 1.68 4.12 1.34 1.80 4.18 4.42 4.96 8.07 5.52 6.04 8.67 12.0 12.5 15.6 14.6 15.1 17.6 Table 2: Average computation time for our algorithms, using various framebuffer sizes and optimizations. basic geometry shader query time (msec) 1.78 1.76 1.74 1.72 . 1.7 Figure 13: Dynamic simulation of a complex scene. 1.68 1 1.2 1.4 1.6 distance 1.8 2 . Figure 11: Influence of the geometry shader optimization on the performance of the proximity query algorithm in Section 3.3 for the bunny model. ometry shader since additional computations are introduced before triangle culling. 5.4. Complex Scenes culling becomes less efficient when fragment processing becomes less prominent in the rendering pipeline, as happens with increased model complexity. Our optimization using the geometry shader reduces the three rendering passes to a single pass. It nevertheless does not yield to triple or even double performance as one might expect. We believe this is partly due to the cost of introducing the additional geometry shader stage in the rendering pipeline. Another reason is the fact that we still have to compute three projected coordinates per vertex, as well as triangle normals. The better performance of the three-pass rendering approach at large distances for the proximity algorithm might be due to the fact that many triangles are then outside of ΩM , and are thus culled. This culling has less influence when using the ge- We finally present an application of the algorithm from Section 3.4 to dynamic simulation of a complex scene. In this example 15 dragon models fall onto a procedurally deformed floor. A snapshot of the simulation can be seen in Figure 13. Collisions between all objects in the scene are computed using AABBs for broadphase collision pruning and our algorithm for narrowphase collision detection and response. The scene contains approximately 7.2 millions triangles. We use a texture resolution of 1024x1024 for collision detection. The average measured execution time for a frame of the dynamic simulation (collision detection, response and integration of penalty forces for all objects) was of 70 milliseconds. 6. Discussion In this section we discuss the characteristics, limitations and possible improvements to our algorithms. c The Eurographics Association and Blackwell Publishing 2008. T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields Our algorithms address the problem of collision detection and proximity queries between a rigid, closed model and a deformable model. It would however be possible to replace the rigid model M with a deformable model and recompute the distance field DM at each deformation, using one of the techniques presented in [SPG03, SOM04, SGGM06]. However, only the algorithm from Section 3.3 would then be relevant, since the global penalty force and torque defined in Section 3.4 only make sense for rigid objects. Although only two different types of proximity queries and collision response schemes have been presented in this paper, other techniques such as different reduction techniques could be devised to produce other types of proximity information. For example, checking whether the objects collide or not can be performed by a simple occlusion query. As mentioned in Section 5, our methods process all triangles of N, and as such do not take advantage of possible pruning of primitives. It would however be possible to combine our algorithm with bounding volume hierarchies, and only render those groups of triangles for which bounding volumes overlap. Since our algorithms perform collision detection and proximity queries based on a discrete sampling of N in image-space, its accuracy is limited by framebuffer resolution. In particular, we assume that every pixel rendered is totally covered when computing the surface area around a fragment. This leads to errors in the computed global penetration force and torque. Nevertheless, it is possible to use much higher precision on the framebuffer than for storing the distance field. Rendering N on the intersection of its bounding box with ΩM instead of the whole domain would increase precision. However, precision would vary with the size of this intersection, and doing this for deformable models involves the extra cost of computing the bounding box of N. Our algorithms are also limited by their discrete nature. Queries are only performed at discrete intervals in time, and thus collisions between objects might be missed. Although techniques such as continuous collision detection [RKC02] can tackle this issue, they are typically more expensive to compute than discrete approaches. On the other hand, the high performance of our algorithms allows us to use high sampling rates which help alleviate this problem. Another issue of our algorithms is the memory required to store distance fields on the GPUs. One way to address this would be to use adaptively sampled distance fields as presented in [FPRJ00], combined with GPU techniques like those in [LSK∗ 06]. One could also reduce memory consumption by storing only the distance in the texture memory and computing the gradient of DM for each fragment as an approximation to the direction to the closest point. This would probably hurt the performance of the algorithm in Section 3.4. c The Eurographics Association and Blackwell Publishing 2008. We believe that our collision detection algorithm is particularly well suited for applications in dynamic simulation using penalty-based collision response and contact handling. Our collision detection from Section 3.4 bears some similarity to the approach of Hasegawa and Sato [HS04]. They pointed out the problem of discontinuities and friction torque computation that appear if too few points are sampled. They proposed integrating penalty forces and torques over the contact plane of interpenetrating convex objects, and obtained global penalty forces and torques based the volume of interpenetration. Although it is not obvious how to compute such a contact plane for non-convex objects and difficult to compute such an integral, our global penetration forces and torques F and T correspond to penalty forces and torques integrated over a contact area. Moreover, the local areaweighted penalty forces computed at each fragment correspond to a small element of volume contained inside the penetration volume. We therefore believe that our method shares many of the advantages of the method presented in [HS04]. Unlike their method however, our method does not handle static friction yet. This is left for future work. Penalty-based methods are especially intuitive as input to haptic feedback. Moreover, haptic feedback requires efficient and steady performance for the collision detection algorithm. Our algorithm provides this and we therefore think that it is particularly well suited for producing haptic feedback using penalty-based methods. Although they map conveniently to our GPU implementation, penalty-based methods have several drawbacks. One of them is their discrete nature, and it would be interesting to explore implementations of different collision response schemes on the GPU. 7. Conclusion In this paper, we have presented highly efficient algorithms for collision detection and proximity queries between a rigid, solid object and an arbitrary polygonal mesh using the parallel computing power of graphics hardware. The computation of penalty-based collision response or compact proximity information directly on the GPU allows our algorithm to minimize readbacks and rendering passes, which leads to high and steady performance compared to several existing algorithms. Acknowledgments This work was supported by the European Community under the Marie Curie Research Training Network ARIS*ER grant number MRTN-CT-2004-512400. The models are courtesy of the Stanford Computer Graphics Laboratory (bunny), Cyberware (horse), UTIA, Academy of Sciences of the Czech Republic, and CGG, Czech Technical University in Prague (dragon). T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields References [BA05] BAERENTZEN J. A., A ANAES H.: Signed distance computation using the angle weighted pseudonormal. IEEE Transactions on Visualization and Computer Graphics 11, 3 (2005), 243–253. [Bar94] BARAFF D.: Fast contact force computation for nonpenetrating rigid bodies. In Proc. of ACM SIGGRAPH (1994), vol. 28, pp. 23–34. [BJ07] BARBI Č J., JAMES D.: Time-critical distributed contact for 6-dof haptic rendering of adaptively sampled reduced deformable models. In SCA ’07: Proceedings of the 2007 ACM SIGGRAPH/Eurographics symposium on Computer animation (2007), pp. 171–180. [Bly06] B LYTHE D.: The direct3d 10 system. In Proc. of ACM SIGGRAPH (2006), pp. 724–732. [FPRJ00] F RISKEN S. F., P ERRY R. N., ROCKWOOD A. P., J ONES T. R.: Adaptively sampled distance fields: a general representation of shape for computer graphics. In Proc. of ACM SIGGRAPH (2000), pp. 249–254. [FSG03] F UHRMANN A., S OBOTTKA G., G ROSS C.: Distance fields for rapid collision detection in physically based modeling. In Proc. of GraphiCon 2003 (2003), pp. 58–65. [GBF03] G UENDELMAN E., B RIDSON R., F EDKIW R.: Nonconvex rigid bodies with stacking. In Proc. of ACM SIGGRAPH (2003), vol. 22, pp. 871–878. [GGK06] G RESS A., G UTHE M., K LEIN R.: GPU-based collision detection for deformable parameterized surfaces. Computer Graphics Forum (Proc. of EUROGRAPHICS) 25, 3 (2006), 497–506. [HS04] H ASEGAWA S., S ATO M.: Real-time rigid body simulation for haptic interactions based on contact volume of polygonal objects. Computer Graphics Forum (Proc. of EUROGRAPHICS) 23, 3 (2004), 529–538. [HTG03] H EIDELBERGER B., T ESCHNER M., G ROSS M. H.: Real-time volumetric intersections of deforming objects. In Proc. of Vision, Modeling, Visualization VMV’03 (2003), pp. 461–468. [HTG04] H EIDELBERGER B., T ESCHNER M., G ROSS M.: Detection of collisions and self-collisions using image-space techniques. In Proceedings of Computer Graphics, Visualization and Computer Vision WSCG’04 (2004), pp. 145–152. [HZLM02] H OFF K., Z AFERAKIS A., L IN M. C., M ANOCHA D.: Fast 3D geometric proximity queries between rigid and deformable models using graphics hardware acceleration. Technical Report: TR02-004, 2002. [JBS06] J ONES M. W., BAERENTZEN J. A., S RAMEK M.: 3D distance fields: a survey of techniques and applications. IEEE Transactions on Visualization and Computer Graphics 12, 4 (2006), 581–599. [KP03] K NOTT D., PAI D. K.: CInDeR: collision and interference detection in real-time using graphics hardware. In Graphics Interface (2003), pp. 73–80. [KSW04] K IPFER P., S EGAL M., W ESTERMANN R.: UberFlow: a GPU-based particle engine. In Proc. of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (2004), pp. 115–122. [LM03] L IN M., M ANOCHA D.: Collision and proximity queries. In Handbook of Discrete and Computational Geometry: Collision detection. 2003. [GKLM07] G OVINDARAJU N. K., K ABUL I., L IN M. C., M ANOCHA D.: Fast continuous collision detection among deformable models using graphics processors. Comput. Graph. 31, 1 (2007), 5–14. [LSK∗ 06] [GLM05] G OVINDARAJU N. K., L IN M. C., M ANOCHA D.: Quick-cullide: Fast inter- and intra-object collision culling using graphics hardware. In VR ’05: Proceedings of the 2005 IEEE Conference 2005 on Virtual Reality (2005), pp. 59–66, 319. [MC95] M IRTICH B., C ANNY J. F.: Impulse-based simulation of rigid bodies. In Symposium on Interactive 3D Graphics (1995), pp. 181–188, 217. [GM96] G OTTSCHALK S., M ANOCHA D.: OBBTree: A hierarchical structure for rapid interference detection. In Proc. of ACM SIGGRAPH (1996), pp. 171–180. [GOM∗ 06] G ALOPPO N., OTADUY M. A., M ECKLEN BURG P., G ROSS M., L IN M. C.: Fast simulation of deformable models in contact using dynamic deformation textures. In SCA ’06: Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation (2006), pp. 73–82. [Har05] H ARRIS M.: Mapping computational concepts to GPUs. In GPU Gems 2, Pharr M., (Ed.). Addison Wesley, March 2005, pp. 493–508. L EFOHN A. E., S ENGUPTA S., K NISS J., S TR R., OWENS J. D.: Glift: Generic, efficient, random-access GPU data structures. ACM Transactions on Graphics 25, 1 (2006), 60–99. ZODKA [MW88] M OORE M., W ILHELMS J.: Collision detection and response for computer animation. In Proc. of ACM SIGGRAPH (1988), pp. 289–298. [OL05] OTADUY M. A., L IN M. C.: Stable and responsive six-degree-of-freedom haptic manipulation using implicit integration. In Proc. World Haptics Conference (2005), pp. 247–256. [Qui94] Q UINLAN S.: Efficient distance computation between non-convex objects. In IEEE Intern. Conf. on Robotics and Automation (1994), pp. 3324–3329. [RKC02] R EDON S., K HEDDAR A., C OQUILLART S.: Fast continuous collision detection between rigid bodies. In Computer Graphics Forum (Proc. of EUROGRAPHICS) (2002), vol. 21. c The Eurographics Association and Blackwell Publishing 2008. T. Morvan, M. Reimers & E. Samset / High Performance GPU-based Proximity Queries using Distance Fields [RMS92] ROSSIGNAC J., M EGAHED A., S CHNEIDER B.-O.: Interactive inspection of solids: cross-sections and interferences. In Proc. of ACM SIGGRAPH (1992), pp. 353–360. [SF91] S HINYA M., F ORGUE M.-C.: Interference detection through rasterization. The Journal of Visualization and Computer Animation 2, 4 (1991), 132–134. [SGG∗ 06] S UD A., G OVINDARAJU N., G AYLE R., K ABUL I., M ANOCHA D.: Fast proximity computation among deformable models using discrete voronoi diagrams. In Proc. of ACM SIGGRAPH (2006), pp. 1144– 1153. [SGGM06] S UD A., G OVINDARAJU N., G AYLE R., M ANOCHA D.: Interactive 3D distance field computation using linear factorization. In Proc. of the 2006 symposium on Interactive 3D graphics and games (2006), pp. 117– 124. [SOM04] S UD A., OTADUY M. A., M ANOCHA D.: DiFi: Fast 3D distance field computation using graphics hardware. In Computer Graphics Forum (Proc. of EUROGRAPHICS) (2004), vol. 23. [SPG03] S IGG C., P EIKERT R., G ROSS M.: Signed distance transform using graphics hardware. In Proc. of IEEE Visualization (2003). [TKH∗ 05] T ESCHNER M., K IMMERLE S., H EIDEL B., Z ACHMANN G., R AGHUPATHI L., F UHRMANN A., C ANI M.-P., FAURE F., M AGNETATT HALMANN N., S TRASSER W., VOLINO P.: Collision detection for deformable objects. Computer Graphics Forum (EUROGRAPHICS State-of-the-Art Report) 24, 1 (2005), 61–81. BERGER [vdB97] VAN DEN B ERGEN G.: Efficient collision detection of complex deformable models using AABB trees. J. Graph. Tools 2, 4 (1997), 1–13. [Zac98] Z ACHMANN G.: Rapid collision detection by dynamically aligned DOP-trees. In Proc. of IEEE Virtual Reality Annual International Symposium (1998), pp. 90– 97. c The Eurographics Association and Blackwell Publishing 2008.