High-Quality Volume Graphics on Consumer PC Hardware Markus Hadwiger Joe M. Kniss Klaus Engel Christof Rezk-Salama Course Notes 42 Abstract Interactive volume visualization in science and entertainment is no longer restricted to expensive workstations and dedicated hardware thanks to the fast evolution of consumer graphics driven by entertainment markets. Course participants will learn to leverage new features of modern graphics hardware to build high-quality volume rendering applications using OpenGL. Beginning with basic texture-based approaches, the algorithms are improved and expanded incrementally, covering illumination, non-polygonal isosurfaces, transfer function design, interaction, volumetric effects, and hardware accelerated filtering. The course is aimed at scientific researchers and entertainment developers. Course participants are provided with documented source code covering details usually omitted in publications. 2 Contact Joe Michael Kniss (Course Organizer) Scientific Computing and Imaging, School of Computing University of Utah 50 S. Central Campus Dr. #3490 Salt Lake City, UT 84112 Email: jmk@cs.utah.edu Phone: 801-581-7977 Klaus Engel Visualization and Interactive Systems Group (VIS) University of Stuttgart Breitwiesenstraße 20-2 70565 Stuttgart, Germany Email: Klaus.Engel@informatik.uni-stuttgart.de Phone: +49 711 7816 208 Fax: +49 711 7816 340 Markus Hadwiger VRVis Research Center for Virtual Reality and Visualization Donau-City-Straße 1 A-1220 Vienna, Austria Email: msh@vrvis.at Phone: +43 1 20501 30603 Fax: +43 1 20501 30900 Christof Rezk-Salama Computer Graphics Group University of Erlangen-Nuremberg Am Weichselgarten 9 91058 Erlangen, Germany Email: rezk@cs.fau.de Phone: +49 9131 85-29927 Fax: +49 9131 85-29931 3 Lecturers Klaus Engel is a PhD candidate at the Visualization and Interactive Systems Group at the University of Stuttgart. He received a Diplom (Masters) of computer science from the University of Erlangen in 1997. From January 1998 to December 2000, he was a research assistant at the Computer Graphics Group at the University of Erlangen-Nuremberg. Since 2000, he is a research assistant at the Visualization and Interactive Systems Group of Prof. Thomas Ertl at the University of Stuttgart. He has presented the results of his research at international conferences, including IEEE Visualization, Visualization Symposium and Graphics Hardware. In 2001, his paper ”High-Quality Pre-Integrated Volume Rendering Using Hardware-Accelerated Pixel Shading” has won the best paper award at the SIGGRAPH/Eurographics Workshop on Graphics Hardware. He has regularly taught courses and seminars on computer graphics, visualization and computer games algorithms. His PhD thesis with the title ”Strategies and Algorithms for Distributed Volume-Visualization on Different Graphics-Hardware Architectures” is currently under review. Markus Hadwiger is a researcher in the ”Basic Research in Visualization” group at the VRVis Research Center in Vienna, Austria, and a PhD student at the Vienna University of Technology. The focus of his current research is exploiting consumer graphics hardware for high quality visualization at interactive rates, especially volume rendering for scientific visualization. First results on high quality filtering and reconstruction of volumetric data have been presented as technical sketch at SIGGRAPH 2001, and as a paper at Vision, Modeling, and Visualization 2001. He is regularly teaching courses and seminars on computer graphics, visualization, and game programming. Before concentrating on scientific visualization, he was working in the area of computer games and interactive entertainment. His master’s thesis ”Design and Architecture of a Portable and Extensible Multiplayer 3D Game Engine” describes the game engine of Parsec (http://www.parsec.org/), a still active cross-platform game project, whose early test builds have been downloaded by over 100.000 people, and were also included on several Red Hat and SuSE Linux distributions. Joe Kniss is a masters student at the University of Utah. He is a research assistant in the Scientific Computing and Imaging Institute. His current research has focused on interactive hardware based volume graphics. A recent paper, Interactive Volume Rendering Using Multi-dimensional Transfer Functions and Direct Manipulation Widgets, won Best Paper at Visualization 2001. He also participated on the Commodity Graphics Accelerators for Scientific Visualization Panel, which won the Best Panel award at Visualization 2001. His previous work demonstrates a system for large scale parallel volume rendering using graphics hardware. New results for this work were presented by Al McPherson at the Siggraph 2001 course on Commodity-Based Scalable Visualization. He has also given numerous lectures on introductory and advanced topics in computer graphics, visualization, and volume rendering. 4 Christof Rezk-Salama has received a PhD in Computer Science from the University of Erlangen in 2002. Since January 1999, he is a research assistant at the Computer Graphics Group and a scholarship holder at the graduate college ”3D Image Analysis and Synthesis”. The results of his research have been presented at international conferences, including IEEE Visualization, Eurographics, MICCAI and Graphics Hardware. In 2000, his paper ”Interactive Volume Rendering on Interactive Volume Rendering on Standard PC Graphics Hardware” has won the best paper award at the SIGGRAPH/Eurographics Workshop on Graphics Hardware. He has regularly taught courses on graphics programming and conceived tutorials and seminars on computer graphics, geometric modeling and scientific visualization. His PhD thesis with the title ”Volume Rendering Techniques for General Purpose Hardware” is currently in print. He has gained practical experience in several scientific projects in medicine, geology and archeology. 5 Contents Introduction 8 1 Motivation 10 2 Volume Rendering 2.1 Volume Data . . . . . . . . . . . . . . 2.2 Sampling and Reconstruction . . . . . 2.3 Direct Volume Rendering . . . . . . . 2.3.1 Optical Models . . . . . . . . . 2.3.2 The Volume Rendering Integral 2.3.3 Ray-Casting . . . . . . . . . . 2.3.4 Alpha Blending . . . . . . . . . 2.3.5 The Shear-Warp Algorithm . . 2.4 Non-Polygonal Iso-Surfaces . . . . . . 2.5 Maximum Intensity Projection . . . . . . . . . . . . . . . 11 11 12 13 14 15 16 17 18 19 20 . . . . . . . . . . . . . . . . . . 21 21 22 23 24 25 26 26 27 27 28 29 29 30 32 33 33 34 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Graphics Hardware 3.1 The Graphics Pipeline . . . . . . . . . . . . . . . . 3.1.1 Geometry Processing . . . . . . . . . . . . . 3.1.2 Rasterization . . . . . . . . . . . . . . . . . 3.1.3 Fragment Operations . . . . . . . . . . . . 3.2 Consumer PC Graphics Hardware . . . . . . . . . . 3.2.1 NVIDIA . . . . . . . . . . . . . . . . . . . 3.2.2 ATI . . . . . . . . . . . . . . . . . . . . . . 3.3 Fragment Shading . . . . . . . . . . . . . . . . . . 3.3.1 Traditional OpenGL Multi-Texturing . . . . 3.3.2 Programmable Fragment Shading . . . . . . 3.4 NVIDIA Fragment Shading . . . . . . . . . . . . . 3.4.1 Texture Shaders . . . . . . . . . . . . . . . 3.4.2 Register Combiners . . . . . . . . . . . . . 3.5 ATI Fragment Shading . . . . . . . . . . . . . . . . 3.6 Other OpenGL Extensions . . . . . . . . . . . . . . 3.6.1 GL EXT blend minmax . . . . . . . . . . . . 3.6.2 GL EXT texture env dot3 . . . . . . . . . 3.6.3 GL EXT paletted texture, GL EXT shared 4 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . palette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6 Texture-Based Methods 35 5 Sampling a Volume Via Texture Mapping 5.1 Proxy Geometry . . . . . . . . . . . . . 5.2 2D-Textured Object-Aligned Slices . . . 5.3 2D Slice Interpolation . . . . . . . . . . 5.4 3D-Textured View-Aligned Slices . . . . 5.5 3D-Textured Spherical Shells . . . . . . 5.6 Slices vs. Slabs . . . . . . . . . . . . . . . . . . . . 37 38 40 44 45 47 47 . . . . . . . . 49 49 49 50 51 52 53 54 54 . . . . . . . . . . . . 6 Components of a Hardware Volume Renderer 6.1 Volume Data Representation . . . . . . . . 6.2 Transfer Function Representation . . . . . . 6.3 Volume Textures . . . . . . . . . . . . . . . 6.4 Transfer Function Tables . . . . . . . . . . . 6.5 Fragment Shader Configuration . . . . . . . 6.6 Blending Mode Configuration . . . . . . . . 6.7 Texture Unit Configuration . . . . . . . . . 6.8 Proxy Geometry Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Acknowledgments 56 Illumination Techniques 56 8 Local Illumination 58 9 Gradient Estimation 59 10 Non-polygonal Shaded Isosurfaces 60 11 Per-Pixel Illumination 62 12 Advanced Per-Pixel Illumination 63 13 Reflection Maps 65 Classification 67 14 Introduction 69 15 Transfer Functions 70 7 16 Extended Transfer Function 16.1 Optical properties . . . . . . 16.2 Traditional volume rendering . 16.3 The Surface Scalar . . . . . . 16.4 Shadows . . . . . . . . . . . 16.5 Translucency . . . . . . . . . 16.6 Summary . . . . . . . . . . . . . . . . . 74 74 74 75 75 78 83 17 Transfer Functions 17.1 Multi-dimensional Transfer Functions . . . . . . . . . . . . . . . . . . . . . . 17.2 Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 85 86 87 Advanced Techniques 90 18 Hardware-Accelerated High-Quality Filtering 18.1 Basic principle . . . . . . . . . . . . . . . . 18.2 Reconstructing Object-Aligned Slices . . . . 18.3 Reconstructing View-Aligned Slices . . . . . 18.4 Volume Rendering . . . . . . . . . . . . . . 92 92 97 98 98 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Pre-Integrated Classification 99 19.1 Accelerated (Approximative) Pre-Integration . . . . . . . . . . . . . . . . . . 101 20 Texture-based Pre-Integrated Volume Rendering 103 20.1 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 20.2 Texel Fetch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 21 Rasterization Isosurfaces using Dependent Textures 108 21.1 Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 22 Volumetric FX Bibliography 114 117 High-Quality Volume Graphics on Consumer PC Hardware Introduction Klaus Engel Course Notes 42 Markus Hadwiger Joe M. Kniss Christof Rezk-Salama 9 Motivation The huge demand for high-performance 3D computer graphics generated by computer games has led to the availability of extremely powerful 3D graphics accelerators in the consumer marketplace. These graphics cards by now not only rival, but in many areas even surpass, tremendously expensive graphics workstations from just a couple of years ago. Current state-of-the-art consumer graphics chips like the NVIDIA GeForce 4, or the ATI Radeon 8500, offer a level of programmability and performance that not only makes it possible to perform traditional workstation tasks on a cheap personal computer, but even enables the use of rendering algorithms that previously could not be employed for real-time graphics at all. Traditionally, volume rendering has especially high computational demands. One of the major problems of using consumer graphics hardware for volume rendering is the amount of texture memory required to store the volume data, and the corresponding bandwidth consumption when texture fetch operations cause basically all of these data to be transferred over the bus for each rendered frame. However, the increased programmability of consumer graphics hardware today allows high-quality volume rendering, for instance with respect to application of transfer functions, shading, and filtering. In spite of the tremendous requirements imposed by the sheer amount of data contained in a volume, the flexibility and quality, but also the performance, that can be achieved by volume renderers for consumer graphics hardware is astonishing, and has made possible entirely new algorithms for high-quality volume rendering. In the introductory part of these notes, we will start with a brief review of volume rendering, already with an emphasis on how it can be implemented on graphics hardware, in chapter 2, and continue with an overview of the most important consumer graphics hardware architectures that enable high-quality volume rendering in real-time, in chapter 3. Throughout these notes, we are using OpenGL [39] in descriptions of graphics architecture and features, and for showing example code fragments. Currently, all of the advanced features of programmable graphics hardware are exposed through OpenGL extensions, and we introduce their implications and use in chapter 3. Later chapters will make frequent use of many of these extensions. In this course, we restrict ourselves to volume data defined on rectilinear grids, which is the grid type most conducive to hardware rendering. In such grids, the volume data are comprised of samples located at grid points that are equispaced along each respective volume axis, and can therefore easily be stored in a texture map. Despite many similarities, hardware-based algorithms for rendering unstructured grids, where volume samples are located at the vertices of an unstructured mesh, e.g., at the vertices of tetrahedra, are radically different in many respects. Thus, they are not covered in this course. 10 Volume Rendering The term volume rendering [24, 10] describes a set of techniques for rendering threedimensional, i.e., volumetric, data. Such data can be acquired from different sources, like medical data from Computed Tomography (CT) or Magnetic Resonance Imaging (MRI) scanners, computational fluid dynamics (CFD), seismic data, or any other data represented as a three-dimensional scalar field. Volume data can, of course, also be generated synthetically, i.e., procedurally [11], which is especially useful for rendering fluids and gaseous objects, natural phenomena like clouds, fog, and fire, visualizing molecular structures, or rendering explosions and other effects in 3D computer games. Although volumetric data can be difficult to visualize and interpret, it is both worthwhile and rewarding to visualize them as 3D entities without falling back to 2D subsets. To summarize succinctly, volume rendering is a very powerful way for visualizing volumetric data and aiding the interpretation process, and can also be used for rendering high-quality special effects. 2.1 Volume Data In contrast to surface data, which are inherently two-dimensional, even though surfaces are often embedded in three-space, volumetric data are comprised of a three-dimensional scalar field: (2.1) f (x) ∈ IR with x ∈ IR3 Although in principle defined over a continuous three-dimensional domain (IR3 ), in the context of volume rendering this scalar field is stored as a 3D array of values, where each of these values is obtained by sampling the continuous domain at a discrete location. The Figure 2.1: Voxels constituting a volumetric object after it has been discretized. 11 individual scalar data values constituting the sampled volume are referred to as voxels (volume elements), analogously to the term pixels used for denoting the atomic elements of discrete two-dimensional images. Figure 2.1 shows a depiction of volume data as a collection of voxels, where each little cube represents a single voxel. The corresponding sampling points would usually be assumed to lie in the respective centers of these cubes. Although imagining voxels as little cubes is convenient and helps to visualize the immediate vicinity of individual voxels, it is more accurate to identify each voxel with a sample obtained at a single infinitesimally small point in IR3 . In this model, the volumetric function is only defined at the exact sampling locations. From this collection of discrete samples, a continuous function that is once again defined for all locations in IR3 (or at least the subvolume of interest), can be attained through a process known as function or signal reconstruction [34]. 2.2 Sampling and Reconstruction When continuous functions need to be stored within a computer, they must be converted to a discrete representation by sampling the continuous domain at – usually equispaced – discrete locations [34]. In addition to the discretization that is necessary with respect to location, these individual samples also have to be quantized in order to map continuous scalars to quantities that can be represented as a discrete number, which is usually stored in either fixed-point, or floating-point format. After the continuous function has been converted into a discrete function via sampling, this function is only defined at the exact sampling locations, but not over the original continuous domain. In order to once again be able to treat the function as being continuous, a process known as reconstruction must be performed, i.e., reconstructing a continuous function from a discrete one [34]. Reconstruction is performed by applying a reconstruction filter to the discrete function, 1 A -0.5 -1 1 B C 1 0.5 0 1 -1 0 -3 1 -2 -1 0 1 2 Figure 2.2: Different reconstruction filters: box (A), tent (B), and sinc filter (C). 12 3 which is done by performing a convolution of the filter kernel (the function describing the filter) with the discrete function. The simplest such filter is known as the box filter (figure 2.2(A)), which results in nearest-neighbor interpolation. Looking again at figure 2.1, we can now see that this image actually depicts volume data reconstructed with a box filter. Another reconstruction filter that is commonly used, especially in hardware, is the tent filter (figure 2.2(B)), which results in linear interpolation. In general, we know from sampling theory that a continuous function can be reconstructed entirely if certain conditions are honored during the sampling process. The original function must be band-limited, i.e., not contain any frequencies above a certain threshold, and the sampling frequency must be at least twice as high as this threshold (which is often called the Nyquist frequency). The requirement for a band-limited input function is usually enforced by applying a low-pass filter before the function is sampled. Low-pass filtering discards frequencies above the Nyquist limit, which would otherwise result in aliasing, i.e., high frequencies being interpreted as much lower frequencies after sampling, due to overlap in the frequency spectrum. The statement that a function can be reconstructed entirely stays theoretical, however, since, even when disregarding quantization artifacts, the reconstruction filter used would have to be perfect. The “perfect,” or ideal, reconstruction filter is known as the sinc filter [34], whose frequency spectrum is box-shaped, and described in the spatial domain by the following equation: sin(πx) (2.2) sinc(x) = πx A graph of this function is depicted in figure 2.2(C). The simple reason why the sinc filter cannot be implemented in practice is that it has infinite extent, i.e., the filter function is non-zero from minus infinity to plus infinity. Thus, a trade-off between reconstruction time, depending on the extent of the reconstruction filter, and reconstruction quality must be found. In hardware rendering, linear interpolation is usually considered to be a reasonable trade-off between performance and reconstruction quality. High-quality filters, usually of width four (i.e., cubic functions), like the family of cardinal splines [20], which includes the Catmull-Rom spline, or the BC-splines [30], are usually only employed when the filtering operation is done in software. However, it has recently been shown that cubic reconstruction filters can indeed be used for high-performance rendering on today’s consumer graphics hardware [15, 16]. 2.3 Direct Volume Rendering Direct volume rendering (DVR) methods [25] create images of an entire volumetric data set, without concentrating on, or even explicitly extracting, surfaces corresponding to certain features of interest, e.g., iso-contours. In order to do so, direct volume rendering requires an optical model for describing how the volume emits, reflects, scatters, or occludes light [27]. 13 Different optical models that can be used for direct volume rendering are described in more detail in section 2.3.1. In general, direct volume rendering maps the scalar field constituting the volume to optical properties such as color and opacity, and integrates the corresponding optical effects along viewing rays into the volume, in order to generate a projected image directly from the volume data. The corresponding integral is known as the volume rendering integral, which is described in section 2.3.2. Naturally, under real-world conditions this integral is solved numerically. For real-time volume rendering, usually the emission-absorption optical model is used, in which a volume is viewed as being comprised of particles that are only able to emit and absorb light. In this case, the scalar data constituting the volume is said to denote the density of these particles. Mapping to optical properties is achieved via a transfer function, the application of which is also known as classification, which is covered in more detail in part 17. Basically, a transfer function is a lookup table that maps scalar density values to RGBA values, which subsume both the emission (RGB), and the absorption (A) of the optical model. Additionally, the volume can be shaded according to the illumination from external light sources, which is the topic of chapter 8. 2.3.1 Optical Models Although most direct volume rendering algorithms, specifically real-time methods, consider the volume to consist of particles of a certain density, and map these densities more or less directly to RGBA information, which is subsequently processed as color and opacity for alpha blending, the underlying physical background is subsumed in an optical model. More sophisticated models than the ones usually used for real-time rendering also include support for scattering of light among particles of the volume itself, and account for shadowing effects. The most important optical models for direct volume rendering are described in a survey paper by Nelson Max [27], and we only briefly summarize these models here: • Absorption only. The volume is assumed to consist of cold, perfectly black particles that absorb all the light that impinges on them. They do not emit, or scatter light. • Emission only. The volume is assumed to consist of particles that only emit light, but do not absorb any, since the absorption is negligible. • Absorption plus emission. This optical model is the most common one in direct volume rendering. Particles emit light, and occlude, i.e., absorb, incoming light. However, there is no scattering or indirect illumination. • Scattering and shading/shadowing. This model includes scattering of illumination that is external to a voxel. Light that is scattered can either be assumed to impinge unimpeded from a distant light source, or it can be shadowed by particles between the light and the voxel under consideration. 14 • Multiple scattering. This sophisticated model includes support for incident light that has already been scattered by multiple particles. In this course we are concerned with rendering volumes defined on rectilinear grids, using an emission-absorption model together with local illumination for rendering, and do not consider complex lighting situations and effects like single or multiple scattering. However, real-time methods taking such effects into account are currently becoming available [17]. To summarize, from here on out the optical model used in all considerations will be the one of particles simultaneously emitting and absorbing light, and the volume rendering integral described below also assumes this particular optical model. 2.3.2 The Volume Rendering Integral All direct volume rendering algorithms share the property that they evaluate the volume rendering integral, which integrates optical effects such as color and opacity along viewing rays cast into the volume, even if no explicit rays are actually employed by the algorithm. Section 2.3.3 covers ray-casting, which for this reason could be seen as the “most direct” numerical method for evaluating this integral. More details are covered below, but for this section it suffices to view ray-casting as a process that, for each pixel in the image to render, casts a single ray from the eye through the pixel’s center into the volume, and integrates the optical properties obtained from the encountered volume densities along the ray. Note that this general description assumes both the volume and the mapping to optical properties to be continuous. In practice, of course, the evaluation of the volume rendering integral is usually done numerically, together with several additional approximations, and the integration operation becomes a simple summation. Remember that the volume itself is also described by a collection of discrete samples, and thus interpolation, or filtering, for reconstructing a continuous volume has to be used in practice, which is also only an approximation. We denote a ray cast into the volume by x(t), and parameterize it by the distance t to the eye. The scalar value corresponding to this position on a ray is denoted by s(x(t)). Since we employ an emission-absorption optical model, the volume rendering integral we are using integrates absorption coefficients τ (s(x(t))) (accounting for the absorption of light), and colors c(s(x(t))) (accounting for light emitted by particles) along a ray. The volume rendering integral can now be used to obtain the integrated “output” color C, subsuming both color (emission) and opacity (absorption) contributions along a ray up to a certain distance D into the volume: D t c(s(x(t)))e− 0 τ (s(x(t ))) dt dt (2.3) C = 0 This integral can be understood more easily by looking at different parts individually: • In order to obtain the color for a pixel (C), we cast a ray into the volume and perform D integration along it ( 0 dt), i.e., for all locations x(t) along this ray. 15 • It is sufficient if the integration is performed until the ray exits the volume on the other side, which happens after a certain distance D, where t = D. • The color contribution of the volume at a certain position x(t) consists of the color emitted there, c(s(x(t))), multiplied by the cumulative (i.e., integrated) absorption up to the position of emission. The cumulative absorption for that position x(t) is t e− 0 τ (s(x(t ))) dt . In practice, this integral is evaluated numerically through either back-to-front or frontto-back compositing (i.e., alpha blending) of samples along the ray, which is most easily illustrated in the method of ray-casting. 2.3.3 Ray-Casting Ray-casting [24] is a method for direct volume rendering, which can be seen as straightforward numerical evaluation of the volume rendering integral (equation 2.3). For each pixel in the image, a single ray is cast into the volume (assuming super-sampling is not used). At equispaced intervals along the ray (the sampling distance), the discrete volume data is resampled, usually using tri-linear interpolation as reconstruction filter. That is, for each resampling location, the scalar values of eight neighboring voxels are weighted according to their distance to the actual location for which a data value is needed. After resampling, the scalar data value is mapped to optical properties via a lookup table, which yields an RGBA value for this location within the volume that subsumes the corresponding emission and absorption coefficients [24], and the volume rendering integral is approximated via alpha blending in back-to-front or front-to-back order. We will now briefly outline why the volume rendering integral can conveniently be approximated with alpha blending. First, the cumulative absorption up to a certain position x(t) along the ray, from equation 2.3, e− t 0 τ (s( x(t ))) dt (2.4) can be approximated by (denoting the distance between successive resampling locations with d): e− t/d i=0 τ (s( x(id)))d (2.5) The summation in the exponent can immediately be substituted by a multiplication of exponentiation terms: t/d e−τ (s(x(id)))d (2.6) i=0 Now, we can introduce the opacity values A “well-known” from alpha blending, by defining Ai = 1 − e−τ (s(x(id)))d 16 (2.7) and rewriting equation 2.6 as: t/d (1 − Ai ) (2.8) i=0 This allows us to use Ai as an approximation for the absorption of the i-th ray segment, instead of absorption at a single point. Similarly, the color (emission) of the i-th ray segment can be approximated by: Ci = c(s(x(id)))d (2.9) Having approximated both the emissions and absorptions along a ray, we can now state the approximate evaluation of the volume rendering integral as (denoting the number of samples by n = D/d): n i−1 Ci (1 − Ai ) (2.10) Capprox = i=0 j=0 Equation 2.10 can be evaluated iteratively by alpha blending in either back-to-front, or front-to-back order. 2.3.4 Alpha Blending The following iterative formulation evaluates equation 2.10 in back-to-front order by stepping i from n − 1 to 0: Ci = Ci + (1 − Ai )Ci+1 (2.11) A new value Ci is calculated from the color Ci and opacity Ai at the current location i, from the previous location i + 1. The starting condition is and the composited color Ci+1 Cn = 0. Note that in all blending equations, we are using opacity-weighted colors [42], which are also known as associated colors [7]. An opacity-weighted color is a color that has been pre-multiplied by its associated opacity. This is a very convenient notation, and especially important for interpolation purposes. It can be shown that interpolating color and opacity separately leads to artifacts, whereas interpolating opacity-weighted colors achieves correct results [42]. The following alternative iterative formulation evaluates equation 2.10 in front-to-back order by stepping i from 1 to n: + (1 − Ai−1 )Ci Ci = Ci−1 (2.12) Ai = Ai−1 + (1 − Ai−1 )Ai (2.13) New values Ci and Ai are calculated from the color Ci and opacity Ai at the current and opacity Ai−1 from the previous location location i, and the composited color Ci−1 i − 1. The starting condition is C0 = 0 and A0 = 0. 17 B C she ar A slice im ages rp wa g w in vie ays r ne pla e g im a ne pla e g im a ne pla e g im a Figure 2.3: The shear-warp algorithm for orthogonal projection. Note that front-to-back compositing requires tracking alpha values, whereas back-tofront compositing does not. In a hardware implementation, this means that destination alpha must be supported by the frame buffer (i.e., an alpha valued must be stored in the frame buffer, and it must be possible to use it as multiplication factor in blending operations), when front-to-back compositing is used. However, since the major advantage of front-to-back compositing is an optimization commonly called early ray termination, where the progression along a ray is terminated as soon as the cumulative alpha value reaches 1.0, and this cannot easily be done in hardware alpha blending, hardware volume rendering usually uses back-to-front compositing. 2.3.5 The Shear-Warp Algorithm an she d s ar ca le The shear-warp algorithm [22] is a very fast approach for evaluating the volume rendering integral. In contrast to ray-casting, no rays are cast back into the volume, but the volume itself is projected slice by slice onto the image plane. This projection uses bi-linear interpolation within two-dimensional slices, instead of the tri-linear interpolation used by ray-casting. wa ne pla e g im a eye rp im e lan p age eye Figure 2.4: The shear-warp algorithm for perspective projection. 18 The basic idea of shear-warp is illustrated in figure 2.3 for the case of orthogonal projection. The projection does not take place directly on the final image plane, but on an intermediate image plane, called the base plane, which is aligned with the volume instead of the viewport. Furthermore, the volume itself is sheared in order to turn the oblique projection direction into a direction that is perpendicular to the base plane, which allows for an extremely fast implementation of this projection. In such a setup, an entire slice can be projected by simple two-dimensional image resampling. Finally, the base plane image has to be warped to the final image plane. Note that this warp is only necessary once per generated image, not once per slice. Perspective projection can be accommodated similarly, by scaling the volume slices, in addition to shearing them, as depicted in figure 2.4. The clever approach outlined above, together with additional optimizations, like runlength encoding the volume data, is what makes the shear-warp algorithm probably the fastest software method for volume rendering. Although originally developed for software rendering, we will encounter a principle similar to shear-warp in hardware volume rendering, specifically in the chapter on 2Dtexture based hardware volume rendering (5.2). When 2D textures are used to store slices of the volume data, and a stack of such slices is texture-mapped and blended in hardware, bilinear interpolation is also substituted for tri-linear interpolation, similarly to shear-warp. This is once again possible, because this hardware method also employs object-aligned slices. Also, both shear-warp and 2D-texture based hardware volume rendering require three slice stacks to be stored, and switched according to the current viewing direction. Further details are provided in chapter 5.2. 2.4 Non-Polygonal Iso-Surfaces In the context of volume rendering, the term iso-surface denotes a contour surface extracted from a volume that corresponds to a given constant value, i.e., the iso-value. Boundary surfaces of regions of the volume that are homogeneous with respect to certain attributes are usually also called iso-surfaces. For example, an explicit iso-surface could be used to depict a region where the density is above a given threshold. As the name suggests, an iso-surface is usually constituted by an explicit surface. In contrast to direct volume rendering, where no surfaces exist at all, these explicit surfaces are usually extracted from the volume data in a preprocess. This is commonly done by using a variant of the marching cubes algorithm [25], in which the volume data is processed and an explicit geometric representation (in this case, usually thousands of triangles) for the feature of interest, i.e., the iso-surface corresponding to a given iso-value, is generated. However, iso-surfaces can also be rendered without the presence of explicit geometry. In this case, we will refer to them as non-polygonal iso-surfaces. One approach for doing this, is to use ray-casting with special transfer functions [24]. On graphics hardware, non-polygonal iso-surfaces can be rendered by exploiting the OpenGL alpha test [41]. In this approach, the volume is stored as an RGBA volume. 19 A B Figure 2.5: A comparison of direct volume rendering (A), and maximum intensity projection (B). Local gradient information is precomputed and stored in the RGB channels, and the volume density itself is stored in the alpha channel. The density in conjunction with alpha testing is used in order to select pixels where the corresponding ray pierces the iso-surface, and the gradient information is used as “surface normal” for shading. Implementation details of the hardware approach for rendering non-polygonal iso-surfaces are given in chapter 10. 2.5 Maximum Intensity Projection Maximum intensity projection (MIP) is a variant of direct volume rendering, where, instead of compositing optical properties, the maximum value encountered along a ray is used to determine the color of the corresponding pixel. An important application area of such a rendering mode, are medical data sets obtained by MRI (magnetic resonance imaging) scanners. Such data sets usually exhibit a significant amount of noise that can make it hard to extract meaningful iso-surfaces, or define transfer functions that aid the interpretation. When MIP is used, however, the fact that within angiography data sets the data values of vascular structures are higher than the values of the surrounding tissue, can be exploited easily for visualizing them. Figure 2.5 shows a comparison of direct volume rendering and MIP used with the same data set. In graphics hardware, MIP can be implemented by using a maximum operator when blending into the frame buffer, instead of standard alpha blending. The corresponding OpenGL extension is covered in section 3.6. 20 Graphics Hardware This chapter begins with a brief overview of the operation of graphics hardware in general, before it continues by describing the kind of graphics hardware that is most interesting to us in the context of this course, i.e., consumer PC graphics hardware, like the NVIDIA GeForce family [32], and the ATI Radeon graphics cards [2]. We are using OpenGL [39] as application programming interface (API), and the sections on specific consumer graphics hardware architectures describe the OpenGL extensions needed for high-quality volume rendering, especially focusing on per-fragment, or per-pixel, programmability. 3.1 The Graphics Pipeline For hardware-accelerated rendering, the geometry of a virtual scene consists of a set of planar polygons, which are ultimately turned into pixels during display traversal. The majority of 3D graphics hardware implements this process as a fixed sequence of processing stages. The order of operations is usually described as a graphics pipeline, which is illustrated in figure 3.1. The input of this pipeline is a stream of vertices that can be joined together to form geometric primitives, such as lines, triangles, and polygons. The output is a raster image of the virtual scene, which can be displayed on the screen. The graphics pipeline can roughly be divided into three different stages: Geometry Processing computes linear transformations of the incoming vertices in the 3D spatial domain such as rotation, translation, and scaling. Through their vertices, the primitives themselves are transformed along naturally. Rasterization decomposes the geometric primitives into fragments. Note that although a fragment is closely related to a pixel on the screen, it may be discarded by one of several tests before it is finally turned into an actual pixel, see below. After a fragment has initially been generated by the rasterizer, colors fetched from texture maps are applied, followed by further color operations, often subsumed under the term fragment shading. On today’s programmable consumer graphics hardware, both fetching colors from textures and additional color operations applied to a fragment are programmable to a large extent. Fragment Operations After fragments have been generated and shaded, several tests are applied, which finally decide whether the incoming fragment is discarded or displayed on the screen as a pixel. These tests usually are alpha testing, stencil testing, and depth testing. After fragment tests have been applied and the fragment has not been discarded, it is combined with the previous contents of the frame buffer, a process known as alpha blending. After this, the fragment has become a pixel. 21 G EO M ETR Y PR O C ESSIN G scene description Vertices R ASTER IZATIO N Fragm ents Prim itives raster im age FR AG M EN T O PER ATIO N S Pixels Figure 3.1: The graphics pipeline for display traversal. For understanding the algorithms presented in this course, it is important to have a grasp of the exact order of operations in the graphics pipeline. In the following sections, we will have a closer look at its different stages. 3.1.1 Geometry Processing The geometry processing unit performs per-vertex operations, i.e, operations that modify the incoming stream of vertices. The geometry engine computes linear transformations, such as translation, rotation, and projection. Local illumination models are also evaluated on a per-vertex basis at this stage of the pipeline. This is the reason why geometry processing is often referred to as transform & lighting unit (T&L). For a more detailed description, the geometry engine can be further subdivided into several subunits, as depicted in figure 3.2: Modeling Transformation: Transformations that are used to arrange objects and specify their placement within the virtual scene are called modeling transformations. They are specified as a 4 × 4 matrix using homogeneous coordinates. G EO M ETRY PR O C ESSIN G M odeling-/ View ingTransform ation Prim itive Assem bly Lighting V E R TIC E S C lipping/ Projective Transform ation P R IM ITIV E S Figure 3.2: Geometry processing as part of the graphics pipeline. 22 Viewing Transformation: A transformation that is used to specify the camera position and viewing direction is called viewing transformation. This transformation is also specified as a 4 × 4 matrix. Modeling and viewing matrices can be pre-multiplied to form a single modelview matrix, which is the term used by OpenGL. Lighting: After the vertices are correctly placed within the virtual scene, a local illumination model is evaluated for each vertex, for example the Phong model [35]. Since this requires information about normal vectors and the final viewing direction, it must be performed after the modeling and viewing transformation. Primitive Assembly: Rendering primitives are generated from the incoming vertex stream. Vertices are connected to lines, lines are joined together to form polygons. Arbitrary polygons are usually tessellated into triangles to ensure planarity and to enable interpolation using barycentric coordinates. Clipping: Polygon and line clipping is applied after primitive assembly in order to remove those portions of the geometry that cannot be visible on the screen, because they lie outside the viewing frustum. Perspective Transformation: Perspective transformation computes the projection of a geometric primitive onto the image plane. Perspective transformation is the final step of the geometry processing stage. All operations that take place after the projection step are performed within the two-dimensional space of the image plane. This is also the stage where vertex programs [32], or vertex shaders, are executed when they are enabled in order to substitute large parts of the fixed-function geometry processing pipeline by a user-supplied assembly language program. 3.1.2 Rasterization Rasterization is the conversion of geometric data into fragments. Each fragment eventually corresponds to a square pixel in the resulting image, if it has not been discarded by one R A STER IZATIO N Polygon R asterization Texture Fetch P R IM ITIV E S Fragm ent Shading FR A G M E N TS Figure 3.3: Rasterization as part of the graphics pipeline. 23 of several per-fragment tests, such as alpha or depth testing. The process of rasterization can be further subdivided into three different subtasks, as displayed in figure 3.3: Polygon Rasterization: In order to display filled polygons, rasterization determines the set of pixels that lie in the interior of the polygon. This also comprises the interpolation of visual attributes such as color, illumination terms, and texture coordinates given at the vertices. Texture Fetch: Textures are mapped onto a polygon according to texture coordinates specified at the vertices. For each fragment, these texture coordinates must be interpolated and a texture lookup is performed at the resulting coordinate. This process yields an interpolated color value fetched from the texture map. In today’s consumer graphics hardware from two to six textures can be fetched simultaneously for a single fragment. Furthermore, the lookup process itself can be controlled, for example by routing colors back into texture coordinates, which is known as dependent texturing. Fragment Shading: After all the enabled textures have been sampled, further color operations are applied in order to shade a fragment. A simple example would be the combination of texture color and primary, i.e., diffuse, color. Today’s consumer graphics hardware allows highly flexible control of the entire fragment shading process. Since fragment shading is extremely important for volume rendering on such hardware, sections 3.3, 3.4, and 3.5 are devoted to this stage of the graphics pipeline as it is implemented in state-of-the-art architectures. Note that recently the line between texture fetch and fragment shading is getting blurred, and the texture fetch stage is becoming a part of the fragment shading stage. 3.1.3 Fragment Operations After a fragment has been shaded, but before it is turned into an actual pixel, which is stored in the frame buffer and ultimately displayed on the screen, several fragment tests are performed, followed by alpha blending. The outcome of these tests determines whether FR A G M EN T O PER ATIO N S Alpha Test Stencil Test D epth Test Alpha Blending FR A G M E N TS Figure 3.4: Fragment operations as part of the graphics pipeline. 24 the fragment is discarded, e.g., because it is occluded, or becomes a pixel. The sequence of fragment operations is illustrated in figure 3.4. Alpha Test: The alpha test allows discarding a fragment depending on the outcome of a comparison between the fragment’s opacity A (the alpha value), and a specified reference value. Stencil Test: The stencil test allows the application of a pixel stencil to the frame buffer. This pixel stencil is contained in the stencil buffer, which is also a part of the frame buffer. The stencil test conditionally discards a fragment depending on a comparison of a reference value with the corresponding pixel in the stencil buffer, optionally also taking the depth value into account. Depth Test: Since primitives may be generated in arbitrary sequence, the depth test provides a convenient mechanism for correct depth ordering of partially occluded objects. The depth value of a pixel is stored in a depth buffer. The depth test decides whether an incoming fragment is occluded by a pixel that has previously been written, by comparing the incoming depth value to the value in the depth buffer. This allows discarding occluded fragments on a per-fragment level. Alpha Blending: To allow for semi-transparent objects and other compositing modes, alpha blending combines the color of the incoming fragment with the color of the corresponding pixel currently stored in the frame buffer. After the scene description has completely passed through the graphics pipeline, the resulting raster image contained in the frame buffer can be displayed on the screen. Different hardware architectures ranging from expensive high-end workstations to consumer PC graphics boards provide different implementations of this graphics pipeline. Thus, consistent access to multiple hardware architectures requires a level of abstraction that is provided by an additional software layer called application programming interface (API). In these course notes, we are using OpenGL as the graphics API. Details on the standard OpenGL rendering pipeline can be found in [39, 28]. 3.2 Consumer PC Graphics Hardware In this section, we briefly discuss the consumer graphics chips that we are using for highquality volume rendering, and most of the algorithms discussed in later sections are built upon. The following sections discuss important features of these architectures in detail. At the time of this writing (spring 2002), the two most important vendors of programmable consumer graphics hardware are NVIDIA and ATI. The current state-of-the-art consumer graphics chips are the NVIDIA GeForce 4, and the ATI Radeon 8500. 25 3.2.1 NVIDIA In late 1999, the GeForce 256 introduced hardware-accelerated geometry processing to the consumer marketplace. Before this, transformation and projection was either done by the OpenGL driver, or even by the application itself. The first GeForce also offered a flexible mechanism for fragment shading, i.e., the register combiners OpenGL extension (GL NV register combiners). The focus on programmable fragment shading was even more pronounced during introduction of the GeForce 2 in early 2000, although it brought no major architectural changes from a programmer’s point of view. On the first two GeForce architectures it was possible to use two textures simultaneously in a single pass (multi-texturing). Usual boards had 32MB of on-board RAM, although GeForce 2 configurations with 64MB were also available. The next major architectural step came with the introduction of the GeForce 3 in early 2001. Moving away from a fixed-function pipeline for geometry processing, the GeForce 3 introduced vertex programs, which allow the programmer to write custom assembly language code operating on vertices. The number of simultaneous textures was increased to four, the register combiners capabilities were improved (GL NV register combiners2), and the introduction of texture shaders (GL NV texture shader) introduced dependent texturing on a consumer graphics platform for the first time. Additionally, the GeForce 3 also supports 3D textures (GL NV texture shader2) in hardware. Usual GeForce 3 configurations have 64MB of on-board RAM, although boards with 128MB are also available. The GeForce 4, introduced in early 2002, extends the modes for dependent texturing (GL NV texture shader3), offers point sprites, hardware occlusion culling support, and flexible support for rendering directly into a texture (the latter also being possible on a GeForce 3 with the OpenGL drivers released at the time of the GeForce 4). The standard amount of on-board RAM of GeForce 4 boards is 128MB, which is also the maximum amount supported by the chip itself. The NVIDIA feature set most relevant in these course notes is the one offered by the GeForce 3, although the GeForce 4 is able to execute it much faster. 3.2.2 ATI In mid-2000, the Radeon was the first consumer graphics hardware to support 3D textures natively. For multi-texturing, it was able to use three 2D textures, or one 2D and one 3D texture simultaneously. However, fragment shading capabilities were constrained to a few extensions of the standard OpenGL texture environment. The usual on-board configuration was 32MB of RAM. The Radeon 8500, introduced in mid-2001, was a huge leap ahead of the original Radeon, especially with respect to fragment programmability (GL ATI fragment shader), which offers a unified model for texture fetching (including flexible dependent textures), and color combination. This architecture also supports programmable vertex operations (GL EXT vertex shader), and six simultaneous textures with full functionality, i.e., even six 3D textures can be used in a single pass. The fragment shading capabilities of the 26 Radeon 8500 are exposed via an assembly-language level interface, and very easy to use. Rendering directly into a texture is also supported. On-board memory of Radeon 8500 boards usually is either 64MB, or 128MB. A minor drawback of Radeon OpenGL drivers (for both architectures) is that paletted textures (GL EXT paletted texture, GL EXT shared texture palette) are not supported, which otherwise provide a nice fallback for volume rendering when postclassification via dependent textures is not used, and downloading a full RGBA volume instead of a single-channel volume is not desired due to the memory overhead incurred. 3.3 Fragment Shading Building on the general discussion of section 3.1.2, this and the following two sections are devoted to a more detailed discussion of the fragment shading stage of the graphics pipeline, which of all the pipeline stages is the most important one for building a consumer hardware volume renderer. Although in section 3.1.2, texture fetch and fragment shading are still shown as two separate stages, we will now discuss texture fetching as part of overall fragment shading. The major reason for this is that consumer graphics hardware is rapidly moving toward a unified model, where a texture fetch is just another way of coloring fragments, in addition to performing other color operations. While on the GeForce architecture the two stages are still conceptually separate (at least under OpenGL, i.e., via the texture shader and register combiners extensions), the Radeon 8500 has already dropped this distinction entirely, and exports the corresponding functionality through a single OpenGL extension, which is actually called fragment shader. The terminology related to fragment shading and the corresponding stages of the graphics pipeline has only begun to change after the introduction of the first highlyprogrammable graphics hardware architecture, i.e., the NVIDIA GeForce family. Before this, fragment shading was so simple that no general name for the corresponding operations was used. The traditional OpenGL model assumes a linearly interpolated primary color (the diffuse color) to be fed into the first texture unit, and subsequent units (if at all supported) to take their input from the immediately preceding unit. Optionally, after all the texture units, a second linearly interpolated color (the specular color) can be added in the color sum stage (if supported), followed by application of fog. The shading pipeline just outlined is commonly known as the traditional OpenGL multi-texturing pipeline. 3.3.1 Traditional OpenGL Multi-Texturing Before the advent of programmable fragment shading (see below), the prevalent model for shading fragments was the traditional OpenGL multi-texturing pipeline, which is depicted in figure 3.5. The primary (or diffuse) color, which has been specified at the vertices and linearly interpolated over the interior of a triangle by the rasterizer, is the intial color input to the 27 C in Texture Env. C out C in Texture Env.0 Tin Tin Texture M ap Texture U nit Texture Env.1 Tin Texture Env.2 C out Tin Texture M ap 0 Texture M ap1 Texture M ap 2 Texture U nit0 Texture U nit1 Texture U nit2 Figure 3.5: The traditional OpenGL multi-texturing pipeline. Conceptually identical texture units (left) are cascaded up to the number of supported units (right). pipeline. The pipeline itself consists of several texture units (corresponding to the maximum number of units supported and the number of enabled textures), each of which has exactly one external input (the color from the immediately preceding unit, or the initial fragment color in the case of unit zero), and one internal input (the color sampled from the corresponding texture). The texture environment of each unit (specified via glTexEnv*()) determines how the external and the internal color are combined. The combined color is then routed on to the next unit. If the unit was the last one, a second linearly interpolated color can be added in a color sum stage (if GL EXT separate specular color is supported), followed by optional fog application. The output of this cascade of texture units and the color sum and fog stage becomes the shaded fragment color, i.e., the output of the “fragment shader.” Standard OpenGL supports only very simple texture environments, i.e., modes of color combination, such as multiplication and blending. For this reason, several extensions have been introduced that add more powerful operations. For example, dot-product computation via GL EXT texture env dot3 (see section 3.6). 3.3.2 Programmable Fragment Shading Although entirely sufficient only a few years ago, the OpenGL multi-texturing pipeline has a lot of drawbacks, is very inflexible, and cannot accommodate the capabilities of today’s consumer graphics hardware. Most of all, colors cannot be routed arbitrarily, but are forced to be applied in a fixed order, and the number of available color combination operations is very limited. Furthermore, the color combination not only depends on the setting of the corresponding texture environment, but also on the internal format of the texture itself, which prevents using the same texture for radically different purposes, especially with respect to treating the RGB and alpha channels separately. For these and other reasons, fragment shading is currently in the process of becoming programmable in its entirety. Starting with the original NVIDIA register combiners, which are comprised of a register-based execution model and programmable input and output routing and operations, the current trend is toward writing a fragment shader in 28 R A STER IZATIO N Polygon R asterization Fragm ent Shading Texture Fetch R egister C om biners P R IM ITIV E S FR A G M E N TS Figure 3.6: The register combiners unit bypasses the standard fragment shading stage (excluding texture fetch) of the graphics pipeline. See also figure 3.3. an assembly language that is downloaded to the graphics hardware and executed for each fragment. The major problem of the current situation with respect to writing fragment shaders is that under OpenGL they are exposed via different (and highly incompatible) vendorspecific extensions. Thus, even this flexible, but still rather low-level, model of using an assembly language for writing these shaders, will be substituted by a shading language similar to the C programming language in the upcoming OpenGL 2.0 [38]. 3.4 NVIDIA Fragment Shading The NVIDIA model for programmable fragment shading currently consists of a two-stage model that is comprised of the distinct stages of texture shaders and register combiners. Texture shaders are the interface for programmable texture fetch operations, whereas register combiners can be used to read colors from a register file, perform color combination operations, and store the result back to the register file. A final combiner stage generates the fragment output, which is passed on to the fragment testing and alpha blending stage, and finally into the frame buffer. The texture registers of the register combiners register file are initialized by a texture shader before the register combiners stage is executed. Therefore, the result of color computations cannot be used in a dependent texture fetch. Dependent texturing on NVIDIA chips is exposed via a set of fixed-function texture shader operations. 3.4.1 Texture Shaders The texture shader interface is exposed through three OpenGL extensions: GL NV texture shader, GL NV texture shader2, and GL NV texture shader3, the latter of which is only supported on GeForce 4 cards. 29 Analogously to the traditional OpenGL texture environments, each texture unit is assigned a texture shader, which determines the texture fetch operation executed by this unit. On GeForce 3 chips, one of 23 pre-defined texture shader programs can be selected for each texture shader, whereas the GeForce 4 offers 37 different such programs. An example for one of these texture shader programs would be dependent alpha-red texturing, where the texture unit for which it is selected takes the alpha and red outputs from a previous texture unit, and uses these as 2D texture coordinates, thus performing a dependent texture fetch, i.e., a texture fetch operation that depends on the outcome of a fetch executed by another unit. The major drawback of the texture shaders model is specifically that it requires to use one of several fixed-function programs, instead of allowing arbitrary programmability. 3.4.2 Register Combiners After all texture fetch operations have been executed (either by standard OpenGL texturing, or using texture shaders), the register combiners mechanism can be used for flexible color combination operations, employing a register-based execution model. The register combiners interface is exposed through two OpenGL extensions: GL NV register combiners, and GL NV register combiners2. Using the terminology of section 3.1.2, the standard fragment shading stage (excluding texture fetch) is bypassed by a register combiners unit, as illustrated in figure 3.6. This is in contrast to the traditional model of figure 3.3. The three fundamental building blocks of the register combiners model are the register FinalR egisterC om biner inputregisters RG B A input m ap input m ap E F texture 0 texture 1 EF texture n spare 0 + secondary color prim ary color secondary color spare 0 input m ap input m ap input m ap input m ap input m ap A B C D G spare 1 constantcolor0 constantcolor1 A B + (1-A)C + D fragm entR G B out fog zero fragm entA lpha out G notreadable com putations Figure 3.7: Register combiners final combiner stage. 30 file, the general combiner stage (figure 3.8), and the final combiner stage (figure 3.7). All stages operate on a single register file, which can be seen on the left-hand side of figure 3.7. Color combination operations are executed by a series of general combiner stages, reading colors from the register file, executing specified operations, and storing back into the register file. The input for the next general combiner is the register file as it has been modified by the previous stage. The operations that can be executed are component-wise G eneralR egisterC om biner,R G B Portion inputregisters outputregisters RG B A texture 0 input m ap input m ap input m ap input m ap RG B A texture 0 texture 1 texture 1 A texture n B C D texture n AB +C D -orA B m ux C D prim ary color prim ary color secondary color secondary color spare 0 AB -orA B spare 1 constantcolor0 scale spare 0 and spare 1 bias constantcolor0 constantcolor1 constantcolor1 C D -orC D fog zero fog zero notreadable notw ritable com putations G eneralR egisterC om biner,Alpha Portion inputregisters RG B texture 0 outputregisters A input m ap input m ap input m ap input m ap RG B A texture 0 texture 1 texture 1 A texture n B C D texture n AB +C D -orA B m ux C D prim ary color prim ary color secondary color secondary color spare 0 scale spare 0 spare 1 and spare 1 bias constantcolor0 AB constantcolor0 constantcolor1 constantcolor1 fog fog C D zero zero notreadable notw ritable com putations Figure 3.8: Register combiners general combiner stage. 31 multiplication, three-component dot-product, and multiplexing two inputs, i.e., conditionally selecting one of them, depending on the alpha component of a specific register. Since the introduction of the GeForce 3, eight such general combiner stages are available, whereas on older architectures just two such stages were supported. Also, since the GeForce 3, four texture register are available, as opposed to just two. After all enabled general combiner stages have been executed, a single final combiner stage generates the final fragment color, which is then passed on to fragment tests and alpha blending. 3.5 ATI Fragment Shading In contrast to the NVIDIA approach, fragment shading on the Radeon 8500 uses a unified model that subsumes both texture fetch and color combination operations in a single fragment shader. The fragment shader interface is exposed through a single OpenGL extension: GL ATI fragment shader. In order to facilitate flexible dependent texturing operations, colors and texture coordinates are conceptually identical, although colors are represented with significantly less precision and range. Still, fetching a texture can easily be done using a register, or the interpolated texture coordinates of a specified texture unit. On the Radeon 8500, the register file used by a fragment shader contains six RGBA registers (GL REG 0 ATI to GL REG 5 ATI), corresponding to this architecture’s six texture units. Furthermore, two interpolated colors, and eight constant RGBA registers (GL CON 0 ATI to GL CON 7 ATI) can be used to provide additional color input to a fragment shader. The execution model consists of this register file and eleven different instructions (note that all registers consist of four components, and thus all instructions in principle take all of them into account, e.g., the MUL instruction actually performs four simultaneous multiplications): • MOV: Moves one register into another. • ADD: Adds one register to another and stores the result in a third register. • SUB: Subtracts one register from another and stores the result in a third register. • MUL: Multiplies two registers component-wise and stores the result in a third register. • MAD: Multiplies two registers component-wise, adds a third, and stores the result in a fourth register. • LERP: Performs linear interpolation between two registers, getting interpolation weights from a third, and stores the result in a fourth register. • DOT3: Performs a three-component dot-product, and stores the replicated result in a third register. 32 • DOT4: Performs a four-component dot-product, and stores the replicated result in a third register. • DOT2 ADD: The same as DOT3, however the third component is assumed to be 1.0 and therefore not actually multiplied. • CND: Moves one of two registers into a third, depending on whether the corresponding component in a fourth register is greater than 0.5. • CND0: The same as CND, but the conditional is a comparison with 0.0. The components of input registers to each of these instructions can be replicated, and the output can be masked for each component, which allows for flexible routing of color components. Scaling, bias, negation, complementation, and saturation (clamp against 0.0) are also supported. Furthermore, instructions are issued separately for RGB and alpha components, although a single pair of RGB and alpha instructions counts as a single instruction. An actual fragment shader consists of up to two times eight such instructions, where up to eight instructions are allowed in one of two stages. The first stage is only able to execute texture fetch operations using interpolated coordinates, whereas the second stage can use registers computed in the preceding stage as texture coordinates, thus allowing dependent fetch operations. These two stages allow for a single “loop-back,” i.e., routing color components into texture coordinates once. Hence only a single level of dependent fetches is possible. Fragment shaders are specified similarly to OpenGL texture objects. They are only specified once, and then reused as often as needed by simply binding a shader referenced by a unique integer id. Instructions are added (in order) to a fragment shader by using one OpenGL function call for the specification of a single instruction, after the initial creation of the shader. In general, it can be said that the ATI fragment shader model is much easier to use than the NVIDIA extensions providing similar functionality, and also offers more flexibility with regard to dependent texture fetches. However, both models allow specific operations that the other is not able to do. 3.6 Other OpenGL Extensions This section briefly summarizes additional OpenGL extensions that are useful for hardwareaccelerated volume rendering and that will be used in later chapters. 3.6.1 GL EXT blend minmax This extension augments the OpenGL alpha blending capabilities by minimum (GL MIN EXT) and maximum (GL MAX EXT) operators, which can be activated via the glBlendEquationEXT() function. When one of these special alpha blending modes is 33 used, a fragment is combined with the previous contents of the frame buffer by taking the minimum, or the maximum value, respectively. For volume rendering, this capability is needed for maximum intensity projection (MIP), where pixels are set to the maximum density along a “ray.” 3.6.2 GL EXT texture env dot3 Although more flexible and powerful functionality is exposed by the register combiners and fragment shader extensions, it is not always desired to incur the development overhead of specifying a full register combiner setup or fragment shader, when only a simple perfragment dot-product is needed. This extension extends the modes that can be used in the texture environment for combining the incoming color with the texture color by a simple three-component dot-product. This functionality is used in chapter 8. 3.6.3 GL EXT paletted texture, GL EXT shared texture palette In hardware-accelerated volume rendering, the volume itself is usually stored in texture maps with only a single channel. Basically, there are two OpenGL texture formats that are used for these textures, both of which consume one byte per voxel. First, a volume can be stored in intensity textures (GL LUMINANCE as external, GL INTENSITY8 as internal format). In this case, each voxel contains the original density values, which are subsequently mapped to RGBA values by post-classification (see chapter 17), or pre-integration (see chapter 19), for example. Second, a volume can be used for rendering with pre-classification (see chapter 17). In this case, storing the volume in an RGBA texture (GL RGBA as external, GL RGBA8 as internal format) would be possible. However, this consumes four times the texture memory that is actually necessary, since the mapping from density to RGBA can easily be performed by the hardware itself. In order to make this possible, paletted textures need to be supported via GL EXT paletted texture. Using this extension, a texture is stored as 8-bit indexes into a color palette (GL COLOR INDEX as external, GL COLOR INDEX8 as internal format). The palette itself consists of 256 entries of four bytes per entry (for RGBA). The GL EXT paletted texture extension by itself needs a single palette for each individual texture, which must be downloaded via a glColorTableEXT() function call. However, in volume rendering with 2D slices (see section 5.2), all slice textures actually use the same palette. In order to share a single palette among multiple textures and download it only once, the GL EXT shared texture palette extension can be used. Using this extension, only a single palette need be downloaded with a glColorTableEXT() function call in conjunction with the GL SHARED TEXTURE PALETTE EXT parameter. 34 Acknowledgments I would like to express a very special thank you to Christof Rezk-Salama for the diagrams and figures in this chapter, as well as most of section 3.1. Robert Kosara and Michael Kalkusch provided valuable comments and proof-reading. Thanks are also due to the VRVis Research Center for supporting the preparation of these course notes in the context of the basic research on visualization (http://www.VRVis.at/vis/). The VRVis Research Center is funded by an Austrian research program called K plus. High-Quality Volume Graphics on Consumer PC Hardware Texture-Based Methods Klaus Engel Course Notes 42 Markus Hadwiger Joe M. Kniss Christof Rezk-Salama 36 Sampling a Volume Via Texture Mapping As illustrated in the introduction to these course notes, the most fundamental operation in volume rendering is sampling the volumetric data (section 2.2). Since these data are already discrete, the sampling task performed during rendering is actually a resampling task, i.e., resampling sampled volume data from one set of discrete locations to another. In order to render a high-quality image of the entire volume, these resampling locations have to be chosen carefully, followed by mapping the obtained values to optical properties, such as color and opacity, and compositing them in either back-to-front or front-to-back order. Ray-casting is probably the simplest approach for accomplishing this task (section 2.3.3). Because it casts rays from the eye through image plane pixels back into the volume, ray-casting is usually called an image-order approach. That is, each ray is cast into the volume, which is then resampled at – usually equispaced – intervals along that ray. The values obtained via resampling are mapped to color and opacity, and composited in order along the ray (from the eye into the volume, or from behind the volume toward the eye) via alpha blending (section 2.3.4). Texture mapping operations basically perform a similar task, i.e., resampling a discrete grid of texels to obtain texture values at locations that do not coincide with the original grid. Thus, texture mapping in many ways is an ideal candidate for performing repetitive resampling tasks. Compositing individual samples can easily be done by exploiting hardware alpha blending (section 3.1.3). The major question with regard to hardwareaccelerated volume rendering is how to achieve the same – or a sufficiently similar – result as compositing samples taken along a ray cast into the volume. Figure 5.1: Rendering a volume by compositing a stack of 2D texture-mapped slices in back-to-front order. If the number of slices is too low, they become visible as artifacts. 37 The major way in which hardware texture mapping can be applied to volume rendering is to use an object-order approach, instead of the image-order approach of ray-casting. The resampling locations are generated by rendering proxy geometry with interpolated texture coordinates (usually comprised of slices rendered as texture-mapped quads), and compositing all the parts (slices) of this proxy geometry from back to front via alpha blending. The volume data itself is stored in one to several textures of two or three dimensions, respectively. For example, if only a density volume is required, it can be stored in a single 3D texture, where a single texel corresponds to a single voxel. Alternatively, volume data can be stored in a stack of 2D textures, each of which corresponds to an axis-aligned slice through the volume. By rendering geometry mapped with these textures, the original volume can be sampled at specific locations, blending the generated fragments with the previous contents of the frame buffer. Such an approach is called object-order, because the algorithm does not iterate over individual pixels of the image plane, but over parts of the “object,” i.e., the volume itself. That is, these parts are usually constituted by slices through the volume, and the final result for each pixel is only available after all slices contributing to this pixel have been processed. 5.1 Proxy Geometry In all approaches rendering volumetric data directly, i.e., without any geometry that has been extracted along certain features (e.g., polygons corresponding to an iso-surface, generated by a variant of the marching cubes algorithm [25]), there exists no geometry at all, at least not per se. However, geometry is the only thing graphics hardware with standard texture mapping capabilities is actually able to render. In this sense, all the fragments and Polygon Slices 2D Textures FinalIm age Figure 5.2: Object-aligned slices used as proxy geometry with 2D texture mapping. 38 ultimately pixels rendered by graphics hardware are generated by rasterizing geometric primitives, in most cases triangles. That is, sampling a texture has to take place in the interior of such primitives specified by their vertices. When we think about the three-dimensional scalar field that constitutes our volume data, we can imagine placing geometry in this field. When this geometry is rendered, several attributes like texture coordinates are interpolated over the interior of primitives, and each fragment generated is assigned its corresponding set of texture coordinates. Subsequently, these coordinates can be used for resampling one or several textures at the corresponding locations. If we assign texture coordinates that correspond to the coordinates in the scalar field, and store the field itself in a texture map (or several texture maps), we can sample the field at arbitrary locations, as long as these are obtained from interpolated texture coordinates. The collective geometry used for obtaining all resampling locations needed for sampling the entire volume is commonly called proxy geometry, since it has no inherent relation to the data contained in the volume itself, and exists solely for the purpose of generating resampling locations, and subsequently sampling texture maps at these locations. The conceptually simplest example of proxy geometry is a set of view-aligned slices (quads that are parallel to the viewport, usually also clipped against the bounding box of the volume, see figure 5.3), with 3D texture coordinates that are interpolated over the interior of these slices, and ultimately used to sample a single 3D texture map at the corresponding locations. However, 3D texture mapping is not supported by all of the graphics hardware we are targeting, and even on hardware that does support it, 3D textures incur a performance penalty in comparison to 2D textures. This penalty is mostly due to the tri-linear interpolation used when sampling a 3D texture map, as opposed to bi-linear interpolation for sampling a 2D texture map. Polygon Slices 3D Texture FinalIm age Figure 5.3: View-aligned slices used as proxy geometry with 3D texture mapping. 39 A B im age plane im age plane C D im age plane E im age plane im age plane Figure 5.4: Switching the slice stack of object-aligned slices according to the viewing direction. Between image (C) and (D) the slice stack used for rendering has been switched. One of the most important things to remember about proxy geometry is that it is intimately related to the kind of texture mapping (2D or 3D) used. When the orientation of slices with respect to the original volume data (i.e., the texture) can be arbitrary, 3D texture mapping is mandatory, since a single slice would have to fetch data from several different 2D textures. If, however, the proxy geometry is aligned with the original volume data, texture fetch operations for a single slice can be guaranteed to stay within the same 2D texture. In this case, the proxy geometry is comprised of a set of object-aligned slices (see figure 5.2), for which 2D texture mapping capabilities suffice. The following sections describe different kinds of proxy geometry and the corresponding resampling approaches in more detail. 5.2 2D-Textured Object-Aligned Slices If only 2D texture mapping capabilities are used, the volume data must be stored in several two-dimensional texture maps. A major implication of the use of 2D textures is that the hardware is only able to resample two-dimensional subsets of the original volumetric data. The proxy geometry in this case is a stack of planar slices, all of which are required to be aligned with one of the major axes of the volume (either the x, y, or z axis), mapped with 2D textures, which in turn are resampled by the hardware-native bi-linear interpolation [8]. The reason for the requirement that slices be aligned with a major axis is that each time a slice is rendered, only two dimensions are available for texture coordinates, and the third coordinate must therefore be constant. Also, bi-linear interpolation would not be sufficient for resampling otherwise. Now, instead of being used as an actual texture coordinate, the third coordinate selects the texture to use from the stack of slices, and the other two coordinates become the actual 2D texture coordinates used for rendering the slice. Rendering proceeds from back to front, blending one slice on top of the other (see figure 5.2). Although a single stack of 2D slices can store the entire volume, one slice stack does not 40 A B C Figure 5.5: The location of sampling points changes abruptly (C), when switching from one slice stack (A), to the next (B). suffice for rendering. When the viewpoint is rotated about the object, it would be possible to see between individual slices, which cannot be prevented with only one slice stack. The solution for this problem is to actually store three slice stacks, one for each of the major axes. During rendering, the stack with slices most parallel to the viewing direction is chosen (see figure 5.4). Under-sampling typically occurs most visibly along the major axis of the slice stack currently in use, which can be seen in figure 5.1. Additional artifacts become visible when the slice stack in use is switched from one stack to the next. The reason for this is that the actual locations of sampling points change abruptly when the stacks are switched, which is illustrated in figure 5.5. To summarize, an obvious drawback of using object-aligned 2D slices is the requirement for three slice stacks, which consume three times the texture memory a single 3D texture would consume. When choosing a stack for rendering, an additional consideration must also be taken into account: After selecting the slice stack, it must be rendered in one of two directions, in order to guarantee actual back-to-front rendering. That is, if a stack is viewed from the back (with respect to the stack itself), it has to be rendered in reversed order, to achieve the desired result. The following code fragment (continued on the next page) shows how both of these decisions, depending on the current viewing direction with respect to the volume, could be implemented: GLfloat model view matrix[16]; GLfloat model view rotation matrix[16]; // obtain the current viewing transformation from the OpenGL state glGet( GL MODELVIEW MATRIX, model view matrix ); // extract the rotation from the matrix GetRotation( model view matrix, model view rotation matrix ); // rotate the initial viewing direction GLfloat view vector[3] = {0.0f, 0.0f, -1.0f}; MatVecMultiply( model view rotation matrix, view vector ); 41 // find the largest absolute vector component int max component = FindAbsMaximum( view vector ); // render slice stack according to viewing direction switch ( max component ) { case X: if ( view vector[X] > 0.0f ) DrawSliceStack PositiveX(); else DrawSliceStack NegativeX(); break; case Y: if ( view vector[Y] > 0.0f ) DrawSliceStack PositiveY(); else DrawSliceStack NegativeY(); break; case Z: if ( view vector[Z] > 0.0f ) DrawSliceStack PositiveZ(); else DrawSliceStack NegativeZ(); break; } Opacity Correction In hardware-accelerated texture-based volume rendering, hardware alpha blending is used to achieve the same effect as compositing samples along a ray in ray-casting. This alpha blending operation actually is a method for performing a numerical integration of the volume rendering integral (section 2.3.4). The distance between successive resampling locations along a “ray,” i.e., the distance at which the integral is approximated by a summation, most of all depends on the distance between adjacent slices. The sampling distance is easiest to account for if it is constant for all “rays” (i.e., pixels). In this case, it can be incorporated into the numerical integration in a preprocess, d0 d1 d2 d3 d4 Figure 5.6: The distance between adjacent sampling points depends on the viewing angle. 42 which is usually done by simply adjusting the transfer function lookup table accordingly. In the case of 3D-textured slices (and orthogonal projection), the slice distance is equal to the sampling distance, which is also equal for all “rays” (i.e., pixels). Thus, it can be accounted for in a preprocess. When 2D-textured slices are used, however, the distance between successive samples for each pixel not only depends on the slice distance, but also on the viewing direction. This is shown in figure 5.6 for two adjacent slices. The sampling distance is only equal to the slice distance when the stack is viewed perpendicularly to its major axis. When the view is rotated, the sampling distance increases. For this reason, the lookup table for numerical integration (the transfer function table, see chapter 17) has to be updated on each change of the viewing direction. The correction of the transfer function in order to account for the viewing direction is usually done in an approximate manner, by simply multiplying the stored opacities by the reciprocal of the cosine between the viewing vector and the stack direction vector: // determine cosine via dot-product; vectors must be normalized! float correction cosine = DotProduct3( view vector, stack vector ); // determine correction factor float opacity correction factor = ( correction cosine != 0.0f ) ? 1.0f/correction cosine : 1.0f; Note that although this correction factor is used for correcting opacity values, it must also be applied to the respective RGB colors, if these are stored as opacity-weighted colors, which usually is the case [42]. Discussion The biggest advantage of using object-aligned slices and 2D textures for volume rendering is that 2D textures and the corresponding bi-linear interpolation are a standard feature of all 3D graphics hardware architectures, and therefore this approach can practically be implemented anywhere. Also, the rendering performance is extremely high, since bilinear interpolation requires only a lookup and weighting of four texels for each resampling 2D-Textured Object-Aligned Slices Pros Cons ⊕ very high performance ⊕ high availability high memory requirements bi-linear interpolation only sampling and switching artifacts inconsistent sampling rate Table 5.1: Summary of volume rendering with object-aligned slices and 2D textures. 43 general com biner 0 inputregisters RG B slice i A IN VER T B A Alpha Portion interpolated color B AB + CD slice i+1 texture 1 gradient intensity C D interpolation factor outputregister A texture 0 gradient intensity final com biner C R G B Portion interpolated alpha AB+ (1-A )C +D RG B A fragm ent D G constcol0 Figure 5.7: Register combiners setup for interpolation of intermediate slices. operation. The major disadvantages of this approach are the high memory requirements, due to the three slice stacks that are required, and the restriction to using two-dimensional, i.e., usually bi-linear, interpolation for texture reconstruction. The use of object-aligned slice stacks also leads to sampling and stack switching artifacts, as well as inconsistent sampling rates for different viewing directions. A brief summary is contained in table 5.1. 5.3 2D Slice Interpolation Figure 5.1 shows a fundamental problem of using 2D texture-mapped slices as proxy geometry for volume rendering. In contrast to view-aligned 3D texture-mapped slices (section 5.4), the number of slices cannot be changed easily, because each slice corresponds to exactly one slice from the slice stack. Furthermore, no interpolation between slices is performed at all, since only bi-linear interpolation is used within each slice. Because of these two properties of that algorithm, artifacts can become visible when there are too few slices, and thus the sampling frequency is too low with respect to frequencies contained in the volume and the transfer function. In order to increase the sampling frequency without enlarging the volume itself (e.g., by generating additional interpolated slices before downloading them to the graphics hardware), inter-slice interpolation has to be performed on-the-fly by the graphics hardware itself. On the consumer hardware we are targeting (i.e., NVIDIA GeForce or later, and ATI Radeon 8500 or later), this can be achieved by using two simultaneous textures when rendering a single slice, instead of just one texture, and performing linear interpolation between these two textures [36]. In order to do this, we have to specify fractional slice positions, where the integers correspond to slices that actually exist in the source slice stack, and the fractional part determines the position between two adjacent slices. The number of rendered slices is now independent from the number of slices contained in the volume, and can be adjusted arbitrarily. For each slice to be rendered, two textures are activated, which correspond to the two 44 neighboring original slices from the source slice stack. The fractional position between these slices is used as weight for the inter-slice interpolation. This method actually performs trilinear interpolation within the volume. Standard bi-linear interpolation is employed for each of the two neighboring slices, and the interpolation between the two obtained results altogether achieves tri-linear interpolation. A register combiners setup for on-the-fly interpolation of intermediate slices can be seen in figure 5.7. The two source slices that enclose the position of the slice to be rendered are configured as texture 0 and texture 1, respectively. An interpolation value between 0.0 (corresponding to slice 0) and 1.0 (corresponding to slice 1), determines the weight for linear interpolation between these two textures, and is stored in a constant color register. The final fragment contains the linearly interpolated result corresponding to the specified fractional slice position. Discussion The biggest advantage of using object-aligned slices together with on-the-fly interpolation between two 2D textures for volume rendering is that this method combines the advantages of using only 2D textures with the capability of arbitrarily controlling the sampling rate, i.e., the number of slices. Although not entirely comparable to tri-linear interpolation in a 3D texture, the combination of bi-linear interpolation and a second linear interpolation step ultimately allows tri-linear interpolation in the volume. The necessary features of consumer hardware, i.e., multi-texturing with at least two simultaneous textures, and the ability to interpolate between them, are widely available on consumer graphics hardware. Disadvantages inherent to the use of object-aligned slice stacks still apply, though. For example, the undesired visible effects when switching slice stacks, and the memory consumption of the three slice stacks. A brief summary is contained in table 5.2. 5.4 3D-Textured View-Aligned Slices In many respects, 3D-textured view-aligned slices are the simplest kind of proxy geometry (see figure 5.3). In this case, the volume is stored in a single 3D texture, and 3D texture coordinates are interpolated over the interior of proxy geometry polygons. These texture 2D Slice Interpolation Pros Cons ⊕ high performance ⊕ tri-linear interpolation ⊕ available on consumer hardware high memory requirements switching effects inconsistent sampling rate for perspective projection Table 5.2: Summary of 2D slice interpolation volume rendering. 45 A.ParallelProjection lane im age p im e lan p age eye ing vieways r ng wi s vie ray view ing rays g w in vie ays r e lan p ge im a B.Perspective projection lane im age p eye Figure 5.8: Sampling locations on view-aligned slices for parallel (A), and perspective projection (B), respectively. coordinates are then used directly for indexing the 3D texture map at the corresponding location, and thus resampling the volume. The big advantage of 3D texture mapping is that it allows slices to be oriented arbitrarily with respect to the 3D texture domain, i.e., the volume itself. Thus, it is natural to use slices aligned with the viewport, since such slices closely mimic the ray-casting algorithm. They offer constant distance between samples for orthogonal projection and all viewing directions, see figure 5.8(A). Since the graphics hardware is already performing completely general tri-linear interpolation within the volume for each resampling location, proxy slices are not bound to original slices at all. Thus, the number of slices can easily be adjusted on-the-fly and without any restrictions, or the need for separately configuring inter-slice interpolation. In the case of perspective projection, the distance between successive samples is different for adjacent pixels, however, which is depicted in figure 5.8(B). If the artifacts caused by a not entirely accurate compensation for sampling distance is deemed noticeable, spherical shells (section 5.5) can be employed instead of planar slices. 3D-Textured View-Aligned Slices Pros Cons ⊕ high performance ⊕ tri-linear interpolation availability still limited inconsistent sampling rate for perspective projection Table 5.3: Summary of 3D-texture based volume rendering. 46 Discussion The biggest advantage of using view-aligned slices and 3D textures for volume rendering is that tri-linear interpolation can be employed for resampling the volume at arbitrary locations. Apart from better image quality than with using bi-linear interpolation, this allows to render slices with arbitrary orientation with respect to the volume, which makes it possible to maintain a constant sampling rate for all pixels and viewing directions. Additionally, a single 3D texture suffices for storing the entire volume. The major disadvantage of this approach is that it requires hardware-native support for 3D textures, which is not yet widely available, and tri-linear interpolation is also significantly slower than bi-linear interpolation, due to the requirement for using eight texels for every single output sample, and texture fetch patterns that decrease the efficiency of texture caches. A brief summary is contained in table 5.3. 5.5 3D-Textured Spherical Shells All types of proxy geometry that use planar slices (irrespective of whether they are objectaligned, or view-aligned), share the basic problem that the distance between successive samples used to determine the color of a single pixel is different from one pixel to the next in the case of perspective projection. This fact is illustrated in figure 5.8(B). When incorporating the sampling distance in the numerical approximation of the volume rendering integral, this pixel-to-pixel difference cannot easily be accounted for. One solution to this problem is to use spherical shells instead of planar slices [23]. In order to attain a constant sampling distance for all pixels, the proxy geometry has to be spherical, i.e., be comprised of concentric spheres, or parts thereof. In practice, these shells are generated by clipping tessellated spheres against both the viewing frustum and the bounding box of the volume data. The major drawback of using spherical shells as proxy geometry is that they are more complicated to setup than planar slice stacks, and they also require more geometry to be rendered, i.e., parts of tessellated spheres. This kind of proxy geometry is only useful when perspective projection is used, and can only be used in conjunction with 3D texture mapping. Furthermore, the artifacts of pixel-to-pixel differences in sampling distance are often hardly noticeable, and planar slice stacks usually suffice also when perspective projection is used. 5.6 Slices vs. Slabs An inherent problem of using slices as proxy geometry is that the number of slices directly determines the (re)sampling frequency, and thus the quality of the rendered result. Especially when high frequencies are contained in the employed transfer functions, the required number of slices can become very high. Thus, even though the number of slices can be 47 increased on-the-fly via interpolation done by the graphics hardware itself, the fill rate demands increase significantly. A very elegant solution to this problem is to use slabs instead of slices, together with pre-integrated classification [12], which is described in more detail in chapter 19. A slab is no new geometrical primitive, but simply the space between two adjacent slices. During rendering, this space is properly accounted for, instead of simply rendering infinitesimally thin slices, by looking up the pre-integrated result of the volume rendering integral from the back slice to the front slice in a lookup table, i.e., a texture. Geometrically, a slab can be rendered as a slice with its immediately neighboring slice (either in the back, or in front) projected onto it. For details on rendering with slabs instead of slices, we refer you to chapter 19. 48 Components of a Hardware Volume Renderer This chapter presents an overview of the major components of a texture-based hardware volume renderer from an implementation-centric point of view. The goal of this chapter is to convey a feeling of where the individual components of such a renderer fit in, and in what order they are executed, and leave the details up to later chapters. The component structure presented here is modeled after separate portions of code that can be found in an actual implementation of a volume renderer for consumer graphics cards. They are listed in the order in which they are executed by the application code, which is not the same as they are “executed” by the graphics hardware itself! 6.1 Volume Data Representation Volume data has to be stored in memory in a suitable format, usually already prepared for download to the graphics hardware as textures. Depending on the kind of proxy geometry used, the volume can either be stored in a single block, when view-aligned slices together with a single 3D texture are used, or split up into three stacks of 2D slices, when objectaligned slices together with multiple 2D textures are used. Usually, it is convenient to store the volume only in a single 3D array, which can be downloaded as a single 3D texture, and extract data for 2D textures on-the-fly, just as needed. Depending on the complexity of the rendering mode, classification, and illumination, there may even be several volumes containing all the information needed. Likewise, the actual storage format of voxels depends on the rendering mode and the type of volume, e.g., whether the volume stores densities, gradients, gradient magnitudes, and so on. Conceptually different volumes may also be combined into the same actual volume, if possible. For example, combining gradient and density data in RGBA voxels. Although it is often the case that the data representation issue is part of a preprocessing step, this is not necessarily so, since new data may have to be generated on-the-fly when the rendering mode or specific parameters are changed. This component is usually executed only once at startup, or only executed when the rendering mode changes. 6.2 Transfer Function Representation Transfer functions are usually represented by color lookup tables. They can be onedimensional or multi-dimensional, and are usually stored as simple arrays. This component is usually user-triggered, when the user is changing the transfer function. 49 6.3 Volume Textures In order for the graphics hardware to be able to access all the required volume information, the volume data must be downloaded and stored in textures. At this stage, a translation from data format (external texture format) to texture format (internal texture format) might take place, if the two are not identical. This component is usually executed only once at startup, or only executed when the rendering mode changes. How and what textures containing the actual volume data have to be downloaded to the graphics hardware depends on a number of factors, most of all the rendering mode and type of classification, and whether 2D or 3D textures are used. The following example code fragment downloads a single 3D texture for rendering with view-aligned slices. The internal format consists of 8-bit color indexes, and for this reason a color lookup table must also be downloaded subsequently (section 6.4): // bind 3D texture target glBindTexture( GL TEXTURE 3D, volume texture name 3d ); glTexParameteri( GL TEXTURE 3D, GL TEXTURE WRAP S, GL CLAMP ); glTexParameteri( GL TEXTURE 3D, GL TEXTURE WRAP T, GL CLAMP ); glTexParameteri( GL TEXTURE 3D, GL TEXTURE WRAP R, GL CLAMP ); glTexParameteri( GL TEXTURE 3D, GL TEXTURE MAG FILTER, GL LINEAR ); glTexParameteri( GL TEXTURE 3D, GL TEXTURE MIN FILTER, GL LINEAR ); // download 3D volume texture for pre-classification glTexImage3D( GL TEXTURE 3D, 0, GL COLOR INDEX8 EXT, size x, size y, size z, GL COLOR INDEX, GL UNSIGNED BYTE, volume data 3d ); When pre-classification is not used, an intensity volume texture is usually downloaded instead, shown here for a single 3D texture (texture target binding identical to above): // download 3D volume texture for post-classification/pre-integration glTexImage3D( GL TEXTURE 3D, 0, GL INTENSITY8, size x, size y, size z, GL LUMINANCE, GL UNSIGNED BYTE, volume data 3d ); If 2D textures are used instead of 3D textures, similar commands have to be used in order to download all the slices of all three slice stacks. 50 6.4 Transfer Function Tables Transfer functions may be downloaded to the hardware in basically one of two formats: In the case of pre-classification, transfer functions are downloaded as texture palettes for on-the-fly expansion of palette indexes to RGBA colors. If post-classification is used, transfer functions are downloaded as 1D, 2D, or even 3D textures (the latter two for multidimensional transfer functions). If pre-integration is used, the transfer function is only used to calculate a pre-integration table, but not downloaded to the hardware itself. Then, this pre-integration table is downloaded instead. This component might even not be used at all, which is the case when the transfer function has already been applied to the volume textures themselves, and they are already in RGBA format. This component is usually only executed when the transfer function or rendering mode changes. How and what transfer function tables have to be downloaded to the graphics hardware depends on the type of classification that is used. The following code fragment downloads a single texture palette that can be used in conjunction with an indexed volume texture for pre-classification. The same code can be used for rendering with either 2D, or 3D slices, respectively: // download color table for pre-classification glColorTableEXT( GL SHARED TEXTURE PALETTE EXT, GL RGBA8, 256 * 4, GL RGBA, GL UNSIGNED BYTE, opacity corrected palette ); If post-classification is used instead, the same transfer function table can be used, but it must be downloaded as a 1D texture instead of a texture palette: // bind 1D texture target glBindTexture( GL TEXTURE 1D, palette texture name ); glTexParameteri( GL TEXTURE 1D, GL TEXTURE WRAP S, GL CLAMP ); glTexParameteri( GL TEXTURE 1D, GL TEXTURE MAG FILTER, GL LINEAR ); glTexParameteri( GL TEXTURE 1D, GL TEXTURE MIN FILTER, GL LINEAR ); // download 1D transfer function texture for post-classification glTexImage1D( GL TEXTURE 1D, 0, GL RGBA8, 256 * 4, 0, GL RGBA, GL UNSIGNED BYTE, opacity corrected palette ); If pre-integration is used, a pre-integration texture is downloaded instead of the transfer function table itself (chapter 19). 51 6.5 Fragment Shader Configuration Before the volume can be rendered using a specific rendering mode, the fragment shader has to be configured accordingly. How textures are stored and what they contain is crucial for the fragment shader. Likewise, the format of the shaded fragment has to correspond to what is expected by the alpha blending stage (section 6.6). This component is usually executed once per frame, i.e., the entire volume can be rendered with the same fragment shader configuration. The code that determines the operation of the fragment shader is highly dependent on the actual hardware architecture used (section 3.3.2). The following code fragment roughly illustrates the sequence of operations on the GeForce architecture: // configure texture shaders glTexEnvi( GL TEXTURE SHADER NV, GL SHADER OPERATION NV, ... ... // enable texture shaders glEnable( GL TEXTURE SHADER NV ); // configure register combiners glCombinerParameteriNV( GL NUM GENERAL COMBINERS NV, 1 ); glCombinerInputNV( GL COMBINER0 NV, ... ); glCombinerOutputNV( GL COMBINER0 NV, ... ); glFinalCombinerInputNV( ... ); ... // enable register combiners glEnable( GL REGISTER COMBINERS NV ); ); A “similar” code fragment for configuring a fragment shader on the Radeon 8500 architecture could look like this: // configure fragment shader GLuint shader name = glGenFragmentShadersATI( 1 ); glBindFragmentShaderATI( shader name ); glBeginFragmentShaderATI(); glSampleMapATI( GL REG 0 ATI, GL TEXTURE0 ARB, GL SWIZZLE STR ATI ); glColorFragmentOp2ATI( GL MUL ATI, ... ); ... glEndFragmentShaderATI(); // enable fragment shader glEnable( GL FRAGMENT SHADER ATI ); 52 6.6 Blending Mode Configuration The blending mode determines how a fragment is combined with the corresponding pixel in the frame buffer. In addition to the configuration of alpha blending, we also configure alpha testing in this component, if it is needed for discarding fragments that do not correspond to the desired iso-surface. Although the alpha test and alpha blending are the last two steps that are actually executed by the graphics hardware in our volume rendering pipeline, they have to be configured before actually rendering any geometry. This configuration usually stays the same for an entire frame. This component is usually executed once per frame, i.e., the entire volume can be rendered with the same blending mode configuration. For direct volume rendering, the blending mode is more or less standard alpha blending. Since color values are usually pre-multiplied by the corresponding opacity (also known as opacity-weighted [42], or associated [7] colors), the factor for multiplication with the source color is one: // enable blending glEnable( GL BLEND ); // set blend function glBlendFunc( GL ONE, GL ONE MINUS SRC ALPHA ); For non-polygonal iso-surfaces, alpha testing has to be configured for selection of fragments corresponding to the desired iso-values. The comparison operator for comparing a fragment’s density value with the reference value is usually GL GREATER, or GL LESS, since using GL EQUAL is not well suited to producing a smooth surface appearance (not many interpolated density values are exactly equal to a given reference value). Alpha blending must be disabled for this rendering mode. More details about rendering non-polygonal iso-surfaces, especially with regard to illumination, can be found in chapter 8. // disable blending glDisable( GL BLEND ); // enable alpha testing glEnable( GL ALPHA TEST ); // configure alpha test function glAlphaFunc( GL GREATER, isovalue ); For maximum intensity projection, an alpha blending equation of GL MAX EXT must be supported, which is either a part of the imaging subset, or the separate GL EXT blend minmax extension. On consumer graphics hardware, querying for the latter extension is the best way to determine availability of the maximum operator. 53 // enable blending glEnable( GL BLEND ); // set blend function to identity (not really necessary) glBlendFunc( GL ONE, GL ONE ); // set blend equation to max glBlendEquationEXT( GL MAX EXT ); 6.7 Texture Unit Configuration The use of texture units corresponds to the inputs required by the fragment shader. Before rendering any geometry, the corresponding textures have to be bound. When 3D textures are used, the entire configuration of texture units usually stays the same for an entire frame. In the case of 2D textures, the textures that are bound change for each slice. This component is usually executed once per frame, or once per slice, depending on whether 3D, or 2D textures are used. The following code fragment shows an example for configuring two texture units for interpolation of two neighboring 2D slices from the z slice stack (section 5.3): // configure texture unit 1 glActiveTextureARB( GL TEXTURE1 ARB ); glBindTexture( GL TEXTURE 2D, volume texture names stack z[sliceid1]); glEnable( GL TEXTURE 2D ); // configure texture unit 0 glActiveTextureARB( GL TEXTURE0 ARB ); glBindTexture( GL TEXTURE 2D, volume texture names stack z[sliceid0]); glEnable( GL TEXTURE 2D ); 6.8 Proxy Geometry Rendering The last component of the execution sequence outlined in this chapter, is getting the graphics hardware to render geometry. This is what actually causes the generation of fragments to be shaded and blended into the frame buffer, after resampling the volume data accordingly. This component is executed once per slice, irrespective of whether 3D or 2D textures are used. Explicit texture coordinates are usually only specified when rendering 2D texturemapped, object-aligned slices. In the case of view-aligned slices, texture coordinates can easily be generated automatically, by exploiting OpenGL’s texture coordinate generation 54 mechanism, which has to be configured before the actual geometry is rendered: // configure texture coordinate generation for view-aligned slices float plane x[]= { 1.0f, 0.0f, 0.0f, 0.0f }; float plane y[]= { 0.0f, 1.0f, 0.0f, 0.0f }; float plane z[]= { 0.0f, 0.0f, 1.0f, 0.0f }; glTexGenfv( GL S, GL OBJECT PLANE, plane x ); glTexGenfv( GL T, GL OBJECT PLANE, plane y ); glTexGenfv( GL R, GL OBJECT PLANE, plane z ); glEnable( GL TEXTURE GEN S ); glEnable( GL TEXTURE GEN T ); glEnable( GL TEXTURE GEN R ); The following code fragment shows an example of rendering a single slice as an OpenGL quad. Texture coordinates are specified explicitly, since this code fragment is intended for rendering a slice from a stack of object-aligned slices with z as its major axis: // render a single slice as quad (four vertices) glBegin( GL QUADS ); glTexCoord2f( 0.0f, 0.0f ); glVertex3f( 0.0f, 0.0f, axis pos z ); glTexCoord2f( 0.0f, 1.0f ); glVertex3f( 0.0f, 1.0f, axis pos z ); glTexCoord2f( 1.0f, 1.0f ); glVertex3f( 1.0f, 1.0f, axis pos z ); glTexCoord2f( 1.0f, 0.0f ); glVertex3f( 1.0f, 0.0f, axis pos z ); glEnd(); Vertex coordinates are specified in object-space, and transformed to view-space using the modelview matrix. In the case of object-aligned slices, all the glTexCoord2f() commands can simply be left out. If multi-texturing is used, a simple vertex program can be exploited for generating the texture coordinates for the additional units, instead of downloading the same texture coordinates to multiple units. On the Radeon 8500 it is also possible to use the texture coordinates from unit zero for texture fetch operations at any of the other units, which solves the problem of duplicate texture coordinates in a very simple way, without requiring a vertex shader or wasting bandwidth. 55 Acknowledgments I would like to express a very special thank you to Christof Rezk-Salama for the diagrams and figures in this chapter. Berk Özer provided valuable comments and proof-reading. Thanks are also due to the VRVis Research Center for supporting the preparation of these course notes in the context of the basic research on visualization (http://www.VRVis.at/vis/). The VRVis Research Center is funded by an Austrian research program called K plus. High-Quality Volume Graphics on Consumer PC Hardware Illumination Techniques Klaus Engel Course Notes 42 Markus Hadwiger Joe M. Kniss Christof Rezk-Salama 57 Local Illumination Local illumination models allow the approximation of the light intensity reflected from a point on the surface of an object. This intensity is evaluated as a function of the (local) orientation of the surface with respect to the position of a point light source and some material properties. In comparison to global illumination models indirect light, shadows and caustics are not taken into account. Local illumination models are simple, easy to evaluate and do not require the computational complexity of global illumination. The most popular local illumination model is the Phong model [35, 5], which computes the lighting as a linear combination of three different terms, an ambient, a diffuse and a specular term, IPhong = Iambient + Idiffuse + Ispecular . Ambient illumination is modeled by a constant term, Iambient = ka = const. Without the ambient term parts of the geometry that are not directly lit would be completely black. In the real world such indirect illumination effects are caused by light intensity which is reflected from other surfaces. Diffuse reflection refers to light which is reflected with equal intensity in all directions (Lambertian reflection). The brightness of a dull, matte surface is independent of the viewing direction and depends only on the angle of incidence ϕ between the direction l of the light source and the surface normal n. The diffuse illumination term is written as Idiffuse = Ip kd cos ϕ = Ip kd (l • n). Ip is the intensity emitted from the light source. The surface property kd is a constant between 0 and 1 specifying the amount of diffuse reflection as a material specific constant. Specular reflection is exhibited by every shiny surface and causes so-called highlights. The specular lighting term incorporates the vector v that runs from the object to the viewer’s eye into the lighting computation. Light is reflected in the direction of reflection r which is the direction of light l mirrored about the surface normal n. For efficiency the reflection vector r can be replaced by the halfway vector h, Ispecular = Ip ks cosn α = Ip ks (h • n)n . The material property ks determines the amount of specular reflection. The exponent n is called the shininess of the surface and is used to control the size of the highlights. 58 Gradient Estimation The Phong illumination models uses the normal vector to describe the local shape of an object and is primarily used for lighting of polygonal surfaces. To include the Phong illumination model into direct volume rendering, the local shape of the volumetric data set must be described by an appropriate type of vector. For scalar fields, the gradient vector is an appropriate substitute for the surface normal as it represents the normal vector of the isosurface for each point. The gradient vector is the first order derivative of a scalar field I(x, y, z), defined as ∇I = (Ix , Iy , Iz ) = ( δ δ δ I, I, I), δx δy δz (9.1) using the partial derivatives of I in x-, y- and z-direction respectively. The scalar magnitude of the gradient measures the local variation of intensity quantitatively. It is computed as the absolute value of the vector, ||∇I|| = Ix 2 + Iy 2 + Iz 2 . (9.2) For illumination purposes only the direction of the gradient vector is of interest. There are several approaches to estimate the directional derivatives for discrete voxel data. One common technique based on the first terms from a Taylor expansion is the central differences method. According to this, the directional derivative in x-direction is calculated as Ix (x, y, z) = I(x + 1, y, z) − I(x − 1, y, z) with x, y, z ∈ IN. (9.3) Derivatives in the other directions are computed analogously. Central differences are usually the method of choice for gradient pre-computation. There also exist some gradientless shading techniques which do not require the explicit knowledge of the gradient vectors. Such techniques usually approximate the dot product with the light direction by a forward difference in direction of the light source. 59 Non-polygonal Shaded Isosurfaces Rendering a volume data set with opacity values of only 0 and 1, will result in an isosurface or an isovolume. Without illumination, however, the resulting image will show nothing but the silhouette of the object as displayed in Figure 10.1 (left). It is obvious, that illumination techniques are required to display the surface structures (middle and right). In a pre-processing step the gradient vector is computed for each voxel using the central differences method or any other gradient estimation scheme. The three components of the normalized gradient vector together with the original scalar value of the data set are stored as RGBA quadruplet in a 3D-texture: Ix ∇I = Iy Iz I −→ −→ −→ R G B −→ A The vector components must be normalized, scaled and biased to adjust their signed range [−1, 1] to the unsigned range [0, 1] of the color components. In our case the alpha channel contains the scalar intensity value and the OpenGL alpha test is used to discard all fragments that do not belong to the isosurface specified by the reference alpha value. The setup for the OpenGL alpha test is displayed in the following code sample. In this case, the number of slices must be increased extremely to obtain satisfying images. Alternatively the Figure 10.1: Non-polygonal isosurface without illumination (left), with diffuse illumination (middle) and with specular light (right) 60 alpha test can be set up to check for GL GREATER or GL LESS instead of GL EQUAL, allowing a considerable reduction of the sampling rate. glDisable(GL BLEND); // Disable alpha blending glEnable(GL ALPHA TEST); // Enable Alpha Test for isosurface glAlphaFunc(GL EQUAL, fIsoValue); What is still missing now is the calculation the Phong illumination model. Current graphics hardware provides functionality for dot product computation in the texture application step which is performed during rasterization. Several different OpenGL extensions have been proposed by different manufacturers, two of which will be outlined in the following. The original implementation of non-polygonal isosurfaces was presented by Westermann and Ertl [41]. The algorithm was expanded to volume shading my Meissner et al [29]. Efficient implementations on PC hardware are described in [36]. 61 Per-Pixel Illumination The integration of the Phong illumination model into a single-pass volume rendering procedure requires a mechanism that allows the computation of dot products and componentwise products in hardware. This mechanism is provided by the pixel-shaders functionality of modern consumer graphics boards. For each voxel, the x-, y- and z-components of the (normalized) gradient vector is pre-computed and stored as color components in an RGB texture. The dot product calculations are directly performed within the texture unit during rasterization. A simple mechanism that supports dot product calculation is provided by the standard OpenGL extension EXT texture env dot3. This extension to the OpenGL texture environment defines a new way to combine the color and texture values during texture applications. As shown in the code sample, the extension is activated by setting the texture environment mode to GL COMBINE EXT. The dot product computation must be enabled by selecting GL DOT3 RGB EXT as combination mode. In the sample code the RGBA quadruplets (GL SRC COLOR) of the primary color and the texel color are used as arguments. #if defined GL EXT texture env dot3 // enable the extension glTexEnvi(GL TEXTURE ENV, GL TEXTURE ENV MODE, GL COMBINE EXT); // preserve the alpha value glTexEnvi(GL TEXTURE ENV, GL COMBINE ALPHA EXT, GL REPLACE); // enable dot product computation glTexEnvi(GL TEXTURE ENV, GL COMBINE RGB EXT, GL DOT3 RGB EXT); // first argument: light direction stored in primary color glTexEnvi(GL TEXTURE ENV, GL SOURCE0 RGB EXT, GL PRIMARY COLOR EXT); glTexEnvi(GL TEXTURE ENV, GL OPERAND0 RGB EXT, GL SRC COLOR); // second argument: voxel gradient stored in RGB texture glTexEnvi(GL TEXTURE ENV, GL SOURCE1 RGB EXT, GL TEXTURE); glTexEnvi(GL TEXTURE ENV, GL OPERAND1 RGB EXT, GL SRC COLOR); #endif This simple implementation does neither account for the specular illumination term, nor for multiple light sources. More flexible illumination effects with multiple light sources can be achieved using the NVidia register combiners or similar extensions. 62 Advanced Per-Pixel Illumination The drawback of the simple implementation described in the previous section is its restriction to a single diffuse light source. Current rasterization hardware, however, allows the computation of the diffuse and specular terms of multiple light sources. To access these features, hardware-specific OpenGL extensions such as NVidia’s register combiners or ATI’s fragment shaders are required. Examples of such more flexible implementations are illustrated using the NVidia register combiners extension. The combiner setup for diffuse illumination with two independent light sources is displayed in Figure 12.1. Activating general com biner 0 inputregisters RG B slice i gradient intensity general com biner 1 final com biner outputregister A texture 0 A texture 1 B direction of light1 constcol0 coloroflight source 1 prim .color direction of light2 constcol1 coloroflight source 2 sec.color A B A C dot product B D C D D C A com p. w ise A B +C D O NE B ZER O C RG B D A LPH A G AB + (1-A )C +D RG B A fragm ent Figure 12.1: NVidia register combiner setup for diffuse illumination with two independent light sources. general com biner 0 inputregisters RG B slice i gradient intensity general com biner 1 final com biner outputregister A texture 0 texture 1 A B C direction of light1 constcol0 coloroflight source 1 prim .color D dot product A B A AB A B B C com p. w ise product D CD C + AB + (1-A )C +D RG B A fragm ent D ZER O RG B E EF F A LPH A G Figure 12.2: NVidia register combiner setup for diffuse and specular illumination. The additional sum (+) is achieved using the spare0 and secondary color registers of the final combiner stage. 63 additional combiner stages also allows the computation of the diffuse terms for more than two light sources. Specular and diffuse illumination can be achieved using the register combiner setup displayed in Figure 12.2. Both implementations assume that the pre-computed gradient and the emission/absorbtion coefficients are kept in separate textures. Texture 0 stores the normalized gradient vectors. The emission and absorbtion values are generated from the original intensity values stored in texture 1 by color table lookup. All methods can alternatively be implemented using ATI’s OpenGL extension. A B C D E F Figure 12.3: CT data of a human hand without illumination (A), with diffuse illumination (B) and with specular illumination (C). Non-polygonal isosurfaces with diffuse (D), specular (C) and diffuse and specular (E) illumination. 64 Reflection Maps If the illumination computation becomes too complex for on-the-fly computation, alternative lighting techniques such as reflection mapping come into play. The idea of reflection mapping originates from 3D computer games and represents a method to pre-compute complex illumination scenarios. The usefulness of this approach derives from its ability to realize local illumination with an arbitrary number of light sources and different illumination parameters at low computational cost. A reflection map caches the incident illumination from all directions at a single point in space. The idea of reflection mapping has been first suggested by Blinn [6]. The term environment mapping was coined by Greene [14] in 1986. Closely related to the diffuse and specular terms of the Phong illumination model, reflection mapping can be performed with diffuse maps or reflective environment maps. The indices into a diffuse reflection map are directly computed from the normal vector, whereas the coordinates for an environment map are a function of both the normal vector and the viewing direction. Reflection maps in general assume that the illuminated object is small with respect to the environment that contains it. Figure 13.1: Example of a environment cube map. A special parameterization of the normal direction is used in order to construct a cube map as displayed in Figure 13.1. In this case the environment is projected onto the six sides of a surrounding cube. The largest component of the reflection vector indicates the 65 appropriate side of the cube and the remaining vector components are used as coordinates for the corresponding texture map. Cubic mapping is popular because the required reflection maps can easily be constructed using conventional rendering systems and photography. The implementation of cubic diffuse and reflective environment maps can be accomplished using the OpenGL extension GL NV texture shader. The setup is displayed in the following code sample. Four texture units are involved in this configuration. Texture 0 is a 3D-texture which contains the pre-computed gradient vectors. In texture unit 0 a normal vector is interpolated from this texture. Since the reflection map is generated in the world coordinate space, accurate application of a normal map requires to account for the local transformation represented by the current modeling matrix. For reflective maps the viewing direction must also be taken into account. In the OpenGL extension, the local 3 × 3 modeling matrix and the camera position is specified as texture coordinates for the texture units 1, 2 and 3. From this information the GPU constructs the viewing direction and valid normal vectors in world coordinates in texture unit 1. The diffuse and the reflective cube maps are applied in texture unit 2 and texture unit 3, respectively. As a result, the texture registers 2 and 3 contain the appropriately sampled diffuse and reflective environment map. These values are finally combined to form the final color of the fragment using the register combiner extension. Figure 13.2: Isosurface of the engine block with diffuse reflection map (left) and specular environment map (right). #if defined GL NV texture shader // texture unit 0 - sample normal vector from 3D-texture glActiveTextureARB(GL TEXTURE0 ARB); glEnable(GL TEXTURE 3D EXT); glEnable(GL TEXTURE SHADER NV); glTexEnvi(GL TEXTURE SHADER NV, GL SHADER OPERATION NV, GL TEXTURE 3D); // texture unit 1 - dot product computation glActiveTextureARB( GL TEXTURE1 ARB ); glEnable(GL TEXTURE SHADER NV); glTexEnvi(GL TEXTURE SHADER NV, GL SHADER OPERATION NV, GL DOT PRODUCT NV); glTexEnvi(GL TEXTURE SHADER NV, GL PREVIOUS TEXTURE INPUT NV, GL TEXTURE0 ARB); glTexEnvi(GL TEXTURE SHADER NV, GL RGBA UNSIGNED DOT PRODUCT MAPPING NV, GL EXPAND NORMAL NV); // texture unit 2 - diffuse cube map glActiveTextureARB( GL TEXTURE2 ARB ); glEnable(GL TEXTURE SHADER NV); glBindTexture(GL TEXTURE CUBE MAP EXT, m nDiffuseCubeMapTexName); glTexEnvi(GL TEXTURE SHADER NV, GL SHADER OPERATION NV, GL DOT PRODUCT DIFFUSE CUBE MAP NV); glTexEnvi(GL TEXTURE SHADER NV, GL PREVIOUS TEXTURE INPUT NV, GL TEXTURE0 ARB); glTexEnvi(GL TEXTURE SHADER NV, GL RGBA UNSIGNED DOT PRODUCT MAPPING NV, GL EXPAND NORMAL NV); // texture unit 3 - reflective cube map glActiveTextureARB( GL TEXTURE3 ARB ); glEnable(GL TEXTURE CUBE MAP EXT); glBindTexture(GL TEXTURE CUBE MAP EXT, m nReflectiveCubeMapTexName); glTexEnvi(GL TEXTURE SHADER NV, GL SHADER OPERATION NV, GL DOT PRODUCT REFLECT CUBE MAP NV); glTexEnvi(GL TEXTURE SHADER NV, GL PREVIOUS TEXTURE INPUT NV, GL TEXTURE0 ARB); glTexEnvi(GL TEXTURE SHADER NV, GL RGBA UNSIGNED DOT PRODUCT MAPPING NV, GL EXPAND NORMAL NV); #endif 67 High-Quality Volume Graphics on Consumer PC Hardware Classification Klaus Engel Course Notes 42 Markus Hadwiger Joe M. Kniss Christof Rezk-Salama 68 Introduction The role of the transfer function in direct volume rendering is essential. Its job is to assign optical properties to more abstract data values. It is these optical properties that we use to render a meaningful image. While the process of transforming data values into optical properties is simply implemented as a table lookup, specifying a good transfer function can be a very difficult task. In this section we will identify and explain the optical properties used in the traditional volume rendering pipeline, explore the use of shadows in volume rendering, demonstrate the utility of an expanded transfer function, and discuss the process of setting a good transfer function. Figure 14.1: The Bonsai Tree CT. Volume shading with an extended transfer function, described clockwise from top. Upper-left: Surface shading. Upper-right: Direct attenuation, or volume shadows. Lower-right: Direct and Indirect lighting. Lower-left: Direct and Indirect lighting with surface shading only on the leaves. 69 Transfer Functions Evaluating a transfer function using graphics hardware effectively amounts to an arbitrary function evaluation of data value via a table lookup. There are two methods to accomplish this. The first method uses the glColorTable() to store a user defined 1D lookup table, which encodes the transfer function. When GL COLOR TABLE is enabled, this function replaces an 8 bit texel with the RGBA components at that 8 bit value’s position in the lookup table. Some high end graphics cards permit lookups based on 12 bit texels. On some commodity graphics cards, such as the NVIDIA GeForce, the color table is an extension known as paletted texture. On these platforms, the use of the color table requires that the data texture have an internal format of GL COLOR INDEX* EXT, where * is the number of bits of precision that the data texture will have (1,2,4,or 8). Other platforms may require that the data texture’s internal format be GL INTENSITY8. The second method uses dependent texture reads. A dependent texture read is the process by which the color components from one texture are converted to texture coordinates and used to read from a second texture. In volume rendering, the first texture is the data texture and the second is the transfer function. The GL extensions and function calls that enable this feature vary depending on the hardware, but the functionality is Figure 15.1: The action of the transfer function. 70 Figure 15.2: Pre-classification (left) verses post-classification (right) equivalent. On the GeForce3 and GeForce4, this functionality is part of the Texture Shader extensions. On the ATI Radeon 8500, dependent texture reads are part of the Fragment Shader extension. While dependent texture reads can be slower than using a color table, they much more flexible. Dependent texture reads can be used to evaluate multi-dimensional transfer functions, discussed later in this chapter, or they can be used for pre-integrated transfer function evaluations, discussed in the next chapter. Since the transfer function can be stored as a regular texture, dependent texture reads also permit transfer functions which define more than four optical properties. Why do we need a transfer function, i.e. why not store the optical properties in the volume directly? There are at least two answers to this question. First, it is inefficient to update the entire volume and reload it each time the transfer function changes. It is much faster to load the smaller lookup table and let the hardware handle the transformation from data value to optical properties. Second, evaluating the transfer function at each sample prior to interpolation is referred to as pre-classification. Pre-classification can cause significant artifacts in the final rendering, especially when there is a sharp peak in the transfer function. An example of pre-classification can be seen on the left side of Figure 15.2. A similar rendering using post-classification is seen on the right. It should be no surprise that interpolating colors from the volume sample points does not adequately capture the behavior of the data. In the traditional volume rendering pipeline, the transfer function returns color (RGB) and opacity (α). User interfaces for transfer function specification will be discussed later in this chapter. Figure 15.3 shows an example of an arbitrary transfer function. While this figure shows RGBα varying as piece-wise linear ramps, the transfer function can also be created using more continuous segments. The goal in specifying a transfer function is to isolate the ranges of data values, in the transfer function domain, that correspond to features, in the spatial domain. Figure 15.4 shows an example transfer function that isolates the bone in the Visible Male’s skull. On the left, we see the transfer function. The alpha ramp is responsible for making the bone visible, whereas the color is constant for all of the bone. The problem with this type of visualization is that the shape and structure is 71 Figure 15.3: An arbitrary transfer function showing how red, green, blue, and alpha vary as a function of data value f(x,y,z) . Figure 15.4: An example transfer function for the bone of the Visible Male (left), and the resulting rendering (right). not readily visible, as seen on the right side of Figure 15.4. One solution to this problem involves a simple modification of the transfer function, called Faux shading. By forcing the color to ramp to black proportionally to the alpha ramping to zero, we can effectively create silhouette edges in the resulting volume rendering, as seen in Figure 15.5. On the left we see the modified transfer function. In the center, we see the resulting volume rendered image. Notice how much more clear the features are in this image. This approach works because the darker colors are only applied at low opacities. This means that they will only accumulate enough to be visible when a viewing ray grazes a classified feature, as seen on the right side of Figure 15.5. While this approach may not produce images as compelling as surface shaded or shadowed renderings as seen in Figure 15.6, it is advantageous because it doesn’t require any extra computation in the rendering phase. 72 Figure 15.5: Faux shading. Modify the transfer function to create silhouette edges. Figure 15.6: Surface shading. 73 Extended Transfer Function 16.1 Optical properties The traditional volume rendering equation proposed by Levoy [24] is a simplified approximation of a more general volumetric light transport equation. This equation was first used in computer graphics by Kajiya [19]. This equation describes the interaction of light and matter, in the form of small particles, as a series of scattering and absorption events. Unfortunately, solutions to this equation are difficult and very time consuming. A survey of this problem in the context of volume rendering can be found in [27]. The optical properties required to describe the interaction of light with a material are spectral, i.e. each wavelength of light may interact with the material differently. The most commonly used optical properties are absorption, scattering, and phase function. Other important optical properties are index of refraction and emission. Volume rendering models that take into account scattering effects are complicated by the fact that each element in the volume can potentially contribute light to each other element. This is similar to other global illumination problems in computer graphics. For this reason, the traditional volume rendering equation ignores scattering effects and focuses on emission and absorption only. In this section we discuss the value and application of adding additional optical properties to the transfer function. 16.2 Traditional volume rendering The traditional volume rendering equation is: eye Te (s) ∗ g(s) ∗ fs (s)ds Ieye = IB ∗ Te (0) + (16.1) 0 Te (s) = exp − eye τ (x)dx (16.2) s Where IB is the background light intensity, g(s) is the emission term at sample s, fs (s) is the Blinn-Phong surface shading model evaluated using the normalized gradient of the scalar data field at s, and τ (x) is an achromatic extinction coefficient at the sample x. For a concise derivation of this equation and the discrete solution used in volume rendering, see [27]. The solution to the equation and its use in hardware volume rendering was also presented in Chapter 2 of these course notes. The extinction term in Equation 16.2 is achromatic, meaning that it affects all wavelengths of light equally. This term can be expanded to attenuate light spectrally. The details of this implementation are in [31]. The implementation of this process requires additional buffers and passes. The transfer function for spectral volume rendering only 74 using the wavelengths of light for red, green, and blue would require a separate alpha for each of these wavelengths. 16.3 The Surface Scalar The emission term is the material color and opacity (α), in the transfer function, is derived from extinction (τ (x)): (16.3) α = e−τ (x) Since the traditional volume rendering model includes local surface shading (fs (s)), the emission term is misleading. This model really implies that the volume is illuminated by an external light source and the light arrives at a sample unimpeded by the intervening volume. In this case the emission term can be thought of as a reflective color. While surface shading can dramatically enhance the visual quality of the rendering, it cannot adequately light homogeneous regions. Since we use the normalized gradient of the scalar field as the surface normal for shading, we can have problems when we try to shade regions where the normal cannot be measured. The gradient should be zero in homogeneous regions where there is very little or no local change in data value, making the normal undefined. In practice, data sets contain noise which further complicates the use of the gradient as a normal. This problem can be easily handled, however, by introducing a surface scalar (S(s)) to the rendering equation. The role of this term is to interpolate between shaded and unshaded rendering per sample. eye Te (s) ∗ C(s)ds (16.4) Ieye = IB ∗ Te (0) + 0 C(s) = g(s) ((1 − S(s)) + fs (s)S(s)) (16.5) S(s) can be computed in a variety of ways. If the gradient magnitude is available at each sample, we can use it to compute S(s). This usage implies that only regions with a high enough gradient magnitudes should be shaded. This is reasonable since homogeneous regions should have a very low gradient magnitude. This term loosely correlates to the index of refraction. In practice we use: S(s) = 1 − (1 − ∇f (s))2 (16.6) Figure 16.1 demonstrates the use of the surface scalar (S(s)). The image on the left is a volume rendering of the visible male with the soft tissue (a relatively homogeneous material) surface shaded, illustrating how this region is poorly illuminated. On the right, only samples with high gradient magnitudes are surface shaded. 16.4 Shadows Surface shading improves the visual quality of volume renderings. However, the lighting model is rather unrealistic since it assumes that light arrives at a sample without interacting 75 Figure 16.1: Surface shading with out (left) and with (right) the surface scalar. with the portions of the volume between it and the light. Volumetric shadows can be added to the equation: eye Ieye = IB ∗ Te (0) + Te (s) ∗ C(s) ∗ fs (s) ∗ Il (s)ds (16.7) 0 Il (s) = Il (0) ∗ exp − light τ (x)dx (16.8) s Where Il (0) is the light intensity, and Il (s) is the light intensity at the sample s. Notice that Il (s) is essentially the same as Te (s) except that the integral is computed toward the light rather than the eye. A hardware model for computing shadows was presented by Behrens and Ratering [4]. This model computes a second volume for storing the amount of light arriving at each sample. The second volume is then sliced and the values at each sample are multiplied by the colors from the original volume after the transfer function has been evaluated. This approach, however, suffers from an artifact referred to as attenuation leakage. The visual consequences of this are blurry shadows and surfaces which appear much darker than they should due to the image space high frequencies introduced by the transfer function. The attenuation at a given sample point is blurred when light intensity is stored at a coarse resolution and interpolated during the observer rendering phase. A simple and efficient alternative was proposed in [21]. First, rather than creating a volumetric shadow map, an off screen render buffer is utilized to accumulate the amount of light attenuated from the light’s point of view. Second, we modify the slice axis to be the 76 direction halfway between the view and light directions. This allows the same slice to be rendered from both the eye and light points of view. Consider the situation for computing shadows when the view and light directions are the same, as seen in Figure 16.2(a). Since the slices for both the eye and light have a one to one correspondence, it is not necessary to pre-compute a volumetric shadow map. The amount of light arriving at a particular slice is equal to one minus the accumulated opacity of the slices rendered before it. Naturally if the projection matrices for the eye and light differ, we need to maintain a separate buffer for the attenuation from the light’s point of view. When the eye and light directions differ, the volume would be sliced along each direction independently. The worst case scenario happens when the view and light directions are perpendicular, as seen in Figure 16.2(b). In the case, it would seem necessary to save a full volumetric shadow map which can be re-sliced with the data volume from the eye’s point of view providing shadows. This approach, however, suffers from an artifact referred to as attenuation leakage. The visual consequences of this are blurry shadows and surfaces which appear much darker than they should due to the image space high frequencies introduced by the transfer function. The attenuation at a given sample point is blurred when light intensity is stored at a coarse resolution and interpolated during the observer rendering phase. Rather than slice along the vector defined by the view direction or the light direction, we can modify the slice axis to allow the same slice to be rendered from both points of view. When the dot product of the light and view directions is positive, we slice along the vector halfway between the light and view directions, seen in Figure 16.2(c). In this case, the volume is rendered in front to back order with respect to the observer. When the dot product is negative, we slice along the vector halfway between the light and the inverted view directions, seen in Figure 16.2(d). In this case, the volume is rendered in back to front order with respect to the observer. In both cases the volume is rendered in front to back order with respect to the light. Care must be taken to insure that the slice spacing along the view and light directions are maintained when the light or eye positions change. If the desired slice spacing along the view direction is dv and the angle between v and l is θ then the slice spacing along the slice direction is θ ds = cos( )dv . 2 (16.9) This is a multi-pass approach. Each slice is first rendered from the observers point of view using the results of the previous pass from the light’s point of view, which modulates the brightness of samples in the current slice. The same slice is then rendered from light’s point of view to calculate the intensity of the light arriving at the next layer. Since we must keep track of the amount of light attenuated at each slice, we utilize an off screen render buffer, known as a pixel buffer. This buffer is initialized to 1−light intensity. It can also be initialized using an arbitrary image to create effects such as spotlights. The projection matrix for the light’s point of view need not be orthographic; a perspective projection matrix can be used for point light sources. However, the entire volume must fit in the light’s view frustum. Light is attenuated by simply accumulating the opacity for each sample using the over operator. The results are then copied to a texture which is multiplied 77 l l v v (a) (b) s θ −1 2 v l l θ v (c) s -v (d) Figure 16.2: Modified slice axis for light transport. with the next slice from the eye’s point of view before it is blended into the frame buffer. While this copy to texture operation has been highly optimized on the current generation of graphics hardware, we have achieved a dramatic increase in performance using a hardware extension known as render to texture. This extension allows us to directly bind a pixel buffer as a texture, avoiding the unnecessary copy operation. The two pass process is illustrated in Figure 16.3. 16.5 Translucency Shadows can add valuable depth queue and dramatic effects to a volume rendered scene. Even if the technique for rendering shadows can avoid attenuation leakage, the images can still appear too dark. This is not an artifact, it is an accurate rendering of materials which only absorb light and do not scatter it. As noted at the beginning of this chapter, volume rendering models that account for scattering effects are too computationally expensive for interactive hardware based approaches. This means that approximations are needed 78 Figure 16.3: 2 pass shadows. Step 1 (left) render a slice for the eye, multiplying it by the attenuation in the light buffer. Step 2 (right) render the slice into the light buffer to update the attenuation for the next pass. (a) Wax (b) Translucent rendering (c) Different reflective color (d) Just shadows Figure 16.4: Translucent volume shading. (a) is a photograph of wax block illuminated from above with a focused flashlight. (b) is a volume rendering with a white reflective color and a desaturated orange transport color (1− indirect attenuation). (c) has a bright blue reflective color and the same transport color as the upper right image. (d) shows the effect of light transport that only takes into account direct attenuation. to capture some of the effects of scattering. One such visual consequence of scattering in volumes is translucency. Translucency is the effect of light propagating deep into a material even though object occluded by it cannot be clearly distinguished. Figure 16.4(a) shows a common translucent object, wax. Other translucent objects are skin, smoke, and clouds. Several simplified optical models for hardware based rendering of clouds have been proposed [17, 9]. These models are capable of producing realistic images of clouds, but do not easily extend to general volume rendering applications. The previously presented model for computing shadows can easily be extended to achieve the effect of translucency. Two modifications are required. First, we require a second alpha value (αi ) which represents the amount of indirect attenuation. This value should be less than or equal to the alpha value for the direct attenuation. Second, we require an additional light buffer for blurring the indirect attenuation. The the translucent 79 Id Id s Ii Ii θ s (a) General Light Transport (b) Translucency Approximation Figure 16.5: On the left is the general case of direct illumination Id and scattered indirect illumination Ii . On the right is a translucent shading model which includes the direct illumination Id and approximates the indirect, Ii , by blurring within the shaded region. Theta is the angle indicated by the shaded region. volume rendering model is: Ieye = I0 ∗ Te (0) + eye Te (s) ∗ C(s) ∗ Il (s)ds (16.10) 0 light Il (s) = Il (0) ∗ exp − τ (x)dx + s light τi (x)dx Blur(θ) Il (0) ∗ exp − (16.11) s Where τi (s) is the indirect light extinction term, C(s) is the reflective color at the sample s, S(s) is a surface shading parameter, and Il is the sum of the direct light and the indirect light contributions. The indirect extinction term is spectral, meaning that it describes the indirect attenuation of light for each of the R, G, and B color components. Similar to the direct extinction, the indirect attenuation can be specified in terms of an indirect alpha: αi = exp(−τi (x)) (16.12) While this is useful for computing the attenuation, we have found it non-intuitive for user specification. We prefer to specify a transport color which is 1 − αi since this is the color the indirect light will become as it is attenuated by the material. In general, light transport in participating media must take into account the incoming light from all directions, as seen in Figure 16.5(a). However, the net effect of multiple 80 scattering in volumes is a blurring of light. The diffusion approximation[40, 13] models the light transport in multiple scattering media as a random walk. This results in the light being diffused within the volume. The Blur(θ) operation in Equation 16.11 averages the incoming light within the cone with an apex angle θ in the direction of the light (Figure 16.5(b)). The indirect lighting at a particular sample is only dependent on a local neighborhood of samples computed in the previous iteration and shown as the arrows between slices. This operation models light diffusion by convolving several random sampling points with a Gaussian filter. The process of rendering using translucency is essentially the same as rendering shadows. In the first pass, as slice is rendered from the point of view of the light. However, rather than simply multiply the samples color by one minus the direct attenuation, we sum one minus the direct and one minus the indirect attenuation to compute the light intensity at the sample. In the second pass, a slice is rendered into the next light buffer from the light’s point of view to compute the lighting for the next iteration. Two light buffers are maintained to accommodate the blur operation required for the indirect attenuation, next is the one being rendered to and current is the one bound as a texture. Rather than blend slices using a standard OpenGl blend operation, we explicitly compute the blend in the fragment shading stage. The current light buffer is sampled once in the first pass, for the observer, and multiple times in the second pass, for the light, using the render to texture OpenGL extension. Whereas, the next light buffer, is rendered to only in the second pass. This relationship changes after the second pass so that the next buffer becomes the current and vice versa. We call this approach ping pong blending. In the fragment shading stage, the texture coordinates for the current light buffer, in all but one texture unit, are modified per-pixel using a random noise texture as discussed in the last chapter of these course notes. The number of samples used for the computation of the indirect light is limited by the number of texture units. Currently, we use four samples. Randomizing the sample offsets masks some artifacts caused by this coarse sampling. The amount of this offset is bounded based on a user defined blur angle (θ) and the sample distance (d): θ (16.13) of f set ≤ d tan( ) 2 The current light buffer is then read using the new texture coordinates. These values are weighted and summed to compute the blurred inward flux at the sample. The transfer function is evaluated for the incoming slice data to obtain the indirect attenuation (αi ) and direct attenuation (α) values for the current slice. The blurred inward flux is attenuated using αi and written to the RGB components of the next light buffer. The alpha value from the current light buffer with the unmodified texture coordinates is blended with the α value from the transfer function to compute the direct attenuation and stored in the alpha component of the next light buffer. This process is enumerated below: 1. Clear color buffer 2. Initialize pixel buffer with 1-light color (or light map) 81 3. Set slice direction to the halfway between light and observer view directions. 4. For each slice: (a) Determine the locations of slice vertices in the light buffer. (b) Convert these light buffer vertex positions to texture coordinates (c) Bind the light buffer as a texture using these texture coordinates (d) In the Per-fragment blend stage: i. Evaluate the transfer function for the Reflective color and direct attenuation ii. Evaluate surface shading model if desired, this replaces the Reflective color iii. Evaluate the phase function, lookup using the dot of the view and light directions iv. Multiply the reflective color by the 1-direct attenuation from the light buffer. v. Multiply the reflective*direct color by the phase function. vi. Multiply the Reflective color by 1-(indirect) from the light buffer. vii. Sum the direct*reflective*phase and indirect*reflective to get the final sample color viii. The alpha value is the direct attenuation from the transfer function (e) Render and blend the slice into the frame buffer for the observers point of view (f) Render slice from the lights point of view: render slice to the position in the light buffer used in for the observer slice (g) In the Per-fragment blend stage: i. Evaluate the transfer function for the direct and indirect attenuation ii. Sample the light buffer at multiple locations iii. Weight and sum the samples to compute the blurred indirect attenuation, weight is the blur kernel and the indirect attenuation at that sample. iv. Blend the blurred indirect and un-blurred direct attenuation with the values from the transfer function (h) Render the slice into the correct light buffer While this process my seem quite complicated, it is straight forward to implement. The render to texture extension is part of the WGL ARB render texture OpenGl extensions. The key functions are wglBindTexImageARB() which binds a P-Buffer as a texture, and wglReleaseTexImageARB() which releases a bound P-Buffer so that it may be rendered to again. The texture coordinates of a slice’s light intensities from a light buffer are the 2D positions that the slice’s vertices project to in the light buffer scaled and biased so that they are in the range zero to one. Computing volumetric light transport in screen space is advantageous because the resolution of these calculations and the resolution of the volume rendering can match. This 82 means that the resolution of the light transport is decoupled from that of the data volume’s grid, permitting procedural volumetric texturing, which will be described in the following chapters. 16.6 Summary Rendering and shading techniques are important for volume graphics, but they would not be useful unless we had a way to transform interpolated data into optical properties. While the traditional volume rendering model only takes into account a few basic optical properties, it is important to consider additional optical properties. Even if these optical properties imply a much more complicated rendering model than is possible with current rendering techniques, adequate approximations can be developed which add considerably to the visual quality. We anticipate that the development of multiple scattering volume shading models will be an active area of research in the future. In the next chapter we discuss techniques for specifying a good transfer function. 83 (a) Carp CT (b) Stanford Bunny (c) Joseph the Convicted Figure 16.6: Example volume renderings using an extended transfer function. 84 Transfer Functions Now that we have identified the basic and more exotic optical properties that describe the visual appearance of a material, we need a way of specifying a transfer function. User interfaces for transfer function specification should fulfill some basic requirements. It should have mechanisms that guide the user toward setting a good transfer function. It should also be expressive, in that it should permit materials to be identified precisely. Finally, it needs to be interactive, since in the end there may not be an automatic method suitable for specifying a desired transfer function. 17.1 Multi-dimensional Transfer Functions A single scalar data value need not be the only quantity used to identify the difference between materials in a transfer function. Levoy’s volume rendering model includes a 2D transfer function. This model allows each sample to contain multiple values. These values are the axes of a multi-dimensional transfer function. Multi-dimensional transfer functions are implemented using dependent texture reads. If there are two values available per-datasample, the transfer function should be 2D and is stored on the graphics card using a 2D texture. See [21] for examples of multi-dimensional transfer functions applied to both scalar data with derivative measurements and multivariate data. Adding the gradient magnitude of a scalar dataset to the transfer function can improve our ability to isolate material boundaries and the materials themselves. Figures 17.1(c) and 17.1)(d) show how this kind of 2D transfer function can help isolate the leaf material from the bark material of the Bonsai Tree CT dataset. A naive transfer function editor may simply give the user access to all of the optical Figure 17.1: 1D (a and c) verses 2D (b and d) transfer functions. 85 Figure 17.2: The Design Gallery transfer function interface. properties directly as a series of control points that define piece-wise linear (or higher order) ramps. This can be seen in Figure 15.3. This approach can make specifying a transfer function a tedious trial and error process. Naturally, adding dimensions to the transfer function can further complicate a user interface. 17.2 Guidance The effectiveness of a transfer function editor can be enhanced with features that guide the user with data specific information. He et al. [18] generated transfer functions with genetic algorithms driven either by user selection of thumbnail renderings, or some objective image fitness function. The purpose of this interface is suggest an appropriate transfer function to the user based how well the user feels the rendered images capture the important features. The Design Gallery [26] creates an intuitive interface to the entire space of all possible transfer functions based on automated analysis and layout of rendered images. This approach basically parameterizes the space of all possible transfer functions and stochastically samples it, renders the volume, and groups the images based on similarity. While this can be a time consuming process, it is fully automated. Figure 17.2 shows an example of this user interface. A more data-centric approach is the Contour Spectrum [3], which visually summarizes the space of isosurfaces in terms of metrics like surface area and mean gradient magnitude, thereby guiding the choice of isovalue for isosurfacing, but also providing information useful for transfer function generation. Another recent paper [1] presents a novel transfer function 86 Figure 17.3: A thumbnail transfer function interface. interface in which small thumbnail renderings are arranged according to their relationship with the spaces of data values, color, and opacity. This kind of editor can be seen in Figure 17.2. One of the most simple and effective features that a transfer function interface can include is a histogram. A histogram shows a user the behavior of data values in the transfer function domain. In time a user can learn to read the histogram and quickly identify features. Figure 17.4(b) shows a 2D joint histogram of the Chapel Hill CT dataset. Notice the arches, they identify material boundaries, the dark blobs at the bottom identify the materials themselves. Volume probing can also help the user identify features. This approach gives the user a mechanism for pointing at a feature in the spatial domain. The values at this point are then presented graphically in the transfer function interface, indicating to the user which ranges of data values identify the feature. This approach can be tied to a mechanism that automatically sets the transfer function based on the data values at the being feature pointed at. This technique is called dual-domain interaction [21]. The action of this process can be seen in Figure 17.5. 17.3 Classification It is often helpful to identify discrete regions in transfer function domain that correspond to individual features. Figure 17.6 shows an integrated 2D transfer function interface. 87 F A D C B A (a) A 1D histogram. The black region represents the number of data value occurrences on a linear scale, the grey is on a log scale. The colored regions (A,B,C) identify basic materials. f ' E B Data Value C (b) A log-scale 2D joint histogram. The lower image shows the location of materials (A,B,C), and material boundaries (D,E,F). D F C B E (c) A volume rendering showing all of the materials and boundaries identified above, except air (A), using a 2D transfer function. Figure 17.4: Material and boundary identification of the Chapel Hill CT Head with data value alone (a) versus data value and gradient magnitude (f ’), seen in (b). The basic materials captured by CT, air (A), soft tissue (B), and bone (C) can be identified using a 1D transfer function as seen in (a). 1D transfer functions, however, cannot capture the complex combinations of material boundaries; air and soft tissue boundary (D), soft tissue and bone boundary (E), and air and bone boundary (F) as seen in (b) and (c). 88 Figure 17.5: Probing and dual-domain interaction. Figure 17.6: Classification widgets This type of interface constructs a transfer function using direct manipulation widgets. Classified regions are modified by manipulating control points. These control points change high level parameters such as position, width, and optical properties. The widgets define a specific type of classification function such as a Gaussian ellipsoid, inverted triangle, or linear ramp. This approach is advantageous because it frees up the user to focus more on feature identification and less on the shape of the classification function. We have also found it useful to allow the user the ability to paint directly into the transfer function domain. In all, our experience has shown that the best transfer functions are specified using an iterative process. When a volume is first encountered, it is important to get an immediate sense of the structures contained in the data. In many cases, a default transfer function 89 Figure 17.7: The “default” transfer function. can achieve this. By assigning higher opacity to higher gradient magnitudes and varying color based on data value, as seen in Figure 17.7, most of the important features of the datasets are visualized. The process of probing allows the user to identify the location of data values in the transfer function domain that correspond to these features. Dual-domain interaction allow the user to set the transfer function by simply pointing at a feature. By having simple control points on discrete classification widgets the user can manipulate the transfer function directly to expose a feature the best that they can. By iterating through this process of exploration, specification, and refinement, a user can efficiently specify a transfer function that produces a high quality visualization. High-Quality Volume Graphics on Consumer PC Hardware Advanced Techniques Klaus Engel Course Notes 42 Markus Hadwiger Joe M. Kniss Christof Rezk-Salama 91 Hardware-Accelerated High-Quality Filtering An important step in volume rendering is the reconstruction of the original signal from the sampled volume data (section 2.2). This step involves the convolution of the sampled signal with a reconstruction kernel. Unfortunately, current graphics hardware only supports linear filters, which do not provide sufficient quality for a high-quality reconstruction of the original signal. Although higher-order filters are able to achieve much better quality than linear interpolation, they are usually only used for filtering in software algorithms. However, by exploiting the features of programmable consumer graphics hardware, high-quality filtering with filter kernels of higher order than linear interpolation can be done in real-time, although the hardware itself does not support such filtering operations natively [15, 16]. This chapter gives an overview of hardware-accelerated high-quality filtering. Examples for high-quality filters that achieve a good trade-off between speed and quality can be seen in figure 18.2. Basically, input textures are filtered by convolving them with an arbitrary filter kernel, which itself is stored in several texture maps. Since the filter function is represented by an array of sampled values, the basic algorithm works irrespective of the shape of this function. However, kernel properties such as separability and symmetry can be exploited to gain higher performance. The basic algorithm is also independent from the dimensionality of input textures and filter kernels. Thus, in the context of volume rendering, it can be used in conjunction with all kinds of proxy geometry, regardless of whether 2D or 3D textures are used. 18.1 Basic principle In order to be able to employ arbitrary filter kernels for reconstruction, we have to evaluate the well-known filter convolution sum: x+m g(x) = (f ∗ h)(x) = f [i]h(x − i) (18.1) i=x−m+1 This equation describes a convolution of the discrete input samples f [x] with a continuous reconstruction filter h(x). In the case of reconstruction, this is essentially a sliding average of the samples and the reconstruction filter. In equation 18.1, the (finite) half-width of the filter kernel is denoted by m. In order to be able to exploit standard graphics hardware for performing this computation, we do not use the evaluation order commonly employed, i.e., in software-based filtering. The convolution sum is usually evaluated in its entirety for a single output sample 92 Figure 18.1: Using a high-quality reconstruction filter for volume rendering. This image compares bi-linear interpolation of object-aligned slices (A) with bi-cubic filtering using a B-spline filter kernel (B). at a time. That is, all the contributions of neighboring input samples (their values multiplied by the corresponding filter values) are gathered and added up in order to calculate the final value of a certain output sample. This “gathering” of contributions is shown in figure 18.3(a). This figure uses a simple tent filter as an example. It shows how a single output sample is calculated by adding up two contributions. The first contribution is gathered from the neighboring input sample on the left-hand side, and the second one is gathered from the input sample on the right-hand side. For generating the desired output data in its entirety, this is done for all corresponding resampling points (output sample locations). In the case of this example, the convolution results in linear interpolation, due to the 93 1.2 1.2 Cubic B-spline Catmull-Rom spline Blackman windowed sinc Blackman window (width = 4) 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.2 -0.2 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 (a) (b) Figure 18.2: Example filter kernels of width four: (a) Cubic B-spline and Catmull-Rom spline; (b) Blackman windowed sinc, depicting also the window itself. tent filter employed. However, we are only using a tent filter for simplifying the explanation. In practice, kernels of arbitrary width and shape can be used, and using a tent filter for hardware-accelerated high-quality filtering would not make much sense. In contrast to the evaluation order just outlined, hardware-accelerated high-quality filtering uses a different order. Instead of focusing on a single output sample at any one time, it calculates the contribution of a single input sample to all corresponding output sample locations (resampling points) first. That is, the contribution of an input sample is distributed to its neighboring output samples, instead of the other way around. This “distribution” of contributions is shown in figure 18.3(b). In this case, the final value input samples output sample input samples filter kernel + + x x resampling points resampling point Figure 18.3: Gathering vs. distribution of input sample contributions (tent filter): (a) Gathering all contributions to a single output sample (b) Distributing a single input sample’s contribution. 94 input samples input samples resampling points resampling points (a) (b) Figure 18.4: Distributing the contributions of all “left-hand” (a), and all “right-hand” (b) neighbors, when using a tent filter as a simple example for the algorithm. of a single output sample is only available when all corresponding contributions of input samples have been distributed to it. The convolution sum is evaluated in this particular order, since the distribution of the contributions of a single relative input sample can be done in hardware for all output samples (pixels) simultaneously. The final result is gradually built up over multiple rendering passes. The term relative input sample location denotes a relative offset of an input sample with respect to a set of output samples. In the example of a one-dimensional tent filter (like in figure 18.3(a)), there are two relative input sample locations. One could be called the “left-hand neighbor,” the other the “right-hand neighbor.” In the first pass, the contribution of all respective left-hand neighbors is calculated. The second pass then adds the contribution of all right-hand neighbors. Note that the number of passes depends on the filter kernel used, see below. Thus, the same part of the filter convolution sum is added to the previous result for each pixel at the same time, yielding the final result after all parts have been added up. From this point of view, the graph in figure 18.3(b) depicts both rendering passes that are necessary for reconstruction with a one-dimensional tent filter, but only with respect to the contribution of a single input sample. The contributions distributed simultaneously in a single pass are depicted in figure 18.4, respectively. In the first pass, the contributions of all relative left-hand neighbors are distributed. Consequently, the second pass distributes the contributions of all relative right-hand neighbors. Adding up the distributed contributions of these two passes yields the final result for all resampling points (i.e., linearly interpolated output values in this example). Figure 18.5 shows this from a more hardware-oriented perspective. We call a segment of the filter kernel from one integer location to the next a filter tile. Naturally, such a tile has length one. Thus, a one-dimensional tent filter has two filter tiles, corresponding to 95 shifted input samples (texture 0) (nearest−neighbor interpolation) x filter tile (texture 1) pass 1 0 1 2 3 mirrored + pass 2 1 2 3 4 output samples Figure 18.5: Tent filter (width two) used for reconstruction of a one-dimensional function in two passes. Imagine the values of the output samples added together from top to bottom. the fact that it has width two. Each pass uses two simultaneous textures, one texture unit point-sampling the original input texture, and the second unit using the current filter tile texture. These two textures are superimposed, multiplied, and added to the frame buffer. In this way, the contribution of a single specific filter tile to all output samples is calculated in a single rendering pass. The input samples used in a single pass correspond to a specific relative input sample location or offset with regard to the output sample locations. That is, in one pass the input samples with relative offset zero are used for all output samples, then the samples with offset one in the next pass, and so on. The number of passes necessary is equal to the number of filter tiles the filter kernel used consists of. Note that the subdivision of the filter kernel into its tiles is crucial to hardwareaccelerated high-quality filtering, and necessary in order to attain a correct mapping between locations in the input data and the filter kernel, and to achieve a consistent evaluation order of passes everywhere. A convolution sum can be evaluated in this way, since it needs only two basic inputs: the input samples, and the filter kernel. Because we change only the order of summation but leave the multiplication untouched, we need these two available at the same time. Therefore, we employ multi-texturing with (at least) two textures and retrieve input samples from the first texture, and filter kernel values from the second texture. Actually, due to the fact that only a single filter tile is needed during a single rendering pass, all tiles are stored and downloaded to the graphics hardware as separate textures. The required 96 replication of tiles over the output sample grid is easily achieved by configuring the hardware to automatically extend the texture domain beyond [0, 1] by simply repeating the texture via a clamp mode of GL REPEAT. In order to fetch input samples in unmodified form, nearest-neighbor interpolation has to be used for the input texture. If a given hardware architecture is able to support 2n textures at the same time, the number of passes can be reduced by n. That is, with two-texture multi-texturing four passes are needed for filtering with a cubic kernel in one dimension, whereas with four-texture multi-texturing only two passes are needed, etc. This algorithm is not limited to symmetric filter kernels, although symmetry can be exploited in order to save texture memory for the filter tile textures. It is also not limited to separable filter kernels, although exploiting separability can greatly enhance performance. Additionally, the algorithm is identical for orthogonal and perspective projections of the resulting images. Basically, it reconstructs at single locations in texture space, which can be viewed as happening before projection. Thus, it is independent from the projection used. Note that the method outlined above is not considering area-averaging filters, since we are assuming that magnification is desired instead of minification. This is in the vein of graphics hardware using bi-linear interpolation for magnification, and other approaches, usually mip-mapping, to deal with minification. 18.2 Reconstructing Object-Aligned Slices When enlarging images or reconstructing object-aligned slices through volumetric data taken directly from a stack of such slices, high-order two-dimensional filters can be used in order to achieve high-quality results. The basic algorithm outlined in the previous section for one dimension can easily be applied in two dimensions, exploiting two-texture multi-texturing hardware in multiple rendering passes. For each output pixel and pass, the algorithm takes two inputs: Unmodified (i.e., unfiltered) slice values, and filter kernel values. That is, two 2D textures are used simultaneously. One texture contains the entire source slice, and the other texture contains the filter tile needed in the current pass, which in this case is a two-dimensional unit square. In addition to using the appropriate filter tile, in each pass an appropriate offset has to be applied to the texture coordinates of the texture containing the input image. As explained in the previous section, each pass corresponds to a specific relative location of an input sample. Thus, the slice texture coordinates have to be offset and scaled in order to match the point-sampled input image grid with the grid of replicated filter tiles. In the case of a cubic filter kernel for bi-cubic filtering, sixteen passes need to be performed on two-texture multi-texturing hardware. However, the number of rendering passes actually needed can be reduced through several optimizations [16]. 97 18.3 Reconstructing View-Aligned Slices When planar slices through 3D volumetric data are allowed to be located and oriented arbitrarily, three-dimensional filtering has to be performed although the result is still twodimensional. On graphics hardware, this is usually done by tri-linearly interpolating within a 3D texture. Hardware-accelerated high-quality filtering can also be applied in this case in order to improve reconstruction quality considerably. The conceptually straightforward extension of the 2D approach described in the previous section (simultaneously using two 2D textures) achieves the equivalent for threedimensional reconstruction by simultaneously using two 3D textures. The first 3D texture contains the input volume in its entirety, whereas the second 3D texture contains the current filter tile, which in this case is a three-dimensional unit cube. In the case of a cubic filter kernel for tri-cubic filtering, 64 passes need to be performed on two-texture multi-texturing hardware. If such a kernel is symmetric, downloading eight 3D textures for the filter tiles suffices, generating the remaining 56 without any performance loss by mirroring texture coordinates. Due to the high memory consumption of 3D textures, it is especially important that the filter kernel need not be downloaded to the graphics hardware in its entirety if it is symmetric. If the filter kernel is separable, which fortunately many filters used for reconstruction purposes are, no 3D textures are required for storing the kernel [16]. If the kernel is both symmetric and separable, tri-cubic filtering can be achieved with just two 1D filter tile textures, each of which usually containing between 64 and 128 samples! 18.4 Volume Rendering Since hardware-accelerated high-quality filtering is able to reconstruct axis-aligned slices, as well as arbitrarily oriented slices, it can naturally be used for rendering all kinds of proxy geometry for direct volume rendering. Figure 18.1 shows an example of volume rendering with high-quality filtered slices. The algorithm can also be used to reconstruct gradients in high quality, in addition to reconstructing density values. This is possible in combination with hardware-accelerated methods that store gradients in the RGB components of a texture [41]. 98 Pre-Integrated Classification High accuracy in direct volume rendering is usually achieved by very high sampling rates, because the discrete approximation of the volume rendering integral will converge to the correct result for a small slice-to-slice distance d → 0, i.e., for high sampling rates n/D = 1/d. However, high sampling rates result in heavy performance losses, i.e. as rasterization requirements of the graphics hardware increase, the frame rates drop respectively. According to the sampling theorem, a correct reconstruction is only possible with sampling rates larger than the Nyquist frequency. Before the data is rendered, the scalar values from the volume are mapped to RGBA values. This classification step is achieved by introducing transfer functions for color densities c̃(s) and extinction densities τ (s), which map scalar values s = s(x) to colors and extinction coefficients. However, non-linear features of transfer functions may considerably increase the sampling rate required for a correct evaluation of the volume rendering integral as the Nyquist frequency of the fields c̃ s(x) and τ s(x) for the sampling along the viewing ray is approximately the product of the Nyquist frequencies of the scalar field s(x) and the maximum of the Nyquist frequencies of the two transfer functions c̃(s) and τ (s). Therefore, it is by no means sufficient to sample a volume with the Nyquist frequency of the scalar field if non-linear transfer functions are allowed. Artifacts resulting from this kind of undersampling are frequently observed unless they are avoided by very smooth transfer functions. In order to overcome the limitations discussed above, the approximation of the volume rendering integral has to be improved. In fact, many improvements have been proposed, e.g., higher-order integration schemes, adaptive sampling, etc. However, these methods do not explicitly address the problem of high Nyquist frequencies of c̃ s(x) and τ s(x) resulting from non-linear transfer functions. On the other hand, the goal of pre-integrated classification is to split the numerical integration into two integrations: one for the continuous scalar field s(x) and one for the transfer functions c̃(s) and τ (s) in order to avoid the problematic product of Nyquist frequencies. The first step is the sampling of the continuous scalar field s(x) along a viewing ray. Note that the Nyquist frequency for this sampling is not affected by the transfer functions. For the purpose of pre-integrated classification, the sampled values define a onedimensional, piecewise linear scalar field. The volume rendering integral for this piecewise linear scalar field is efficiently computed by one table lookup for each linear segment. The three arguments of the table lookup are the scalar value at the start (front) of the segment sf := s x(id) , the scalar value the end (back) of the segment sb := s x((i + 1)d) , and the length of the segment d. (See Figure 19.1.) More precisely spoken, the opacity αi of 99 sxΛ sb sxi 1 d s f sxi d i d i 1 d Λ d xi d xi 1 d xΛ Figure 19.1: Scheme of the parameters determining the color and opacity of the i-th ray segment. the i-th segment is approximated by αi = 1 − exp − ≈ 1 − exp − (i+1)d τ s x(λ) dλ id 1 τ (1 − ω)sf + ωsb d dω . (19.1) 0 Thus, αi is a function of sf , sb , and d. (Or of sf and sb , if the lengths of the segments are equal.) The (associated) colors C̃i are approximated correspondingly: 1 C̃i ≈ c̃ (1 − ω)sf + ωsb 0 ω (19.2) × exp − τ (1 − ω )sf + ω sb d dω d dω. 0 Analogously to αi , C̃i is a function of sf , sb , and d. Thus, pre-integrated classification will approximate the volume rendering integral by evaluating the following Equation: I≈ n i=0 i−1 C̃i (1 − αj ) j=0 with colors C̃i pre-computed according to Equation (19.2) and opacities αi pre-computed according to Equation (19.1). For non-associated color transfer function, i.e., when substituting c̃(s) by τ (s)c(s), we will also employ Equation (19.1) for the approximation of αi and the following approximation of the associated color C̃iτ : 1 τ C̃i ≈ τ (1 − ω)sf + ωsb c (1 − ω)sf + ωsb 0 ω (19.3) τ (1 − ω )sf + ω sb d dω d dω. × exp − 0 100 Note that pre-integrated classification always computes associated colors, whether a transfer function for associated colors c̃(s) or for non-associated colors c(s) is employed. In either case, pre-integrated classification allows us to sample a continuous scalar field s(x) without the need to increase the sampling rate for any non-linear transfer function. Therefore, pre-integrated classification has the potential to improve the accuracy (less undersampling) and the performance (fewer samples) of a volume renderer at the same time. 19.1 Accelerated (Approximative) Pre-Integration The primary drawback of pre-integrated classification in general is actually the preintegration required to compute the lookup tables, which map the three integration parameters (scalar value at the front sf , scalar value at the back sb , and length of the segment d) to pre-integrated colors C̃ = C̃(sf , sb , d) and opacities α = α(sf , sb , d). As these tables depend on the transfer functions, any modification of the transfer functions requires an update of the lookup tables. This might be no concern for games and entertainment applications, but it strongly limits the interactivity of applications in the domain of scientific volume visualization, which often depend on user-specified transfer functions. Therefore, we will suggest three methods to accelerate the pre-integration step. Firstly, under some circumstances it is possible to reduce the dimensionality of the tables from three to two (only sf and sb ) by assuming a constant length of the segments. Obviously, this applies to ray-casting with equidistant samples. It also applies to 3D texture-based volume visualization with orthographic projection and is a good approximation for most perspective projections. It is less appropriate for axes-aligned 2D texturebased volume rendering. Even if very different lengths occur, the complicated dependency on the segment length might be approximated by a linear dependency as suggested in [37]; thus, the lookup tables may be calculated for a single segment length. Secondly, a local modification of the transfer functions for a particular scalar value s does not require to update the whole lookup table. In fact, only the values C̃(sf , sb , d) and α(sf , sb , d) with sf ≤ s ≤ sb or sf ≥ s ≥ sb have to be recomputed; i.e., in the worst case about half of the lookup table has to be recomputed. Finally, the pre-integration may be greatly accelerated by evaluating the integrals in Equations (19.1), (19.2), and (19.3) by employing integral functions for τ (s), c̃(s), and τ (s)c(s), respectively. More specifically, Equation (19.1) for αi = α(sf , sb , d) can be rewritten as d T (sb ) − T (sf ) α(sf , sb , d) ≈ 1 − exp − sb − sf (19.4) s with the integral function T (s) := 0 τ (s)ds, which is easily computed in practice as the scalar values s are usually quantized. 101 Equation (19.2) for C̃i = C̃(sf , sb , d) may be approximated analogously: C̃(sf , sb , d) ≈ d K(sb ) − K(sf ) sb − sf (19.5) s with the integral function K(s) := 0 c̃(s)ds. However, this requires to neglect the attenuation within a ray segment. As mentioned above, this is a common approximation for post-classified volume rendering and well justified for small products τ (s)d. For the non-associated color transfer function c(s) we approximate Equation (19.3) by C̃ τ (sf , sb , d) ≈ d τ K (sb ) − K τ (sf ) . sb − sf (19.6) s with K τ (s) := 0 τ (s)c(s)ds. Thus, instead of numerically computing the integrals in Equations (19.1), (19.2), and (19.3) for each combination of sf , sb , and d, we will only once compute the integral functions T (s), K(s), or K τ (s) and employ these to evaluate colors and opacities according to Equations (19.4), (19.5), or (19.6) without any further integration. 102 Texture-based Pre-Integrated Volume Rendering Based on the description of pre-integrated classification in Section 19, we will now present a novel texture-based algorithms that implements pre-integrated classification. It employs dependent textures, i.e., relies on the possibility to convert fragment (or pixel) colors into texture coordinates. In contrast to paletted textures, dependent texture allow post-classification shading using a 1-dimensional lookup texture. However, we will use dependent texture to lookup pre-integrated ray-segment values. The volume texture maps (either three-dimensional or two-dimensional textures) contain the scalar values of the volume, just as for post-classification. As each pair of adjacent slices (either view-aligned or object-aligned) corresponds to one slab of the volume (see Figure 20.1), the texture maps of two adjacent slices have to be mapped onto one slice (either the front or the back slice) by means of multiple textures (see Section 20.1). Thus, the scalar values of both slices (front and back) are fetched from texture maps during the rasterization of the polygon for one slab (see Section 20.2). These two scalar values are required for a third texture fetch operation, which performs the lookup of pre-integrated colors and opacities from a two-dimensional texture map. This texture fetch depends on previously fetched texels; therefore, this third texture map is called a dependent texture map. sf front slice sb back slice Figure 20.1: A slab of the volume between two slices. The scalar value on the front (back) slice for a particular viewing ray is called sf (sb ). 103 The opacities of this dependent texture map are calculated according to Equation (19.1), while the colors are computed according to Equation (19.2) if the transfer function specifies associated colors c̃(s), and Equation (19.3) if it specifies non-associated colors c(s). In either case a back-to-front compositing algorithm is used for blending the ray segments into the framebuffer. Obviously, a hardware implementation of these algorithms depends on rather complicated texture fetch operations. Fortunately, the OpenGL texture shader extension recently proposed can in fact be customized to implement these algorithms. The details of this implementation are discussed in the following section. Our current implementation is based on NVidia’s GeForce3 graphics chip. NVidia introduced a flexible multi-texturing unit in their GeForce2 graphics processor via the register combiners OpenGL extension [33]. This unit allows the programming of per-pixel shading operations using three stages, two general and one final combiner stage. This register combiner extension is located behind the texel fetch unit in the rendering pipeline. Recently NVidia extended the register combiners in the GeForce3 graphics chip, by providing eight general and one final combiner stage with per-combiner constants via the register combiner2 extension. Additionally, the GeForce3 provides a programmable texture fetch unit [33] allowing four texture fetch operations via 21 possible commands, among them several dependent texture operations. This so called texture shader OpenGL extension and the register combiners are merged together in Microsoft’s DirectX8 API to form the pixel shader API. The texture shader extension refers to 2D textures only. NVidia proposed an equivalent extension for 3D texture fetches via the texture shader2 extension. The equivalent functionality is also available on ATI’s R200 graphics processor. ATI proposed the fragment shader OpenGL extension, that combines texture fetch operations and per-fragment calculations in a single API. The pre-integrated volume rendering algorithm consists of three basic steps: First two adjacent texture slices are projected onto one of them, either the back slice onto the front slice or vice versa. Thereby, two texels along each ray (one from the front and one from the back slice) are projected onto each other. They are fetched using the texture shader extension and then used as texture coordinates for a dependent texture fetch containing preintegrated values for each combination of back and front texels. For isosurface rendering, the dependent texture contains color, transparency, and interpolation values, if the isovalue is in between the front and back texel value. The gradient and voxel values are stored in RGBA textures. In the register combiners gradients are interpolated and dot product lighting calculations are performed. The following sub-sections explain all these steps in detail. 20.1 Projection The texture-based volume rendering algorithm usually blends object-aligned texture slices of one of the three texture stacks back-to-front into the frame buffer using the over operator. Instead of this slice-by-slice approach, we render slab-by-slab (see Figure 20.1) 104 from back to front into the frame buffer. A single polygon is rendered for each slab with the two corresponding textures as texture maps. In order to have texels along all viewing rays projected upon each other for the texel fetch operation, either the back slice must be projected onto the front slice or vice versa. The projection is thereby accomplished by adapting texture coordinates for the projected texture slice and retaining the texture coordinates of the other texture slice. Figure 20.2 shows the projection for the object- and view-aligned rendering algorithms. For direct volume rendering without lighting, textures are defined in the OpenGL texture format GL LUMINANCE8. For volume shading and shaded isosurfaces GL RGBA textures are used, which contain the pre-calculated volume gradient and the scalar values. 20.2 Texel Fetch For each fragment, texels of two adjacent slices along each ray through the volume are projected onto each other. Thus, we can fetch the texels with their given per-fragment texture coordinates. Then the two fetched texels are used as lookup coordinates into a dependent 2D texture, containing pre-integrated values for each of the possible combinations of front and back scalar values. NVidia’s texture shader extension provides a texture shader operation that employs the previous texture shader’s green and blue (or red and alpha) colors as the (s, t) coordinates for a non-projective 2D texture lookup. Unfortunately, we cannot Figure 20.2: Projection of texture slice vertices onto adjacent slice polygons for objectaligned slices (left) and view-aligned slices (right) 105 Figure 20.3: Texture shader setup for dependent 2D texture lookup with texture coordinates obtained from two source textures. use this operation as our coordinates are fetched from two separate 2D textures. Instead, as a workaround, we use the dot product texture shader, which computes the dot product of the stage’s (s, t, r) and a vector derived from a previous stage’s texture lookup (see Figure 20.3). The result of two of such dot product texture shader operations are employed as coordinates for a dependent texture lookup. Here the dot product is only required to extract the front and back volume scalars. This is achieved by storing the volume scalars in the red components of the textures and applying a dot product with a constant vector v = (1, 0, 0)T . The texture shader extension allows us to define to which previous texture fetch the dot product refers with the GL PREVIOUS TEXTURE INPUT NV texture environment. The first dot product is set to use the fetched front texel values as previous texture stage, the second uses the back texel value. In this approach, the second dot product performs the texture lookup into our dependent texture via texture coordinates obtained from two different textures. 106 For direct volume rendering without lighting the fetched texel from the last dependent texture operation is routed through the register combiners without further processing and blended into the frame buffer with the OpenGL blending function glBlendFunc(GL ONE,GL ONE MINUS SRC ALPHA). 107 Rasterization Isosurfaces using Dependent Textures As discussed in [37], pre-integrated volume rendering can be employed to render multiple isosurfaces. The basic idea is to color each ray segment according to the first isosurface intersected by the ray segment. Examples for such dependent textures are depicted in Figure 21.3. For shading calculations, RGBA textures are usually employed, that contain the volume gradient in the RGB components and the volume scalar in the ALPHA component. As we use dot products to extract the front and back volume scalar and the dot product refers only to the first three components of a vector, we store the scalar data in the RED component. The first gradient component is stored in the ALPHA component in return. For lighting purposes the gradient of the front and back slice has to be rebuilt in the RGB components (ALPHA has to be routed back to RED) and the two gradients have to be interpolated depending on a given isovalue (see Figure 21.1). The interpolation value for the back slice is given by IP = (siso − sf )/(sb − sf ); the interpolation value for the front slice is 1−IP (see also [37]). IP could be calculated on-the-fly for each given isovalue, back and front scalar. Unfortunately, this requires a division in the register combiners, which is not available. For this reason we have to pre-calculate the interpolation values for each combination of back and front scalar and store them in the dependent texture. Ideally, this interpolation value would be looked up using a second dependent texture. Unfortunately, NVidia’s texture shader extension only allows four texture operations, which we already spent. Hence we have to store the interpolation value IP in the first and only dependent texture. There are two possible ways to store these interpolation values. The first approach stores the interpolation value (IP) in the ALPHA component of the dependent texture (R,G,B,IP). The main disadvantage of this method is, that the transparency, which is usually freely definable for each isosurface’s back and front face, is now constant for all isosurfaces’ faces. In order to obtain a transparency value of zero for ray segments that do not intersect the isosurface and a constant transparency for ray segments that intersect the isosurface the interpolation values are stored in the ALPHA channel in the range 128 to 255 (7 bit). An interpolation value of 0 is stored for ray segments that do not intersect the isosurface. This allows us to scale the ALPHA channel with a factor of 2, to get an ALPHA of 1.0 for ray segments intersecting the isosurface and an ALPHA of 0 otherwise. Afterwards, a multiplication of the result with the constant transparency can be performed. For the interpolation the second general combiner’s input mapping for the interpolation is set to GL HALF BIAS NORMAL NV and GL UNSIGNED INVERT NV to map the the interpolation value to the ranges 0 to 0.5 and 0.5 to 0 (see Figure 21.1). After the interpolation the result is scaled with 2 in order to get the correct interpolation result. 108 Figure 21.1: Register Combiner setup for gradient reconstruction and interpolation with interpolation values stored in alpha . Note that the interpolation values are stored in the range of 0.5 to 1.0, which requires proper input and output mappings for general combiner 2 to obtain a correct interpolation. M denotes the gradient of the back slice, N the front slice gradient respectively. 109 Figure 21.2: Register Combiner setup for gradient reconstruction and interpolation with interpolation values stored in blue. Note that the interpolation values are routed in the alpha portion and back into the RGB portion to distribute the values onto RGB for interpolation.M denotes the gradient of the back slice, N the front slice gradient respectively. 110 Our second approach stores the interpolation value IP in the BLUE component of the dependent texture (R,G,IP,A). Now the transparency can be freely defined for each isosurface and each back and front face of the isosurface, but the register combiners are used to fill the blue color channel with a constant value, that is equal for all isosurfaces’ back and front faces. Also we can use all 8 bits of the BLUE color channel for the interpolation value. In order to distribute the interpolation value from the BLUE color channel on all RGB components for the interpolation, BLUE is first routed into the ALPHA portion of a general combiner stage and then routed back into the RGB portion (see Figure 21.2). 21.1 Lighting After the per-fragment calculation of the isosurfaces’ gradient in the first three general combiner stages, the remaining five general combiners and the final combiner can be used for lighting computations. Diffuse and specular lighting with a maximum power of 256 is possible by utilizing the dot product of the register combiners and increasing the power by multiplying the dot product with itself. Currently we calculate I = Ia + Id C(n · l1 ) + Is (n · l2 )16 , where n denotes the interpolated normal, l1 the diffuse light source direction, l2 the specular light source direction, and C the color of the isosurface. A visualization of a CT scan of a human head at different thresholds is shown in Figure 21.3. The same approach can also be employed for volume shading. For lighting, the average gradient at the front and back slice is used, thus no interpolation values have to be stored in the dependent texture. The dependent texture holds pre-integrated opacity and color values, latter are employed for diffuse and specular lighting calculations. The implemented lighting model computes I = Id C(n · l1 ) + Is C(n · l2 )16 , where n denotes the interpolated normal, l1 the diffuse light source direction, l2 the specular light source direction and C the pre-integrated color of the ray segment. Dynamic lighting as described above requires RGBA textures, which consume a lot of texture memory. Alternatively, static lighting is possible by storing pre-calculated dot products of gradient and light vectors for each voxel in the textures. The dot products at the start and end of a ray segment are then interpolated for a given isovalue in the register combiners. For this purpose LUMINANCE ALPHA textures can be employed, which consume only half of the memory of RGBA textures. The intermixing of semi-transparent volumes and isosurfaces is performed by a multipass approach that first renders a slice with a pre-integrated dependent texture and then renders the slice again with a isosurface dependent texture. Without the need of storing the interpolation values in the dependent texture, a single pass approach could also be implemented, which neglects isosurfaces and semi-transparent volumes in a slab at the same time. Examples of dependent textures for direct and isosurface volume rendering are presented in Figure 21.3. 111 Figure 21.3: Pre-Integrated isosurfaces, Left to right: Multiple colored isosurfaces of a synthetic data set with the corresponding dependent texture. Isosurfaces of a human head CT scan (2563 ): skin, skull, semi-transparent skin with opaque skull and the dependent texture for the latter image. Figure 21.4: Images showing a comparison of a) pre-shaded, b) post-shaded without additional slices, c) post-shaded with additional slices and d) pre-integrated volume visualization of tiny structures of the inner ear (128 × 128 × 30) rendered with 128 slices. Note that the slicing artifacts in (b) can be removed (c) by rendering additional slices. With pre-integrated volume rendering (d) there are no slicing artifacts visible with the original number of slices. 112 Figure 21.5: Comparison of the results of pre-classification (top), post-classification (middle) and pre-integrated classification (bottom) for direct volume rendering of a spherical harmonic (Legendre’s) function (163 voxels) with random transfer functions. 113 Volumetric FX One drawback of volume based graphics is that high frequency details cannot be represented in small volumes. These high frequency details are essential for capturing the characteristics of many volumetric objects such as clouds, smoke, trees, hair, and fur. Procedural noise simulation is a very powerful tool to use with small volumes to produce visually compelling simulations of these types of volumetric objects. Our approach is similar to Ebert s approach for modeling clouds[11]; use a coarse technique for modeling the macrostructure and use procedural noise based simulations for the microstructure. We have adapted this approach to interactive volume rendering through two volume perturbation approaches which are efficient on modern graphics hardware. The first approach is used to perturb optical properties in the shading stage while the second approach is used to perturb the volume itself. Both volume perturbation approaches employ a small 3D-perturbation volume with 3 32 voxels. Each texel is initialized with four random 8-bit numbers, stored as RGBA components, and blurred slightly to hide the artifacts caused by trilinear interpolation. Texel access is then set to repeat. An additional pass is required for both approaches due to limitations imposed on the number of textures which can be simultaneously applied to a polygon, and the number of sequential dependent texture reads permitted. The additional pass occurs before the steps outlined in the previous section. Multiple copies of the noise texture are applied to each slice at different scales. They are then weighted and summed per pixel. To animate the perturbation, we add a different offset to each noise texture s coordinates and update it each frame. The first approach is similar to Ebert s lattice based noise approach[11]. It uses the four per-pixel noise components to modify the optical properties of the volume after the the transfer function has been evaluated. This approach makes the materials appear to have inhomogeneities. We allow the user to select which optical properties are modified. This technique is used to get the subtle iridescence effects seen in Figure 22.1 (bottom). The second approach is closely related to Peachey s vector based noise simulation technique[11]. It uses the noise to modify the location of the data access for the volume. In this case three components of the noise texture form a vector, which is added to the texture coordinates for the volume data per pixel. The data is then read using a dependent texture read. The perturbed data is rendered to a pixel buffer that it is used instead of the original volume data. Figure 22.2 illustrates this process. A shows the original texture data. B shows how the perturbation texture is applied to the polygon twice, once to achieve low frequency with high amplitude perturbations (large arrows) and again to achieve high frequency with low amplitude perturbations (small arrows). Notice that the high frequency content is created by allowing the texture to repeat. Figure 22.2 C shows the resulting texture coordinate perturbation field when the multiple displacements are weighted and summed. D shows the image generated when the texture is read using the perturbed texture coordinates. Figure 22.1 shows how a coarse volume model can be combined with 114 Figure 22.1: Procedural clouds. The image on the top shows the underlying data, 643 . The center image shows the perturbed volume. The bottom image shows the perturbed volume lit from behind with low frequency noise added to the indirect attenuation to achieve subtle iridescence effects. 115 Figure 22.2: An example of texture coordinate perturbation in 2D. A shows a square polygon mapped with the original texture that is to be perturbed. B shows a low resolution perturbation texture applied the the polygon multiple times at different scales. These offset vectors are weighted and summed to offset the original texture coordinates as seen in C. The texture is then read using the modified texture coordinates, producing the image seen in D. 116 Figure 22.3: Procedural fur. Left: Original teddy bear CT scan. Right: teddy bear with fur created using high frequency texture coordinate perturbation. our volume perturbation technique to produce an extremely detailed interactively rendered cloud. The original 643 voxel dataset is generated from a simple combination of volumetric blended implicit ellipses and defines the cloud macrostructure[11]. The final rendered image in Figure 22.1(c), produced with our volume perturbation technique, shows detail that would be equivalent to unperturbed voxel dataset of at least one hundred times the resolution. Figure 22.3 demonstrates this technique on another example. By perturbing the volume with a high frequency noise, we can obtain a fur-like surface on the teddy bear. 118 Volumetric FX (a) Radial distance volume with highfrequency fire transfer function (b) Perlin-Noise-Volume with fire-transfer function (c) Weighted combination of the distance volume und two Perlin-Noise volumes (d) Like (c), but with higher weights for the Perlin-Noise-volumes Figure 22.4: Pre-Integrated volume rendering of a fireball. The fireball effect is achieved by mixing different volumes during rendering. 119 High-Quality Volume Graphics on Consumer PC Hardware Bibliography Klaus Engel Markus Hadwiger Joe M. Kniss Christof Rezk-Salama Course Notes 42 Bibliography [1] Andreas H. König and Eduard M. Gröller. Mastering transfer function specification by using volumepro technology. Technical Report TR-186-2-00-07, Vienna University of Technology, March 2000. [2] ATI web page. http://www.ati.com/. [3] Chandrajit L. Bajaj, Valerio Pascucci, and Daniel R. Schikore. The Contour Spectrum. In Proceedings IEEE Visualization 1997, pages 167–173, 1997. [4] Uwe Behrens and Ralf Ratering. Adding Shadows to a Texture-Based Volume Renderer. In 1998 Volume Visualization Symposium, pages 39–46, 1998. [5] J. Blinn. Models of Light Reflection for Computer Synthesized Pictures . Computer Graphics, 11(2):192–198, 1977. [6] J. Blinn and M. Newell. Texture and Reflection in Computer Generated Images. Communcations of the ACM, 19(10):362–367, 1976. [7] J. F. Blinn. Jim blinn’s corner: Image compositing–theory. IEEE Computer Graphics and Applications, 14(5), 1994. [8] B. Cabral, N. Cam, and J. Foran. Accelerated volume rendering and tomographic reconstruction using texture mapping hardware. In Proc. of IEEE Symposium on Volume Visualization, pages 91–98, 1994. [9] Yoshinori Dobashi, Kazufumi Kanede, Hideo Yamashita, Tsuyoshi Okita, and Tomoyuki Hishita. A Simple, Efficient Method for Realistic Animation of Clouds. In Siggraph 2000, pages 19–28, 2000. [10] R. A. Drebin, L. Carpenter, and P. Hanrahan. SIGGRAPH ’88, pages 65–74, 1988. Volume rendering. In Proc. of [11] D. Ebert, F. K. Musgrave, D. Peachey, K. Perlin, and S. Worley. Texturing and Modeling: A Procedural Approach. Academic Press, July 1998. [12] K. Engel, M. Kraus, and T. Ertl. High-Quality Pre-Integrated Volume Rendering Using Hardware-Accelerated Pixel Shading. In Proc. Graphics Hardware, 2001. [13] T. J. Farrell, M. S. Patterson, and B. C. Wilson. A diffusion theory model of spatially resolved, steady-state diffuse reflectance for the non-invasive determination of tissue optical properties in vivo. Medical Physics, 19:879–888, 1992. BIBLIOGRAPHY 121 [14] N. Greene. Environment Mapping and Other Applications of World Projection. IEEE Computer Graphics and Applications, 6(11):21–29, 1986. [15] M. Hadwiger, T. Theußl, H. Hauser, and E. Gröller. Hardware-accelerated highquality filtering on PC hardware. In Proc. of Vision, Modeling, and Visualization 2001, pages 105–112, 2001. [16] M. Hadwiger, I. Viola, and H. Hauser. Fast convolution with high-resolution filters. Technical Report TR-VRVis-2002-001, VRVis Research Center for Virtual Reality and Visualization, 2002. [17] M.J. Harris and A. Lastra. Real-time cloud rendering. In Proc. of Eurographics 2001, pages 76–84, 2001. [18] Taosong He, Lichan Hong, Arie Kaufman, and Hanspeter Pfister. Generation of Transfer Functions with Stochastic Search Techniques. In Proceedings IEEE Visualization 1996, pages 227–234, 1996. [19] James T. Kajiya and Brian P. Von Herzen. Ray Tracing Volume Densities. In ACM Computer Graphics (SIGGRAPH ’84 Proceedings), pages 165–173, July 1984. [20] R. G. Keys. Cubic convolution interpolation for digital image processing. IEEE Trans. Acoustics, Speech, and Signal Processing, ASSP-29(6):1153–1160, December 1981. [21] Joe Kniss, Gordon Kindlmann, and Charles Hansen. Multi-Dimensional Transfer Functions for Interactive Volume Rendering. TVCG, 2002 to appear. [22] P. Lacroute and M. Levoy. Fast volume rendering using a shear-warp factorization of the viewing transformation. In Proc. of SIGGRAPH ’94, pages 451–458, 1994. [23] E. LaMar, B. Hamann, and K. Joy. Multiresolution Techniques for Interactive Texturebased Volume Visualization. In Proc. IEEE Visualization, 1999. [24] M. Levoy. Display of surfaces from volume data. IEEE Computer Graphics and Applications, 8(3):29–37, May 1988. [25] W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3D surface construction algorithm. In Proc. of SIGGRAPH ’87, pages 163–169, 1987. [26] J. Marks, B. Andalman, P.A. Beardsley, and H. Pfister et al. Design Galleries: A General Approach to Setting Parameters for Computer Graphics and Animation. In ACM Computer Graphics (SIGGRAPH ’97 Proceedings), pages 389–400, August 1997. [27] N. Max. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics, 1(2):99–108, 1995. [28] T. McReynolds, D. Blythe, B. Grantham, and S. Nelson. Advanced graphics programming techniques using OpenGL. In SIGGRAPH 2000 course notes, 2000. 122 BIBLIOGRAPHY [29] M. Meißner, U. Hoffmann, and W. Straßer. Enabling Classification and Shading for 3D-texture Based Volume Rendering Using OpenGL and Extensions. In Proc. IEEE Visualization, 1999. [30] D. P. Mitchell and A. N. Netravali. Reconstruction filters in computer graphics. In Proc. of SIGGRAPH ’88, pages 221–228, 1988. [31] Herke Jan Noordmans, Hans T.M. van der Voort, and Arnold W.M. Smeulders. Spectral Volume Rendering. In IEEE Transactions on Visualization and Computer Graphics, volume 6. IEEE, July-September 2000. [32] NVIDIA web page. http://www.nvidia.com/. [33] NVIDIA OpenGL extension http://www.nvidia.com/developer. specifications document. [34] A. V. Oppenheim and R. W. Schafer. Digital Signal Processing. Prentice Hall, Englewood Cliffs, 1975. [35] B.T. Phong. Illumination for Computer Generated Pictures. Communications of the ACM, 18(6):311–317, June 1975. [36] C. Rezk-Salama, K. Engel, M. Bauer, G. Greiner, and T. Ertl. Interactive Volume Rendering on Standard PC Graphics Hardware Using Multi-Textures and Multi-Stage Rasterization. In Proc. SIGGRAPH/Eurographics Workshop on Graphics Hardware, 2000. [37] S. Röttger, M. Kraus, and T. Ertl. Hardware-accelerated volume and isosurface rendering based on cell-projection. In Proc. of IEEE Visualization 2000, pages 109–116, 2000. [38] J. Schimpf. 3Dlabs OpenGL http://www.3dlabs.com/support/developer/ogl2/. [39] M. Segal and K. Akeley. http://www.opengl.org. 2.0 The OpenGL Graphics System: white papers. A Specification. [40] Lihong V. Wang. Rapid modelling of diffuse reflectance of light in turbid slabs. J. Opt. Soc. Am. A, 15(4):936–944, 1998. [41] R. Westermann and T. Ertl. Efficiently using graphics hardware in volume rendering applications. In Proc. of SIGGRAPH ’98, pages 169–178, 1998. [42] C. M. Wittenbrink, T. Malzbender, and M. E. Goss. Opacity-weighted color interpolation for volume sampling. In Proc. of IEEE Symposium on Volume Visualization, pages 135–142, 1998.