High-Quality Volume Graphics on Consumer PC Hardware

advertisement
High-Quality Volume Graphics
on Consumer PC Hardware
Markus Hadwiger
Joe M. Kniss
Klaus Engel
Christof Rezk-Salama
Course Notes 42
Abstract
Interactive volume visualization in science and entertainment is no longer restricted to
expensive workstations and dedicated hardware thanks to the fast evolution of consumer
graphics driven by entertainment markets. Course participants will learn to leverage new
features of modern graphics hardware to build high-quality volume rendering applications using OpenGL. Beginning with basic texture-based approaches, the algorithms are
improved and expanded incrementally, covering illumination, non-polygonal isosurfaces,
transfer function design, interaction, volumetric effects, and hardware accelerated filtering. The course is aimed at scientific researchers and entertainment developers. Course
participants are provided with documented source code covering details usually omitted in
publications.
2
Contact
Joe Michael Kniss (Course Organizer)
Scientific Computing and Imaging,
School of Computing
University of Utah
50 S. Central Campus Dr. #3490
Salt Lake City, UT 84112
Email: jmk@cs.utah.edu
Phone: 801-581-7977
Klaus Engel
Visualization and Interactive Systems Group (VIS)
University of Stuttgart
Breitwiesenstraße 20-2
70565 Stuttgart, Germany
Email: Klaus.Engel@informatik.uni-stuttgart.de
Phone: +49 711 7816 208
Fax:
+49 711 7816 340
Markus Hadwiger
VRVis Research Center for Virtual Reality and Visualization
Donau-City-Straße 1
A-1220 Vienna, Austria
Email: msh@vrvis.at
Phone: +43 1 20501 30603
Fax:
+43 1 20501 30900
Christof Rezk-Salama
Computer Graphics Group
University of Erlangen-Nuremberg
Am Weichselgarten 9
91058 Erlangen, Germany
Email: rezk@cs.fau.de
Phone: +49 9131 85-29927
Fax:
+49 9131 85-29931
3
Lecturers
Klaus Engel is a PhD candidate at the Visualization and Interactive Systems Group at
the University of Stuttgart. He received a Diplom (Masters) of computer science from the
University of Erlangen in 1997. From January 1998 to December 2000, he was a research
assistant at the Computer Graphics Group at the University of Erlangen-Nuremberg. Since
2000, he is a research assistant at the Visualization and Interactive Systems Group of Prof.
Thomas Ertl at the University of Stuttgart. He has presented the results of his research
at international conferences, including IEEE Visualization, Visualization Symposium and
Graphics Hardware. In 2001, his paper ”High-Quality Pre-Integrated Volume Rendering
Using Hardware-Accelerated Pixel Shading” has won the best paper award at the SIGGRAPH/Eurographics Workshop on Graphics Hardware. He has regularly taught courses
and seminars on computer graphics, visualization and computer games algorithms. His
PhD thesis with the title ”Strategies and Algorithms for Distributed Volume-Visualization
on Different Graphics-Hardware Architectures” is currently under review.
Markus Hadwiger is a researcher in the ”Basic Research in Visualization” group at the
VRVis Research Center in Vienna, Austria, and a PhD student at the Vienna University
of Technology. The focus of his current research is exploiting consumer graphics hardware
for high quality visualization at interactive rates, especially volume rendering for scientific
visualization. First results on high quality filtering and reconstruction of volumetric data
have been presented as technical sketch at SIGGRAPH 2001, and as a paper at Vision,
Modeling, and Visualization 2001. He is regularly teaching courses and seminars on computer graphics, visualization, and game programming. Before concentrating on scientific
visualization, he was working in the area of computer games and interactive entertainment.
His master’s thesis ”Design and Architecture of a Portable and Extensible Multiplayer 3D
Game Engine” describes the game engine of Parsec (http://www.parsec.org/), a still active
cross-platform game project, whose early test builds have been downloaded by over 100.000
people, and were also included on several Red Hat and SuSE Linux distributions.
Joe Kniss is a masters student at the University of Utah. He is a research assistant
in the Scientific Computing and Imaging Institute. His current research has focused on
interactive hardware based volume graphics. A recent paper, Interactive Volume Rendering
Using Multi-dimensional Transfer Functions and Direct Manipulation Widgets, won Best
Paper at Visualization 2001. He also participated on the Commodity Graphics Accelerators
for Scientific Visualization Panel, which won the Best Panel award at Visualization 2001.
His previous work demonstrates a system for large scale parallel volume rendering using
graphics hardware. New results for this work were presented by Al McPherson at the
Siggraph 2001 course on Commodity-Based Scalable Visualization. He has also given
numerous lectures on introductory and advanced topics in computer graphics, visualization,
and volume rendering.
4
Christof Rezk-Salama has received a PhD in Computer Science from the University of
Erlangen in 2002. Since January 1999, he is a research assistant at the Computer Graphics
Group and a scholarship holder at the graduate college ”3D Image Analysis and Synthesis”.
The results of his research have been presented at international conferences, including
IEEE Visualization, Eurographics, MICCAI and Graphics Hardware. In 2000, his paper
”Interactive Volume Rendering on Interactive Volume Rendering on Standard PC Graphics
Hardware” has won the best paper award at the SIGGRAPH/Eurographics Workshop
on Graphics Hardware. He has regularly taught courses on graphics programming and
conceived tutorials and seminars on computer graphics, geometric modeling and scientific
visualization. His PhD thesis with the title ”Volume Rendering Techniques for General
Purpose Hardware” is currently in print. He has gained practical experience in several
scientific projects in medicine, geology and archeology.
5
Contents
Introduction
8
1 Motivation
10
2 Volume Rendering
2.1 Volume Data . . . . . . . . . . . . . .
2.2 Sampling and Reconstruction . . . . .
2.3 Direct Volume Rendering . . . . . . .
2.3.1 Optical Models . . . . . . . . .
2.3.2 The Volume Rendering Integral
2.3.3 Ray-Casting . . . . . . . . . .
2.3.4 Alpha Blending . . . . . . . . .
2.3.5 The Shear-Warp Algorithm . .
2.4 Non-Polygonal Iso-Surfaces . . . . . .
2.5 Maximum Intensity Projection . . . . .
.
.
.
.
.
.
.
.
.
.
11
11
12
13
14
15
16
17
18
19
20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
22
23
24
25
26
26
27
27
28
29
29
30
32
33
33
34
34
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Graphics Hardware
3.1 The Graphics Pipeline . . . . . . . . . . . . . . . .
3.1.1 Geometry Processing . . . . . . . . . . . . .
3.1.2 Rasterization . . . . . . . . . . . . . . . . .
3.1.3 Fragment Operations . . . . . . . . . . . .
3.2 Consumer PC Graphics Hardware . . . . . . . . . .
3.2.1 NVIDIA . . . . . . . . . . . . . . . . . . .
3.2.2 ATI . . . . . . . . . . . . . . . . . . . . . .
3.3 Fragment Shading . . . . . . . . . . . . . . . . . .
3.3.1 Traditional OpenGL Multi-Texturing . . . .
3.3.2 Programmable Fragment Shading . . . . . .
3.4 NVIDIA Fragment Shading . . . . . . . . . . . . .
3.4.1 Texture Shaders . . . . . . . . . . . . . . .
3.4.2 Register Combiners . . . . . . . . . . . . .
3.5 ATI Fragment Shading . . . . . . . . . . . . . . . .
3.6 Other OpenGL Extensions . . . . . . . . . . . . . .
3.6.1 GL EXT blend minmax . . . . . . . . . . . .
3.6.2 GL EXT texture env dot3 . . . . . . . . .
3.6.3 GL EXT paletted texture, GL EXT shared
4 Acknowledgments
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
texture
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
palette
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
35
6
Texture-Based Methods
35
5 Sampling a Volume Via Texture Mapping
5.1 Proxy Geometry . . . . . . . . . . . . .
5.2 2D-Textured Object-Aligned Slices . . .
5.3 2D Slice Interpolation . . . . . . . . . .
5.4 3D-Textured View-Aligned Slices . . . .
5.5 3D-Textured Spherical Shells . . . . . .
5.6 Slices vs. Slabs . . . . . . . . . . . . . .
.
.
.
.
.
.
37
38
40
44
45
47
47
.
.
.
.
.
.
.
.
49
49
49
50
51
52
53
54
54
.
.
.
.
.
.
.
.
.
.
.
.
6 Components of a Hardware Volume Renderer
6.1 Volume Data Representation . . . . . . . .
6.2 Transfer Function Representation . . . . . .
6.3 Volume Textures . . . . . . . . . . . . . . .
6.4 Transfer Function Tables . . . . . . . . . . .
6.5 Fragment Shader Configuration . . . . . . .
6.6 Blending Mode Configuration . . . . . . . .
6.7 Texture Unit Configuration . . . . . . . . .
6.8 Proxy Geometry Rendering . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Acknowledgments
56
Illumination Techniques
56
8 Local Illumination
58
9 Gradient Estimation
59
10 Non-polygonal Shaded Isosurfaces
60
11 Per-Pixel Illumination
62
12 Advanced Per-Pixel Illumination
63
13 Reflection Maps
65
Classification
67
14 Introduction
69
15 Transfer Functions
70
7
16 Extended Transfer Function
16.1 Optical properties . . . . . .
16.2 Traditional volume rendering .
16.3 The Surface Scalar . . . . . .
16.4 Shadows . . . . . . . . . . .
16.5 Translucency . . . . . . . . .
16.6 Summary . . . . . . . . . . .
.
.
.
.
.
.
74
74
74
75
75
78
83
17 Transfer Functions
17.1 Multi-dimensional Transfer Functions . . . . . . . . . . . . . . . . . . . . . .
17.2 Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
85
86
87
Advanced Techniques
90
18 Hardware-Accelerated High-Quality Filtering
18.1 Basic principle . . . . . . . . . . . . . . . .
18.2 Reconstructing Object-Aligned Slices . . . .
18.3 Reconstructing View-Aligned Slices . . . . .
18.4 Volume Rendering . . . . . . . . . . . . . .
92
92
97
98
98
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19 Pre-Integrated Classification
99
19.1 Accelerated (Approximative) Pre-Integration . . . . . . . . . . . . . . . . . . 101
20 Texture-based Pre-Integrated Volume Rendering
103
20.1 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
20.2 Texel Fetch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
21 Rasterization Isosurfaces using Dependent Textures
108
21.1 Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
22 Volumetric FX
Bibliography
114
117
High-Quality Volume Graphics
on Consumer PC Hardware
Introduction
Klaus Engel
Course Notes 42
Markus Hadwiger
Joe M. Kniss
Christof Rezk-Salama
9
Motivation
The huge demand for high-performance 3D computer graphics generated by computer
games has led to the availability of extremely powerful 3D graphics accelerators in the
consumer marketplace. These graphics cards by now not only rival, but in many areas
even surpass, tremendously expensive graphics workstations from just a couple of years
ago.
Current state-of-the-art consumer graphics chips like the NVIDIA GeForce 4, or the
ATI Radeon 8500, offer a level of programmability and performance that not only makes it
possible to perform traditional workstation tasks on a cheap personal computer, but even
enables the use of rendering algorithms that previously could not be employed for real-time
graphics at all.
Traditionally, volume rendering has especially high computational demands. One of
the major problems of using consumer graphics hardware for volume rendering is the
amount of texture memory required to store the volume data, and the corresponding
bandwidth consumption when texture fetch operations cause basically all of these data to
be transferred over the bus for each rendered frame.
However, the increased programmability of consumer graphics hardware today allows
high-quality volume rendering, for instance with respect to application of transfer functions,
shading, and filtering. In spite of the tremendous requirements imposed by the sheer
amount of data contained in a volume, the flexibility and quality, but also the performance,
that can be achieved by volume renderers for consumer graphics hardware is astonishing,
and has made possible entirely new algorithms for high-quality volume rendering.
In the introductory part of these notes, we will start with a brief review of volume
rendering, already with an emphasis on how it can be implemented on graphics hardware,
in chapter 2, and continue with an overview of the most important consumer graphics
hardware architectures that enable high-quality volume rendering in real-time, in chapter 3.
Throughout these notes, we are using OpenGL [39] in descriptions of graphics architecture and features, and for showing example code fragments. Currently, all of the advanced
features of programmable graphics hardware are exposed through OpenGL extensions, and
we introduce their implications and use in chapter 3. Later chapters will make frequent
use of many of these extensions.
In this course, we restrict ourselves to volume data defined on rectilinear grids, which
is the grid type most conducive to hardware rendering. In such grids, the volume data
are comprised of samples located at grid points that are equispaced along each respective
volume axis, and can therefore easily be stored in a texture map. Despite many similarities,
hardware-based algorithms for rendering unstructured grids, where volume samples are
located at the vertices of an unstructured mesh, e.g., at the vertices of tetrahedra, are
radically different in many respects. Thus, they are not covered in this course.
10
Volume Rendering
The term volume rendering [24, 10] describes a set of techniques for rendering threedimensional, i.e., volumetric, data. Such data can be acquired from different sources, like
medical data from Computed Tomography (CT) or Magnetic Resonance Imaging (MRI)
scanners, computational fluid dynamics (CFD), seismic data, or any other data represented
as a three-dimensional scalar field. Volume data can, of course, also be generated synthetically, i.e., procedurally [11], which is especially useful for rendering fluids and gaseous
objects, natural phenomena like clouds, fog, and fire, visualizing molecular structures, or
rendering explosions and other effects in 3D computer games.
Although volumetric data can be difficult to visualize and interpret, it is both worthwhile and rewarding to visualize them as 3D entities without falling back to 2D subsets. To
summarize succinctly, volume rendering is a very powerful way for visualizing volumetric
data and aiding the interpretation process, and can also be used for rendering high-quality
special effects.
2.1
Volume Data
In contrast to surface data, which are inherently two-dimensional, even though surfaces
are often embedded in three-space, volumetric data are comprised of a three-dimensional
scalar field:
(2.1)
f (x) ∈ IR with x ∈ IR3
Although in principle defined over a continuous three-dimensional domain (IR3 ), in the
context of volume rendering this scalar field is stored as a 3D array of values, where each
of these values is obtained by sampling the continuous domain at a discrete location. The
Figure 2.1: Voxels constituting a volumetric object after it has been discretized.
11
individual scalar data values constituting the sampled volume are referred to as voxels
(volume elements), analogously to the term pixels used for denoting the atomic elements
of discrete two-dimensional images.
Figure 2.1 shows a depiction of volume data as a collection of voxels, where each
little cube represents a single voxel. The corresponding sampling points would usually be
assumed to lie in the respective centers of these cubes.
Although imagining voxels as little cubes is convenient and helps to visualize the immediate vicinity of individual voxels, it is more accurate to identify each voxel with a
sample obtained at a single infinitesimally small point in IR3 . In this model, the volumetric function is only defined at the exact sampling locations. From this collection of discrete
samples, a continuous function that is once again defined for all locations in IR3 (or at least
the subvolume of interest), can be attained through a process known as function or signal
reconstruction [34].
2.2
Sampling and Reconstruction
When continuous functions need to be stored within a computer, they must be converted
to a discrete representation by sampling the continuous domain at – usually equispaced –
discrete locations [34]. In addition to the discretization that is necessary with respect to
location, these individual samples also have to be quantized in order to map continuous
scalars to quantities that can be represented as a discrete number, which is usually stored
in either fixed-point, or floating-point format.
After the continuous function has been converted into a discrete function via sampling,
this function is only defined at the exact sampling locations, but not over the original
continuous domain. In order to once again be able to treat the function as being continuous,
a process known as reconstruction must be performed, i.e., reconstructing a continuous
function from a discrete one [34].
Reconstruction is performed by applying a reconstruction filter to the discrete function,
1
A
-0.5
-1
1
B
C
1
0.5
0
1
-1
0
-3
1
-2
-1
0
1
2
Figure 2.2: Different reconstruction filters: box (A), tent (B), and sinc filter (C).
12
3
which is done by performing a convolution of the filter kernel (the function describing
the filter) with the discrete function. The simplest such filter is known as the box filter
(figure 2.2(A)), which results in nearest-neighbor interpolation. Looking again at figure 2.1,
we can now see that this image actually depicts volume data reconstructed with a box filter.
Another reconstruction filter that is commonly used, especially in hardware, is the tent
filter (figure 2.2(B)), which results in linear interpolation.
In general, we know from sampling theory that a continuous function can be reconstructed entirely if certain conditions are honored during the sampling process. The original
function must be band-limited, i.e., not contain any frequencies above a certain threshold,
and the sampling frequency must be at least twice as high as this threshold (which is often
called the Nyquist frequency). The requirement for a band-limited input function is usually
enforced by applying a low-pass filter before the function is sampled. Low-pass filtering
discards frequencies above the Nyquist limit, which would otherwise result in aliasing, i.e.,
high frequencies being interpreted as much lower frequencies after sampling, due to overlap
in the frequency spectrum.
The statement that a function can be reconstructed entirely stays theoretical, however,
since, even when disregarding quantization artifacts, the reconstruction filter used would
have to be perfect. The “perfect,” or ideal, reconstruction filter is known as the sinc
filter [34], whose frequency spectrum is box-shaped, and described in the spatial domain
by the following equation:
sin(πx)
(2.2)
sinc(x) =
πx
A graph of this function is depicted in figure 2.2(C). The simple reason why the sinc filter
cannot be implemented in practice is that it has infinite extent, i.e., the filter function
is non-zero from minus infinity to plus infinity. Thus, a trade-off between reconstruction
time, depending on the extent of the reconstruction filter, and reconstruction quality must
be found.
In hardware rendering, linear interpolation is usually considered to be a reasonable
trade-off between performance and reconstruction quality. High-quality filters, usually of
width four (i.e., cubic functions), like the family of cardinal splines [20], which includes the
Catmull-Rom spline, or the BC-splines [30], are usually only employed when the filtering
operation is done in software. However, it has recently been shown that cubic reconstruction filters can indeed be used for high-performance rendering on today’s consumer graphics
hardware [15, 16].
2.3
Direct Volume Rendering
Direct volume rendering (DVR) methods [25] create images of an entire volumetric data set,
without concentrating on, or even explicitly extracting, surfaces corresponding to certain
features of interest, e.g., iso-contours. In order to do so, direct volume rendering requires an
optical model for describing how the volume emits, reflects, scatters, or occludes light [27].
13
Different optical models that can be used for direct volume rendering are described in more
detail in section 2.3.1.
In general, direct volume rendering maps the scalar field constituting the volume to
optical properties such as color and opacity, and integrates the corresponding optical effects
along viewing rays into the volume, in order to generate a projected image directly from
the volume data. The corresponding integral is known as the volume rendering integral,
which is described in section 2.3.2. Naturally, under real-world conditions this integral is
solved numerically.
For real-time volume rendering, usually the emission-absorption optical model is used,
in which a volume is viewed as being comprised of particles that are only able to emit and
absorb light. In this case, the scalar data constituting the volume is said to denote the
density of these particles. Mapping to optical properties is achieved via a transfer function,
the application of which is also known as classification, which is covered in more detail in
part 17. Basically, a transfer function is a lookup table that maps scalar density values
to RGBA values, which subsume both the emission (RGB), and the absorption (A) of the
optical model. Additionally, the volume can be shaded according to the illumination from
external light sources, which is the topic of chapter 8.
2.3.1
Optical Models
Although most direct volume rendering algorithms, specifically real-time methods, consider
the volume to consist of particles of a certain density, and map these densities more or less
directly to RGBA information, which is subsequently processed as color and opacity for
alpha blending, the underlying physical background is subsumed in an optical model. More
sophisticated models than the ones usually used for real-time rendering also include support
for scattering of light among particles of the volume itself, and account for shadowing
effects.
The most important optical models for direct volume rendering are described in a survey
paper by Nelson Max [27], and we only briefly summarize these models here:
• Absorption only. The volume is assumed to consist of cold, perfectly black particles
that absorb all the light that impinges on them. They do not emit, or scatter light.
• Emission only. The volume is assumed to consist of particles that only emit light,
but do not absorb any, since the absorption is negligible.
• Absorption plus emission. This optical model is the most common one in direct
volume rendering. Particles emit light, and occlude, i.e., absorb, incoming light.
However, there is no scattering or indirect illumination.
• Scattering and shading/shadowing. This model includes scattering of illumination that is external to a voxel. Light that is scattered can either be assumed to
impinge unimpeded from a distant light source, or it can be shadowed by particles
between the light and the voxel under consideration.
14
• Multiple scattering. This sophisticated model includes support for incident light
that has already been scattered by multiple particles.
In this course we are concerned with rendering volumes defined on rectilinear grids, using
an emission-absorption model together with local illumination for rendering, and do not
consider complex lighting situations and effects like single or multiple scattering. However,
real-time methods taking such effects into account are currently becoming available [17].
To summarize, from here on out the optical model used in all considerations will be
the one of particles simultaneously emitting and absorbing light, and the volume rendering
integral described below also assumes this particular optical model.
2.3.2
The Volume Rendering Integral
All direct volume rendering algorithms share the property that they evaluate the volume
rendering integral, which integrates optical effects such as color and opacity along viewing
rays cast into the volume, even if no explicit rays are actually employed by the algorithm.
Section 2.3.3 covers ray-casting, which for this reason could be seen as the “most direct”
numerical method for evaluating this integral. More details are covered below, but for
this section it suffices to view ray-casting as a process that, for each pixel in the image
to render, casts a single ray from the eye through the pixel’s center into the volume, and
integrates the optical properties obtained from the encountered volume densities along the
ray.
Note that this general description assumes both the volume and the mapping to optical
properties to be continuous. In practice, of course, the evaluation of the volume rendering
integral is usually done numerically, together with several additional approximations, and
the integration operation becomes a simple summation. Remember that the volume itself
is also described by a collection of discrete samples, and thus interpolation, or filtering,
for reconstructing a continuous volume has to be used in practice, which is also only an
approximation.
We denote a ray cast into the volume by x(t), and parameterize it by the distance t
to the eye. The scalar value corresponding to this position on a ray is denoted by s(x(t)).
Since we employ an emission-absorption optical model, the volume rendering integral we
are using integrates absorption coefficients τ (s(x(t))) (accounting for the absorption of
light), and colors c(s(x(t))) (accounting for light emitted by particles) along a ray. The
volume rendering integral can now be used to obtain the integrated “output” color C,
subsuming both color (emission) and opacity (absorption) contributions along a ray up to
a certain distance D into the volume:
D
t
c(s(x(t)))e− 0 τ (s(x(t ))) dt dt
(2.3)
C =
0
This integral can be understood more easily by looking at different parts individually:
• In order to obtain the color for a pixel (C), we cast a ray into the volume and perform
D
integration along it ( 0 dt), i.e., for all locations x(t) along this ray.
15
• It is sufficient if the integration is performed until the ray exits the volume on the
other side, which happens after a certain distance D, where t = D.
• The color contribution of the volume at a certain position x(t) consists of the color
emitted there, c(s(x(t))), multiplied by the cumulative (i.e., integrated) absorption
up to the position of emission. The cumulative absorption for that position x(t) is
t
e− 0 τ (s(x(t ))) dt .
In practice, this integral is evaluated numerically through either back-to-front or frontto-back compositing (i.e., alpha blending) of samples along the ray, which is most easily
illustrated in the method of ray-casting.
2.3.3
Ray-Casting
Ray-casting [24] is a method for direct volume rendering, which can be seen as straightforward numerical evaluation of the volume rendering integral (equation 2.3). For each
pixel in the image, a single ray is cast into the volume (assuming super-sampling is not
used). At equispaced intervals along the ray (the sampling distance), the discrete volume
data is resampled, usually using tri-linear interpolation as reconstruction filter. That is,
for each resampling location, the scalar values of eight neighboring voxels are weighted
according to their distance to the actual location for which a data value is needed. After
resampling, the scalar data value is mapped to optical properties via a lookup table, which
yields an RGBA value for this location within the volume that subsumes the corresponding
emission and absorption coefficients [24], and the volume rendering integral is approximated
via alpha blending in back-to-front or front-to-back order.
We will now briefly outline why the volume rendering integral can conveniently be approximated with alpha blending. First, the cumulative absorption up to a certain position
x(t) along the ray, from equation 2.3,
e−
t
0
τ (s(
x(t ))) dt
(2.4)
can be approximated by (denoting the distance between successive resampling locations
with d):
e−
t/d
i=0
τ (s(
x(id)))d
(2.5)
The summation in the exponent can immediately be substituted by a multiplication of
exponentiation terms:
t/d
e−τ (s(x(id)))d
(2.6)
i=0
Now, we can introduce the opacity values A “well-known” from alpha blending, by defining
Ai = 1 − e−τ (s(x(id)))d
16
(2.7)
and rewriting equation 2.6 as:
t/d
(1 − Ai )
(2.8)
i=0
This allows us to use Ai as an approximation for the absorption of the i-th ray segment,
instead of absorption at a single point.
Similarly, the color (emission) of the i-th ray segment can be approximated by:
Ci = c(s(x(id)))d
(2.9)
Having approximated both the emissions and absorptions along a ray, we can now state
the approximate evaluation of the volume rendering integral as (denoting the number of
samples by n = D/d):
n
i−1
Ci (1 − Ai )
(2.10)
Capprox =
i=0
j=0
Equation 2.10 can be evaluated iteratively by alpha blending in either back-to-front, or
front-to-back order.
2.3.4
Alpha Blending
The following iterative formulation evaluates equation 2.10 in back-to-front order by stepping i from n − 1 to 0:
Ci = Ci + (1 − Ai )Ci+1
(2.11)
A new value Ci is calculated from the color Ci and opacity Ai at the current location i,
from the previous location i + 1. The starting condition is
and the composited color Ci+1
Cn = 0.
Note that in all blending equations, we are using opacity-weighted colors [42], which
are also known as associated colors [7]. An opacity-weighted color is a color that has been
pre-multiplied by its associated opacity. This is a very convenient notation, and especially
important for interpolation purposes. It can be shown that interpolating color and opacity
separately leads to artifacts, whereas interpolating opacity-weighted colors achieves correct
results [42].
The following alternative iterative formulation evaluates equation 2.10 in front-to-back
order by stepping i from 1 to n:
+ (1 − Ai−1 )Ci
Ci = Ci−1
(2.12)
Ai = Ai−1 + (1 − Ai−1 )Ai
(2.13)
New values Ci and Ai are calculated from the color Ci and opacity Ai at the current
and opacity Ai−1 from the previous location
location i, and the composited color Ci−1
i − 1. The starting condition is C0 = 0 and A0 = 0.
17
B
C
she
ar
A
slice
im ages
rp
wa
g
w in
vie ays
r
ne
pla
e
g
im a
ne
pla
e
g
im a
ne
pla
e
g
im a
Figure 2.3: The shear-warp algorithm for orthogonal projection.
Note that front-to-back compositing requires tracking alpha values, whereas back-tofront compositing does not. In a hardware implementation, this means that destination
alpha must be supported by the frame buffer (i.e., an alpha valued must be stored in
the frame buffer, and it must be possible to use it as multiplication factor in blending
operations), when front-to-back compositing is used. However, since the major advantage
of front-to-back compositing is an optimization commonly called early ray termination,
where the progression along a ray is terminated as soon as the cumulative alpha value
reaches 1.0, and this cannot easily be done in hardware alpha blending, hardware volume
rendering usually uses back-to-front compositing.
2.3.5
The Shear-Warp Algorithm
an she
d s ar
ca
le
The shear-warp algorithm [22] is a very fast approach for evaluating the volume rendering
integral. In contrast to ray-casting, no rays are cast back into the volume, but the volume itself is projected slice by slice onto the image plane. This projection uses bi-linear
interpolation within two-dimensional slices, instead of the tri-linear interpolation used by
ray-casting.
wa
ne
pla
e
g
im a
eye
rp
im
e
lan
p
age
eye
Figure 2.4: The shear-warp algorithm for perspective projection.
18
The basic idea of shear-warp is illustrated in figure 2.3 for the case of orthogonal
projection. The projection does not take place directly on the final image plane, but on an
intermediate image plane, called the base plane, which is aligned with the volume instead
of the viewport. Furthermore, the volume itself is sheared in order to turn the oblique
projection direction into a direction that is perpendicular to the base plane, which allows
for an extremely fast implementation of this projection. In such a setup, an entire slice can
be projected by simple two-dimensional image resampling. Finally, the base plane image
has to be warped to the final image plane. Note that this warp is only necessary once per
generated image, not once per slice.
Perspective projection can be accommodated similarly, by scaling the volume slices, in
addition to shearing them, as depicted in figure 2.4.
The clever approach outlined above, together with additional optimizations, like runlength encoding the volume data, is what makes the shear-warp algorithm probably the
fastest software method for volume rendering.
Although originally developed for software rendering, we will encounter a principle
similar to shear-warp in hardware volume rendering, specifically in the chapter on 2Dtexture based hardware volume rendering (5.2). When 2D textures are used to store slices of
the volume data, and a stack of such slices is texture-mapped and blended in hardware, bilinear interpolation is also substituted for tri-linear interpolation, similarly to shear-warp.
This is once again possible, because this hardware method also employs object-aligned
slices. Also, both shear-warp and 2D-texture based hardware volume rendering require
three slice stacks to be stored, and switched according to the current viewing direction.
Further details are provided in chapter 5.2.
2.4
Non-Polygonal Iso-Surfaces
In the context of volume rendering, the term iso-surface denotes a contour surface extracted
from a volume that corresponds to a given constant value, i.e., the iso-value. Boundary
surfaces of regions of the volume that are homogeneous with respect to certain attributes
are usually also called iso-surfaces. For example, an explicit iso-surface could be used to
depict a region where the density is above a given threshold.
As the name suggests, an iso-surface is usually constituted by an explicit surface. In
contrast to direct volume rendering, where no surfaces exist at all, these explicit surfaces
are usually extracted from the volume data in a preprocess. This is commonly done by
using a variant of the marching cubes algorithm [25], in which the volume data is processed
and an explicit geometric representation (in this case, usually thousands of triangles) for
the feature of interest, i.e., the iso-surface corresponding to a given iso-value, is generated.
However, iso-surfaces can also be rendered without the presence of explicit geometry.
In this case, we will refer to them as non-polygonal iso-surfaces. One approach for doing
this, is to use ray-casting with special transfer functions [24].
On graphics hardware, non-polygonal iso-surfaces can be rendered by exploiting the
OpenGL alpha test [41]. In this approach, the volume is stored as an RGBA volume.
19
A
B
Figure 2.5: A comparison of direct volume rendering (A), and maximum intensity projection (B).
Local gradient information is precomputed and stored in the RGB channels, and the volume
density itself is stored in the alpha channel. The density in conjunction with alpha testing
is used in order to select pixels where the corresponding ray pierces the iso-surface, and
the gradient information is used as “surface normal” for shading. Implementation details
of the hardware approach for rendering non-polygonal iso-surfaces are given in chapter 10.
2.5
Maximum Intensity Projection
Maximum intensity projection (MIP) is a variant of direct volume rendering, where, instead
of compositing optical properties, the maximum value encountered along a ray is used to
determine the color of the corresponding pixel. An important application area of such
a rendering mode, are medical data sets obtained by MRI (magnetic resonance imaging)
scanners. Such data sets usually exhibit a significant amount of noise that can make it hard
to extract meaningful iso-surfaces, or define transfer functions that aid the interpretation.
When MIP is used, however, the fact that within angiography data sets the data values of
vascular structures are higher than the values of the surrounding tissue, can be exploited
easily for visualizing them. Figure 2.5 shows a comparison of direct volume rendering and
MIP used with the same data set.
In graphics hardware, MIP can be implemented by using a maximum operator when
blending into the frame buffer, instead of standard alpha blending. The corresponding
OpenGL extension is covered in section 3.6.
20
Graphics Hardware
This chapter begins with a brief overview of the operation of graphics hardware in general,
before it continues by describing the kind of graphics hardware that is most interesting to
us in the context of this course, i.e., consumer PC graphics hardware, like the NVIDIA
GeForce family [32], and the ATI Radeon graphics cards [2]. We are using OpenGL [39] as
application programming interface (API), and the sections on specific consumer graphics
hardware architectures describe the OpenGL extensions needed for high-quality volume
rendering, especially focusing on per-fragment, or per-pixel, programmability.
3.1
The Graphics Pipeline
For hardware-accelerated rendering, the geometry of a virtual scene consists of a set of
planar polygons, which are ultimately turned into pixels during display traversal. The
majority of 3D graphics hardware implements this process as a fixed sequence of processing stages. The order of operations is usually described as a graphics pipeline, which is
illustrated in figure 3.1. The input of this pipeline is a stream of vertices that can be joined
together to form geometric primitives, such as lines, triangles, and polygons. The output
is a raster image of the virtual scene, which can be displayed on the screen. The graphics
pipeline can roughly be divided into three different stages:
Geometry Processing computes linear transformations of the incoming vertices in the
3D spatial domain such as rotation, translation, and scaling. Through their vertices,
the primitives themselves are transformed along naturally.
Rasterization decomposes the geometric primitives into fragments. Note that although
a fragment is closely related to a pixel on the screen, it may be discarded by one
of several tests before it is finally turned into an actual pixel, see below. After a
fragment has initially been generated by the rasterizer, colors fetched from texture
maps are applied, followed by further color operations, often subsumed under the
term fragment shading. On today’s programmable consumer graphics hardware, both
fetching colors from textures and additional color operations applied to a fragment
are programmable to a large extent.
Fragment Operations After fragments have been generated and shaded, several tests are
applied, which finally decide whether the incoming fragment is discarded or displayed
on the screen as a pixel. These tests usually are alpha testing, stencil testing, and
depth testing. After fragment tests have been applied and the fragment has not been
discarded, it is combined with the previous contents of the frame buffer, a process
known as alpha blending. After this, the fragment has become a pixel.
21
G EO M ETR Y
PR O C ESSIN G
scene
description
Vertices
R ASTER IZATIO N
Fragm ents
Prim itives
raster
im age
FR AG M EN T
O PER ATIO N S
Pixels
Figure 3.1: The graphics pipeline for display traversal.
For understanding the algorithms presented in this course, it is important to have a grasp
of the exact order of operations in the graphics pipeline. In the following sections, we will
have a closer look at its different stages.
3.1.1
Geometry Processing
The geometry processing unit performs per-vertex operations, i.e, operations that modify
the incoming stream of vertices. The geometry engine computes linear transformations,
such as translation, rotation, and projection. Local illumination models are also evaluated on a per-vertex basis at this stage of the pipeline. This is the reason why geometry
processing is often referred to as transform & lighting unit (T&L). For a more detailed description, the geometry engine can be further subdivided into several subunits, as depicted
in figure 3.2:
Modeling Transformation: Transformations that are used to arrange objects and specify their placement within the virtual scene are called modeling transformations. They
are specified as a 4 × 4 matrix using homogeneous coordinates.
G EO M ETRY PR O C ESSIN G
M odeling-/
View ingTransform ation
Prim itive
Assem bly
Lighting
V E R TIC E S
C lipping/
Projective
Transform ation
P R IM ITIV E S
Figure 3.2: Geometry processing as part of the graphics pipeline.
22
Viewing Transformation: A transformation that is used to specify the camera position
and viewing direction is called viewing transformation. This transformation is also
specified as a 4 × 4 matrix. Modeling and viewing matrices can be pre-multiplied to
form a single modelview matrix, which is the term used by OpenGL.
Lighting: After the vertices are correctly placed within the virtual scene, a local illumination model is evaluated for each vertex, for example the Phong model [35]. Since this
requires information about normal vectors and the final viewing direction, it must
be performed after the modeling and viewing transformation.
Primitive Assembly: Rendering primitives are generated from the incoming vertex
stream. Vertices are connected to lines, lines are joined together to form polygons.
Arbitrary polygons are usually tessellated into triangles to ensure planarity and to
enable interpolation using barycentric coordinates.
Clipping: Polygon and line clipping is applied after primitive assembly in order to remove
those portions of the geometry that cannot be visible on the screen, because they lie
outside the viewing frustum.
Perspective Transformation: Perspective transformation computes the projection of a
geometric primitive onto the image plane.
Perspective transformation is the final step of the geometry processing stage. All operations
that take place after the projection step are performed within the two-dimensional space
of the image plane. This is also the stage where vertex programs [32], or vertex shaders,
are executed when they are enabled in order to substitute large parts of the fixed-function
geometry processing pipeline by a user-supplied assembly language program.
3.1.2
Rasterization
Rasterization is the conversion of geometric data into fragments. Each fragment eventually
corresponds to a square pixel in the resulting image, if it has not been discarded by one
R A STER IZATIO N
Polygon
R asterization
Texture
Fetch
P R IM ITIV E S
Fragm ent
Shading
FR A G M E N TS
Figure 3.3: Rasterization as part of the graphics pipeline.
23
of several per-fragment tests, such as alpha or depth testing. The process of rasterization
can be further subdivided into three different subtasks, as displayed in figure 3.3:
Polygon Rasterization: In order to display filled polygons, rasterization determines the
set of pixels that lie in the interior of the polygon. This also comprises the interpolation of visual attributes such as color, illumination terms, and texture coordinates
given at the vertices.
Texture Fetch: Textures are mapped onto a polygon according to texture coordinates
specified at the vertices. For each fragment, these texture coordinates must be interpolated and a texture lookup is performed at the resulting coordinate. This process
yields an interpolated color value fetched from the texture map. In today’s consumer
graphics hardware from two to six textures can be fetched simultaneously for a single
fragment. Furthermore, the lookup process itself can be controlled, for example by
routing colors back into texture coordinates, which is known as dependent texturing.
Fragment Shading: After all the enabled textures have been sampled, further color operations are applied in order to shade a fragment. A simple example would be the
combination of texture color and primary, i.e., diffuse, color. Today’s consumer
graphics hardware allows highly flexible control of the entire fragment shading process. Since fragment shading is extremely important for volume rendering on such
hardware, sections 3.3, 3.4, and 3.5 are devoted to this stage of the graphics pipeline
as it is implemented in state-of-the-art architectures. Note that recently the line
between texture fetch and fragment shading is getting blurred, and the texture fetch
stage is becoming a part of the fragment shading stage.
3.1.3
Fragment Operations
After a fragment has been shaded, but before it is turned into an actual pixel, which is
stored in the frame buffer and ultimately displayed on the screen, several fragment tests
are performed, followed by alpha blending. The outcome of these tests determines whether
FR A G M EN T O PER ATIO N S
Alpha
Test
Stencil
Test
D epth
Test
Alpha
Blending
FR A G M E N TS
Figure 3.4: Fragment operations as part of the graphics pipeline.
24
the fragment is discarded, e.g., because it is occluded, or becomes a pixel. The sequence
of fragment operations is illustrated in figure 3.4.
Alpha Test: The alpha test allows discarding a fragment depending on the outcome of
a comparison between the fragment’s opacity A (the alpha value), and a specified
reference value.
Stencil Test: The stencil test allows the application of a pixel stencil to the frame buffer.
This pixel stencil is contained in the stencil buffer, which is also a part of the frame
buffer. The stencil test conditionally discards a fragment depending on a comparison
of a reference value with the corresponding pixel in the stencil buffer, optionally also
taking the depth value into account.
Depth Test: Since primitives may be generated in arbitrary sequence, the depth test
provides a convenient mechanism for correct depth ordering of partially occluded
objects. The depth value of a pixel is stored in a depth buffer. The depth test
decides whether an incoming fragment is occluded by a pixel that has previously
been written, by comparing the incoming depth value to the value in the depth
buffer. This allows discarding occluded fragments on a per-fragment level.
Alpha Blending: To allow for semi-transparent objects and other compositing modes,
alpha blending combines the color of the incoming fragment with the color of the
corresponding pixel currently stored in the frame buffer.
After the scene description has completely passed through the graphics pipeline, the resulting raster image contained in the frame buffer can be displayed on the screen. Different hardware architectures ranging from expensive high-end workstations to consumer
PC graphics boards provide different implementations of this graphics pipeline. Thus,
consistent access to multiple hardware architectures requires a level of abstraction that is
provided by an additional software layer called application programming interface (API).
In these course notes, we are using OpenGL as the graphics API. Details on the standard
OpenGL rendering pipeline can be found in [39, 28].
3.2
Consumer PC Graphics Hardware
In this section, we briefly discuss the consumer graphics chips that we are using for highquality volume rendering, and most of the algorithms discussed in later sections are built
upon. The following sections discuss important features of these architectures in detail. At
the time of this writing (spring 2002), the two most important vendors of programmable
consumer graphics hardware are NVIDIA and ATI. The current state-of-the-art consumer
graphics chips are the NVIDIA GeForce 4, and the ATI Radeon 8500.
25
3.2.1
NVIDIA
In late 1999, the GeForce 256 introduced hardware-accelerated geometry processing to
the consumer marketplace. Before this, transformation and projection was either done
by the OpenGL driver, or even by the application itself. The first GeForce also offered
a flexible mechanism for fragment shading, i.e., the register combiners OpenGL extension (GL NV register combiners). The focus on programmable fragment shading was
even more pronounced during introduction of the GeForce 2 in early 2000, although it
brought no major architectural changes from a programmer’s point of view. On the first
two GeForce architectures it was possible to use two textures simultaneously in a single
pass (multi-texturing). Usual boards had 32MB of on-board RAM, although GeForce 2
configurations with 64MB were also available.
The next major architectural step came with the introduction of the GeForce 3 in early
2001. Moving away from a fixed-function pipeline for geometry processing, the GeForce 3
introduced vertex programs, which allow the programmer to write custom assembly language code operating on vertices. The number of simultaneous textures was increased
to four, the register combiners capabilities were improved (GL NV register combiners2),
and the introduction of texture shaders (GL NV texture shader) introduced dependent
texturing on a consumer graphics platform for the first time. Additionally, the GeForce 3
also supports 3D textures (GL NV texture shader2) in hardware. Usual GeForce 3 configurations have 64MB of on-board RAM, although boards with 128MB are also available.
The GeForce 4, introduced in early 2002, extends the modes for dependent texturing
(GL NV texture shader3), offers point sprites, hardware occlusion culling support, and
flexible support for rendering directly into a texture (the latter also being possible on a
GeForce 3 with the OpenGL drivers released at the time of the GeForce 4). The standard
amount of on-board RAM of GeForce 4 boards is 128MB, which is also the maximum
amount supported by the chip itself.
The NVIDIA feature set most relevant in these course notes is the one offered by the
GeForce 3, although the GeForce 4 is able to execute it much faster.
3.2.2
ATI
In mid-2000, the Radeon was the first consumer graphics hardware to support 3D textures
natively. For multi-texturing, it was able to use three 2D textures, or one 2D and one 3D
texture simultaneously. However, fragment shading capabilities were constrained to a few
extensions of the standard OpenGL texture environment. The usual on-board configuration
was 32MB of RAM.
The Radeon 8500, introduced in mid-2001, was a huge leap ahead of the original
Radeon, especially with respect to fragment programmability (GL ATI fragment shader),
which offers a unified model for texture fetching (including flexible dependent textures),
and color combination. This architecture also supports programmable vertex operations
(GL EXT vertex shader), and six simultaneous textures with full functionality, i.e., even
six 3D textures can be used in a single pass. The fragment shading capabilities of the
26
Radeon 8500 are exposed via an assembly-language level interface, and very easy to use.
Rendering directly into a texture is also supported. On-board memory of Radeon 8500
boards usually is either 64MB, or 128MB.
A minor drawback of Radeon OpenGL drivers (for both architectures) is that
paletted textures (GL EXT paletted texture, GL EXT shared texture palette) are not
supported, which otherwise provide a nice fallback for volume rendering when postclassification via dependent textures is not used, and downloading a full RGBA volume
instead of a single-channel volume is not desired due to the memory overhead incurred.
3.3
Fragment Shading
Building on the general discussion of section 3.1.2, this and the following two sections
are devoted to a more detailed discussion of the fragment shading stage of the graphics
pipeline, which of all the pipeline stages is the most important one for building a consumer
hardware volume renderer.
Although in section 3.1.2, texture fetch and fragment shading are still shown as two
separate stages, we will now discuss texture fetching as part of overall fragment shading.
The major reason for this is that consumer graphics hardware is rapidly moving toward a
unified model, where a texture fetch is just another way of coloring fragments, in addition
to performing other color operations. While on the GeForce architecture the two stages are
still conceptually separate (at least under OpenGL, i.e., via the texture shader and register
combiners extensions), the Radeon 8500 has already dropped this distinction entirely,
and exports the corresponding functionality through a single OpenGL extension, which is
actually called fragment shader.
The terminology related to fragment shading and the corresponding stages of the
graphics pipeline has only begun to change after the introduction of the first highlyprogrammable graphics hardware architecture, i.e., the NVIDIA GeForce family. Before
this, fragment shading was so simple that no general name for the corresponding operations was used. The traditional OpenGL model assumes a linearly interpolated primary
color (the diffuse color) to be fed into the first texture unit, and subsequent units (if at all
supported) to take their input from the immediately preceding unit. Optionally, after all
the texture units, a second linearly interpolated color (the specular color) can be added in
the color sum stage (if supported), followed by application of fog. The shading pipeline
just outlined is commonly known as the traditional OpenGL multi-texturing pipeline.
3.3.1
Traditional OpenGL Multi-Texturing
Before the advent of programmable fragment shading (see below), the prevalent model for
shading fragments was the traditional OpenGL multi-texturing pipeline, which is depicted
in figure 3.5.
The primary (or diffuse) color, which has been specified at the vertices and linearly
interpolated over the interior of a triangle by the rasterizer, is the intial color input to the
27
C in
Texture
Env.
C out
C in
Texture
Env.0
Tin
Tin
Texture
M ap
Texture U nit
Texture
Env.1
Tin
Texture
Env.2
C out
Tin
Texture
M ap 0
Texture
M ap1
Texture
M ap 2
Texture U nit0
Texture U nit1
Texture U nit2
Figure 3.5: The traditional OpenGL multi-texturing pipeline. Conceptually identical texture units (left) are cascaded up to the number of supported units (right).
pipeline. The pipeline itself consists of several texture units (corresponding to the maximum number of units supported and the number of enabled textures), each of which has
exactly one external input (the color from the immediately preceding unit, or the initial
fragment color in the case of unit zero), and one internal input (the color sampled from the
corresponding texture). The texture environment of each unit (specified via glTexEnv*())
determines how the external and the internal color are combined. The combined color
is then routed on to the next unit. If the unit was the last one, a second linearly interpolated color can be added in a color sum stage (if GL EXT separate specular color is
supported), followed by optional fog application. The output of this cascade of texture
units and the color sum and fog stage becomes the shaded fragment color, i.e., the output
of the “fragment shader.”
Standard OpenGL supports only very simple texture environments, i.e., modes of color
combination, such as multiplication and blending. For this reason, several extensions have
been introduced that add more powerful operations. For example, dot-product computation via GL EXT texture env dot3 (see section 3.6).
3.3.2
Programmable Fragment Shading
Although entirely sufficient only a few years ago, the OpenGL multi-texturing pipeline has
a lot of drawbacks, is very inflexible, and cannot accommodate the capabilities of today’s
consumer graphics hardware.
Most of all, colors cannot be routed arbitrarily, but are forced to be applied in a
fixed order, and the number of available color combination operations is very limited.
Furthermore, the color combination not only depends on the setting of the corresponding
texture environment, but also on the internal format of the texture itself, which prevents
using the same texture for radically different purposes, especially with respect to treating
the RGB and alpha channels separately.
For these and other reasons, fragment shading is currently in the process of becoming programmable in its entirety. Starting with the original NVIDIA register combiners,
which are comprised of a register-based execution model and programmable input and
output routing and operations, the current trend is toward writing a fragment shader in
28
R A STER IZATIO N
Polygon
R asterization
Fragm ent
Shading
Texture
Fetch
R egister
C om biners
P R IM ITIV E S
FR A G M E N TS
Figure 3.6: The register combiners unit bypasses the standard fragment shading stage
(excluding texture fetch) of the graphics pipeline. See also figure 3.3.
an assembly language that is downloaded to the graphics hardware and executed for each
fragment.
The major problem of the current situation with respect to writing fragment shaders
is that under OpenGL they are exposed via different (and highly incompatible) vendorspecific extensions. Thus, even this flexible, but still rather low-level, model of using an
assembly language for writing these shaders, will be substituted by a shading language
similar to the C programming language in the upcoming OpenGL 2.0 [38].
3.4
NVIDIA Fragment Shading
The NVIDIA model for programmable fragment shading currently consists of a two-stage
model that is comprised of the distinct stages of texture shaders and register combiners.
Texture shaders are the interface for programmable texture fetch operations, whereas register combiners can be used to read colors from a register file, perform color combination
operations, and store the result back to the register file. A final combiner stage generates
the fragment output, which is passed on to the fragment testing and alpha blending stage,
and finally into the frame buffer.
The texture registers of the register combiners register file are initialized by a texture
shader before the register combiners stage is executed. Therefore, the result of color computations cannot be used in a dependent texture fetch. Dependent texturing on NVIDIA
chips is exposed via a set of fixed-function texture shader operations.
3.4.1
Texture Shaders
The texture shader interface is exposed through three OpenGL extensions:
GL NV texture shader, GL NV texture shader2, and GL NV texture shader3, the latter
of which is only supported on GeForce 4 cards.
29
Analogously to the traditional OpenGL texture environments, each texture unit is
assigned a texture shader, which determines the texture fetch operation executed by this
unit. On GeForce 3 chips, one of 23 pre-defined texture shader programs can be selected
for each texture shader, whereas the GeForce 4 offers 37 different such programs.
An example for one of these texture shader programs would be dependent alpha-red
texturing, where the texture unit for which it is selected takes the alpha and red outputs
from a previous texture unit, and uses these as 2D texture coordinates, thus performing a
dependent texture fetch, i.e., a texture fetch operation that depends on the outcome of a
fetch executed by another unit.
The major drawback of the texture shaders model is specifically that it requires to use
one of several fixed-function programs, instead of allowing arbitrary programmability.
3.4.2
Register Combiners
After all texture fetch operations have been executed (either by standard OpenGL texturing, or using texture shaders), the register combiners mechanism can be used for flexible
color combination operations, employing a register-based execution model.
The register combiners interface is exposed through two OpenGL extensions:
GL NV register combiners, and GL NV register combiners2. Using the terminology of
section 3.1.2, the standard fragment shading stage (excluding texture fetch) is bypassed by
a register combiners unit, as illustrated in figure 3.6. This is in contrast to the traditional
model of figure 3.3.
The three fundamental building blocks of the register combiners model are the register
FinalR egisterC om biner
inputregisters
RG B
A
input
m ap
input
m ap
E
F
texture 0
texture 1
EF
texture n
spare 0 +
secondary color
prim ary color
secondary color
spare 0
input
m ap
input
m ap
input
m ap
input
m ap
input
m ap
A
B
C
D
G
spare 1
constantcolor0
constantcolor1
A B + (1-A)C + D
fragm entR G B out
fog
zero
fragm entA lpha out
G
notreadable
com putations
Figure 3.7: Register combiners final combiner stage.
30
file, the general combiner stage (figure 3.8), and the final combiner stage (figure 3.7). All
stages operate on a single register file, which can be seen on the left-hand side of figure 3.7.
Color combination operations are executed by a series of general combiner stages, reading colors from the register file, executing specified operations, and storing back into the
register file. The input for the next general combiner is the register file as it has been
modified by the previous stage. The operations that can be executed are component-wise
G eneralR egisterC om biner,R G B Portion
inputregisters
outputregisters
RG B
A
texture 0
input
m ap
input
m ap
input
m ap
input
m ap
RG B
A
texture 0
texture 1
texture 1
A
texture n
B
C
D
texture n
AB +C D
-orA B m ux C D
prim ary color
prim ary color
secondary color
secondary color
spare 0
AB
-orA B
spare 1
constantcolor0
scale
spare 0
and
spare 1
bias
constantcolor0
constantcolor1
constantcolor1
C D
-orC D
fog
zero
fog
zero
notreadable
notw ritable
com putations
G eneralR egisterC om biner,Alpha Portion
inputregisters
RG B
texture 0
outputregisters
A
input
m ap
input
m ap
input
m ap
input
m ap
RG B
A
texture 0
texture 1
texture 1
A
texture n
B
C
D
texture n
AB +C D
-orA B m ux C D
prim ary color
prim ary color
secondary color
secondary color
spare 0
scale
spare 0
spare 1
and
spare 1
bias
constantcolor0
AB
constantcolor0
constantcolor1
constantcolor1
fog
fog
C D
zero
zero
notreadable
notw ritable
com putations
Figure 3.8: Register combiners general combiner stage.
31
multiplication, three-component dot-product, and multiplexing two inputs, i.e., conditionally selecting one of them, depending on the alpha component of a specific register. Since
the introduction of the GeForce 3, eight such general combiner stages are available, whereas
on older architectures just two such stages were supported. Also, since the GeForce 3, four
texture register are available, as opposed to just two.
After all enabled general combiner stages have been executed, a single final combiner
stage generates the final fragment color, which is then passed on to fragment tests and
alpha blending.
3.5
ATI Fragment Shading
In contrast to the NVIDIA approach, fragment shading on the Radeon 8500 uses a unified
model that subsumes both texture fetch and color combination operations in a single
fragment shader. The fragment shader interface is exposed through a single OpenGL
extension: GL ATI fragment shader.
In order to facilitate flexible dependent texturing operations, colors and texture coordinates are conceptually identical, although colors are represented with significantly less
precision and range. Still, fetching a texture can easily be done using a register, or the
interpolated texture coordinates of a specified texture unit.
On the Radeon 8500, the register file used by a fragment shader contains six RGBA
registers (GL REG 0 ATI to GL REG 5 ATI), corresponding to this architecture’s six texture units. Furthermore, two interpolated colors, and eight constant RGBA registers
(GL CON 0 ATI to GL CON 7 ATI) can be used to provide additional color input to a fragment
shader. The execution model consists of this register file and eleven different instructions
(note that all registers consist of four components, and thus all instructions in principle take
all of them into account, e.g., the MUL instruction actually performs four simultaneous
multiplications):
• MOV: Moves one register into another.
• ADD: Adds one register to another and stores the result in a third register.
• SUB: Subtracts one register from another and stores the result in a third register.
• MUL: Multiplies two registers component-wise and stores the result in a third register.
• MAD: Multiplies two registers component-wise, adds a third, and stores the result
in a fourth register.
• LERP: Performs linear interpolation between two registers, getting interpolation
weights from a third, and stores the result in a fourth register.
• DOT3: Performs a three-component dot-product, and stores the replicated result
in a third register.
32
• DOT4: Performs a four-component dot-product, and stores the replicated result in
a third register.
• DOT2 ADD: The same as DOT3, however the third component is assumed to be
1.0 and therefore not actually multiplied.
• CND: Moves one of two registers into a third, depending on whether the corresponding component in a fourth register is greater than 0.5.
• CND0: The same as CND, but the conditional is a comparison with 0.0.
The components of input registers to each of these instructions can be replicated, and
the output can be masked for each component, which allows for flexible routing of color
components. Scaling, bias, negation, complementation, and saturation (clamp against 0.0)
are also supported. Furthermore, instructions are issued separately for RGB and alpha
components, although a single pair of RGB and alpha instructions counts as a single
instruction.
An actual fragment shader consists of up to two times eight such instructions, where
up to eight instructions are allowed in one of two stages. The first stage is only able to
execute texture fetch operations using interpolated coordinates, whereas the second stage
can use registers computed in the preceding stage as texture coordinates, thus allowing
dependent fetch operations. These two stages allow for a single “loop-back,” i.e., routing
color components into texture coordinates once. Hence only a single level of dependent
fetches is possible.
Fragment shaders are specified similarly to OpenGL texture objects. They are only
specified once, and then reused as often as needed by simply binding a shader referenced
by a unique integer id. Instructions are added (in order) to a fragment shader by using one
OpenGL function call for the specification of a single instruction, after the initial creation
of the shader.
In general, it can be said that the ATI fragment shader model is much easier to use
than the NVIDIA extensions providing similar functionality, and also offers more flexibility
with regard to dependent texture fetches. However, both models allow specific operations
that the other is not able to do.
3.6
Other OpenGL Extensions
This section briefly summarizes additional OpenGL extensions that are useful for hardwareaccelerated volume rendering and that will be used in later chapters.
3.6.1
GL EXT blend minmax
This extension augments the OpenGL alpha blending capabilities by minimum
(GL MIN EXT) and maximum (GL MAX EXT) operators, which can be activated via the
glBlendEquationEXT() function. When one of these special alpha blending modes is
33
used, a fragment is combined with the previous contents of the frame buffer by taking the
minimum, or the maximum value, respectively.
For volume rendering, this capability is needed for maximum intensity projection (MIP),
where pixels are set to the maximum density along a “ray.”
3.6.2
GL EXT texture env dot3
Although more flexible and powerful functionality is exposed by the register combiners and
fragment shader extensions, it is not always desired to incur the development overhead of
specifying a full register combiner setup or fragment shader, when only a simple perfragment dot-product is needed. This extension extends the modes that can be used in the
texture environment for combining the incoming color with the texture color by a simple
three-component dot-product. This functionality is used in chapter 8.
3.6.3
GL EXT paletted texture, GL EXT shared texture palette
In hardware-accelerated volume rendering, the volume itself is usually stored in texture
maps with only a single channel. Basically, there are two OpenGL texture formats that
are used for these textures, both of which consume one byte per voxel.
First, a volume can be stored in intensity textures (GL LUMINANCE as external,
GL INTENSITY8 as internal format). In this case, each voxel contains the original density values, which are subsequently mapped to RGBA values by post-classification (see
chapter 17), or pre-integration (see chapter 19), for example.
Second, a volume can be used for rendering with pre-classification (see chapter 17).
In this case, storing the volume in an RGBA texture (GL RGBA as external, GL RGBA8
as internal format) would be possible. However, this consumes four times the texture
memory that is actually necessary, since the mapping from density to RGBA can easily
be performed by the hardware itself. In order to make this possible, paletted textures
need to be supported via GL EXT paletted texture. Using this extension, a texture is
stored as 8-bit indexes into a color palette (GL COLOR INDEX as external, GL COLOR INDEX8
as internal format). The palette itself consists of 256 entries of four bytes per entry (for
RGBA).
The GL EXT paletted texture extension by itself needs a single palette for each individual texture, which must be downloaded via a glColorTableEXT() function call. However, in volume rendering with 2D slices (see section 5.2), all slice textures actually use the
same palette.
In order to share a single palette among multiple textures and download it only once,
the GL EXT shared texture palette extension can be used. Using this extension, only a
single palette need be downloaded with a glColorTableEXT() function call in conjunction
with the GL SHARED TEXTURE PALETTE EXT parameter.
34
Acknowledgments
I would like to express a very special thank you to Christof Rezk-Salama for the diagrams
and figures in this chapter, as well as most of section 3.1. Robert Kosara and Michael
Kalkusch provided valuable comments and proof-reading. Thanks are also due to the
VRVis Research Center for supporting the preparation of these course notes in the context
of the basic research on visualization (http://www.VRVis.at/vis/). The VRVis Research
Center is funded by an Austrian research program called K plus.
High-Quality Volume Graphics
on Consumer PC Hardware
Texture-Based Methods
Klaus Engel
Course Notes 42
Markus Hadwiger
Joe M. Kniss
Christof Rezk-Salama
36
Sampling a Volume Via
Texture Mapping
As illustrated in the introduction to these course notes, the most fundamental operation
in volume rendering is sampling the volumetric data (section 2.2). Since these data are
already discrete, the sampling task performed during rendering is actually a resampling
task, i.e., resampling sampled volume data from one set of discrete locations to another.
In order to render a high-quality image of the entire volume, these resampling locations
have to be chosen carefully, followed by mapping the obtained values to optical properties,
such as color and opacity, and compositing them in either back-to-front or front-to-back
order.
Ray-casting is probably the simplest approach for accomplishing this task (section 2.3.3). Because it casts rays from the eye through image plane pixels back into the
volume, ray-casting is usually called an image-order approach. That is, each ray is cast
into the volume, which is then resampled at – usually equispaced – intervals along that
ray. The values obtained via resampling are mapped to color and opacity, and composited
in order along the ray (from the eye into the volume, or from behind the volume toward
the eye) via alpha blending (section 2.3.4).
Texture mapping operations basically perform a similar task, i.e., resampling a discrete
grid of texels to obtain texture values at locations that do not coincide with the original
grid. Thus, texture mapping in many ways is an ideal candidate for performing repetitive resampling tasks. Compositing individual samples can easily be done by exploiting
hardware alpha blending (section 3.1.3). The major question with regard to hardwareaccelerated volume rendering is how to achieve the same – or a sufficiently similar – result
as compositing samples taken along a ray cast into the volume.
Figure 5.1: Rendering a volume by compositing a stack of 2D texture-mapped slices in
back-to-front order. If the number of slices is too low, they become visible as artifacts.
37
The major way in which hardware texture mapping can be applied to volume rendering
is to use an object-order approach, instead of the image-order approach of ray-casting.
The resampling locations are generated by rendering proxy geometry with interpolated
texture coordinates (usually comprised of slices rendered as texture-mapped quads), and
compositing all the parts (slices) of this proxy geometry from back to front via alpha
blending. The volume data itself is stored in one to several textures of two or three
dimensions, respectively. For example, if only a density volume is required, it can be stored
in a single 3D texture, where a single texel corresponds to a single voxel. Alternatively,
volume data can be stored in a stack of 2D textures, each of which corresponds to an
axis-aligned slice through the volume.
By rendering geometry mapped with these textures, the original volume can be sampled
at specific locations, blending the generated fragments with the previous contents of the
frame buffer. Such an approach is called object-order, because the algorithm does not
iterate over individual pixels of the image plane, but over parts of the “object,” i.e., the
volume itself. That is, these parts are usually constituted by slices through the volume,
and the final result for each pixel is only available after all slices contributing to this pixel
have been processed.
5.1
Proxy Geometry
In all approaches rendering volumetric data directly, i.e., without any geometry that has
been extracted along certain features (e.g., polygons corresponding to an iso-surface, generated by a variant of the marching cubes algorithm [25]), there exists no geometry at all,
at least not per se. However, geometry is the only thing graphics hardware with standard
texture mapping capabilities is actually able to render. In this sense, all the fragments and
Polygon Slices
2D Textures
FinalIm age
Figure 5.2: Object-aligned slices used as proxy geometry with 2D texture mapping.
38
ultimately pixels rendered by graphics hardware are generated by rasterizing geometric
primitives, in most cases triangles. That is, sampling a texture has to take place in the
interior of such primitives specified by their vertices.
When we think about the three-dimensional scalar field that constitutes our volume
data, we can imagine placing geometry in this field. When this geometry is rendered,
several attributes like texture coordinates are interpolated over the interior of primitives,
and each fragment generated is assigned its corresponding set of texture coordinates. Subsequently, these coordinates can be used for resampling one or several textures at the
corresponding locations. If we assign texture coordinates that correspond to the coordinates in the scalar field, and store the field itself in a texture map (or several texture
maps), we can sample the field at arbitrary locations, as long as these are obtained from
interpolated texture coordinates. The collective geometry used for obtaining all resampling
locations needed for sampling the entire volume is commonly called proxy geometry, since
it has no inherent relation to the data contained in the volume itself, and exists solely for
the purpose of generating resampling locations, and subsequently sampling texture maps
at these locations.
The conceptually simplest example of proxy geometry is a set of view-aligned slices
(quads that are parallel to the viewport, usually also clipped against the bounding box
of the volume, see figure 5.3), with 3D texture coordinates that are interpolated over
the interior of these slices, and ultimately used to sample a single 3D texture map at
the corresponding locations. However, 3D texture mapping is not supported by all of
the graphics hardware we are targeting, and even on hardware that does support it, 3D
textures incur a performance penalty in comparison to 2D textures. This penalty is mostly
due to the tri-linear interpolation used when sampling a 3D texture map, as opposed to
bi-linear interpolation for sampling a 2D texture map.
Polygon Slices
3D Texture
FinalIm age
Figure 5.3: View-aligned slices used as proxy geometry with 3D texture mapping.
39
A
B
im age plane
im age plane
C
D
im age plane
E
im age plane
im age plane
Figure 5.4: Switching the slice stack of object-aligned slices according to the viewing
direction. Between image (C) and (D) the slice stack used for rendering has been switched.
One of the most important things to remember about proxy geometry is that it is
intimately related to the kind of texture mapping (2D or 3D) used. When the orientation
of slices with respect to the original volume data (i.e., the texture) can be arbitrary, 3D
texture mapping is mandatory, since a single slice would have to fetch data from several
different 2D textures. If, however, the proxy geometry is aligned with the original volume
data, texture fetch operations for a single slice can be guaranteed to stay within the same
2D texture. In this case, the proxy geometry is comprised of a set of object-aligned slices
(see figure 5.2), for which 2D texture mapping capabilities suffice. The following sections
describe different kinds of proxy geometry and the corresponding resampling approaches
in more detail.
5.2
2D-Textured Object-Aligned Slices
If only 2D texture mapping capabilities are used, the volume data must be stored in several
two-dimensional texture maps. A major implication of the use of 2D textures is that the
hardware is only able to resample two-dimensional subsets of the original volumetric data.
The proxy geometry in this case is a stack of planar slices, all of which are required to be
aligned with one of the major axes of the volume (either the x, y, or z axis), mapped with
2D textures, which in turn are resampled by the hardware-native bi-linear interpolation [8].
The reason for the requirement that slices be aligned with a major axis is that each time a
slice is rendered, only two dimensions are available for texture coordinates, and the third
coordinate must therefore be constant. Also, bi-linear interpolation would not be sufficient
for resampling otherwise. Now, instead of being used as an actual texture coordinate,
the third coordinate selects the texture to use from the stack of slices, and the other
two coordinates become the actual 2D texture coordinates used for rendering the slice.
Rendering proceeds from back to front, blending one slice on top of the other (see figure 5.2).
Although a single stack of 2D slices can store the entire volume, one slice stack does not
40
A
B
C
Figure 5.5: The location of sampling points changes abruptly (C), when switching from
one slice stack (A), to the next (B).
suffice for rendering. When the viewpoint is rotated about the object, it would be possible
to see between individual slices, which cannot be prevented with only one slice stack. The
solution for this problem is to actually store three slice stacks, one for each of the major
axes. During rendering, the stack with slices most parallel to the viewing direction is
chosen (see figure 5.4). Under-sampling typically occurs most visibly along the major axis
of the slice stack currently in use, which can be seen in figure 5.1. Additional artifacts
become visible when the slice stack in use is switched from one stack to the next. The
reason for this is that the actual locations of sampling points change abruptly when the
stacks are switched, which is illustrated in figure 5.5. To summarize, an obvious drawback
of using object-aligned 2D slices is the requirement for three slice stacks, which consume
three times the texture memory a single 3D texture would consume. When choosing a
stack for rendering, an additional consideration must also be taken into account: After
selecting the slice stack, it must be rendered in one of two directions, in order to guarantee
actual back-to-front rendering. That is, if a stack is viewed from the back (with respect to
the stack itself), it has to be rendered in reversed order, to achieve the desired result.
The following code fragment (continued on the next page) shows how both of these
decisions, depending on the current viewing direction with respect to the volume, could be
implemented:
GLfloat model view matrix[16];
GLfloat model view rotation matrix[16];
// obtain the current viewing transformation from the OpenGL state
glGet( GL MODELVIEW MATRIX, model view matrix );
// extract the rotation from the matrix
GetRotation( model view matrix, model view rotation matrix );
// rotate the initial viewing direction
GLfloat view vector[3] = {0.0f, 0.0f, -1.0f};
MatVecMultiply( model view rotation matrix, view vector );
41
// find the largest absolute vector component
int max component = FindAbsMaximum( view vector );
// render slice stack according to viewing direction
switch ( max component ) {
case X:
if ( view vector[X] > 0.0f )
DrawSliceStack PositiveX();
else
DrawSliceStack NegativeX();
break;
case Y:
if ( view vector[Y] > 0.0f )
DrawSliceStack PositiveY();
else
DrawSliceStack NegativeY();
break;
case Z:
if ( view vector[Z] > 0.0f )
DrawSliceStack PositiveZ();
else
DrawSliceStack NegativeZ();
break;
}
Opacity Correction
In hardware-accelerated texture-based volume rendering, hardware alpha blending is used
to achieve the same effect as compositing samples along a ray in ray-casting. This alpha
blending operation actually is a method for performing a numerical integration of the
volume rendering integral (section 2.3.4). The distance between successive resampling
locations along a “ray,” i.e., the distance at which the integral is approximated by a
summation, most of all depends on the distance between adjacent slices.
The sampling distance is easiest to account for if it is constant for all “rays” (i.e.,
pixels). In this case, it can be incorporated into the numerical integration in a preprocess,
d0
d1
d2
d3
d4
Figure 5.6: The distance between adjacent sampling points depends on the viewing angle.
42
which is usually done by simply adjusting the transfer function lookup table accordingly.
In the case of 3D-textured slices (and orthogonal projection), the slice distance is equal
to the sampling distance, which is also equal for all “rays” (i.e., pixels). Thus, it can be
accounted for in a preprocess. When 2D-textured slices are used, however, the distance
between successive samples for each pixel not only depends on the slice distance, but also
on the viewing direction. This is shown in figure 5.6 for two adjacent slices. The sampling
distance is only equal to the slice distance when the stack is viewed perpendicularly to its
major axis. When the view is rotated, the sampling distance increases. For this reason,
the lookup table for numerical integration (the transfer function table, see chapter 17) has
to be updated on each change of the viewing direction.
The correction of the transfer function in order to account for the viewing direction is
usually done in an approximate manner, by simply multiplying the stored opacities by the
reciprocal of the cosine between the viewing vector and the stack direction vector:
// determine cosine via dot-product; vectors must be normalized!
float correction cosine = DotProduct3( view vector, stack vector );
// determine correction factor
float opacity correction factor = ( correction cosine != 0.0f ) ?
1.0f/correction cosine : 1.0f;
Note that although this correction factor is used for correcting opacity values, it must also
be applied to the respective RGB colors, if these are stored as opacity-weighted colors,
which usually is the case [42].
Discussion
The biggest advantage of using object-aligned slices and 2D textures for volume rendering
is that 2D textures and the corresponding bi-linear interpolation are a standard feature
of all 3D graphics hardware architectures, and therefore this approach can practically
be implemented anywhere. Also, the rendering performance is extremely high, since bilinear interpolation requires only a lookup and weighting of four texels for each resampling
2D-Textured Object-Aligned Slices
Pros
Cons
⊕ very high performance
⊕ high availability
high memory requirements
bi-linear interpolation only
sampling and switching artifacts
inconsistent sampling rate
Table 5.1: Summary of volume rendering with object-aligned slices and 2D textures.
43
general
com biner 0
inputregisters
RG B
slice i
A
IN VER T
B
A
Alpha Portion
interpolated
color
B
AB + CD
slice i+1
texture 1
gradient
intensity
C
D
interpolation
factor
outputregister
A
texture 0
gradient
intensity
final
com biner
C
R G B Portion
interpolated
alpha
AB+
(1-A )C
+D
RG B
A
fragm ent
D
G
constcol0
Figure 5.7: Register combiners setup for interpolation of intermediate slices.
operation.
The major disadvantages of this approach are the high memory requirements, due to
the three slice stacks that are required, and the restriction to using two-dimensional, i.e.,
usually bi-linear, interpolation for texture reconstruction. The use of object-aligned slice
stacks also leads to sampling and stack switching artifacts, as well as inconsistent sampling
rates for different viewing directions. A brief summary is contained in table 5.1.
5.3
2D Slice Interpolation
Figure 5.1 shows a fundamental problem of using 2D texture-mapped slices as proxy geometry for volume rendering. In contrast to view-aligned 3D texture-mapped slices (section 5.4), the number of slices cannot be changed easily, because each slice corresponds
to exactly one slice from the slice stack. Furthermore, no interpolation between slices is
performed at all, since only bi-linear interpolation is used within each slice. Because of
these two properties of that algorithm, artifacts can become visible when there are too few
slices, and thus the sampling frequency is too low with respect to frequencies contained in
the volume and the transfer function.
In order to increase the sampling frequency without enlarging the volume itself (e.g.,
by generating additional interpolated slices before downloading them to the graphics hardware), inter-slice interpolation has to be performed on-the-fly by the graphics hardware
itself. On the consumer hardware we are targeting (i.e., NVIDIA GeForce or later, and
ATI Radeon 8500 or later), this can be achieved by using two simultaneous textures when
rendering a single slice, instead of just one texture, and performing linear interpolation
between these two textures [36].
In order to do this, we have to specify fractional slice positions, where the integers
correspond to slices that actually exist in the source slice stack, and the fractional part
determines the position between two adjacent slices. The number of rendered slices is
now independent from the number of slices contained in the volume, and can be adjusted
arbitrarily.
For each slice to be rendered, two textures are activated, which correspond to the two
44
neighboring original slices from the source slice stack. The fractional position between these
slices is used as weight for the inter-slice interpolation. This method actually performs trilinear interpolation within the volume. Standard bi-linear interpolation is employed for
each of the two neighboring slices, and the interpolation between the two obtained results
altogether achieves tri-linear interpolation.
A register combiners setup for on-the-fly interpolation of intermediate slices can be seen
in figure 5.7. The two source slices that enclose the position of the slice to be rendered
are configured as texture 0 and texture 1, respectively. An interpolation value between
0.0 (corresponding to slice 0) and 1.0 (corresponding to slice 1), determines the weight for
linear interpolation between these two textures, and is stored in a constant color register.
The final fragment contains the linearly interpolated result corresponding to the specified
fractional slice position.
Discussion
The biggest advantage of using object-aligned slices together with on-the-fly interpolation
between two 2D textures for volume rendering is that this method combines the advantages
of using only 2D textures with the capability of arbitrarily controlling the sampling rate,
i.e., the number of slices. Although not entirely comparable to tri-linear interpolation in
a 3D texture, the combination of bi-linear interpolation and a second linear interpolation
step ultimately allows tri-linear interpolation in the volume. The necessary features of
consumer hardware, i.e., multi-texturing with at least two simultaneous textures, and the
ability to interpolate between them, are widely available on consumer graphics hardware.
Disadvantages inherent to the use of object-aligned slice stacks still apply, though.
For example, the undesired visible effects when switching slice stacks, and the memory
consumption of the three slice stacks. A brief summary is contained in table 5.2.
5.4
3D-Textured View-Aligned Slices
In many respects, 3D-textured view-aligned slices are the simplest kind of proxy geometry
(see figure 5.3). In this case, the volume is stored in a single 3D texture, and 3D texture
coordinates are interpolated over the interior of proxy geometry polygons. These texture
2D Slice Interpolation
Pros
Cons
⊕ high performance
⊕ tri-linear interpolation
⊕ available on consumer hardware
high memory requirements
switching effects
inconsistent sampling rate
for perspective projection
Table 5.2: Summary of 2D slice interpolation volume rendering.
45
A.ParallelProjection
lane
im age p
im
e
lan
p
age
eye
ing
vieways
r
ng
wi s
vie ray
view ing
rays
g
w in
vie ays
r
e
lan
p
ge
im a
B.Perspective projection
lane
im age p
eye
Figure 5.8: Sampling locations on view-aligned slices for parallel (A), and perspective
projection (B), respectively.
coordinates are then used directly for indexing the 3D texture map at the corresponding
location, and thus resampling the volume.
The big advantage of 3D texture mapping is that it allows slices to be oriented arbitrarily with respect to the 3D texture domain, i.e., the volume itself. Thus, it is natural to use
slices aligned with the viewport, since such slices closely mimic the ray-casting algorithm.
They offer constant distance between samples for orthogonal projection and all viewing
directions, see figure 5.8(A). Since the graphics hardware is already performing completely
general tri-linear interpolation within the volume for each resampling location, proxy slices
are not bound to original slices at all. Thus, the number of slices can easily be adjusted
on-the-fly and without any restrictions, or the need for separately configuring inter-slice
interpolation.
In the case of perspective projection, the distance between successive samples is different
for adjacent pixels, however, which is depicted in figure 5.8(B). If the artifacts caused by a
not entirely accurate compensation for sampling distance is deemed noticeable, spherical
shells (section 5.5) can be employed instead of planar slices.
3D-Textured View-Aligned Slices
Pros
Cons
⊕ high performance
⊕ tri-linear interpolation
availability still limited
inconsistent sampling rate
for perspective projection
Table 5.3: Summary of 3D-texture based volume rendering.
46
Discussion
The biggest advantage of using view-aligned slices and 3D textures for volume rendering
is that tri-linear interpolation can be employed for resampling the volume at arbitrary
locations. Apart from better image quality than with using bi-linear interpolation, this
allows to render slices with arbitrary orientation with respect to the volume, which makes
it possible to maintain a constant sampling rate for all pixels and viewing directions.
Additionally, a single 3D texture suffices for storing the entire volume.
The major disadvantage of this approach is that it requires hardware-native support
for 3D textures, which is not yet widely available, and tri-linear interpolation is also significantly slower than bi-linear interpolation, due to the requirement for using eight texels
for every single output sample, and texture fetch patterns that decrease the efficiency of
texture caches. A brief summary is contained in table 5.3.
5.5
3D-Textured Spherical Shells
All types of proxy geometry that use planar slices (irrespective of whether they are objectaligned, or view-aligned), share the basic problem that the distance between successive
samples used to determine the color of a single pixel is different from one pixel to the next
in the case of perspective projection. This fact is illustrated in figure 5.8(B).
When incorporating the sampling distance in the numerical approximation of the volume rendering integral, this pixel-to-pixel difference cannot easily be accounted for. One
solution to this problem is to use spherical shells instead of planar slices [23]. In order to
attain a constant sampling distance for all pixels, the proxy geometry has to be spherical,
i.e., be comprised of concentric spheres, or parts thereof. In practice, these shells are generated by clipping tessellated spheres against both the viewing frustum and the bounding
box of the volume data.
The major drawback of using spherical shells as proxy geometry is that they are more
complicated to setup than planar slice stacks, and they also require more geometry to be
rendered, i.e., parts of tessellated spheres.
This kind of proxy geometry is only useful when perspective projection is used, and
can only be used in conjunction with 3D texture mapping. Furthermore, the artifacts of
pixel-to-pixel differences in sampling distance are often hardly noticeable, and planar slice
stacks usually suffice also when perspective projection is used.
5.6
Slices vs. Slabs
An inherent problem of using slices as proxy geometry is that the number of slices directly
determines the (re)sampling frequency, and thus the quality of the rendered result. Especially when high frequencies are contained in the employed transfer functions, the required
number of slices can become very high. Thus, even though the number of slices can be
47
increased on-the-fly via interpolation done by the graphics hardware itself, the fill rate
demands increase significantly.
A very elegant solution to this problem is to use slabs instead of slices, together with
pre-integrated classification [12], which is described in more detail in chapter 19. A slab
is no new geometrical primitive, but simply the space between two adjacent slices. During
rendering, this space is properly accounted for, instead of simply rendering infinitesimally
thin slices, by looking up the pre-integrated result of the volume rendering integral from
the back slice to the front slice in a lookup table, i.e., a texture. Geometrically, a slab
can be rendered as a slice with its immediately neighboring slice (either in the back, or in
front) projected onto it.
For details on rendering with slabs instead of slices, we refer you to chapter 19.
48
Components of a
Hardware Volume Renderer
This chapter presents an overview of the major components of a texture-based hardware
volume renderer from an implementation-centric point of view. The goal of this chapter
is to convey a feeling of where the individual components of such a renderer fit in, and in
what order they are executed, and leave the details up to later chapters. The component
structure presented here is modeled after separate portions of code that can be found in an
actual implementation of a volume renderer for consumer graphics cards. They are listed
in the order in which they are executed by the application code, which is not the same as
they are “executed” by the graphics hardware itself!
6.1
Volume Data Representation
Volume data has to be stored in memory in a suitable format, usually already prepared for
download to the graphics hardware as textures. Depending on the kind of proxy geometry
used, the volume can either be stored in a single block, when view-aligned slices together
with a single 3D texture are used, or split up into three stacks of 2D slices, when objectaligned slices together with multiple 2D textures are used. Usually, it is convenient to store
the volume only in a single 3D array, which can be downloaded as a single 3D texture, and
extract data for 2D textures on-the-fly, just as needed.
Depending on the complexity of the rendering mode, classification, and illumination,
there may even be several volumes containing all the information needed. Likewise, the
actual storage format of voxels depends on the rendering mode and the type of volume,
e.g., whether the volume stores densities, gradients, gradient magnitudes, and so on. Conceptually different volumes may also be combined into the same actual volume, if possible.
For example, combining gradient and density data in RGBA voxels.
Although it is often the case that the data representation issue is part of a preprocessing
step, this is not necessarily so, since new data may have to be generated on-the-fly when
the rendering mode or specific parameters are changed.
This component is usually executed only once at startup, or only executed when the
rendering mode changes.
6.2
Transfer Function Representation
Transfer functions are usually represented by color lookup tables. They can be onedimensional or multi-dimensional, and are usually stored as simple arrays.
This component is usually user-triggered, when the user is changing the transfer function.
49
6.3
Volume Textures
In order for the graphics hardware to be able to access all the required volume information,
the volume data must be downloaded and stored in textures. At this stage, a translation
from data format (external texture format) to texture format (internal texture format)
might take place, if the two are not identical.
This component is usually executed only once at startup, or only executed when the
rendering mode changes.
How and what textures containing the actual volume data have to be downloaded to
the graphics hardware depends on a number of factors, most of all the rendering mode and
type of classification, and whether 2D or 3D textures are used.
The following example code fragment downloads a single 3D texture for rendering with
view-aligned slices. The internal format consists of 8-bit color indexes, and for this reason
a color lookup table must also be downloaded subsequently (section 6.4):
// bind 3D texture target
glBindTexture( GL TEXTURE 3D, volume texture name 3d );
glTexParameteri( GL TEXTURE 3D, GL TEXTURE WRAP S, GL CLAMP );
glTexParameteri( GL TEXTURE 3D, GL TEXTURE WRAP T, GL CLAMP );
glTexParameteri( GL TEXTURE 3D, GL TEXTURE WRAP R, GL CLAMP );
glTexParameteri( GL TEXTURE 3D, GL TEXTURE MAG FILTER, GL LINEAR );
glTexParameteri( GL TEXTURE 3D, GL TEXTURE MIN FILTER, GL LINEAR );
// download 3D volume texture for pre-classification
glTexImage3D( GL TEXTURE 3D, 0, GL COLOR INDEX8 EXT,
size x, size y, size z,
GL COLOR INDEX, GL UNSIGNED BYTE, volume data 3d );
When pre-classification is not used, an intensity volume texture is usually downloaded
instead, shown here for a single 3D texture (texture target binding identical to above):
// download 3D volume texture for post-classification/pre-integration
glTexImage3D( GL TEXTURE 3D, 0, GL INTENSITY8,
size x, size y, size z,
GL LUMINANCE, GL UNSIGNED BYTE, volume data 3d );
If 2D textures are used instead of 3D textures, similar commands have to be used in order
to download all the slices of all three slice stacks.
50
6.4
Transfer Function Tables
Transfer functions may be downloaded to the hardware in basically one of two formats:
In the case of pre-classification, transfer functions are downloaded as texture palettes for
on-the-fly expansion of palette indexes to RGBA colors. If post-classification is used,
transfer functions are downloaded as 1D, 2D, or even 3D textures (the latter two for multidimensional transfer functions). If pre-integration is used, the transfer function is only
used to calculate a pre-integration table, but not downloaded to the hardware itself. Then,
this pre-integration table is downloaded instead. This component might even not be used
at all, which is the case when the transfer function has already been applied to the volume
textures themselves, and they are already in RGBA format.
This component is usually only executed when the transfer function or rendering mode
changes.
How and what transfer function tables have to be downloaded to the graphics hardware
depends on the type of classification that is used.
The following code fragment downloads a single texture palette that can be used in
conjunction with an indexed volume texture for pre-classification. The same code can be
used for rendering with either 2D, or 3D slices, respectively:
// download color table for pre-classification
glColorTableEXT( GL SHARED TEXTURE PALETTE EXT,
GL RGBA8, 256 * 4, GL RGBA,
GL UNSIGNED BYTE, opacity corrected palette );
If post-classification is used instead, the same transfer function table can be used, but it
must be downloaded as a 1D texture instead of a texture palette:
// bind 1D texture target
glBindTexture( GL TEXTURE 1D, palette texture name );
glTexParameteri( GL TEXTURE 1D, GL TEXTURE WRAP S, GL CLAMP );
glTexParameteri( GL TEXTURE 1D, GL TEXTURE MAG FILTER, GL LINEAR );
glTexParameteri( GL TEXTURE 1D, GL TEXTURE MIN FILTER, GL LINEAR );
// download 1D transfer function texture for post-classification
glTexImage1D( GL TEXTURE 1D, 0, GL RGBA8, 256 * 4, 0,
GL RGBA, GL UNSIGNED BYTE, opacity corrected palette );
If pre-integration is used, a pre-integration texture is downloaded instead of the transfer
function table itself (chapter 19).
51
6.5
Fragment Shader Configuration
Before the volume can be rendered using a specific rendering mode, the fragment shader
has to be configured accordingly. How textures are stored and what they contain is crucial
for the fragment shader. Likewise, the format of the shaded fragment has to correspond
to what is expected by the alpha blending stage (section 6.6).
This component is usually executed once per frame, i.e., the entire volume can be rendered with the same fragment shader configuration.
The code that determines the operation of the fragment shader is highly dependent on
the actual hardware architecture used (section 3.3.2). The following code fragment roughly
illustrates the sequence of operations on the GeForce architecture:
// configure texture shaders
glTexEnvi( GL TEXTURE SHADER NV, GL SHADER OPERATION NV, ...
...
// enable texture shaders
glEnable( GL TEXTURE SHADER NV );
// configure register combiners
glCombinerParameteriNV( GL NUM GENERAL COMBINERS NV, 1 );
glCombinerInputNV( GL COMBINER0 NV, ... );
glCombinerOutputNV( GL COMBINER0 NV, ... );
glFinalCombinerInputNV( ... );
...
// enable register combiners
glEnable( GL REGISTER COMBINERS NV );
);
A “similar” code fragment for configuring a fragment shader on the Radeon 8500 architecture could look like this:
// configure fragment shader
GLuint shader name = glGenFragmentShadersATI( 1 );
glBindFragmentShaderATI( shader name );
glBeginFragmentShaderATI();
glSampleMapATI( GL REG 0 ATI, GL TEXTURE0 ARB, GL SWIZZLE STR ATI );
glColorFragmentOp2ATI( GL MUL ATI, ... );
...
glEndFragmentShaderATI();
// enable fragment shader
glEnable( GL FRAGMENT SHADER ATI );
52
6.6
Blending Mode Configuration
The blending mode determines how a fragment is combined with the corresponding pixel in
the frame buffer. In addition to the configuration of alpha blending, we also configure alpha
testing in this component, if it is needed for discarding fragments that do not correspond to
the desired iso-surface. Although the alpha test and alpha blending are the last two steps
that are actually executed by the graphics hardware in our volume rendering pipeline, they
have to be configured before actually rendering any geometry. This configuration usually
stays the same for an entire frame.
This component is usually executed once per frame, i.e., the entire volume can be rendered with the same blending mode configuration.
For direct volume rendering, the blending mode is more or less standard alpha
blending. Since color values are usually pre-multiplied by the corresponding opacity (also
known as opacity-weighted [42], or associated [7] colors), the factor for multiplication with
the source color is one:
// enable blending
glEnable( GL BLEND );
// set blend function
glBlendFunc( GL ONE, GL ONE MINUS SRC ALPHA );
For non-polygonal iso-surfaces, alpha testing has to be configured for selection of fragments corresponding to the desired iso-values. The comparison operator for comparing a
fragment’s density value with the reference value is usually GL GREATER, or GL LESS, since
using GL EQUAL is not well suited to producing a smooth surface appearance (not many
interpolated density values are exactly equal to a given reference value). Alpha blending
must be disabled for this rendering mode. More details about rendering non-polygonal
iso-surfaces, especially with regard to illumination, can be found in chapter 8.
// disable blending
glDisable( GL BLEND );
// enable alpha testing
glEnable( GL ALPHA TEST );
// configure alpha test function
glAlphaFunc( GL GREATER, isovalue );
For maximum intensity projection, an alpha blending equation of GL MAX EXT
must be supported, which is either a part of the imaging subset, or the separate
GL EXT blend minmax extension. On consumer graphics hardware, querying for the latter extension is the best way to determine availability of the maximum operator.
53
// enable blending
glEnable( GL BLEND );
// set blend function to identity (not really necessary)
glBlendFunc( GL ONE, GL ONE );
// set blend equation to max
glBlendEquationEXT( GL MAX EXT );
6.7
Texture Unit Configuration
The use of texture units corresponds to the inputs required by the fragment shader. Before
rendering any geometry, the corresponding textures have to be bound. When 3D textures
are used, the entire configuration of texture units usually stays the same for an entire
frame. In the case of 2D textures, the textures that are bound change for each slice.
This component is usually executed once per frame, or once per slice, depending on
whether 3D, or 2D textures are used.
The following code fragment shows an example for configuring two texture units for
interpolation of two neighboring 2D slices from the z slice stack (section 5.3):
// configure texture unit 1
glActiveTextureARB( GL TEXTURE1 ARB );
glBindTexture( GL TEXTURE 2D, volume texture names stack z[sliceid1]);
glEnable( GL TEXTURE 2D );
// configure texture unit 0
glActiveTextureARB( GL TEXTURE0 ARB );
glBindTexture( GL TEXTURE 2D, volume texture names stack z[sliceid0]);
glEnable( GL TEXTURE 2D );
6.8
Proxy Geometry Rendering
The last component of the execution sequence outlined in this chapter, is getting the
graphics hardware to render geometry. This is what actually causes the generation of
fragments to be shaded and blended into the frame buffer, after resampling the volume
data accordingly.
This component is executed once per slice, irrespective of whether 3D or 2D textures
are used.
Explicit texture coordinates are usually only specified when rendering 2D texturemapped, object-aligned slices. In the case of view-aligned slices, texture coordinates can
easily be generated automatically, by exploiting OpenGL’s texture coordinate generation
54
mechanism, which has to be configured before the actual geometry is rendered:
// configure texture coordinate generation for view-aligned slices
float plane x[]= { 1.0f, 0.0f, 0.0f, 0.0f };
float plane y[]= { 0.0f, 1.0f, 0.0f, 0.0f };
float plane z[]= { 0.0f, 0.0f, 1.0f, 0.0f };
glTexGenfv( GL S, GL OBJECT PLANE, plane x );
glTexGenfv( GL T, GL OBJECT PLANE, plane y );
glTexGenfv( GL R, GL OBJECT PLANE, plane z );
glEnable( GL TEXTURE GEN S );
glEnable( GL TEXTURE GEN T );
glEnable( GL TEXTURE GEN R );
The following code fragment shows an example of rendering a single slice as an OpenGL
quad. Texture coordinates are specified explicitly, since this code fragment is intended for
rendering a slice from a stack of object-aligned slices with z as its major axis:
// render a single slice as quad (four vertices)
glBegin( GL QUADS );
glTexCoord2f( 0.0f, 0.0f );
glVertex3f( 0.0f, 0.0f, axis pos z );
glTexCoord2f( 0.0f, 1.0f );
glVertex3f( 0.0f, 1.0f, axis pos z );
glTexCoord2f( 1.0f, 1.0f );
glVertex3f( 1.0f, 1.0f, axis pos z );
glTexCoord2f( 1.0f, 0.0f );
glVertex3f( 1.0f, 0.0f, axis pos z );
glEnd();
Vertex coordinates are specified in object-space, and transformed to view-space using the
modelview matrix. In the case of object-aligned slices, all the glTexCoord2f() commands
can simply be left out. If multi-texturing is used, a simple vertex program can be exploited
for generating the texture coordinates for the additional units, instead of downloading the
same texture coordinates to multiple units. On the Radeon 8500 it is also possible to use
the texture coordinates from unit zero for texture fetch operations at any of the other units,
which solves the problem of duplicate texture coordinates in a very simple way, without
requiring a vertex shader or wasting bandwidth.
55
Acknowledgments
I would like to express a very special thank you to Christof Rezk-Salama for the
diagrams and figures in this chapter. Berk Özer provided valuable comments and
proof-reading. Thanks are also due to the VRVis Research Center for supporting the
preparation of these course notes in the context of the basic research on visualization
(http://www.VRVis.at/vis/). The VRVis Research Center is funded by an Austrian
research program called K plus.
High-Quality Volume Graphics
on Consumer PC Hardware
Illumination Techniques
Klaus Engel
Course Notes 42
Markus Hadwiger
Joe M. Kniss
Christof Rezk-Salama
57
Local Illumination
Local illumination models allow the approximation of the light intensity reflected from a
point on the surface of an object. This intensity is evaluated as a function of the (local)
orientation of the surface with respect to the position of a point light source and some
material properties. In comparison to global illumination models indirect light, shadows
and caustics are not taken into account. Local illumination models are simple, easy to
evaluate and do not require the computational complexity of global illumination. The most
popular local illumination model is the Phong model [35, 5], which computes the lighting
as a linear combination of three different terms, an ambient, a diffuse and a specular term,
IPhong
=
Iambient + Idiffuse + Ispecular .
Ambient illumination is modeled by a constant term,
Iambient = ka = const.
Without the ambient term parts of the geometry that are not directly lit would be
completely black. In the real world such indirect illumination effects are caused by light
intensity which is reflected from other surfaces.
Diffuse reflection refers to light which is reflected with equal intensity in all directions (Lambertian reflection). The brightness of a dull, matte surface is independent of
the viewing direction and depends only on the angle of incidence ϕ between the direction
l of the light source and the surface normal n. The diffuse illumination term is written as
Idiffuse = Ip kd cos ϕ = Ip kd (l • n).
Ip is the intensity emitted from the light source. The surface property kd is a constant
between 0 and 1 specifying the amount of diffuse reflection as a material specific constant.
Specular reflection is exhibited by every shiny surface and causes so-called highlights. The
specular lighting term incorporates the vector v that runs from the object to the viewer’s
eye into the lighting computation. Light is reflected in the direction of reflection r which
is the direction of light l mirrored about the surface normal n. For efficiency the reflection
vector r can be replaced by the halfway vector h,
Ispecular = Ip ks cosn α = Ip ks (h • n)n .
The material property ks determines the amount of specular reflection. The exponent n is
called the shininess of the surface and is used to control the size of the highlights.
58
Gradient Estimation
The Phong illumination models uses the normal vector to describe the local shape of an
object and is primarily used for lighting of polygonal surfaces. To include the Phong
illumination model into direct volume rendering, the local shape of the volumetric data set
must be described by an appropriate type of vector.
For scalar fields, the gradient vector is an appropriate substitute for the surface normal
as it represents the normal vector of the isosurface for each point. The gradient vector is
the first order derivative of a scalar field I(x, y, z), defined as
∇I = (Ix , Iy , Iz ) = (
δ
δ
δ
I,
I,
I),
δx δy
δz
(9.1)
using the partial derivatives of I in x-, y- and z-direction respectively. The scalar magnitude
of the gradient measures the local variation of intensity quantitatively. It is computed as
the absolute value of the vector,
||∇I|| =
Ix 2 + Iy 2 + Iz 2 .
(9.2)
For illumination purposes only the direction of the gradient vector is of interest.
There are several approaches to estimate the directional derivatives for discrete voxel
data. One common technique based on the first terms from a Taylor expansion is the
central differences method. According to this, the directional derivative in x-direction is
calculated as
Ix (x, y, z) = I(x + 1, y, z) − I(x − 1, y, z) with x, y, z ∈ IN.
(9.3)
Derivatives in the other directions are computed analogously. Central differences are
usually the method of choice for gradient pre-computation. There also exist some gradientless shading techniques which do not require the explicit knowledge of the gradient vectors.
Such techniques usually approximate the dot product with the light direction by a forward
difference in direction of the light source.
59
Non-polygonal Shaded Isosurfaces
Rendering a volume data set with opacity values of only 0 and 1, will result in an isosurface
or an isovolume. Without illumination, however, the resulting image will show nothing but
the silhouette of the object as displayed in Figure 10.1 (left). It is obvious, that illumination
techniques are required to display the surface structures (middle and right).
In a pre-processing step the gradient vector is computed for each voxel using the central
differences method or any other gradient estimation scheme. The three components of the
normalized gradient vector together with the original scalar value of the data set are stored
as RGBA quadruplet in a 3D-texture:

Ix
∇I =  Iy 
Iz

I
−→
−→
−→
R
G
B
−→
A
The vector components must be normalized, scaled and biased to adjust their signed range
[−1, 1] to the unsigned range [0, 1] of the color components. In our case the alpha channel contains the scalar intensity value and the OpenGL alpha test is used to discard all
fragments that do not belong to the isosurface specified by the reference alpha value. The
setup for the OpenGL alpha test is displayed in the following code sample. In this case, the
number of slices must be increased extremely to obtain satisfying images. Alternatively the
Figure 10.1: Non-polygonal isosurface without illumination (left), with diffuse illumination
(middle) and with specular light (right)
60
alpha test can be set up to check for GL GREATER or GL LESS instead of GL EQUAL, allowing
a considerable reduction of the sampling rate.
glDisable(GL BLEND); // Disable alpha blending
glEnable(GL ALPHA TEST); // Enable Alpha Test for isosurface
glAlphaFunc(GL EQUAL, fIsoValue);
What is still missing now is the calculation the Phong illumination model. Current
graphics hardware provides functionality for dot product computation in the texture application step which is performed during rasterization. Several different OpenGL extensions have been proposed by different manufacturers, two of which will be outlined in the
following.
The original implementation of non-polygonal isosurfaces was presented by Westermann
and Ertl [41]. The algorithm was expanded to volume shading my Meissner et al [29].
Efficient implementations on PC hardware are described in [36].
61
Per-Pixel Illumination
The integration of the Phong illumination model into a single-pass volume rendering procedure requires a mechanism that allows the computation of dot products and componentwise products in hardware. This mechanism is provided by the pixel-shaders functionality
of modern consumer graphics boards. For each voxel, the x-, y- and z-components of the
(normalized) gradient vector is pre-computed and stored as color components in an RGB
texture. The dot product calculations are directly performed within the texture unit during
rasterization.
A simple mechanism that supports dot product calculation is provided by the standard OpenGL extension EXT texture env dot3. This extension to the OpenGL texture
environment defines a new way to combine the color and texture values during texture applications. As shown in the code sample, the extension is activated by setting the texture
environment mode to GL COMBINE EXT. The dot product computation must be enabled by
selecting GL DOT3 RGB EXT as combination mode. In the sample code the RGBA quadruplets (GL SRC COLOR) of the primary color and the texel color are used as arguments.
#if defined GL EXT texture env dot3
// enable the extension
glTexEnvi(GL TEXTURE ENV, GL TEXTURE ENV MODE, GL COMBINE EXT);
// preserve the alpha value
glTexEnvi(GL TEXTURE ENV, GL COMBINE ALPHA EXT, GL REPLACE);
// enable dot product computation
glTexEnvi(GL TEXTURE ENV, GL COMBINE RGB EXT, GL DOT3 RGB EXT);
// first argument: light direction stored in primary color
glTexEnvi(GL TEXTURE ENV, GL SOURCE0 RGB EXT, GL PRIMARY COLOR EXT);
glTexEnvi(GL TEXTURE ENV, GL OPERAND0 RGB EXT, GL SRC COLOR);
// second argument: voxel gradient stored in RGB texture
glTexEnvi(GL TEXTURE ENV, GL SOURCE1 RGB EXT, GL TEXTURE);
glTexEnvi(GL TEXTURE ENV, GL OPERAND1 RGB EXT, GL SRC COLOR);
#endif
This simple implementation does neither account for the specular illumination term,
nor for multiple light sources. More flexible illumination effects with multiple light sources
can be achieved using the NVidia register combiners or similar extensions.
62
Advanced Per-Pixel Illumination
The drawback of the simple implementation described in the previous section is its restriction to a single diffuse light source. Current rasterization hardware, however, allows
the computation of the diffuse and specular terms of multiple light sources. To access
these features, hardware-specific OpenGL extensions such as NVidia’s register combiners
or ATI’s fragment shaders are required. Examples of such more flexible implementations
are illustrated using the NVidia register combiners extension. The combiner setup for diffuse illumination with two independent light sources is displayed in Figure 12.1. Activating
general
com biner 0
inputregisters
RG B
slice i
gradient
intensity
general
com biner 1
final
com biner
outputregister
A
texture 0
A
texture 1
B
direction of
light1
constcol0
coloroflight
source 1
prim .color
direction of
light2
constcol1
coloroflight
source 2
sec.color
A B
A
C
dot
product
B
D
C D
D
C
A
com p.
w ise
A B +C D
O NE
B
ZER O
C
RG B
D
A LPH A
G
AB +
(1-A )C
+D
RG B
A
fragm ent
Figure 12.1: NVidia register combiner setup for diffuse illumination with two independent
light sources.
general
com biner 0
inputregisters
RG B
slice i
gradient
intensity
general
com biner 1
final
com biner
outputregister
A
texture 0
texture 1
A
B
C
direction of
light1
constcol0
coloroflight
source 1
prim .color
D
dot
product
A B
A
AB
A
B
B
C
com p.
w ise
product
D
CD
C
+
AB +
(1-A )C
+D
RG B
A
fragm ent
D
ZER O
RG B
E
EF
F
A LPH A
G
Figure 12.2: NVidia register combiner setup for diffuse and specular illumination. The
additional sum (+) is achieved using the spare0 and secondary color registers of the
final combiner stage.
63
additional combiner stages also allows the computation of the diffuse terms for more than
two light sources. Specular and diffuse illumination can be achieved using the register combiner setup displayed in Figure 12.2. Both implementations assume that the pre-computed
gradient and the emission/absorbtion coefficients are kept in separate textures. Texture 0
stores the normalized gradient vectors. The emission and absorbtion values are generated
from the original intensity values stored in texture 1 by color table lookup. All methods
can alternatively be implemented using ATI’s OpenGL extension.
A
B
C
D
E
F
Figure 12.3: CT data of a human hand without illumination (A), with diffuse illumination (B) and with specular illumination (C). Non-polygonal isosurfaces with diffuse (D),
specular (C) and diffuse and specular (E) illumination.
64
Reflection Maps
If the illumination computation becomes too complex for on-the-fly computation, alternative lighting techniques such as reflection mapping come into play. The idea of reflection
mapping originates from 3D computer games and represents a method to pre-compute
complex illumination scenarios. The usefulness of this approach derives from its ability
to realize local illumination with an arbitrary number of light sources and different illumination parameters at low computational cost. A reflection map caches the incident
illumination from all directions at a single point in space.
The idea of reflection mapping has been first suggested by Blinn [6]. The term environment mapping was coined by Greene [14] in 1986. Closely related to the diffuse and
specular terms of the Phong illumination model, reflection mapping can be performed with
diffuse maps or reflective environment maps. The indices into a diffuse reflection map are
directly computed from the normal vector, whereas the coordinates for an environment
map are a function of both the normal vector and the viewing direction. Reflection maps
in general assume that the illuminated object is small with respect to the environment that
contains it.
Figure 13.1: Example of a environment cube map.
A special parameterization of the normal direction is used in order to construct a cube
map as displayed in Figure 13.1. In this case the environment is projected onto the six
sides of a surrounding cube. The largest component of the reflection vector indicates the
65
appropriate side of the cube and the remaining vector components are used as coordinates for the corresponding texture map. Cubic mapping is popular because the required
reflection maps can easily be constructed using conventional rendering systems and photography. The implementation of cubic diffuse and reflective environment maps can be
accomplished using the OpenGL extension GL NV texture shader. The setup is displayed
in the following code sample. Four texture units are involved in this configuration. Texture 0 is a 3D-texture which contains the pre-computed gradient vectors. In texture unit
0 a normal vector is interpolated from this texture. Since the reflection map is generated
in the world coordinate space, accurate application of a normal map requires to account
for the local transformation represented by the current modeling matrix. For reflective
maps the viewing direction must also be taken into account. In the OpenGL extension,
the local 3 × 3 modeling matrix and the camera position is specified as texture coordinates
for the texture units 1, 2 and 3. From this information the GPU constructs the viewing
direction and valid normal vectors in world coordinates in texture unit 1. The diffuse and
the reflective cube maps are applied in texture unit 2 and texture unit 3, respectively.
As a result, the texture registers 2 and 3 contain the appropriately sampled diffuse and
reflective environment map. These values are finally combined to form the final color of
the fragment using the register combiner extension.
Figure 13.2: Isosurface of the engine block with diffuse reflection map (left) and specular
environment map (right).
#if defined GL NV texture shader
// texture unit 0 - sample normal vector from 3D-texture
glActiveTextureARB(GL TEXTURE0 ARB);
glEnable(GL TEXTURE 3D EXT);
glEnable(GL TEXTURE SHADER NV);
glTexEnvi(GL TEXTURE SHADER NV, GL SHADER OPERATION NV, GL TEXTURE 3D);
// texture unit 1 - dot product computation
glActiveTextureARB( GL TEXTURE1 ARB );
glEnable(GL TEXTURE SHADER NV);
glTexEnvi(GL TEXTURE SHADER NV,
GL SHADER OPERATION NV, GL DOT PRODUCT NV);
glTexEnvi(GL TEXTURE SHADER NV,
GL PREVIOUS TEXTURE INPUT NV, GL TEXTURE0 ARB);
glTexEnvi(GL TEXTURE SHADER NV,
GL RGBA UNSIGNED DOT PRODUCT MAPPING NV,
GL EXPAND NORMAL NV);
// texture unit 2 - diffuse cube map
glActiveTextureARB( GL TEXTURE2 ARB );
glEnable(GL TEXTURE SHADER NV);
glBindTexture(GL TEXTURE CUBE MAP EXT, m nDiffuseCubeMapTexName);
glTexEnvi(GL TEXTURE SHADER NV,
GL SHADER OPERATION NV, GL DOT PRODUCT DIFFUSE CUBE MAP NV);
glTexEnvi(GL TEXTURE SHADER NV,
GL PREVIOUS TEXTURE INPUT NV, GL TEXTURE0 ARB);
glTexEnvi(GL TEXTURE SHADER NV,
GL RGBA UNSIGNED DOT PRODUCT MAPPING NV,
GL EXPAND NORMAL NV);
// texture unit 3 - reflective cube map
glActiveTextureARB( GL TEXTURE3 ARB );
glEnable(GL TEXTURE CUBE MAP EXT);
glBindTexture(GL TEXTURE CUBE MAP EXT, m nReflectiveCubeMapTexName);
glTexEnvi(GL TEXTURE SHADER NV,
GL SHADER OPERATION NV, GL DOT PRODUCT REFLECT CUBE MAP NV);
glTexEnvi(GL TEXTURE SHADER NV,
GL PREVIOUS TEXTURE INPUT NV, GL TEXTURE0 ARB);
glTexEnvi(GL TEXTURE SHADER NV,
GL RGBA UNSIGNED DOT PRODUCT MAPPING NV,
GL EXPAND NORMAL NV);
#endif
67
High-Quality Volume Graphics
on Consumer PC Hardware
Classification
Klaus Engel
Course Notes 42
Markus Hadwiger
Joe M. Kniss
Christof Rezk-Salama
68
Introduction
The role of the transfer function in direct volume rendering is essential. Its job is to assign
optical properties to more abstract data values. It is these optical properties that we use
to render a meaningful image. While the process of transforming data values into optical
properties is simply implemented as a table lookup, specifying a good transfer function can
be a very difficult task. In this section we will identify and explain the optical properties
used in the traditional volume rendering pipeline, explore the use of shadows in volume
rendering, demonstrate the utility of an expanded transfer function, and discuss the process
of setting a good transfer function.
Figure 14.1: The Bonsai Tree CT. Volume shading with an extended transfer function,
described clockwise from top. Upper-left: Surface shading. Upper-right: Direct attenuation, or volume shadows. Lower-right: Direct and Indirect lighting. Lower-left: Direct and
Indirect lighting with surface shading only on the leaves.
69
Transfer Functions
Evaluating a transfer function using graphics hardware effectively amounts to an arbitrary
function evaluation of data value via a table lookup. There are two methods to accomplish
this.
The first method uses the glColorTable() to store a user defined 1D lookup table,
which encodes the transfer function. When GL COLOR TABLE is enabled, this function replaces an 8 bit texel with the RGBA components at that 8 bit value’s position in
the lookup table. Some high end graphics cards permit lookups based on 12 bit texels.
On some commodity graphics cards, such as the NVIDIA GeForce, the color table is an
extension known as paletted texture. On these platforms, the use of the color table
requires that the data texture have an internal format of GL COLOR INDEX* EXT,
where * is the number of bits of precision that the data texture will have (1,2,4,or 8). Other
platforms may require that the data texture’s internal format be GL INTENSITY8.
The second method uses dependent texture reads. A dependent texture read is
the process by which the color components from one texture are converted to texture
coordinates and used to read from a second texture. In volume rendering, the first texture
is the data texture and the second is the transfer function. The GL extensions and function
calls that enable this feature vary depending on the hardware, but the functionality is
Figure 15.1: The action of the transfer function.
70
Figure 15.2: Pre-classification (left) verses post-classification (right)
equivalent. On the GeForce3 and GeForce4, this functionality is part of the Texture
Shader extensions. On the ATI Radeon 8500, dependent texture reads are part of the
Fragment Shader extension. While dependent texture reads can be slower than using
a color table, they much more flexible. Dependent texture reads can be used to evaluate
multi-dimensional transfer functions, discussed later in this chapter, or they can be used
for pre-integrated transfer function evaluations, discussed in the next chapter. Since the
transfer function can be stored as a regular texture, dependent texture reads also permit
transfer functions which define more than four optical properties.
Why do we need a transfer function, i.e. why not store the optical properties in the
volume directly? There are at least two answers to this question. First, it is inefficient to
update the entire volume and reload it each time the transfer function changes. It is much
faster to load the smaller lookup table and let the hardware handle the transformation
from data value to optical properties. Second, evaluating the transfer function at each
sample prior to interpolation is referred to as pre-classification. Pre-classification can
cause significant artifacts in the final rendering, especially when there is a sharp peak
in the transfer function. An example of pre-classification can be seen on the left side of
Figure 15.2. A similar rendering using post-classification is seen on the right. It should be
no surprise that interpolating colors from the volume sample points does not adequately
capture the behavior of the data.
In the traditional volume rendering pipeline, the transfer function returns color (RGB)
and opacity (α). User interfaces for transfer function specification will be discussed later
in this chapter. Figure 15.3 shows an example of an arbitrary transfer function. While
this figure shows RGBα varying as piece-wise linear ramps, the transfer function can also
be created using more continuous segments. The goal in specifying a transfer function
is to isolate the ranges of data values, in the transfer function domain, that correspond
to features, in the spatial domain. Figure 15.4 shows an example transfer function that
isolates the bone in the Visible Male’s skull. On the left, we see the transfer function. The
alpha ramp is responsible for making the bone visible, whereas the color is constant for all
of the bone. The problem with this type of visualization is that the shape and structure is
71
Figure 15.3: An arbitrary transfer function showing how red, green, blue, and alpha vary
as a function of data value f(x,y,z) .
Figure 15.4: An example transfer function for the bone of the Visible Male (left), and the
resulting rendering (right).
not readily visible, as seen on the right side of Figure 15.4. One solution to this problem
involves a simple modification of the transfer function, called Faux shading. By forcing
the color to ramp to black proportionally to the alpha ramping to zero, we can effectively
create silhouette edges in the resulting volume rendering, as seen in Figure 15.5. On the left
we see the modified transfer function. In the center, we see the resulting volume rendered
image. Notice how much more clear the features are in this image. This approach works
because the darker colors are only applied at low opacities. This means that they will only
accumulate enough to be visible when a viewing ray grazes a classified feature, as seen on
the right side of Figure 15.5. While this approach may not produce images as compelling
as surface shaded or shadowed renderings as seen in Figure 15.6, it is advantageous because
it doesn’t require any extra computation in the rendering phase.
72
Figure 15.5: Faux shading. Modify the transfer function to create silhouette edges.
Figure 15.6: Surface shading.
73
Extended Transfer Function
16.1
Optical properties
The traditional volume rendering equation proposed by Levoy [24] is a simplified approximation of a more general volumetric light transport equation. This equation was first
used in computer graphics by Kajiya [19]. This equation describes the interaction of light
and matter, in the form of small particles, as a series of scattering and absorption events.
Unfortunately, solutions to this equation are difficult and very time consuming. A survey
of this problem in the context of volume rendering can be found in [27]. The optical properties required to describe the interaction of light with a material are spectral, i.e. each
wavelength of light may interact with the material differently. The most commonly used
optical properties are absorption, scattering, and phase function. Other important optical
properties are index of refraction and emission. Volume rendering models that take into
account scattering effects are complicated by the fact that each element in the volume can
potentially contribute light to each other element. This is similar to other global illumination problems in computer graphics. For this reason, the traditional volume rendering
equation ignores scattering effects and focuses on emission and absorption only. In this
section we discuss the value and application of adding additional optical properties to the
transfer function.
16.2
Traditional volume rendering
The traditional volume rendering equation is:
eye
Te (s) ∗ g(s) ∗ fs (s)ds
Ieye = IB ∗ Te (0) +
(16.1)
0
Te (s) = exp −
eye
τ (x)dx
(16.2)
s
Where IB is the background light intensity, g(s) is the emission term at sample s, fs (s)
is the Blinn-Phong surface shading model evaluated using the normalized gradient of the
scalar data field at s, and τ (x) is an achromatic extinction coefficient at the sample x. For
a concise derivation of this equation and the discrete solution used in volume rendering,
see [27]. The solution to the equation and its use in hardware volume rendering was also
presented in Chapter 2 of these course notes.
The extinction term in Equation 16.2 is achromatic, meaning that it affects all wavelengths of light equally. This term can be expanded to attenuate light spectrally. The
details of this implementation are in [31]. The implementation of this process requires
additional buffers and passes. The transfer function for spectral volume rendering only
74
using the wavelengths of light for red, green, and blue would require a separate alpha for
each of these wavelengths.
16.3
The Surface Scalar
The emission term is the material color and opacity (α), in the transfer function, is derived
from extinction (τ (x)):
(16.3)
α = e−τ (x)
Since the traditional volume rendering model includes local surface shading (fs (s)), the
emission term is misleading. This model really implies that the volume is illuminated by
an external light source and the light arrives at a sample unimpeded by the intervening
volume. In this case the emission term can be thought of as a reflective color. While surface
shading can dramatically enhance the visual quality of the rendering, it cannot adequately
light homogeneous regions. Since we use the normalized gradient of the scalar field as the
surface normal for shading, we can have problems when we try to shade regions where
the normal cannot be measured. The gradient should be zero in homogeneous regions
where there is very little or no local change in data value, making the normal undefined.
In practice, data sets contain noise which further complicates the use of the gradient as
a normal. This problem can be easily handled, however, by introducing a surface scalar
(S(s)) to the rendering equation. The role of this term is to interpolate between shaded
and unshaded rendering per sample.
eye
Te (s) ∗ C(s)ds
(16.4)
Ieye = IB ∗ Te (0) +
0
C(s) = g(s) ((1 − S(s)) + fs (s)S(s))
(16.5)
S(s) can be computed in a variety of ways. If the gradient magnitude is available at
each sample, we can use it to compute S(s). This usage implies that only regions with a
high enough gradient magnitudes should be shaded. This is reasonable since homogeneous
regions should have a very low gradient magnitude. This term loosely correlates to the
index of refraction. In practice we use:
S(s) = 1 − (1 − ∇f (s))2
(16.6)
Figure 16.1 demonstrates the use of the surface scalar (S(s)). The image on the left
is a volume rendering of the visible male with the soft tissue (a relatively homogeneous
material) surface shaded, illustrating how this region is poorly illuminated. On the right,
only samples with high gradient magnitudes are surface shaded.
16.4
Shadows
Surface shading improves the visual quality of volume renderings. However, the lighting
model is rather unrealistic since it assumes that light arrives at a sample without interacting
75
Figure 16.1: Surface shading with out (left) and with (right) the surface scalar.
with the portions of the volume between it and the light. Volumetric shadows can be added
to the equation:
eye
Ieye = IB ∗ Te (0) +
Te (s) ∗ C(s) ∗ fs (s) ∗ Il (s)ds
(16.7)
0
Il (s) = Il (0) ∗ exp −
light
τ (x)dx
(16.8)
s
Where Il (0) is the light intensity, and Il (s) is the light intensity at the sample s. Notice
that Il (s) is essentially the same as Te (s) except that the integral is computed toward the
light rather than the eye.
A hardware model for computing shadows was presented by Behrens and Ratering [4].
This model computes a second volume for storing the amount of light arriving at each
sample. The second volume is then sliced and the values at each sample are multiplied by
the colors from the original volume after the transfer function has been evaluated. This
approach, however, suffers from an artifact referred to as attenuation leakage. The visual
consequences of this are blurry shadows and surfaces which appear much darker than they
should due to the image space high frequencies introduced by the transfer function. The
attenuation at a given sample point is blurred when light intensity is stored at a coarse
resolution and interpolated during the observer rendering phase.
A simple and efficient alternative was proposed in [21]. First, rather than creating a
volumetric shadow map, an off screen render buffer is utilized to accumulate the amount of
light attenuated from the light’s point of view. Second, we modify the slice axis to be the
76
direction halfway between the view and light directions. This allows the same slice to be
rendered from both the eye and light points of view. Consider the situation for computing
shadows when the view and light directions are the same, as seen in Figure 16.2(a). Since
the slices for both the eye and light have a one to one correspondence, it is not necessary to
pre-compute a volumetric shadow map. The amount of light arriving at a particular slice
is equal to one minus the accumulated opacity of the slices rendered before it. Naturally if
the projection matrices for the eye and light differ, we need to maintain a separate buffer
for the attenuation from the light’s point of view. When the eye and light directions differ,
the volume would be sliced along each direction independently. The worst case scenario
happens when the view and light directions are perpendicular, as seen in Figure 16.2(b).
In the case, it would seem necessary to save a full volumetric shadow map which can
be re-sliced with the data volume from the eye’s point of view providing shadows. This
approach, however, suffers from an artifact referred to as attenuation leakage. The visual
consequences of this are blurry shadows and surfaces which appear much darker than they
should due to the image space high frequencies introduced by the transfer function. The
attenuation at a given sample point is blurred when light intensity is stored at a coarse
resolution and interpolated during the observer rendering phase.
Rather than slice along the vector defined by the view direction or the light direction,
we can modify the slice axis to allow the same slice to be rendered from both points of
view. When the dot product of the light and view directions is positive, we slice along the
vector halfway between the light and view directions, seen in Figure 16.2(c). In this case,
the volume is rendered in front to back order with respect to the observer. When the dot
product is negative, we slice along the vector halfway between the light and the inverted
view directions, seen in Figure 16.2(d). In this case, the volume is rendered in back to
front order with respect to the observer. In both cases the volume is rendered in front to
back order with respect to the light. Care must be taken to insure that the slice spacing
along the view and light directions are maintained when the light or eye positions change.
If the desired slice spacing along the view direction is dv and the angle between v and l is
θ then the slice spacing along the slice direction is
θ
ds = cos( )dv .
2
(16.9)
This is a multi-pass approach. Each slice is first rendered from the observers point of
view using the results of the previous pass from the light’s point of view, which modulates
the brightness of samples in the current slice. The same slice is then rendered from light’s
point of view to calculate the intensity of the light arriving at the next layer.
Since we must keep track of the amount of light attenuated at each slice, we utilize an off
screen render buffer, known as a pixel buffer. This buffer is initialized to 1−light intensity.
It can also be initialized using an arbitrary image to create effects such as spotlights. The
projection matrix for the light’s point of view need not be orthographic; a perspective
projection matrix can be used for point light sources. However, the entire volume must fit
in the light’s view frustum. Light is attenuated by simply accumulating the opacity for each
sample using the over operator. The results are then copied to a texture which is multiplied
77
l
l
v
v
(a)
(b)
s θ
−1
2
v
l
l
θ
v
(c)
s
-v
(d)
Figure 16.2: Modified slice axis for light transport.
with the next slice from the eye’s point of view before it is blended into the frame buffer.
While this copy to texture operation has been highly optimized on the current generation of
graphics hardware, we have achieved a dramatic increase in performance using a hardware
extension known as render to texture. This extension allows us to directly bind a pixel
buffer as a texture, avoiding the unnecessary copy operation. The two pass process is
illustrated in Figure 16.3.
16.5
Translucency
Shadows can add valuable depth queue and dramatic effects to a volume rendered scene.
Even if the technique for rendering shadows can avoid attenuation leakage, the images can
still appear too dark. This is not an artifact, it is an accurate rendering of materials which
only absorb light and do not scatter it. As noted at the beginning of this chapter, volume
rendering models that account for scattering effects are too computationally expensive
for interactive hardware based approaches. This means that approximations are needed
78
Figure 16.3: 2 pass shadows. Step 1 (left) render a slice for the eye, multiplying it by
the attenuation in the light buffer. Step 2 (right) render the slice into the light buffer to
update the attenuation for the next pass.
(a) Wax
(b) Translucent rendering
(c) Different reflective color
(d) Just shadows
Figure 16.4: Translucent volume shading. (a) is a photograph of wax block illuminated
from above with a focused flashlight. (b) is a volume rendering with a white reflective
color and a desaturated orange transport color (1− indirect attenuation). (c) has a bright
blue reflective color and the same transport color as the upper right image. (d) shows the
effect of light transport that only takes into account direct attenuation.
to capture some of the effects of scattering. One such visual consequence of scattering
in volumes is translucency. Translucency is the effect of light propagating deep into a
material even though object occluded by it cannot be clearly distinguished. Figure 16.4(a)
shows a common translucent object, wax. Other translucent objects are skin, smoke, and
clouds. Several simplified optical models for hardware based rendering of clouds have been
proposed [17, 9]. These models are capable of producing realistic images of clouds, but do
not easily extend to general volume rendering applications.
The previously presented model for computing shadows can easily be extended to
achieve the effect of translucency. Two modifications are required. First, we require a
second alpha value (αi ) which represents the amount of indirect attenuation. This value
should be less than or equal to the alpha value for the direct attenuation. Second, we
require an additional light buffer for blurring the indirect attenuation. The the translucent
79
Id
Id
s
Ii
Ii
θ
s
(a) General Light Transport
(b) Translucency Approximation
Figure 16.5: On the left is the general case of direct illumination Id and scattered indirect
illumination Ii . On the right is a translucent shading model which includes the direct
illumination Id and approximates the indirect, Ii , by blurring within the shaded region.
Theta is the angle indicated by the shaded region.
volume rendering model is:
Ieye = I0 ∗ Te (0) +
eye
Te (s) ∗ C(s) ∗ Il (s)ds
(16.10)
0
light
Il (s) = Il (0) ∗ exp −
τ (x)dx +
s
light
τi (x)dx Blur(θ)
Il (0) ∗ exp −
(16.11)
s
Where τi (s) is the indirect light extinction term, C(s) is the reflective color at the sample
s, S(s) is a surface shading parameter, and Il is the sum of the direct light and the indirect
light contributions.
The indirect extinction term is spectral, meaning that it describes the indirect attenuation of light for each of the R, G, and B color components. Similar to the direct extinction,
the indirect attenuation can be specified in terms of an indirect alpha:
αi = exp(−τi (x))
(16.12)
While this is useful for computing the attenuation, we have found it non-intuitive for user
specification. We prefer to specify a transport color which is 1 − αi since this is the color
the indirect light will become as it is attenuated by the material.
In general, light transport in participating media must take into account the incoming
light from all directions, as seen in Figure 16.5(a). However, the net effect of multiple
80
scattering in volumes is a blurring of light. The diffusion approximation[40, 13] models
the light transport in multiple scattering media as a random walk. This results in the
light being diffused within the volume. The Blur(θ) operation in Equation 16.11 averages the incoming light within the cone with an apex angle θ in the direction of the light
(Figure 16.5(b)). The indirect lighting at a particular sample is only dependent on a local
neighborhood of samples computed in the previous iteration and shown as the arrows between slices. This operation models light diffusion by convolving several random sampling
points with a Gaussian filter.
The process of rendering using translucency is essentially the same as rendering shadows. In the first pass, as slice is rendered from the point of view of the light. However,
rather than simply multiply the samples color by one minus the direct attenuation, we sum
one minus the direct and one minus the indirect attenuation to compute the light intensity
at the sample. In the second pass, a slice is rendered into the next light buffer from the
light’s point of view to compute the lighting for the next iteration. Two light buffers are
maintained to accommodate the blur operation required for the indirect attenuation, next
is the one being rendered to and current is the one bound as a texture. Rather than
blend slices using a standard OpenGl blend operation, we explicitly compute the blend in
the fragment shading stage. The current light buffer is sampled once in the first pass,
for the observer, and multiple times in the second pass, for the light, using the render to
texture OpenGL extension. Whereas, the next light buffer, is rendered to only in the second pass. This relationship changes after the second pass so that the next buffer becomes
the current and vice versa. We call this approach ping pong blending. In the fragment
shading stage, the texture coordinates for the current light buffer, in all but one texture
unit, are modified per-pixel using a random noise texture as discussed in the last chapter of
these course notes. The number of samples used for the computation of the indirect light
is limited by the number of texture units. Currently, we use four samples. Randomizing
the sample offsets masks some artifacts caused by this coarse sampling. The amount of
this offset is bounded based on a user defined blur angle (θ) and the sample distance (d):
θ
(16.13)
of f set ≤ d tan( )
2
The current light buffer is then read using the new texture coordinates. These values
are weighted and summed to compute the blurred inward flux at the sample. The transfer
function is evaluated for the incoming slice data to obtain the indirect attenuation (αi ) and
direct attenuation (α) values for the current slice. The blurred inward flux is attenuated
using αi and written to the RGB components of the next light buffer. The alpha value
from the current light buffer with the unmodified texture coordinates is blended with the
α value from the transfer function to compute the direct attenuation and stored in the
alpha component of the next light buffer.
This process is enumerated below:
1. Clear color buffer
2. Initialize pixel buffer with 1-light color (or light map)
81
3. Set slice direction to the halfway between light and observer view directions.
4. For each slice:
(a) Determine the locations of slice vertices in the light buffer.
(b) Convert these light buffer vertex positions to texture coordinates
(c) Bind the light buffer as a texture using these texture coordinates
(d) In the Per-fragment blend stage:
i. Evaluate the transfer function for the Reflective color and direct attenuation
ii. Evaluate surface shading model if desired, this replaces the Reflective color
iii. Evaluate the phase function, lookup using the dot of the view and light
directions
iv. Multiply the reflective color by the 1-direct attenuation from the light buffer.
v. Multiply the reflective*direct color by the phase function.
vi. Multiply the Reflective color by 1-(indirect) from the light buffer.
vii. Sum the direct*reflective*phase and indirect*reflective to get the final sample color
viii. The alpha value is the direct attenuation from the transfer function
(e) Render and blend the slice into the frame buffer for the observers point of view
(f) Render slice from the lights point of view: render slice to the position in the
light buffer used in for the observer slice
(g) In the Per-fragment blend stage:
i. Evaluate the transfer function for the direct and indirect attenuation
ii. Sample the light buffer at multiple locations
iii. Weight and sum the samples to compute the blurred indirect attenuation,
weight is the blur kernel and the indirect attenuation at that sample.
iv. Blend the blurred indirect and un-blurred direct attenuation with the values
from the transfer function
(h) Render the slice into the correct light buffer
While this process my seem quite complicated, it is straight forward to implement.
The render to texture extension is part of the WGL ARB render texture OpenGl extensions. The key functions are wglBindTexImageARB() which binds a P-Buffer as a
texture, and wglReleaseTexImageARB() which releases a bound P-Buffer so that it
may be rendered to again. The texture coordinates of a slice’s light intensities from a light
buffer are the 2D positions that the slice’s vertices project to in the light buffer scaled and
biased so that they are in the range zero to one.
Computing volumetric light transport in screen space is advantageous because the resolution of these calculations and the resolution of the volume rendering can match. This
82
means that the resolution of the light transport is decoupled from that of the data volume’s
grid, permitting procedural volumetric texturing, which will be described in the following
chapters.
16.6
Summary
Rendering and shading techniques are important for volume graphics, but they would
not be useful unless we had a way to transform interpolated data into optical properties.
While the traditional volume rendering model only takes into account a few basic optical
properties, it is important to consider additional optical properties. Even if these optical
properties imply a much more complicated rendering model than is possible with current
rendering techniques, adequate approximations can be developed which add considerably
to the visual quality. We anticipate that the development of multiple scattering volume
shading models will be an active area of research in the future.
In the next chapter we discuss techniques for specifying a good transfer function.
83
(a) Carp CT
(b) Stanford Bunny
(c) Joseph the Convicted
Figure 16.6: Example volume renderings using an extended transfer function.
84
Transfer Functions
Now that we have identified the basic and more exotic optical properties that describe
the visual appearance of a material, we need a way of specifying a transfer function. User
interfaces for transfer function specification should fulfill some basic requirements. It should
have mechanisms that guide the user toward setting a good transfer function. It should
also be expressive, in that it should permit materials to be identified precisely. Finally, it
needs to be interactive, since in the end there may not be an automatic method suitable
for specifying a desired transfer function.
17.1
Multi-dimensional Transfer Functions
A single scalar data value need not be the only quantity used to identify the difference
between materials in a transfer function. Levoy’s volume rendering model includes a 2D
transfer function. This model allows each sample to contain multiple values. These values
are the axes of a multi-dimensional transfer function. Multi-dimensional transfer functions
are implemented using dependent texture reads. If there are two values available per-datasample, the transfer function should be 2D and is stored on the graphics card using a
2D texture. See [21] for examples of multi-dimensional transfer functions applied to both
scalar data with derivative measurements and multivariate data. Adding the gradient
magnitude of a scalar dataset to the transfer function can improve our ability to isolate
material boundaries and the materials themselves. Figures 17.1(c) and 17.1)(d) show how
this kind of 2D transfer function can help isolate the leaf material from the bark material
of the Bonsai Tree CT dataset.
A naive transfer function editor may simply give the user access to all of the optical
Figure 17.1: 1D (a and c) verses 2D (b and d) transfer functions.
85
Figure 17.2: The Design Gallery transfer function interface.
properties directly as a series of control points that define piece-wise linear (or higher order)
ramps. This can be seen in Figure 15.3. This approach can make specifying a transfer
function a tedious trial and error process. Naturally, adding dimensions to the transfer
function can further complicate a user interface.
17.2
Guidance
The effectiveness of a transfer function editor can be enhanced with features that guide the
user with data specific information. He et al. [18] generated transfer functions with genetic
algorithms driven either by user selection of thumbnail renderings, or some objective image
fitness function. The purpose of this interface is suggest an appropriate transfer function to
the user based how well the user feels the rendered images capture the important features.
The Design Gallery [26] creates an intuitive interface to the entire space of all possible transfer functions based on automated analysis and layout of rendered images. This
approach basically parameterizes the space of all possible transfer functions and stochastically samples it, renders the volume, and groups the images based on similarity. While
this can be a time consuming process, it is fully automated. Figure 17.2 shows an example
of this user interface.
A more data-centric approach is the Contour Spectrum [3], which visually summarizes
the space of isosurfaces in terms of metrics like surface area and mean gradient magnitude,
thereby guiding the choice of isovalue for isosurfacing, but also providing information useful
for transfer function generation. Another recent paper [1] presents a novel transfer function
86
Figure 17.3: A thumbnail transfer function interface.
interface in which small thumbnail renderings are arranged according to their relationship
with the spaces of data values, color, and opacity. This kind of editor can be seen in
Figure 17.2.
One of the most simple and effective features that a transfer function interface can
include is a histogram. A histogram shows a user the behavior of data values in the
transfer function domain. In time a user can learn to read the histogram and quickly
identify features. Figure 17.4(b) shows a 2D joint histogram of the Chapel Hill CT dataset.
Notice the arches, they identify material boundaries, the dark blobs at the bottom identify
the materials themselves.
Volume probing can also help the user identify features. This approach gives the user
a mechanism for pointing at a feature in the spatial domain. The values at this point are
then presented graphically in the transfer function interface, indicating to the user which
ranges of data values identify the feature. This approach can be tied to a mechanism
that automatically sets the transfer function based on the data values at the being feature
pointed at. This technique is called dual-domain interaction [21]. The action of this process
can be seen in Figure 17.5.
17.3
Classification
It is often helpful to identify discrete regions in transfer function domain that correspond
to individual features. Figure 17.6 shows an integrated 2D transfer function interface.
87
F
A
D
C
B
A
(a) A 1D histogram. The black region represents the number of data value occurrences on
a linear scale, the grey is on a log scale. The
colored regions (A,B,C) identify basic materials.
f '
E
B
Data Value
C
(b) A log-scale 2D joint histogram. The
lower image shows the location of materials
(A,B,C), and material boundaries (D,E,F).
D
F
C
B
E
(c) A volume rendering showing all of the materials
and boundaries identified above, except air (A),
using a 2D transfer function.
Figure 17.4: Material and boundary identification of the Chapel Hill CT Head with data
value alone (a) versus data value and gradient magnitude (f ’), seen in (b). The basic
materials captured by CT, air (A), soft tissue (B), and bone (C) can be identified using
a 1D transfer function as seen in (a). 1D transfer functions, however, cannot capture the
complex combinations of material boundaries; air and soft tissue boundary (D), soft tissue
and bone boundary (E), and air and bone boundary (F) as seen in (b) and (c).
88
Figure 17.5: Probing and dual-domain interaction.
Figure 17.6: Classification widgets
This type of interface constructs a transfer function using direct manipulation widgets.
Classified regions are modified by manipulating control points. These control points change
high level parameters such as position, width, and optical properties. The widgets define
a specific type of classification function such as a Gaussian ellipsoid, inverted triangle, or
linear ramp. This approach is advantageous because it frees up the user to focus more
on feature identification and less on the shape of the classification function. We have also
found it useful to allow the user the ability to paint directly into the transfer function
domain.
In all, our experience has shown that the best transfer functions are specified using an
iterative process. When a volume is first encountered, it is important to get an immediate
sense of the structures contained in the data. In many cases, a default transfer function
89
Figure 17.7: The “default” transfer function.
can achieve this. By assigning higher opacity to higher gradient magnitudes and varying
color based on data value, as seen in Figure 17.7, most of the important features of the
datasets are visualized. The process of probing allows the user to identify the location of
data values in the transfer function domain that correspond to these features. Dual-domain
interaction allow the user to set the transfer function by simply pointing at a feature. By
having simple control points on discrete classification widgets the user can manipulate the
transfer function directly to expose a feature the best that they can. By iterating through
this process of exploration, specification, and refinement, a user can efficiently specify a
transfer function that produces a high quality visualization.
High-Quality Volume Graphics
on Consumer PC Hardware
Advanced Techniques
Klaus Engel
Course Notes 42
Markus Hadwiger
Joe M. Kniss
Christof Rezk-Salama
91
Hardware-Accelerated
High-Quality Filtering
An important step in volume rendering is the reconstruction of the original signal from the
sampled volume data (section 2.2). This step involves the convolution of the sampled signal
with a reconstruction kernel. Unfortunately, current graphics hardware only supports
linear filters, which do not provide sufficient quality for a high-quality reconstruction of
the original signal. Although higher-order filters are able to achieve much better quality
than linear interpolation, they are usually only used for filtering in software algorithms.
However, by exploiting the features of programmable consumer graphics hardware,
high-quality filtering with filter kernels of higher order than linear interpolation can be
done in real-time, although the hardware itself does not support such filtering operations
natively [15, 16]. This chapter gives an overview of hardware-accelerated high-quality
filtering. Examples for high-quality filters that achieve a good trade-off between speed and
quality can be seen in figure 18.2.
Basically, input textures are filtered by convolving them with an arbitrary filter kernel,
which itself is stored in several texture maps. Since the filter function is represented
by an array of sampled values, the basic algorithm works irrespective of the shape of this
function. However, kernel properties such as separability and symmetry can be exploited to
gain higher performance. The basic algorithm is also independent from the dimensionality
of input textures and filter kernels. Thus, in the context of volume rendering, it can be used
in conjunction with all kinds of proxy geometry, regardless of whether 2D or 3D textures
are used.
18.1
Basic principle
In order to be able to employ arbitrary filter kernels for reconstruction, we have to evaluate
the well-known filter convolution sum:
x+m
g(x) = (f ∗ h)(x) =
f [i]h(x − i)
(18.1)
i=x−m+1
This equation describes a convolution of the discrete input samples f [x] with a continuous
reconstruction filter h(x). In the case of reconstruction, this is essentially a sliding average
of the samples and the reconstruction filter. In equation 18.1, the (finite) half-width of the
filter kernel is denoted by m.
In order to be able to exploit standard graphics hardware for performing this computation, we do not use the evaluation order commonly employed, i.e., in software-based
filtering. The convolution sum is usually evaluated in its entirety for a single output sample
92
Figure 18.1: Using a high-quality reconstruction filter for volume rendering. This image
compares bi-linear interpolation of object-aligned slices (A) with bi-cubic filtering using a
B-spline filter kernel (B).
at a time. That is, all the contributions of neighboring input samples (their values multiplied by the corresponding filter values) are gathered and added up in order to calculate
the final value of a certain output sample. This “gathering” of contributions is shown in
figure 18.3(a). This figure uses a simple tent filter as an example. It shows how a single output sample is calculated by adding up two contributions. The first contribution is
gathered from the neighboring input sample on the left-hand side, and the second one is
gathered from the input sample on the right-hand side. For generating the desired output
data in its entirety, this is done for all corresponding resampling points (output sample
locations).
In the case of this example, the convolution results in linear interpolation, due to the
93
1.2
1.2
Cubic B-spline
Catmull-Rom spline
Blackman windowed sinc
Blackman window (width = 4)
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
(a)
(b)
Figure 18.2: Example filter kernels of width four: (a) Cubic B-spline and Catmull-Rom
spline; (b) Blackman windowed sinc, depicting also the window itself.
tent filter employed. However, we are only using a tent filter for simplifying the explanation.
In practice, kernels of arbitrary width and shape can be used, and using a tent filter for
hardware-accelerated high-quality filtering would not make much sense.
In contrast to the evaluation order just outlined, hardware-accelerated high-quality
filtering uses a different order. Instead of focusing on a single output sample at any one
time, it calculates the contribution of a single input sample to all corresponding output
sample locations (resampling points) first. That is, the contribution of an input sample
is distributed to its neighboring output samples, instead of the other way around. This
“distribution” of contributions is shown in figure 18.3(b). In this case, the final value
input samples
output sample
input samples
filter kernel
+
+
x
x
resampling points
resampling point
Figure 18.3: Gathering vs. distribution of input sample contributions (tent filter): (a)
Gathering all contributions to a single output sample (b) Distributing a single input sample’s contribution.
94
input samples
input samples
resampling points
resampling points
(a)
(b)
Figure 18.4: Distributing the contributions of all “left-hand” (a), and all “right-hand” (b)
neighbors, when using a tent filter as a simple example for the algorithm.
of a single output sample is only available when all corresponding contributions of input
samples have been distributed to it.
The convolution sum is evaluated in this particular order, since the distribution of the
contributions of a single relative input sample can be done in hardware for all output samples (pixels) simultaneously. The final result is gradually built up over multiple rendering
passes. The term relative input sample location denotes a relative offset of an input sample
with respect to a set of output samples.
In the example of a one-dimensional tent filter (like in figure 18.3(a)), there are two
relative input sample locations. One could be called the “left-hand neighbor,” the other
the “right-hand neighbor.” In the first pass, the contribution of all respective left-hand
neighbors is calculated. The second pass then adds the contribution of all right-hand
neighbors. Note that the number of passes depends on the filter kernel used, see below.
Thus, the same part of the filter convolution sum is added to the previous result for
each pixel at the same time, yielding the final result after all parts have been added up.
From this point of view, the graph in figure 18.3(b) depicts both rendering passes that are
necessary for reconstruction with a one-dimensional tent filter, but only with respect to the
contribution of a single input sample. The contributions distributed simultaneously in a
single pass are depicted in figure 18.4, respectively. In the first pass, the contributions of all
relative left-hand neighbors are distributed. Consequently, the second pass distributes the
contributions of all relative right-hand neighbors. Adding up the distributed contributions
of these two passes yields the final result for all resampling points (i.e., linearly interpolated
output values in this example).
Figure 18.5 shows this from a more hardware-oriented perspective. We call a segment
of the filter kernel from one integer location to the next a filter tile. Naturally, such a tile
has length one. Thus, a one-dimensional tent filter has two filter tiles, corresponding to
95
shifted input samples (texture 0)
(nearest−neighbor interpolation)
x
filter tile (texture 1)
pass 1
0
1
2
3
mirrored
+
pass 2
1
2
3
4
output samples
Figure 18.5: Tent filter (width two) used for reconstruction of a one-dimensional function in
two passes. Imagine the values of the output samples added together from top to bottom.
the fact that it has width two.
Each pass uses two simultaneous textures, one texture unit point-sampling the original
input texture, and the second unit using the current filter tile texture. These two textures
are superimposed, multiplied, and added to the frame buffer. In this way, the contribution
of a single specific filter tile to all output samples is calculated in a single rendering pass.
The input samples used in a single pass correspond to a specific relative input sample
location or offset with regard to the output sample locations. That is, in one pass the
input samples with relative offset zero are used for all output samples, then the samples
with offset one in the next pass, and so on. The number of passes necessary is equal to the
number of filter tiles the filter kernel used consists of.
Note that the subdivision of the filter kernel into its tiles is crucial to hardwareaccelerated high-quality filtering, and necessary in order to attain a correct mapping between locations in the input data and the filter kernel, and to achieve a consistent evaluation
order of passes everywhere.
A convolution sum can be evaluated in this way, since it needs only two basic inputs:
the input samples, and the filter kernel. Because we change only the order of summation
but leave the multiplication untouched, we need these two available at the same time.
Therefore, we employ multi-texturing with (at least) two textures and retrieve input samples from the first texture, and filter kernel values from the second texture. Actually, due
to the fact that only a single filter tile is needed during a single rendering pass, all tiles
are stored and downloaded to the graphics hardware as separate textures. The required
96
replication of tiles over the output sample grid is easily achieved by configuring the hardware to automatically extend the texture domain beyond [0, 1] by simply repeating the
texture via a clamp mode of GL REPEAT. In order to fetch input samples in unmodified
form, nearest-neighbor interpolation has to be used for the input texture. If a given hardware architecture is able to support 2n textures at the same time, the number of passes
can be reduced by n. That is, with two-texture multi-texturing four passes are needed for
filtering with a cubic kernel in one dimension, whereas with four-texture multi-texturing
only two passes are needed, etc.
This algorithm is not limited to symmetric filter kernels, although symmetry can be
exploited in order to save texture memory for the filter tile textures. It is also not limited to
separable filter kernels, although exploiting separability can greatly enhance performance.
Additionally, the algorithm is identical for orthogonal and perspective projections of
the resulting images. Basically, it reconstructs at single locations in texture space, which
can be viewed as happening before projection. Thus, it is independent from the projection
used.
Note that the method outlined above is not considering area-averaging filters, since we
are assuming that magnification is desired instead of minification. This is in the vein of
graphics hardware using bi-linear interpolation for magnification, and other approaches,
usually mip-mapping, to deal with minification.
18.2
Reconstructing Object-Aligned Slices
When enlarging images or reconstructing object-aligned slices through volumetric data
taken directly from a stack of such slices, high-order two-dimensional filters can be used in
order to achieve high-quality results.
The basic algorithm outlined in the previous section for one dimension can easily be
applied in two dimensions, exploiting two-texture multi-texturing hardware in multiple
rendering passes. For each output pixel and pass, the algorithm takes two inputs: Unmodified (i.e., unfiltered) slice values, and filter kernel values. That is, two 2D textures are
used simultaneously. One texture contains the entire source slice, and the other texture
contains the filter tile needed in the current pass, which in this case is a two-dimensional
unit square.
In addition to using the appropriate filter tile, in each pass an appropriate offset has
to be applied to the texture coordinates of the texture containing the input image. As
explained in the previous section, each pass corresponds to a specific relative location of
an input sample. Thus, the slice texture coordinates have to be offset and scaled in order
to match the point-sampled input image grid with the grid of replicated filter tiles. In the
case of a cubic filter kernel for bi-cubic filtering, sixteen passes need to be performed on
two-texture multi-texturing hardware. However, the number of rendering passes actually
needed can be reduced through several optimizations [16].
97
18.3
Reconstructing View-Aligned Slices
When planar slices through 3D volumetric data are allowed to be located and oriented
arbitrarily, three-dimensional filtering has to be performed although the result is still twodimensional. On graphics hardware, this is usually done by tri-linearly interpolating within
a 3D texture. Hardware-accelerated high-quality filtering can also be applied in this case
in order to improve reconstruction quality considerably.
The conceptually straightforward extension of the 2D approach described in the previous section (simultaneously using two 2D textures) achieves the equivalent for threedimensional reconstruction by simultaneously using two 3D textures. The first 3D texture
contains the input volume in its entirety, whereas the second 3D texture contains the
current filter tile, which in this case is a three-dimensional unit cube.
In the case of a cubic filter kernel for tri-cubic filtering, 64 passes need to be performed
on two-texture multi-texturing hardware. If such a kernel is symmetric, downloading eight
3D textures for the filter tiles suffices, generating the remaining 56 without any performance
loss by mirroring texture coordinates. Due to the high memory consumption of 3D textures,
it is especially important that the filter kernel need not be downloaded to the graphics
hardware in its entirety if it is symmetric.
If the filter kernel is separable, which fortunately many filters used for reconstruction
purposes are, no 3D textures are required for storing the kernel [16]. If the kernel is both
symmetric and separable, tri-cubic filtering can be achieved with just two 1D filter tile
textures, each of which usually containing between 64 and 128 samples!
18.4
Volume Rendering
Since hardware-accelerated high-quality filtering is able to reconstruct axis-aligned slices,
as well as arbitrarily oriented slices, it can naturally be used for rendering all kinds of proxy
geometry for direct volume rendering. Figure 18.1 shows an example of volume rendering
with high-quality filtered slices.
The algorithm can also be used to reconstruct gradients in high quality, in addition to
reconstructing density values. This is possible in combination with hardware-accelerated
methods that store gradients in the RGB components of a texture [41].
98
Pre-Integrated Classification
High accuracy in direct volume rendering is usually achieved by very high sampling
rates, because the discrete approximation of the volume rendering integral will converge
to the correct result for a small slice-to-slice distance d → 0, i.e., for high sampling rates
n/D = 1/d. However, high sampling rates result in heavy performance losses, i.e. as rasterization requirements of the graphics hardware increase, the frame rates drop respectively.
According to the sampling theorem, a correct reconstruction is only possible with sampling
rates larger than the Nyquist frequency. Before the data is rendered, the scalar values from
the volume are mapped to RGBA values. This classification step is achieved by introducing transfer functions for color densities c̃(s) and extinction densities τ (s), which map
scalar values s = s(x) to colors and extinction coefficients. However, non-linear features
of transfer functions may considerably increase the sampling rate required for a correct
evaluation
of
the
volume
rendering
integral
as
the
Nyquist
frequency
of
the
fields
c̃
s(x)
and τ s(x) for the sampling along the viewing ray is approximately the product of the
Nyquist frequencies of the scalar field s(x) and the maximum of the Nyquist frequencies of
the two transfer functions c̃(s) and τ (s). Therefore, it is by no means sufficient to sample
a volume with the Nyquist frequency of the scalar field if non-linear transfer functions are
allowed. Artifacts resulting from this kind of undersampling are frequently observed unless
they are avoided by very smooth transfer functions.
In order to overcome the limitations discussed above, the approximation of the volume
rendering integral has to be improved. In fact, many improvements have been proposed,
e.g., higher-order integration schemes, adaptive sampling, etc. However,
these
methods
do not explicitly address the problem of high Nyquist frequencies of c̃ s(x) and τ s(x)
resulting from non-linear transfer functions. On the other hand, the goal of pre-integrated
classification is to split the numerical integration into two integrations: one for the continuous scalar field s(x) and one for the transfer functions c̃(s) and τ (s) in order to avoid the
problematic product of Nyquist frequencies.
The first step is the sampling of the continuous scalar field s(x) along a viewing ray.
Note that the Nyquist frequency for this sampling is not affected by the transfer functions. For the purpose of pre-integrated classification, the sampled values define a onedimensional, piecewise linear scalar field. The volume rendering integral for this piecewise
linear scalar field is efficiently computed by one table lookup for each linear segment. The
three arguments
of the table lookup are the scalar value at the start (front)
of the segment
sf := s x(id) , the scalar value the end (back) of the segment sb := s x((i + 1)d) , and
the length of the segment d. (See Figure 19.1.) More precisely spoken, the opacity αi of
99
sxΛ
sb sxi 1 d
s f sxi d
i d i 1 d
Λ
d
xi d xi 1 d
xΛ
Figure 19.1: Scheme of the parameters determining the color and opacity of the i-th ray
segment.
the i-th segment is approximated by
αi = 1 − exp −
≈ 1 − exp −
(i+1)d
τ s x(λ) dλ
id
1
τ (1 − ω)sf + ωsb d dω .
(19.1)
0
Thus, αi is a function of sf , sb , and d. (Or of sf and sb , if the lengths of the segments are
equal.) The (associated) colors C̃i are approximated correspondingly:
1
C̃i ≈
c̃ (1 − ω)sf + ωsb
0
ω (19.2)
× exp −
τ (1 − ω )sf + ω sb d dω d dω.
0
Analogously to αi , C̃i is a function of sf , sb , and d. Thus, pre-integrated classification will
approximate the volume rendering integral by evaluating the following Equation:
I≈
n
i=0
i−1
C̃i (1 − αj )
j=0
with colors C̃i pre-computed according to Equation (19.2) and opacities αi pre-computed
according to Equation (19.1). For non-associated color transfer function, i.e., when substituting c̃(s) by τ (s)c(s), we will also employ Equation (19.1) for the approximation of αi
and the following approximation of the associated color C̃iτ :
1
τ
C̃i ≈
τ (1 − ω)sf + ωsb c (1 − ω)sf + ωsb
0
ω (19.3)
τ (1 − ω )sf + ω sb d dω d dω.
× exp −
0
100
Note that pre-integrated classification always computes associated colors, whether a transfer function for associated colors c̃(s) or for non-associated colors c(s) is employed.
In either case, pre-integrated classification allows us to sample a continuous scalar field
s(x) without the need to increase the sampling rate for any non-linear transfer function.
Therefore, pre-integrated classification has the potential to improve the accuracy (less
undersampling) and the performance (fewer samples) of a volume renderer at the same
time.
19.1
Accelerated (Approximative) Pre-Integration
The primary drawback of pre-integrated classification in general is actually the preintegration required to compute the lookup tables, which map the three integration parameters (scalar value at the front sf , scalar value at the back sb , and length of the segment
d) to pre-integrated colors C̃ = C̃(sf , sb , d) and opacities α = α(sf , sb , d). As these tables
depend on the transfer functions, any modification of the transfer functions requires an
update of the lookup tables. This might be no concern for games and entertainment applications, but it strongly limits the interactivity of applications in the domain of scientific
volume visualization, which often depend on user-specified transfer functions. Therefore,
we will suggest three methods to accelerate the pre-integration step.
Firstly, under some circumstances it is possible to reduce the dimensionality of the
tables from three to two (only sf and sb ) by assuming a constant length of the segments.
Obviously, this applies to ray-casting with equidistant samples. It also applies to 3D
texture-based volume visualization with orthographic projection and is a good approximation for most perspective projections. It is less appropriate for axes-aligned 2D texturebased volume rendering. Even if very different lengths occur, the complicated dependency
on the segment length might be approximated by a linear dependency as suggested in [37];
thus, the lookup tables may be calculated for a single segment length.
Secondly, a local modification of the transfer functions for a particular scalar value s
does not require to update the whole lookup table. In fact, only the values C̃(sf , sb , d) and
α(sf , sb , d) with sf ≤ s ≤ sb or sf ≥ s ≥ sb have to be recomputed; i.e., in the worst case
about half of the lookup table has to be recomputed.
Finally, the pre-integration may be greatly accelerated by evaluating the integrals in
Equations (19.1), (19.2), and (19.3) by employing integral functions for τ (s), c̃(s), and
τ (s)c(s), respectively. More specifically, Equation (19.1) for αi = α(sf , sb , d) can be rewritten as
d T (sb ) − T (sf )
α(sf , sb , d) ≈ 1 − exp −
sb − sf
(19.4)
s
with the integral function T (s) := 0 τ (s)ds, which is easily computed in practice as the
scalar values s are usually quantized.
101
Equation (19.2) for C̃i = C̃(sf , sb , d) may be approximated analogously:
C̃(sf , sb , d) ≈
d K(sb ) − K(sf )
sb − sf
(19.5)
s
with the integral function K(s) := 0 c̃(s)ds. However, this requires to neglect the attenuation within a ray segment. As mentioned above, this is a common approximation for
post-classified volume rendering and well justified for small products τ (s)d.
For the non-associated color transfer function c(s) we approximate Equation (19.3) by
C̃ τ (sf , sb , d) ≈
d τ
K (sb ) − K τ (sf ) .
sb − sf
(19.6)
s
with K τ (s) := 0 τ (s)c(s)ds.
Thus, instead of numerically computing the integrals in Equations (19.1), (19.2), and
(19.3) for each combination of sf , sb , and d, we will only once compute the integral functions T (s), K(s), or K τ (s) and employ these to evaluate colors and opacities according to
Equations (19.4), (19.5), or (19.6) without any further integration.
102
Texture-based Pre-Integrated Volume
Rendering
Based on the description of pre-integrated classification in Section 19, we will now present
a novel texture-based algorithms that implements pre-integrated classification. It employs dependent textures, i.e., relies on the possibility to convert fragment (or pixel) colors into texture coordinates. In contrast to paletted textures, dependent texture allow
post-classification shading using a 1-dimensional lookup texture. However, we will use
dependent texture to lookup pre-integrated ray-segment values.
The volume texture maps (either three-dimensional or two-dimensional textures) contain the scalar values of the volume, just as for post-classification. As each pair of adjacent
slices (either view-aligned or object-aligned) corresponds to one slab of the volume (see
Figure 20.1), the texture maps of two adjacent slices have to be mapped onto one slice
(either the front or the back slice) by means of multiple textures (see Section 20.1). Thus,
the scalar values of both slices (front and back) are fetched from texture maps during the
rasterization of the polygon for one slab (see Section 20.2). These two scalar values are
required for a third texture fetch operation, which performs the lookup of pre-integrated
colors and opacities from a two-dimensional texture map. This texture fetch depends on
previously fetched texels; therefore, this third texture map is called a dependent texture
map.
sf
front slice
sb
back slice
Figure 20.1: A slab of the volume between two slices. The scalar value on the front (back)
slice for a particular viewing ray is called sf (sb ).
103
The opacities of this dependent texture map are calculated according to Equation (19.1), while the colors are computed according to Equation (19.2) if the transfer
function specifies associated colors c̃(s), and Equation (19.3) if it specifies non-associated
colors c(s). In either case a back-to-front compositing algorithm is used for blending the
ray segments into the framebuffer.
Obviously, a hardware implementation of these algorithms depends on rather complicated texture fetch operations. Fortunately, the OpenGL texture shader extension recently
proposed can in fact be customized to implement these algorithms. The details of this implementation are discussed in the following section.
Our current implementation is based on NVidia’s GeForce3 graphics chip. NVidia
introduced a flexible multi-texturing unit in their GeForce2 graphics processor via the
register combiners OpenGL extension [33]. This unit allows the programming of per-pixel
shading operations using three stages, two general and one final combiner stage. This
register combiner extension is located behind the texel fetch unit in the rendering pipeline.
Recently NVidia extended the register combiners in the GeForce3 graphics chip, by providing eight general and one final combiner stage with per-combiner constants via the register
combiner2 extension. Additionally, the GeForce3 provides a programmable texture fetch
unit [33] allowing four texture fetch operations via 21 possible commands, among them
several dependent texture operations. This so called texture shader OpenGL extension
and the register combiners are merged together in Microsoft’s DirectX8 API to form the
pixel shader API. The texture shader extension refers to 2D textures only. NVidia proposed an equivalent extension for 3D texture fetches via the texture shader2 extension.
The equivalent functionality is also available on ATI’s R200 graphics processor. ATI proposed the fragment shader OpenGL extension, that combines texture fetch operations and
per-fragment calculations in a single API.
The pre-integrated volume rendering algorithm consists of three basic steps: First two
adjacent texture slices are projected onto one of them, either the back slice onto the front
slice or vice versa. Thereby, two texels along each ray (one from the front and one from
the back slice) are projected onto each other. They are fetched using the texture shader
extension and then used as texture coordinates for a dependent texture fetch containing preintegrated values for each combination of back and front texels. For isosurface rendering,
the dependent texture contains color, transparency, and interpolation values, if the isovalue
is in between the front and back texel value. The gradient and voxel values are stored in
RGBA textures. In the register combiners gradients are interpolated and dot product
lighting calculations are performed. The following sub-sections explain all these steps in
detail.
20.1
Projection
The texture-based volume rendering algorithm usually blends object-aligned texture slices
of one of the three texture stacks back-to-front into the frame buffer using the over operator. Instead of this slice-by-slice approach, we render slab-by-slab (see Figure 20.1)
104
from back to front into the frame buffer. A single polygon is rendered for each slab with
the two corresponding textures as texture maps. In order to have texels along all viewing
rays projected upon each other for the texel fetch operation, either the back slice must
be projected onto the front slice or vice versa. The projection is thereby accomplished
by adapting texture coordinates for the projected texture slice and retaining the texture
coordinates of the other texture slice. Figure 20.2 shows the projection for the object- and
view-aligned rendering algorithms.
For direct volume rendering without lighting, textures are defined in the OpenGL texture format GL LUMINANCE8. For volume shading and shaded isosurfaces GL RGBA textures
are used, which contain the pre-calculated volume gradient and the scalar values.
20.2
Texel Fetch
For each fragment, texels of two adjacent slices along each ray through the volume are projected onto each other. Thus, we can fetch the texels with their given per-fragment texture
coordinates. Then the two fetched texels are used as lookup coordinates into a dependent
2D texture, containing pre-integrated values for each of the possible combinations of front
and back scalar values. NVidia’s texture shader extension provides a texture shader operation that employs the previous texture shader’s green and blue (or red and alpha) colors
as the (s, t) coordinates for a non-projective 2D texture lookup. Unfortunately, we cannot
Figure 20.2: Projection of texture slice vertices onto adjacent slice polygons for objectaligned slices (left) and view-aligned slices (right)
105
Figure 20.3: Texture shader setup for dependent 2D texture lookup with texture coordinates obtained from two source textures.
use this operation as our coordinates are fetched from two separate 2D textures. Instead,
as a workaround, we use the dot product texture shader, which computes the dot product
of the stage’s (s, t, r) and a vector derived from a previous stage’s texture lookup (see Figure 20.3). The result of two of such dot product texture shader operations are employed
as coordinates for a dependent texture lookup. Here the dot product is only required to
extract the front and back volume scalars. This is achieved by storing the volume scalars
in the red components of the textures and applying a dot product with a constant vector
v = (1, 0, 0)T . The texture shader extension allows us to define to which previous texture
fetch the dot product refers with the GL PREVIOUS TEXTURE INPUT NV texture environment.
The first dot product is set to use the fetched front texel values as previous texture stage,
the second uses the back texel value. In this approach, the second dot product performs
the texture lookup into our dependent texture via texture coordinates obtained from two
different textures.
106
For direct volume rendering without lighting the fetched texel from the last dependent texture operation is routed through the register combiners without further
processing and blended into the frame buffer with the OpenGL blending function
glBlendFunc(GL ONE,GL ONE MINUS SRC ALPHA).
107
Rasterization Isosurfaces using
Dependent Textures
As discussed in [37], pre-integrated volume rendering can be employed to render multiple
isosurfaces. The basic idea is to color each ray segment according to the first isosurface
intersected by the ray segment. Examples for such dependent textures are depicted in
Figure 21.3.
For shading calculations, RGBA textures are usually employed, that contain the volume
gradient in the RGB components and the volume scalar in the ALPHA component. As we use
dot products to extract the front and back volume scalar and the dot product refers only
to the first three components of a vector, we store the scalar data in the RED component.
The first gradient component is stored in the ALPHA component in return.
For lighting purposes the gradient of the front and back slice has to be rebuilt in the
RGB components (ALPHA has to be routed back to RED) and the two gradients have to be
interpolated depending on a given isovalue (see Figure 21.1). The interpolation value for
the back slice is given by IP = (siso − sf )/(sb − sf ); the interpolation value for the front
slice is 1−IP (see also [37]). IP could be calculated on-the-fly for each given isovalue, back
and front scalar. Unfortunately, this requires a division in the register combiners, which
is not available. For this reason we have to pre-calculate the interpolation values for each
combination of back and front scalar and store them in the dependent texture. Ideally, this
interpolation value would be looked up using a second dependent texture. Unfortunately,
NVidia’s texture shader extension only allows four texture operations, which we already
spent. Hence we have to store the interpolation value IP in the first and only dependent
texture.
There are two possible ways to store these interpolation values. The first approach
stores the interpolation value (IP) in the ALPHA component of the dependent texture
(R,G,B,IP). The main disadvantage of this method is, that the transparency, which is
usually freely definable for each isosurface’s back and front face, is now constant for all
isosurfaces’ faces. In order to obtain a transparency value of zero for ray segments that
do not intersect the isosurface and a constant transparency for ray segments that intersect
the isosurface the interpolation values are stored in the ALPHA channel in the range 128 to
255 (7 bit). An interpolation value of 0 is stored for ray segments that do not intersect the
isosurface. This allows us to scale the ALPHA channel with a factor of 2, to get an ALPHA of
1.0 for ray segments intersecting the isosurface and an ALPHA of 0 otherwise. Afterwards,
a multiplication of the result with the constant transparency can be performed. For the
interpolation the second general combiner’s input mapping for the interpolation is set to
GL HALF BIAS NORMAL NV and GL UNSIGNED INVERT NV to map the the interpolation value
to the ranges 0 to 0.5 and 0.5 to 0 (see Figure 21.1). After the interpolation the result is
scaled with 2 in order to get the correct interpolation result.
108
Figure 21.1: Register Combiner setup for gradient reconstruction and interpolation with
interpolation values stored in alpha . Note that the interpolation values are stored in the
range of 0.5 to 1.0, which requires proper input and output mappings for general combiner
2 to obtain a correct interpolation. M denotes the gradient of the back slice, N the front
slice gradient respectively.
109
Figure 21.2: Register Combiner setup for gradient reconstruction and interpolation with
interpolation values stored in blue. Note that the interpolation values are routed in the
alpha portion and back into the RGB portion to distribute the values onto RGB for interpolation.M denotes the gradient of the back slice, N the front slice gradient respectively.
110
Our second approach stores the interpolation value IP in the BLUE component of the
dependent texture (R,G,IP,A). Now the transparency can be freely defined for each isosurface and each back and front face of the isosurface, but the register combiners are used
to fill the blue color channel with a constant value, that is equal for all isosurfaces’ back
and front faces. Also we can use all 8 bits of the BLUE color channel for the interpolation
value. In order to distribute the interpolation value from the BLUE color channel on all RGB
components for the interpolation, BLUE is first routed into the ALPHA portion of a general
combiner stage and then routed back into the RGB portion (see Figure 21.2).
21.1
Lighting
After the per-fragment calculation of the isosurfaces’ gradient in the first three general
combiner stages, the remaining five general combiners and the final combiner can be used
for lighting computations. Diffuse and specular lighting with a maximum power of 256 is
possible by utilizing the dot product of the register combiners and increasing the power by
multiplying the dot product with itself. Currently we calculate I = Ia + Id C(n · l1 ) + Is (n ·
l2 )16 , where n denotes the interpolated normal, l1 the diffuse light source direction, l2 the
specular light source direction, and C the color of the isosurface. A visualization of a CT
scan of a human head at different thresholds is shown in Figure 21.3.
The same approach can also be employed for volume shading. For lighting, the average
gradient at the front and back slice is used, thus no interpolation values have to be stored
in the dependent texture. The dependent texture holds pre-integrated opacity and color
values, latter are employed for diffuse and specular lighting calculations. The implemented
lighting model computes I = Id C(n · l1 ) + Is C(n · l2 )16 , where n denotes the interpolated
normal, l1 the diffuse light source direction, l2 the specular light source direction and C
the pre-integrated color of the ray segment.
Dynamic lighting as described above requires RGBA textures, which consume a lot of
texture memory. Alternatively, static lighting is possible by storing pre-calculated dot
products of gradient and light vectors for each voxel in the textures. The dot products at
the start and end of a ray segment are then interpolated for a given isovalue in the register
combiners. For this purpose LUMINANCE ALPHA textures can be employed, which consume
only half of the memory of RGBA textures.
The intermixing of semi-transparent volumes and isosurfaces is performed by a multipass approach that first renders a slice with a pre-integrated dependent texture and then
renders the slice again with a isosurface dependent texture. Without the need of storing
the interpolation values in the dependent texture, a single pass approach could also be
implemented, which neglects isosurfaces and semi-transparent volumes in a slab at the
same time. Examples of dependent textures for direct and isosurface volume rendering are
presented in Figure 21.3.
111
Figure 21.3: Pre-Integrated isosurfaces, Left to right: Multiple colored isosurfaces of a
synthetic data set with the corresponding dependent texture. Isosurfaces of a human head
CT scan (2563 ): skin, skull, semi-transparent skin with opaque skull and the dependent
texture for the latter image.
Figure 21.4: Images showing a comparison of a) pre-shaded, b) post-shaded without additional slices, c) post-shaded with additional slices and d) pre-integrated volume visualization of tiny structures of the inner ear (128 × 128 × 30) rendered with 128 slices. Note
that the slicing artifacts in (b) can be removed (c) by rendering additional slices. With
pre-integrated volume rendering (d) there are no slicing artifacts visible with the original
number of slices.
112
Figure 21.5: Comparison of the results of pre-classification (top), post-classification (middle) and pre-integrated classification (bottom) for direct volume rendering of a spherical
harmonic (Legendre’s) function (163 voxels) with random transfer functions.
113
Volumetric FX
One drawback of volume based graphics is that high frequency details cannot be represented
in small volumes. These high frequency details are essential for capturing the characteristics of many volumetric objects such as clouds, smoke, trees, hair, and fur. Procedural
noise simulation is a very powerful tool to use with small volumes to produce visually compelling simulations of these types of volumetric objects. Our approach is similar to Ebert s
approach for modeling clouds[11]; use a coarse technique for modeling the macrostructure
and use procedural noise based simulations for the microstructure. We have adapted this
approach to interactive volume rendering through two volume perturbation approaches
which are efficient on modern graphics hardware. The first approach is used to perturb
optical properties in the shading stage while the second approach is used to perturb the
volume itself.
Both volume perturbation approaches employ a small 3D-perturbation volume with
3
32 voxels. Each texel is initialized with four random 8-bit numbers, stored as RGBA
components, and blurred slightly to hide the artifacts caused by trilinear interpolation.
Texel access is then set to repeat. An additional pass is required for both approaches due
to limitations imposed on the number of textures which can be simultaneously applied to a
polygon, and the number of sequential dependent texture reads permitted. The additional
pass occurs before the steps outlined in the previous section. Multiple copies of the noise
texture are applied to each slice at different scales. They are then weighted and summed
per pixel. To animate the perturbation, we add a different offset to each noise texture s
coordinates and update it each frame.
The first approach is similar to Ebert s lattice based noise approach[11]. It uses the
four per-pixel noise components to modify the optical properties of the volume after the
the transfer function has been evaluated. This approach makes the materials appear to
have inhomogeneities. We allow the user to select which optical properties are modified.
This technique is used to get the subtle iridescence effects seen in Figure 22.1 (bottom).
The second approach is closely related to Peachey s vector based noise simulation
technique[11]. It uses the noise to modify the location of the data access for the volume.
In this case three components of the noise texture form a vector, which is added to the
texture coordinates for the volume data per pixel. The data is then read using a dependent
texture read. The perturbed data is rendered to a pixel buffer that it is used instead of
the original volume data. Figure 22.2 illustrates this process. A shows the original texture
data. B shows how the perturbation texture is applied to the polygon twice, once to achieve
low frequency with high amplitude perturbations (large arrows) and again to achieve high
frequency with low amplitude perturbations (small arrows). Notice that the high frequency
content is created by allowing the texture to repeat. Figure 22.2 C shows the resulting
texture coordinate perturbation field when the multiple displacements are weighted and
summed. D shows the image generated when the texture is read using the perturbed
texture coordinates. Figure 22.1 shows how a coarse volume model can be combined with
114
Figure 22.1: Procedural clouds. The image on the top shows the underlying data, 643 . The
center image shows the perturbed volume. The bottom image shows the perturbed volume
lit from behind with low frequency noise added to the indirect attenuation to achieve subtle
iridescence effects.
115
Figure 22.2: An example of texture coordinate perturbation in 2D. A shows a square
polygon mapped with the original texture that is to be perturbed. B shows a low resolution
perturbation texture applied the the polygon multiple times at different scales. These offset
vectors are weighted and summed to offset the original texture coordinates as seen in C.
The texture is then read using the modified texture coordinates, producing the image seen
in D.
116
Figure 22.3: Procedural fur. Left: Original teddy bear CT scan. Right: teddy bear with
fur created using high frequency texture coordinate perturbation.
our volume perturbation technique to produce an extremely detailed interactively rendered
cloud. The original 643 voxel dataset is generated from a simple combination of volumetric
blended implicit ellipses and defines the cloud macrostructure[11].
The final rendered image in Figure 22.1(c), produced with our volume perturbation
technique, shows detail that would be equivalent to unperturbed voxel dataset of at least
one hundred times the resolution. Figure 22.3 demonstrates this technique on another
example. By perturbing the volume with a high frequency noise, we can obtain a fur-like
surface on the teddy bear.
118
Volumetric FX
(a) Radial distance volume with highfrequency fire transfer function
(b) Perlin-Noise-Volume with fire-transfer
function
(c) Weighted combination of the distance
volume und two Perlin-Noise volumes
(d) Like (c), but with higher weights for
the Perlin-Noise-volumes
Figure 22.4: Pre-Integrated volume rendering of a fireball. The fireball effect is achieved
by mixing different volumes during rendering.
119
High-Quality Volume Graphics
on Consumer PC Hardware
Bibliography
Klaus Engel
Markus Hadwiger
Joe M. Kniss
Christof Rezk-Salama
Course Notes 42
Bibliography
[1] Andreas H. König and Eduard M. Gröller. Mastering transfer function specification
by using volumepro technology. Technical Report TR-186-2-00-07, Vienna University
of Technology, March 2000.
[2] ATI web page. http://www.ati.com/.
[3] Chandrajit L. Bajaj, Valerio Pascucci, and Daniel R. Schikore. The Contour Spectrum.
In Proceedings IEEE Visualization 1997, pages 167–173, 1997.
[4] Uwe Behrens and Ralf Ratering. Adding Shadows to a Texture-Based Volume Renderer. In 1998 Volume Visualization Symposium, pages 39–46, 1998.
[5] J. Blinn. Models of Light Reflection for Computer Synthesized Pictures . Computer
Graphics, 11(2):192–198, 1977.
[6] J. Blinn and M. Newell. Texture and Reflection in Computer Generated Images.
Communcations of the ACM, 19(10):362–367, 1976.
[7] J. F. Blinn. Jim blinn’s corner: Image compositing–theory. IEEE Computer Graphics
and Applications, 14(5), 1994.
[8] B. Cabral, N. Cam, and J. Foran. Accelerated volume rendering and tomographic
reconstruction using texture mapping hardware. In Proc. of IEEE Symposium on
Volume Visualization, pages 91–98, 1994.
[9] Yoshinori Dobashi, Kazufumi Kanede, Hideo Yamashita, Tsuyoshi Okita, and Tomoyuki Hishita. A Simple, Efficient Method for Realistic Animation of Clouds. In
Siggraph 2000, pages 19–28, 2000.
[10] R. A. Drebin, L. Carpenter, and P. Hanrahan.
SIGGRAPH ’88, pages 65–74, 1988.
Volume rendering.
In Proc. of
[11] D. Ebert, F. K. Musgrave, D. Peachey, K. Perlin, and S. Worley. Texturing and
Modeling: A Procedural Approach. Academic Press, July 1998.
[12] K. Engel, M. Kraus, and T. Ertl. High-Quality Pre-Integrated Volume Rendering
Using Hardware-Accelerated Pixel Shading. In Proc. Graphics Hardware, 2001.
[13] T. J. Farrell, M. S. Patterson, and B. C. Wilson. A diffusion theory model of spatially
resolved, steady-state diffuse reflectance for the non-invasive determination of tissue
optical properties in vivo. Medical Physics, 19:879–888, 1992.
BIBLIOGRAPHY
121
[14] N. Greene. Environment Mapping and Other Applications of World Projection. IEEE
Computer Graphics and Applications, 6(11):21–29, 1986.
[15] M. Hadwiger, T. Theußl, H. Hauser, and E. Gröller. Hardware-accelerated highquality filtering on PC hardware. In Proc. of Vision, Modeling, and Visualization
2001, pages 105–112, 2001.
[16] M. Hadwiger, I. Viola, and H. Hauser. Fast convolution with high-resolution filters.
Technical Report TR-VRVis-2002-001, VRVis Research Center for Virtual Reality and
Visualization, 2002.
[17] M.J. Harris and A. Lastra. Real-time cloud rendering. In Proc. of Eurographics 2001,
pages 76–84, 2001.
[18] Taosong He, Lichan Hong, Arie Kaufman, and Hanspeter Pfister. Generation of Transfer Functions with Stochastic Search Techniques. In Proceedings IEEE Visualization
1996, pages 227–234, 1996.
[19] James T. Kajiya and Brian P. Von Herzen. Ray Tracing Volume Densities. In ACM
Computer Graphics (SIGGRAPH ’84 Proceedings), pages 165–173, July 1984.
[20] R. G. Keys. Cubic convolution interpolation for digital image processing. IEEE Trans.
Acoustics, Speech, and Signal Processing, ASSP-29(6):1153–1160, December 1981.
[21] Joe Kniss, Gordon Kindlmann, and Charles Hansen. Multi-Dimensional Transfer
Functions for Interactive Volume Rendering. TVCG, 2002 to appear.
[22] P. Lacroute and M. Levoy. Fast volume rendering using a shear-warp factorization of
the viewing transformation. In Proc. of SIGGRAPH ’94, pages 451–458, 1994.
[23] E. LaMar, B. Hamann, and K. Joy. Multiresolution Techniques for Interactive Texturebased Volume Visualization. In Proc. IEEE Visualization, 1999.
[24] M. Levoy. Display of surfaces from volume data. IEEE Computer Graphics and
Applications, 8(3):29–37, May 1988.
[25] W. E. Lorensen and H. E. Cline. Marching cubes: A high resolution 3D surface
construction algorithm. In Proc. of SIGGRAPH ’87, pages 163–169, 1987.
[26] J. Marks, B. Andalman, P.A. Beardsley, and H. Pfister et al. Design Galleries: A
General Approach to Setting Parameters for Computer Graphics and Animation. In
ACM Computer Graphics (SIGGRAPH ’97 Proceedings), pages 389–400, August 1997.
[27] N. Max. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics, 1(2):99–108, 1995.
[28] T. McReynolds, D. Blythe, B. Grantham, and S. Nelson. Advanced graphics programming techniques using OpenGL. In SIGGRAPH 2000 course notes, 2000.
122
BIBLIOGRAPHY
[29] M. Meißner, U. Hoffmann, and W. Straßer. Enabling Classification and Shading for
3D-texture Based Volume Rendering Using OpenGL and Extensions. In Proc. IEEE
Visualization, 1999.
[30] D. P. Mitchell and A. N. Netravali. Reconstruction filters in computer graphics. In
Proc. of SIGGRAPH ’88, pages 221–228, 1988.
[31] Herke Jan Noordmans, Hans T.M. van der Voort, and Arnold W.M. Smeulders. Spectral Volume Rendering. In IEEE Transactions on Visualization and Computer Graphics, volume 6. IEEE, July-September 2000.
[32] NVIDIA web page. http://www.nvidia.com/.
[33] NVIDIA
OpenGL
extension
http://www.nvidia.com/developer.
specifications
document.
[34] A. V. Oppenheim and R. W. Schafer. Digital Signal Processing. Prentice Hall, Englewood Cliffs, 1975.
[35] B.T. Phong. Illumination for Computer Generated Pictures. Communications of the
ACM, 18(6):311–317, June 1975.
[36] C. Rezk-Salama, K. Engel, M. Bauer, G. Greiner, and T. Ertl. Interactive Volume
Rendering on Standard PC Graphics Hardware Using Multi-Textures and Multi-Stage
Rasterization. In Proc. SIGGRAPH/Eurographics Workshop on Graphics Hardware,
2000.
[37] S. Röttger, M. Kraus, and T. Ertl. Hardware-accelerated volume and isosurface rendering based on cell-projection. In Proc. of IEEE Visualization 2000, pages 109–116,
2000.
[38] J.
Schimpf.
3Dlabs
OpenGL
http://www.3dlabs.com/support/developer/ogl2/.
[39] M. Segal and K. Akeley.
http://www.opengl.org.
2.0
The OpenGL Graphics System:
white
papers.
A Specification.
[40] Lihong V. Wang. Rapid modelling of diffuse reflectance of light in turbid slabs. J.
Opt. Soc. Am. A, 15(4):936–944, 1998.
[41] R. Westermann and T. Ertl. Efficiently using graphics hardware in volume rendering
applications. In Proc. of SIGGRAPH ’98, pages 169–178, 1998.
[42] C. M. Wittenbrink, T. Malzbender, and M. E. Goss. Opacity-weighted color interpolation for volume sampling. In Proc. of IEEE Symposium on Volume Visualization,
pages 135–142, 1998.
Download