Tone Mapping and High Dinamic Range Imaging

advertisement
Development of a Perceptual Tone
Mapping Operator
Workpackage 4 (Task 4.2)
Deliverable 4.1a
Patrick Ledda
University of Bristol
1
1 INTRODUCTION
3
2 GENERATING HIGH DYNAMIC RANGE IMAGING
4
3 TONE MAPPING
4
3.1 Tone Mapping Operators
6
3.2 Local operators
6
3.3 Global operators
10
3.4 Perceptual vs. Non-Perceptual
14
4 A PERCEPTUAL TMO FOR ARIS
14
4.1 Threshold versus Intensity studies
15
4.2 Implementation of the Algorithm
16
4.3 A local approach
22
4.4 Introducing Human Visibility Effects
22
5 VALIDATION OF TONE MAPPING OPERATORS
23
6 REFERENCES
25
2
1 Introduction
The natural world presents our visual system with a wide range of colours and
intensities. A starlit night has an average luminance level of around 10-3 candelas/m2,
and daylight scenes are close to 105 cd/m2.
Humans can see detail in regions that vary by 1:104 at any given adaptation level, over
which the eye gets swamped by stray light (i.e., disability glare) and details are lost.
Modern camera lenses, even with their clean-room construction and coated optics,
cannot rival human vision when it comes to low flare and absence of multiple paths
("sun dogs") in harsh lighting environments. Even if they could, conventional
negative film cannot capture much more range than this, and most digital image
formats do not even come close. With the possible exception of cinema, there has
been little push for achieving greater dynamic range in the image capture stage,
because common displays and viewing environments limit the range of what can be
presented to about two orders of magnitude between minimum and maximum
luminance. A well-designed CRT monitor may do slightly better than this in a
darkened room, but the maximum display luminance is only around 100 cd/m2, which
does not begin to approach daylight levels. A high-quality xenon film projector may
get a few times brighter than this, but they are still two orders of magnitude away
from the optimal light level for human acuity and colour perception.
Figure 1 – Range of luminances in the real world compared to RGB
As a result of global illumination, images with huge dynamic ranges have become
more common. Dealing with such values requires new file formats and more
importantly devices able to display such range. The first requirement has been solved
by the development of High Dynamic Range (HDR) file formats, which allow storing
the images in a more efficient way than storing three floating-point numbers for each
RGB. The RGBE file format [16] for example requires only 32 bits to store the whole
luminance information.
Unfortunately it is still practically impossible to display these luminances on standard
devices such as CRT monitors or printers. So how can the appearance of extremes of
light and shadow be reproduced using only the tiny range of available display
outputs?
Appearance-preserving transformations from scene to display, or tone reproduction
operators, can solve this problem and were first described in the computer graphics
literature by Tumblin and Rushmeier [12] as shown in Figure 2.
3
Figure 2 - Simple diagram of Tone Mapping
2 Generating High Dynamic Range Imaging
Most computer graphics software works in a 24-bit RGB space with 8 bits allocated to
each of the three primaries. The advantage of this is that no tone mapping is required
and the result can be accurately reproduced on a standard CRT. The disadvantage is
that colours outside the sRGB gamut cannot be represented (especially very light or
dark ones). So, how can images with luminances that can resemble reality be created?
There are two main methods for generating HDR imaging. The first method is by
using
physically based renderers, which produce high dynamic range images
generating basically all visible colours. Another way to generate HDR imaging is by
taking photographs of a particular scene at different exposure times [2]. By taking a
series of photographs at different exposure, all the luminances in the scene can be
captured as shown in figure 3a. After the images have been aligned and combined into
one single image, the camera’s response function for each of the RGB channels can be
recovered and stored.
Figure 3 (a,b) – Taking pictures at multiple exposure times and camera response
function.
Once, the response function is known (Figure 3b), this allow to quickly taking a HDR
photograph of a scene from a few different exposures (most digital cameras have auto
bracketing function which allows taking simultaneously photographs at different
exposure times).
3 Tone Mapping
Tone-mapping algorithms rely on observer models that mathematically transform
scene luminances into all the visual sensations experienced by a human observer
viewing the scene, estimating the brain's own visual assessments. A Tone Mapping
4
Operator (TMO) tries to match the outputs of one observer model applied to the scene
to the outputs of another observer model applied to the desired display image.
Tumblin and Rushmeier were the first to bring the issue of tone mapping to the
computer graphics community. They offered a general framework for tone
reproduction operators by concatenating a scene observer model with an inverse
display observer model, and when properly constructed such operators should
guarantee the displayed image is veridical: it causes the display to exactly recreate the
visual appearance of the original scene, showing no more and no less visual content
than would be discernible if actually present to see the original scene.
Unfortunately, visual appearance is still quite mysterious, especially for high contrast
scenes, making precise and verifiable tone reproduction operators difficult to
construct and evaluate. Appearance, the ensemble of visual sensations evoked by a
viewed image or scene, is not a simple one-to-one mapping from scene radiance to
perceived radiance, but instead is the result of a complex combination of sensations
and judgments, a set of well-formed mental estimates of scene illumination,
reflectance, shapes, objects and positions, material properties, and textures. Though
all these quantities are directly measurable in the original scene, the mental estimates
that make up visual appearance are not.
The most troublesome task of any basic tone reproduction operator is detailpreserving contrast. The Human Visual System (HVS) copes with large dynamic
ranges through a process known as visual adaptation. Local adaptation, the ensemble
of local sensitivity-adjusting mechanisms in the human visual system, reveals visible
details almost everywhere in a viewed scene, even when embedded in scenes of very
high contrast. Although most sensations that humans perceive from scene contents,
such as reflectance, shape, colour and movement can be directly evoked by the
display outputs, large contrasts cannot. As shown in Figure 4, high contrasts must be
drastically reduced for display, yet somehow must retain a high contrast appearance
and at the same time keep visible in the displayed image all the low contrast details
and textures revealed by local adaptation processes.
Figure 4 – The range of luminances cannot be reproduced on a CRT monitor
There are different reasons that make the tone mapping problem not always easy to
solve. The most obvious reason is that, as mentioned above the contrast ratio that can
be produced by a standard CRT monitor is only about 100:1 which is much smaller
that what can exist in the real world. Newspaper photographs achieve a maximum
contrast of about 30:1; the best photographic prints can provide contrasts as high as
1000:1. In comparison, scenes that include visible light sources, deep shadows, and
highlights can reach contrasts of 100000:1. Another reason that makes tone mapping
operators fail in some cases is that the simplest ways to adjust scene intensities for
display will usually reduce or destroy important details and textures.
5
3.1 Tone Mapping Operators
In the past decade quite a few authors have developed tone mapping operators to
display HDR imagery. These algorithms can all be classified in two main categories:
spatially uniform (non-local) and spatially varying (local). This is shown in Figure 5
below.
Figure 5 – List of TM operators
3.2 Local operators
Humans are capable of viewing high contrasts scenes thanks to the local control
sensitivity in the retina. This suggests that a position-dependent scale factor might
reduce scene contrasts acceptably and allow displaying them on a low dynamic range
device. This approach converts the original scene or real-world intensities to the
displayed image intensities, using a position-dependent multiplying term.
Chiu et al. [1] addressed the problem of global visibility loss by scaling luminance
values based on a spatial average of luminances in pixel neighbourhoods. Very dark
or bright areas are not clamped (like in the very first models) but are scaled according
to their spatial location. Their approach provides excellent results on smoothly shaded
portions of an image; however, any small bright feature in the image will cause strong
attenuation of the neighbouring pixels and surround the feature or high-contrast edge
with a noticeable dark band or halo. This error occurs because the human eye is very
sensitive to variation at high spatial frequencies.
6
Figure 6 – Contrast reversal causes halo artefacts
Schlick [9] followed the work proposed by Chiu but this algorithm also reported
problems with similar halo artifacts. Schlick used a first-degree rational polynomial
function to map high-contrast scene luminances to display system values. This
function works well when applied uniformly to each pixel of a high-contrast scene,
and is especially good for scenes containing strong highlights. Next, he made an
attempt to mimic local adaptation by locally varying a mapping function parameter;
one method caused halo artifacts. Schlick concentrated mainly on efficiency and
simplicity rather than improving the method mentioned above.
Figure 7 – Top images show the dynamic range, bottom is the tone mapped image
(Shilick’s operator)
Rahman et al. [8] recently devised a full-colour local scaling and contrast reduction
method using a multiscale version of Land’s “retinex” theory of colour vision.
Retinex theory estimates scene reflectances from the ratios of scene intensities to their
7
local intensity averages. Jobson, Rahman, and colleagues also use Gaussian low-pass
filtering to find local multiplying factors, making their method susceptible to halo
artifacts. They divide each point in the image by its low-pass filtered value, then take
the logarithm of the result to form a reduced contrast “single-scale retinex.” To further
reduce halo artifacts they construct a “multiscale retinex” from a weighted sum of
three single-scale retinexes, each computed with different sized filter kernels, then
apply scaling and offset constants to produce the display image. These and other
constants give excellent results for the wide variety of 24bit RGB images used to test
their method, but it is unclear whether these robust results will extend to floatingpoint images whose maximum contrasts can greatly exceed 255:1.
Pattanaik et al.[7] proposed a tone reproduction algorithm that takes into account
representations of pattern, luminance and colour processing in the Human Visual
System. The model accounts for changes of perception at threshold and
suprathresholds levels of brightness. This tone mapping algorithm also allows
chromatic adaptation as well as luminance adaptation (See figure below). It however
doesn’t include any time adaptation models.
Figure 8 – Colour sensitivity and visual acuity at different luminance levels
Recently Reinhard[] proposed an operator that is based on photographic practice
using a system called the zone system which divides the scenes luminances into 11
printing zones. The zones go from black (zone 0) to white (zone 10). Then a
luminance reading for a middle grey is taken and is assigned to zone 5. The dynamic
range is captured by reading light and dark regions. This operator firstly applies a
scaling to the entire image to reduce the dynamic range and then modifies locally the
contrast of some regions by highlighting or darkening to improve the overall
visibility. There is also a global version of the operator, which tries to simulate the
“dodging and burning” techniques used in photography.
8
Figure 9 – Two scenes mapped with Reinhard’s operator
9
3.3 Global operators
Most imaging systems do not imitate local adaptation. Instead, almost all image
synthesis, recording, and display processes use an implicit normalizing step to map
the original scene intensities to the target display intensities without disturbing any
scene contrasts that fall within the range of the display device. This normalizing
consists of a single constant multiplier. Image normalizing has two important
properties: it preserves all reproducible scene contrasts and it discards the intensities
of the original scene or image. Contrast, the ratio of any two intensities, is not
changed if the same multiplier scales both intensities. Normalizing implicitly assumes
that scaling does not change the appearance, as if all the perceptually important
information were carried by the contrasts alone, but scaling display intensities can
strongly affect a viewer’s estimates of scene contrasts and intensities. Although this
scaling is not harmful for many well-lit images or scenes, discarding the original
intensities can make two scenes with different illumination levels appear identical.
Normalizing also fails to capture dramatic appearance changes at the extremes of
lighting, such as gradual loss of colour vision, changes in acuity, and changes in
contrast sensitivity.
Tumblin and Rushmeier [12] tried to capture some of these light dependent changes in
appearance by describing a “tone reproduction operator,” which was built from
models of human vision, to convert scene intensities to display intensities. They
offered an example operator based on the suprathreshold brightness measurements
made by Stevens and Stevens [11] who claimed that an elegant power-law relation
exists between luminance, adaptation luminance, and perceived brightness.
Tumblin and Rushmeier’s were the first to bring to attention the tone reproduction
problem. They used the results of Stevens and Stevens and tried to preserve brightness
in a scene. However it had some large limitations: images or scenes that approach
total darkness processed with their method are displayed as anomalous middle grey
images instead of black, and display contrasts for very bright images (>100 cd/m2) are
unrealistically exaggerated.
Soon afterwards Ward [15] presented a much simpler approach to appearance
modelling that also provided a better way to make dark scenes appear dark and bright
scenes appear bright on the display. The idea behind this operator is that visibility is
preserved which insures that the smallest perceptible difference in a real scene
corresponds to the smallest perceptible difference in the image.
Ferwerda et al. [3] offered an extended appearance model for adaptation that
successfully captured several of its most important visual effects. This operator takes
into account the transition for achromatic night vision and chromatic day vision. This
is achieved by modelling the gradual transition from cone-mediated daylight vision to
rod-mediated night vision. This method accounts for change in colour sensitivity
acuity as a function of intensity in the scene. Like Ward, they converted original scene
or image intensities to display intensities with a multiplicative scale factor, but they
determined their multiplier values from a smooth blending of increment threshold data
for both rods and cones in the retina, as shown in Figure 8. This method also provides
a simple method to mimic the time course of adaptation for both dark-to-light and
light-to-dark transitions (Figure 10).
10
Figure 10 – Ferwerda’s model
More recently Ward et al. [17] published a new and impressively comprehensive tone
reproduction operator based on iterative histogram adjustment and spatial filtering
processes. Their operator reduces high scene contrasts to match display abilities, and
also ensures that contrasts that exceed human visibility thresholds in the scene will
remain visible on the display (Figure 11). They model some foveally dominated local
adaptation effects, yet completely avoid halo artifacts or other forms of local gradient
reversals, and include new locally adapted models of glare, colour sensitivity, and
acuity similar to those used by Ferwerda et al. [3].
Figure 11 – Two scenes mapped with Ward’s operator
In 1999 Tumblin et al [14] proposed two methods to display high contrast images on
low dynamic range displays by imitating some of the human visual systems’
properties. One method, based on and HVS layer models, creates images in lighting
layers and surface properties. The algorithm aims to preserve scene visibility. This is
achieved by scaling all the luminance levels and compressing them while preserving
the reflectance and transparency layers. The main limitation with this process is that it
11
only works with rendered images where all the layer information can be retrieved
during the rendering process.
Tumblin’s second method, know as the foveal method, interactively adjust the detail
visibility in the fovea area whilst compressing other parts of the image. The user can
use the mouse to click on any area of an image and the algorithm tone maps the
surrounding area base on the local luminance levels.
A recent operator was proposed by Pattanaik et al[6]. This time dependent algorithm
allows to tone map either static or dynamic images (photographs or rendered) and is
based on a perceptual model proposed by Tumblin and Rushmeier. It also includes an
eye adaptation model to represent lightness and colour.
Figure 12 – Pattanaik model of adaptation
This operator is original since it accepts a variety of scenes and luminance levels and
it takes into account various adaptation factors. All the human eye properties are
obtained from widely accepted colour science and psychology literature making this
operator ideal for dynamic scenes. Using Hunt’s [4] colour model for static vision
they include time dependent effects such as neural response and colour bleaching
effects. The main limitation of this operator however is that it does not include a local
eye-adaptation approach which is very important to faithfully represent visual
appearance.
Recently Drago et al. implemented a simple method based on logarithmic
compression of luminance values, imitating the human response to light. A bias power
function is introduced to adaptively vary logarithmic bases, resulting in good
preservation of details and contrast. This adaptive logarithmic mapping technique is
capable of producing perceptually tuned images with high dynamic content and works
at interactive speed. The image below shows a tone mapped image with this operator.
Although it is difficult to see from the image below, one of the main problems with
this operator is that it appears to reduce contrast in the image (Figure 13).
12
Figure 13 – Two scenes mapped with Drago’s operator
A few other computer graphics researchers have modelled the appearance of
extremely bright, high-contrast scene features by adding halos, streaks, and blooming
effects to create the appearance of intensities well beyond the abilities of the display.
Nakamae et al. proposed that the star-like streaks seen around bright lights at night are
partly due to diffraction by eyelashes and pupils, and they presented a method to
calculate these streaks in RGB units, implicitly normalizing them for display. Later
Spencer et al. [10] presented an extensive summary of the optical causes and visual
effects of glare and modelled their appearance by using several adjustable low-pass
filters on the intensities of the original scene (Figure 14). Small, extremely bright light
sources that cover only a few pixels, such as street lights at night or the sun leaking
through a thicket of trees, are expanded into large, faintly coloured, glare-like image
features that have a convincing and realistic appearance.
13
Figure 14 – Discomfort glare simulated
Despite progress in modelling the light-dependent changes in appearance that occur
over the entire range of human vision, few methods offer the substantial contrast
reduction needed to display these images without truncation or halo artifacts.
3.4 Perceptual vs. Non-Perceptual
Some algorithms use perceptual data, usually based on psychophysical experiments,
to simulate reality; others simply attempt to compress the range purely by a
mathematical approach with the aim of obtaining the maximum visibility on the
display device. This latter approach can be useful if the TMO operator is simply used
for visualization purposes in which case displaying all the possible values can be
satisfactory. However, in all those cases where Tone Mapping tries to simulate reality,
the implementation of the algorithm should be based on perceptual data. A few of the
operators published try to simulate human visibility by mathematically modelling
some property of the HVS such as eye-adaptation, colour visibility at different
photopic or scotopic light level, visual acuity.
One of the main limitations of tone mapping is that the displayed result is static.
Although some algorithms take into account human visibility factors, only very few
operators allow to dynamically modify the image based on human eye models. It is
important to decide what are the purposes of a particular algorithm. If for example it
is important to visualize all the luminance levels in a scene “in one go” then most
operators satisfy this. However, if an eye-simulation is required then a dynamic model
based on adaptation may be more accurate.
4 A perceptual TMO for ARIS
If we have a High Dynamic Range photograph of a scene (which will eventually be
augmented with some artificial object) or a CG image, this has to be tone mapped in
order to be displayed (or printed) on a monitor. As mentioned in the section above,
there are different algorithms in the literature that accomplish this. However, very few
try to give a perceptual match with the real scene. After having implemented and
tested several Tone Mapping Operators, we ended up with the conclusion that a
perceptual operator, which took into account Human Visibility effects, was needed.
14
We also realized that none of the operator mentioned in the previous sections had
been validated against reality. We believe that this is a very important area, which
requires investigation.
Humans make judgements of environments mainly based on luminance contrast,
therefore keeping the correct contrast between pixels in the input and an output image
is crucial if a perceptual TMO has to be developed. Trying to develop a perceptual
operator also allows us to include various human visibility effects such as time
adaptation, colour sensitivity, visual acuity a different light levels. This means that not
only we can have an operator that maps high dynamic range images (or videos)
perceptually but also we can to some extent simulate other aspect of vision.
We base our operator on Ward’s global contrast preserving algorithm, however we
extend the model to support a wider range of luminances by making the operator map
luminance locally. We also have started to include human visibility effects as well as
a time domain aspect.
4.1 Threshold versus Intensity studies
Visual sensitivity is often measured psychophysically in a detection threshold
experiment. In the typical experimental paradigm, an observer is seated in front of a
blank screen that is in their field of view. To determine the absolute threshold the
screen is made dark. To determine the contrast threshold a large region of the screen
is illuminated to a particular background luminance level. Before testing begins, the
observer fixates the centre of the screen until they are completely adapted to the
background level. On each trial a disk of light is flashed near the centre of fixation for
a few hundred milliseconds. The observer reports whether they see the disk or not. If
the disk is not seen its intensity is increased on the next trial. If it is seen, its intensity
is decreased. In this way, the detection threshold for the target disk against the
background is measured.
Figure 15 – T.v.i. data
15
As the luminance of the background in a detection threshold experiment is increased
from zero, the luminance difference between target and background required for
detection increases in direct proportion to the background luminance.
Plotting the detection threshold against the corresponding background luminance
gives a threshold versus intensity (t.v.i.) function as shown in Figure 15.
Over a wide middle range covering 3.5 log units of background luminance the
function is linear, this relationship can be described by the function deltaL = kL. This
relationship is known as Weber's law (Riggs 1971). Weber's law behaviour is
indicative of a system that has constant contrast sensitivity, since the proportional
increase in threshold with increasing background luminance corresponds to a
luminance pattern with constant contrast.
4.2 Implementation of the Algorithm
Ward’s contrast preserving algorithm (modified later by Ferwerda et al.) tries to
match the JND (Just Noticeable Difference) in contrast in the real scene and on the
monitor. To do this he assumes that the display luminance (Ld) can be calculated by
the world luminance (Lw) multiplied by some value m:
Ld ( L w ) = L w m
Eq 1
The multiplier m is chosen so that the visibility in the real scene matches the visibility
on the display device. To achieve this we can assume that we have a tvi function (see
Figure above) that gives the threshold of visibility at a given adaptation level (in
luminance). Therefore for m as a function of world adaptation level (Lwa) and display
adaptation level (Lda):
t ( Lda ) = t ( Lwa )m( Lwa , Lda )
so,
Eq 2
m( Lwa , Lda ) = t ( Lda ) / t ( Lwa ) Eq 3
From the graph above (Figure 15) which is based on accepted psychophysical data,
we can obtain a threshold for visibility for the display observer and the world
observer. Therefore if we know the display and world adaptation luminances, then a
threshold of visibility can be determined. This will determine the multiplicative factor
m. Because m is a multiplier the contrast in maintained in the input on output images
(multiplying or dividing luminances preserve contrast). Using this approach to
determine a match of visibility between display and world luminances, can be showed
graphically below:
16
1600
1400
1200
1000
800
600
400
200
0
0
200
400
600
800
1000
World Luminance
Figure 16 – Mapping caused by different values of m
Although this approach can be considered perceptual in the sense that it is contrast
preserving, the operator only works in a limited dynamic range. The operator requires
determining threshold levels for display and world adaptation levels. This ensures that
contrast is preserved and sets the threshold of world adaptation to the mid-luminance
that can be displayed on a monitor. This will suffice for all scene luminances values
that are within a limited range from the world adaptation level, very high values
however will still be to bright even after being multiplied by the a small m. This is
illustrated in Figure 17.
P e r c e p tu a l
TM O
120
100
D is 8 0
p la
y L 60
cd/
m 2 40
S e r ie s
1
20
0
0
100
200
300
400
500
600
700
W o r ld L
c d /m 2
17
Figure 17 – High luminances are clamped. This can be seen in the loss of detail in the
window.
As it can bee seen from the above graph, clamping will occur if the dynamic range is
too large. This is the main weakness with the operator as it stands. A way to solve this
problem is to use a sigmoid (S-shape) function, which allows mapping huge
luminances to the small range that is allowed by CRT monitors. This method ensures
very good results in terms of visibility but it can affect contrasts in the scene to
achieve this, making the operator a poor representation of reality. However, if the
correct sigmoid is chosen, a large part of the range can be mapped whilst preserving
the contrast for the most part. However, one of reason why the sigmoid may not be
ideal is because it gives very little control of the shape of the curve, only changes in
the gradient can be made. This is illustrated in the Figure 18 below:
Figure 18 – Sigmoid Transfer Functions with different gradients
After testing this approach, it was decided that more control over the shape of the
curve would allow better results whilst still preserving the contrast in most pixels.
This lead to using a spline as a transfer function. We cans set a series of control
points, which define the general shape of the transfer function and subsequently
determine the curve that passes through these points. This is show in Figure 19.
18
Figure 19 – Functions passing through control points
In figure 19, we have determined three functions that pass through the control points
specified for each scene. Once these curves have been obtained, the mapping can be
thought of as selecting the correct curve according to the input value. This means
finding a single curve (the spline) from the 3 individual curves as shown below:
Figure 20 –Transfer Functions originated from different control points
Determining the control points is very important, as these will generate a different
spline.
The mapping is essentially the same as the mapping for the sigmoid curve described
above however with this method, we have more control of the function in the
high/low end of the range. Some images produced with this operator are displayed
below.
19
Figure 21– The original scene (left) and tone mapped with the global model (right)
20
Figure 22– The global model with two more scenes
21
4.3 A local approach
Although the operator produces in most cases decent results, it fails in those
circumstances where a scene contains many high and low luminance value. It will still
produce a visually pleasing image but most of the pixel won’t preserve contrast. A
local model should solve this problem. The operator is based on the fact that the eye
adapts locally and not globally this allows introducing a local version of Eq 3
Ld=m(x,y)*Lw
Eq 4
The task is to determine a different m for different areas of the image. If we convolve
the input image with a low pass filter (Gaussian function) we can simulate the locality
of adaptation. A low pass Gaussian filter can be represented by the following
function, where σ is the radius of the filter determining the locality of adaptation:
G ( x, y ) =
1
2πσ 2
exp
− x2 + y2
2σ
Eq 5
Convolving the input image with this filter results in a blurred version of the original
image as shown below:
Figure 23 – Adaptation Map
The resulting blurred image will then be use as an “adaptation map” and from there
different adaptation level can be determined which will produce several values of
m(x,y).
4.4 Introducing Human Visibility Effects
When humans are viewing an environment, the way that this is perceived is greatly
affected by the luminances in the scene. When going from a bright environment to a
dark one (and vice versa) the scene that we perceive can have a very different visual
appearance. This phenomenon is known as adaptation. This process however, is time
dependent and can vary from a few seconds to several minutes. The time course of
22
dark/light adaptation is well known. We are planning to further develop our TMO to
include this effect since we think that it would be a very valuable tool. If the lighting
in a scene change dramatically, the representation of the image should change too.
This however should take time and should not be instantaneous.
Another important aspect of human vision that we would like to take into account is
the loss of colour and visual acuity at low level of light (scotopic light). When the
light is dim enough, the only photoreceptors present in our eyes are the rods. These
receptors have a very poor acuity therefore cannot resolve fine detail. This is the
reason why we find it hard to read at low level of light. Another property of scotopic
light is that we have no perception of colour. This is due to the fact that the rod system
is uncapable of detecting colour.
All these effects mentioned can greatly affect the perception of a scene. Thus, if we
want to develop a perceptual operator these properties should be taken into account
and included in the model. During the next 12 months we would like to concentrate
on these aspect of vision.
5 Validation of Tone Mapping Operators
As part of ARIS we are not only interested in developing a perceptual operator, which
takes into account human visibility, but we are also very interested in validating this
operator and other published ones against reality. It is in fact very important that the
operator gives a high fidelity result so that we can use it with a high level of security
knowing that the results is accurate.
At the University of Bristol, we have recently purchased a High Dynamic Range
Monitor which is capable of 30,000:1 dynamic range which is 300 times more than
what standard monitors (or paper) are capable of. It is also capable of displaying very
low luminance values, which is ideal in those circumstances where many shadows are
present in the scene. This range that can be achieved on the monitor is much closer to
what it can be achieved in the real world. Although the dynamic range the can exist in
a real scene can exceed 100,000:1 most scenes rarely reach such a huge range. Having
such a device allows us to display high dynamic range photographs without the need
of any advanced tone mapping technique because in most scenes, the monitor is
capable of displaying all the values directly.
It is possible that in the future, these monitor could become the standard (reducing the
role of tone mapping techniques) but is unlikely since for most application they are
simply to bright and unpractical. However we believe that they can be very useful for
validation purposes because we can use them as a reference of reality. Thus validating
tone mapping algorithms becomes a much more practical and feasible procedure.
The validation will first be in the form of a pilot study with a few scenes and a fairly
small number of participants. The purpose of the pilot study is to get an overall idea of
what the results will be. The pilot study is also very important because it will give us
the chance to notice any errors before the lengthy validation begins. The main
validation will take place in December 2003 and will last two to three weeks.
The idea of the validation is novel and we believe that it is very important. In the past
10 years many algorithms have been published, some are perceptual some are not,
however none of them have been accurately validated. There are many operators but
23
no one knows what algorithms performs best and in what conditions. Hopefully the
results will be interesting and useful for the High Dynamic Range/Tone Mapping
community. Nobody has run some experiments to validate current algorithms, which
is one of the main reasons why TMOs are still very difficult to use in all kind of
scene. Usually an algorithm may perform well in a specific type of scene; sometimesmanual adjustments have to be made in order to obtain a decent result. We believe
that an accurate validation will give the community some answers and will help us to
develop a unique TMO, which will hopefully be both automatic and realistic.
The validation will consist of different parts and we will be asking participants not
only specific questions (using questionnaires) but also to compare some contrast
charts which are widely used in the optometry field. Two example of the charts used
is illustrated below, in these cases the viewers may be asked to compare the visibility
of certain letters or circles in the scene displayed on the HDR monitor and in the tone
mapped image on a standard CRT display.
Figure 24: Examples of contrast charts used for the validation
We cannot simply ask participants to observe a series of images and determine which
one is better. We believe that it is more important to analyse the results produced by
the different TMOs at a lower level. We will try to investigate the following:
•
•
•
•
•
Naturalness
Visibility
Colour Accuracy
Brightness
Contrast
The validation will be a 3-way comparison between two algorithms (at any one time)
and the HDR Monitor.
From the validation we are hoping to understand what are the weaknesses and
strengths of some of the most popular algorithms. We believe that with this
knowledge and knowing what are the mathematical functions that they are based on,
we should be able to develop an operator which uses the strengths of the successful
TMOs and that is as universal and realistic as possible.
24
Figure 25: Some images used for the validation
6 References
[1] Chiu K., Herf M., Shirley P., Swamy S., Wang C., Zimmerman K. 1993 Spatially
Nonunifrom Scaling Functions for High Contrast Images. Proceedings Graphics
Interface 93, 245-254
[2] Debevec P.E. and Malik J. 1997 Recovering High Dynamic Range Radiance Maps
from Images. Proceedings SIGGRAPH 97. 369-378
[3]
Ferwerda J., Pattanaik SN., Shirley P. and Greenberg DP. 1996 A Model of
Visual Adaptation for Realistic Image Synthesis, In Proceedings of SIGGRAPH 1996,
ACM Press / ACM SIGGRAPH, New York. H. Rushmeier, Ed., Computer Graphics
Proceedings, Annual Conference Series, ACM, pp. 249-258
[4] Hunt G. The Reproduction of Colour. Kings Langley:Fountain Press, 3rd
edition, 1975. (5th edition is now available).
25
[5] Mitsunaga T. and Nayar S.K., 1999 Radiometric Self Calibration, Proc. IEEE
Conference on Computer Vision and Pattern Recognition, 1999.
[6] Pattanaik SN., Tumblin JE.,Yee H. and Greenberg DP. 2000 Time-Dependent
Visual Adaptation for Realistic Real-Time Image Display, In Proceedings of
SIGGRAPH 2000, ACM Press / ACM SIGGRAPH, New York. K. Akeley, Ed.,
Computer Graphics Proceedings, Annual Conference Series, ACM, 47-54.
[7] Pattanaik SN., Ferwerda J, Fairchild MD., and Greenberg DP. 1998 A Multiscale
Model of Adaptation and Spatial Vision for Realistic Image Display, in Proceedings
of SIGGRAPH 1998, ACM Press / ACM SIGGRAPH, New York. M. Cohen, Ed.,
Computer Graphics Proceedings, Annual Conference Series, ACM, 287-298.
[8] Rahman Z., D. J. Jobson, and G. A. Woodell. Multi-scale retinex for colour image
enhancement. In Proceedings, International Conference on Image Processing, volume
3, pages 1003{1006, June 1996. Held in Lausanne, Switzerland 16-19 September
1996.
[9] Schilick C. 1995 Quantization Techniques for High Dynamic Range Pictures. In
G. Sakasm, P.Shurley and S. Mueller(eds) Photoreslistic Rendering Techniques,
Berlin:Springer-Verlag, 7-20.
[10] Spencer G., P. Shirley, K. Zimmerman, and D. P. Greenberg. Physically-based
glare effects for digital images. In SIGGRAPH 95, Annual Conference Series, pages
325{334, New York, NY, August 1995. ACM SIGGRAPH, Addison-Wesley. Held in
Los Angeles CA. 6{11 August 1995.
[11] Stevens and J. C. Stevens. Brightness function: Parametric effects of adaptation
and contrast. Journal of the Optical Society of America, 50(11):1139, November
1960. Program of the 1960 Annual Meeting.
[12] Tumblin J., Rushmeier H. 1993 Tone Reproduction for Realistuc Images, IEEE
Computer Graphics and Applications. 13(6), 42-48.
[13] Tumblin J., Hodgkins J. and Guenter B. 1997 Display of High Contrast Images
Using Models of Visual Adaptation. Visual proceedings SIGGRAPH 97, 154
[14] Tumblin, J. AND Hodgings, J.K., Guenter, B. 1999. Two Methods for Display of
High Contrast Images. ACM Transactions on Graphics, Vol. 18, No. 1, 56-94.
[15] Ward G. 1994 A contrast-based scalefactor for luminances display. In Graphics
Gems IV, P. Heckbert, Ed. Academic Press, Boston, 415-421.
[16] Ward G. and Shakespeare R, 1997, Rendering with Radiance: The Art and
Science of Lighting Visualisation, Morgan Kaufmann Publication
[17] Ward Larson G., Rushmeier H., Piatko C., 1997 A Visibility Matching Tone
Reproduction Operator for High Dynamic Range Scenes, IEEE Transactions on
Visualization and Computer Graphics, Vol. 3, No. 4, 1997.
26
Download