Development of a Perceptual Tone Mapping Operator Workpackage 4 (Task 4.2) Deliverable 4.1a Patrick Ledda University of Bristol 1 1 INTRODUCTION 3 2 GENERATING HIGH DYNAMIC RANGE IMAGING 4 3 TONE MAPPING 4 3.1 Tone Mapping Operators 6 3.2 Local operators 6 3.3 Global operators 10 3.4 Perceptual vs. Non-Perceptual 14 4 A PERCEPTUAL TMO FOR ARIS 14 4.1 Threshold versus Intensity studies 15 4.2 Implementation of the Algorithm 16 4.3 A local approach 22 4.4 Introducing Human Visibility Effects 22 5 VALIDATION OF TONE MAPPING OPERATORS 23 6 REFERENCES 25 2 1 Introduction The natural world presents our visual system with a wide range of colours and intensities. A starlit night has an average luminance level of around 10-3 candelas/m2, and daylight scenes are close to 105 cd/m2. Humans can see detail in regions that vary by 1:104 at any given adaptation level, over which the eye gets swamped by stray light (i.e., disability glare) and details are lost. Modern camera lenses, even with their clean-room construction and coated optics, cannot rival human vision when it comes to low flare and absence of multiple paths ("sun dogs") in harsh lighting environments. Even if they could, conventional negative film cannot capture much more range than this, and most digital image formats do not even come close. With the possible exception of cinema, there has been little push for achieving greater dynamic range in the image capture stage, because common displays and viewing environments limit the range of what can be presented to about two orders of magnitude between minimum and maximum luminance. A well-designed CRT monitor may do slightly better than this in a darkened room, but the maximum display luminance is only around 100 cd/m2, which does not begin to approach daylight levels. A high-quality xenon film projector may get a few times brighter than this, but they are still two orders of magnitude away from the optimal light level for human acuity and colour perception. Figure 1 – Range of luminances in the real world compared to RGB As a result of global illumination, images with huge dynamic ranges have become more common. Dealing with such values requires new file formats and more importantly devices able to display such range. The first requirement has been solved by the development of High Dynamic Range (HDR) file formats, which allow storing the images in a more efficient way than storing three floating-point numbers for each RGB. The RGBE file format [16] for example requires only 32 bits to store the whole luminance information. Unfortunately it is still practically impossible to display these luminances on standard devices such as CRT monitors or printers. So how can the appearance of extremes of light and shadow be reproduced using only the tiny range of available display outputs? Appearance-preserving transformations from scene to display, or tone reproduction operators, can solve this problem and were first described in the computer graphics literature by Tumblin and Rushmeier [12] as shown in Figure 2. 3 Figure 2 - Simple diagram of Tone Mapping 2 Generating High Dynamic Range Imaging Most computer graphics software works in a 24-bit RGB space with 8 bits allocated to each of the three primaries. The advantage of this is that no tone mapping is required and the result can be accurately reproduced on a standard CRT. The disadvantage is that colours outside the sRGB gamut cannot be represented (especially very light or dark ones). So, how can images with luminances that can resemble reality be created? There are two main methods for generating HDR imaging. The first method is by using physically based renderers, which produce high dynamic range images generating basically all visible colours. Another way to generate HDR imaging is by taking photographs of a particular scene at different exposure times [2]. By taking a series of photographs at different exposure, all the luminances in the scene can be captured as shown in figure 3a. After the images have been aligned and combined into one single image, the camera’s response function for each of the RGB channels can be recovered and stored. Figure 3 (a,b) – Taking pictures at multiple exposure times and camera response function. Once, the response function is known (Figure 3b), this allow to quickly taking a HDR photograph of a scene from a few different exposures (most digital cameras have auto bracketing function which allows taking simultaneously photographs at different exposure times). 3 Tone Mapping Tone-mapping algorithms rely on observer models that mathematically transform scene luminances into all the visual sensations experienced by a human observer viewing the scene, estimating the brain's own visual assessments. A Tone Mapping 4 Operator (TMO) tries to match the outputs of one observer model applied to the scene to the outputs of another observer model applied to the desired display image. Tumblin and Rushmeier were the first to bring the issue of tone mapping to the computer graphics community. They offered a general framework for tone reproduction operators by concatenating a scene observer model with an inverse display observer model, and when properly constructed such operators should guarantee the displayed image is veridical: it causes the display to exactly recreate the visual appearance of the original scene, showing no more and no less visual content than would be discernible if actually present to see the original scene. Unfortunately, visual appearance is still quite mysterious, especially for high contrast scenes, making precise and verifiable tone reproduction operators difficult to construct and evaluate. Appearance, the ensemble of visual sensations evoked by a viewed image or scene, is not a simple one-to-one mapping from scene radiance to perceived radiance, but instead is the result of a complex combination of sensations and judgments, a set of well-formed mental estimates of scene illumination, reflectance, shapes, objects and positions, material properties, and textures. Though all these quantities are directly measurable in the original scene, the mental estimates that make up visual appearance are not. The most troublesome task of any basic tone reproduction operator is detailpreserving contrast. The Human Visual System (HVS) copes with large dynamic ranges through a process known as visual adaptation. Local adaptation, the ensemble of local sensitivity-adjusting mechanisms in the human visual system, reveals visible details almost everywhere in a viewed scene, even when embedded in scenes of very high contrast. Although most sensations that humans perceive from scene contents, such as reflectance, shape, colour and movement can be directly evoked by the display outputs, large contrasts cannot. As shown in Figure 4, high contrasts must be drastically reduced for display, yet somehow must retain a high contrast appearance and at the same time keep visible in the displayed image all the low contrast details and textures revealed by local adaptation processes. Figure 4 – The range of luminances cannot be reproduced on a CRT monitor There are different reasons that make the tone mapping problem not always easy to solve. The most obvious reason is that, as mentioned above the contrast ratio that can be produced by a standard CRT monitor is only about 100:1 which is much smaller that what can exist in the real world. Newspaper photographs achieve a maximum contrast of about 30:1; the best photographic prints can provide contrasts as high as 1000:1. In comparison, scenes that include visible light sources, deep shadows, and highlights can reach contrasts of 100000:1. Another reason that makes tone mapping operators fail in some cases is that the simplest ways to adjust scene intensities for display will usually reduce or destroy important details and textures. 5 3.1 Tone Mapping Operators In the past decade quite a few authors have developed tone mapping operators to display HDR imagery. These algorithms can all be classified in two main categories: spatially uniform (non-local) and spatially varying (local). This is shown in Figure 5 below. Figure 5 – List of TM operators 3.2 Local operators Humans are capable of viewing high contrasts scenes thanks to the local control sensitivity in the retina. This suggests that a position-dependent scale factor might reduce scene contrasts acceptably and allow displaying them on a low dynamic range device. This approach converts the original scene or real-world intensities to the displayed image intensities, using a position-dependent multiplying term. Chiu et al. [1] addressed the problem of global visibility loss by scaling luminance values based on a spatial average of luminances in pixel neighbourhoods. Very dark or bright areas are not clamped (like in the very first models) but are scaled according to their spatial location. Their approach provides excellent results on smoothly shaded portions of an image; however, any small bright feature in the image will cause strong attenuation of the neighbouring pixels and surround the feature or high-contrast edge with a noticeable dark band or halo. This error occurs because the human eye is very sensitive to variation at high spatial frequencies. 6 Figure 6 – Contrast reversal causes halo artefacts Schlick [9] followed the work proposed by Chiu but this algorithm also reported problems with similar halo artifacts. Schlick used a first-degree rational polynomial function to map high-contrast scene luminances to display system values. This function works well when applied uniformly to each pixel of a high-contrast scene, and is especially good for scenes containing strong highlights. Next, he made an attempt to mimic local adaptation by locally varying a mapping function parameter; one method caused halo artifacts. Schlick concentrated mainly on efficiency and simplicity rather than improving the method mentioned above. Figure 7 – Top images show the dynamic range, bottom is the tone mapped image (Shilick’s operator) Rahman et al. [8] recently devised a full-colour local scaling and contrast reduction method using a multiscale version of Land’s “retinex” theory of colour vision. Retinex theory estimates scene reflectances from the ratios of scene intensities to their 7 local intensity averages. Jobson, Rahman, and colleagues also use Gaussian low-pass filtering to find local multiplying factors, making their method susceptible to halo artifacts. They divide each point in the image by its low-pass filtered value, then take the logarithm of the result to form a reduced contrast “single-scale retinex.” To further reduce halo artifacts they construct a “multiscale retinex” from a weighted sum of three single-scale retinexes, each computed with different sized filter kernels, then apply scaling and offset constants to produce the display image. These and other constants give excellent results for the wide variety of 24bit RGB images used to test their method, but it is unclear whether these robust results will extend to floatingpoint images whose maximum contrasts can greatly exceed 255:1. Pattanaik et al.[7] proposed a tone reproduction algorithm that takes into account representations of pattern, luminance and colour processing in the Human Visual System. The model accounts for changes of perception at threshold and suprathresholds levels of brightness. This tone mapping algorithm also allows chromatic adaptation as well as luminance adaptation (See figure below). It however doesn’t include any time adaptation models. Figure 8 – Colour sensitivity and visual acuity at different luminance levels Recently Reinhard[] proposed an operator that is based on photographic practice using a system called the zone system which divides the scenes luminances into 11 printing zones. The zones go from black (zone 0) to white (zone 10). Then a luminance reading for a middle grey is taken and is assigned to zone 5. The dynamic range is captured by reading light and dark regions. This operator firstly applies a scaling to the entire image to reduce the dynamic range and then modifies locally the contrast of some regions by highlighting or darkening to improve the overall visibility. There is also a global version of the operator, which tries to simulate the “dodging and burning” techniques used in photography. 8 Figure 9 – Two scenes mapped with Reinhard’s operator 9 3.3 Global operators Most imaging systems do not imitate local adaptation. Instead, almost all image synthesis, recording, and display processes use an implicit normalizing step to map the original scene intensities to the target display intensities without disturbing any scene contrasts that fall within the range of the display device. This normalizing consists of a single constant multiplier. Image normalizing has two important properties: it preserves all reproducible scene contrasts and it discards the intensities of the original scene or image. Contrast, the ratio of any two intensities, is not changed if the same multiplier scales both intensities. Normalizing implicitly assumes that scaling does not change the appearance, as if all the perceptually important information were carried by the contrasts alone, but scaling display intensities can strongly affect a viewer’s estimates of scene contrasts and intensities. Although this scaling is not harmful for many well-lit images or scenes, discarding the original intensities can make two scenes with different illumination levels appear identical. Normalizing also fails to capture dramatic appearance changes at the extremes of lighting, such as gradual loss of colour vision, changes in acuity, and changes in contrast sensitivity. Tumblin and Rushmeier [12] tried to capture some of these light dependent changes in appearance by describing a “tone reproduction operator,” which was built from models of human vision, to convert scene intensities to display intensities. They offered an example operator based on the suprathreshold brightness measurements made by Stevens and Stevens [11] who claimed that an elegant power-law relation exists between luminance, adaptation luminance, and perceived brightness. Tumblin and Rushmeier’s were the first to bring to attention the tone reproduction problem. They used the results of Stevens and Stevens and tried to preserve brightness in a scene. However it had some large limitations: images or scenes that approach total darkness processed with their method are displayed as anomalous middle grey images instead of black, and display contrasts for very bright images (>100 cd/m2) are unrealistically exaggerated. Soon afterwards Ward [15] presented a much simpler approach to appearance modelling that also provided a better way to make dark scenes appear dark and bright scenes appear bright on the display. The idea behind this operator is that visibility is preserved which insures that the smallest perceptible difference in a real scene corresponds to the smallest perceptible difference in the image. Ferwerda et al. [3] offered an extended appearance model for adaptation that successfully captured several of its most important visual effects. This operator takes into account the transition for achromatic night vision and chromatic day vision. This is achieved by modelling the gradual transition from cone-mediated daylight vision to rod-mediated night vision. This method accounts for change in colour sensitivity acuity as a function of intensity in the scene. Like Ward, they converted original scene or image intensities to display intensities with a multiplicative scale factor, but they determined their multiplier values from a smooth blending of increment threshold data for both rods and cones in the retina, as shown in Figure 8. This method also provides a simple method to mimic the time course of adaptation for both dark-to-light and light-to-dark transitions (Figure 10). 10 Figure 10 – Ferwerda’s model More recently Ward et al. [17] published a new and impressively comprehensive tone reproduction operator based on iterative histogram adjustment and spatial filtering processes. Their operator reduces high scene contrasts to match display abilities, and also ensures that contrasts that exceed human visibility thresholds in the scene will remain visible on the display (Figure 11). They model some foveally dominated local adaptation effects, yet completely avoid halo artifacts or other forms of local gradient reversals, and include new locally adapted models of glare, colour sensitivity, and acuity similar to those used by Ferwerda et al. [3]. Figure 11 – Two scenes mapped with Ward’s operator In 1999 Tumblin et al [14] proposed two methods to display high contrast images on low dynamic range displays by imitating some of the human visual systems’ properties. One method, based on and HVS layer models, creates images in lighting layers and surface properties. The algorithm aims to preserve scene visibility. This is achieved by scaling all the luminance levels and compressing them while preserving the reflectance and transparency layers. The main limitation with this process is that it 11 only works with rendered images where all the layer information can be retrieved during the rendering process. Tumblin’s second method, know as the foveal method, interactively adjust the detail visibility in the fovea area whilst compressing other parts of the image. The user can use the mouse to click on any area of an image and the algorithm tone maps the surrounding area base on the local luminance levels. A recent operator was proposed by Pattanaik et al[6]. This time dependent algorithm allows to tone map either static or dynamic images (photographs or rendered) and is based on a perceptual model proposed by Tumblin and Rushmeier. It also includes an eye adaptation model to represent lightness and colour. Figure 12 – Pattanaik model of adaptation This operator is original since it accepts a variety of scenes and luminance levels and it takes into account various adaptation factors. All the human eye properties are obtained from widely accepted colour science and psychology literature making this operator ideal for dynamic scenes. Using Hunt’s [4] colour model for static vision they include time dependent effects such as neural response and colour bleaching effects. The main limitation of this operator however is that it does not include a local eye-adaptation approach which is very important to faithfully represent visual appearance. Recently Drago et al. implemented a simple method based on logarithmic compression of luminance values, imitating the human response to light. A bias power function is introduced to adaptively vary logarithmic bases, resulting in good preservation of details and contrast. This adaptive logarithmic mapping technique is capable of producing perceptually tuned images with high dynamic content and works at interactive speed. The image below shows a tone mapped image with this operator. Although it is difficult to see from the image below, one of the main problems with this operator is that it appears to reduce contrast in the image (Figure 13). 12 Figure 13 – Two scenes mapped with Drago’s operator A few other computer graphics researchers have modelled the appearance of extremely bright, high-contrast scene features by adding halos, streaks, and blooming effects to create the appearance of intensities well beyond the abilities of the display. Nakamae et al. proposed that the star-like streaks seen around bright lights at night are partly due to diffraction by eyelashes and pupils, and they presented a method to calculate these streaks in RGB units, implicitly normalizing them for display. Later Spencer et al. [10] presented an extensive summary of the optical causes and visual effects of glare and modelled their appearance by using several adjustable low-pass filters on the intensities of the original scene (Figure 14). Small, extremely bright light sources that cover only a few pixels, such as street lights at night or the sun leaking through a thicket of trees, are expanded into large, faintly coloured, glare-like image features that have a convincing and realistic appearance. 13 Figure 14 – Discomfort glare simulated Despite progress in modelling the light-dependent changes in appearance that occur over the entire range of human vision, few methods offer the substantial contrast reduction needed to display these images without truncation or halo artifacts. 3.4 Perceptual vs. Non-Perceptual Some algorithms use perceptual data, usually based on psychophysical experiments, to simulate reality; others simply attempt to compress the range purely by a mathematical approach with the aim of obtaining the maximum visibility on the display device. This latter approach can be useful if the TMO operator is simply used for visualization purposes in which case displaying all the possible values can be satisfactory. However, in all those cases where Tone Mapping tries to simulate reality, the implementation of the algorithm should be based on perceptual data. A few of the operators published try to simulate human visibility by mathematically modelling some property of the HVS such as eye-adaptation, colour visibility at different photopic or scotopic light level, visual acuity. One of the main limitations of tone mapping is that the displayed result is static. Although some algorithms take into account human visibility factors, only very few operators allow to dynamically modify the image based on human eye models. It is important to decide what are the purposes of a particular algorithm. If for example it is important to visualize all the luminance levels in a scene “in one go” then most operators satisfy this. However, if an eye-simulation is required then a dynamic model based on adaptation may be more accurate. 4 A perceptual TMO for ARIS If we have a High Dynamic Range photograph of a scene (which will eventually be augmented with some artificial object) or a CG image, this has to be tone mapped in order to be displayed (or printed) on a monitor. As mentioned in the section above, there are different algorithms in the literature that accomplish this. However, very few try to give a perceptual match with the real scene. After having implemented and tested several Tone Mapping Operators, we ended up with the conclusion that a perceptual operator, which took into account Human Visibility effects, was needed. 14 We also realized that none of the operator mentioned in the previous sections had been validated against reality. We believe that this is a very important area, which requires investigation. Humans make judgements of environments mainly based on luminance contrast, therefore keeping the correct contrast between pixels in the input and an output image is crucial if a perceptual TMO has to be developed. Trying to develop a perceptual operator also allows us to include various human visibility effects such as time adaptation, colour sensitivity, visual acuity a different light levels. This means that not only we can have an operator that maps high dynamic range images (or videos) perceptually but also we can to some extent simulate other aspect of vision. We base our operator on Ward’s global contrast preserving algorithm, however we extend the model to support a wider range of luminances by making the operator map luminance locally. We also have started to include human visibility effects as well as a time domain aspect. 4.1 Threshold versus Intensity studies Visual sensitivity is often measured psychophysically in a detection threshold experiment. In the typical experimental paradigm, an observer is seated in front of a blank screen that is in their field of view. To determine the absolute threshold the screen is made dark. To determine the contrast threshold a large region of the screen is illuminated to a particular background luminance level. Before testing begins, the observer fixates the centre of the screen until they are completely adapted to the background level. On each trial a disk of light is flashed near the centre of fixation for a few hundred milliseconds. The observer reports whether they see the disk or not. If the disk is not seen its intensity is increased on the next trial. If it is seen, its intensity is decreased. In this way, the detection threshold for the target disk against the background is measured. Figure 15 – T.v.i. data 15 As the luminance of the background in a detection threshold experiment is increased from zero, the luminance difference between target and background required for detection increases in direct proportion to the background luminance. Plotting the detection threshold against the corresponding background luminance gives a threshold versus intensity (t.v.i.) function as shown in Figure 15. Over a wide middle range covering 3.5 log units of background luminance the function is linear, this relationship can be described by the function deltaL = kL. This relationship is known as Weber's law (Riggs 1971). Weber's law behaviour is indicative of a system that has constant contrast sensitivity, since the proportional increase in threshold with increasing background luminance corresponds to a luminance pattern with constant contrast. 4.2 Implementation of the Algorithm Ward’s contrast preserving algorithm (modified later by Ferwerda et al.) tries to match the JND (Just Noticeable Difference) in contrast in the real scene and on the monitor. To do this he assumes that the display luminance (Ld) can be calculated by the world luminance (Lw) multiplied by some value m: Ld ( L w ) = L w m Eq 1 The multiplier m is chosen so that the visibility in the real scene matches the visibility on the display device. To achieve this we can assume that we have a tvi function (see Figure above) that gives the threshold of visibility at a given adaptation level (in luminance). Therefore for m as a function of world adaptation level (Lwa) and display adaptation level (Lda): t ( Lda ) = t ( Lwa )m( Lwa , Lda ) so, Eq 2 m( Lwa , Lda ) = t ( Lda ) / t ( Lwa ) Eq 3 From the graph above (Figure 15) which is based on accepted psychophysical data, we can obtain a threshold for visibility for the display observer and the world observer. Therefore if we know the display and world adaptation luminances, then a threshold of visibility can be determined. This will determine the multiplicative factor m. Because m is a multiplier the contrast in maintained in the input on output images (multiplying or dividing luminances preserve contrast). Using this approach to determine a match of visibility between display and world luminances, can be showed graphically below: 16 1600 1400 1200 1000 800 600 400 200 0 0 200 400 600 800 1000 World Luminance Figure 16 – Mapping caused by different values of m Although this approach can be considered perceptual in the sense that it is contrast preserving, the operator only works in a limited dynamic range. The operator requires determining threshold levels for display and world adaptation levels. This ensures that contrast is preserved and sets the threshold of world adaptation to the mid-luminance that can be displayed on a monitor. This will suffice for all scene luminances values that are within a limited range from the world adaptation level, very high values however will still be to bright even after being multiplied by the a small m. This is illustrated in Figure 17. P e r c e p tu a l TM O 120 100 D is 8 0 p la y L 60 cd/ m 2 40 S e r ie s 1 20 0 0 100 200 300 400 500 600 700 W o r ld L c d /m 2 17 Figure 17 – High luminances are clamped. This can be seen in the loss of detail in the window. As it can bee seen from the above graph, clamping will occur if the dynamic range is too large. This is the main weakness with the operator as it stands. A way to solve this problem is to use a sigmoid (S-shape) function, which allows mapping huge luminances to the small range that is allowed by CRT monitors. This method ensures very good results in terms of visibility but it can affect contrasts in the scene to achieve this, making the operator a poor representation of reality. However, if the correct sigmoid is chosen, a large part of the range can be mapped whilst preserving the contrast for the most part. However, one of reason why the sigmoid may not be ideal is because it gives very little control of the shape of the curve, only changes in the gradient can be made. This is illustrated in the Figure 18 below: Figure 18 – Sigmoid Transfer Functions with different gradients After testing this approach, it was decided that more control over the shape of the curve would allow better results whilst still preserving the contrast in most pixels. This lead to using a spline as a transfer function. We cans set a series of control points, which define the general shape of the transfer function and subsequently determine the curve that passes through these points. This is show in Figure 19. 18 Figure 19 – Functions passing through control points In figure 19, we have determined three functions that pass through the control points specified for each scene. Once these curves have been obtained, the mapping can be thought of as selecting the correct curve according to the input value. This means finding a single curve (the spline) from the 3 individual curves as shown below: Figure 20 –Transfer Functions originated from different control points Determining the control points is very important, as these will generate a different spline. The mapping is essentially the same as the mapping for the sigmoid curve described above however with this method, we have more control of the function in the high/low end of the range. Some images produced with this operator are displayed below. 19 Figure 21– The original scene (left) and tone mapped with the global model (right) 20 Figure 22– The global model with two more scenes 21 4.3 A local approach Although the operator produces in most cases decent results, it fails in those circumstances where a scene contains many high and low luminance value. It will still produce a visually pleasing image but most of the pixel won’t preserve contrast. A local model should solve this problem. The operator is based on the fact that the eye adapts locally and not globally this allows introducing a local version of Eq 3 Ld=m(x,y)*Lw Eq 4 The task is to determine a different m for different areas of the image. If we convolve the input image with a low pass filter (Gaussian function) we can simulate the locality of adaptation. A low pass Gaussian filter can be represented by the following function, where σ is the radius of the filter determining the locality of adaptation: G ( x, y ) = 1 2πσ 2 exp − x2 + y2 2σ Eq 5 Convolving the input image with this filter results in a blurred version of the original image as shown below: Figure 23 – Adaptation Map The resulting blurred image will then be use as an “adaptation map” and from there different adaptation level can be determined which will produce several values of m(x,y). 4.4 Introducing Human Visibility Effects When humans are viewing an environment, the way that this is perceived is greatly affected by the luminances in the scene. When going from a bright environment to a dark one (and vice versa) the scene that we perceive can have a very different visual appearance. This phenomenon is known as adaptation. This process however, is time dependent and can vary from a few seconds to several minutes. The time course of 22 dark/light adaptation is well known. We are planning to further develop our TMO to include this effect since we think that it would be a very valuable tool. If the lighting in a scene change dramatically, the representation of the image should change too. This however should take time and should not be instantaneous. Another important aspect of human vision that we would like to take into account is the loss of colour and visual acuity at low level of light (scotopic light). When the light is dim enough, the only photoreceptors present in our eyes are the rods. These receptors have a very poor acuity therefore cannot resolve fine detail. This is the reason why we find it hard to read at low level of light. Another property of scotopic light is that we have no perception of colour. This is due to the fact that the rod system is uncapable of detecting colour. All these effects mentioned can greatly affect the perception of a scene. Thus, if we want to develop a perceptual operator these properties should be taken into account and included in the model. During the next 12 months we would like to concentrate on these aspect of vision. 5 Validation of Tone Mapping Operators As part of ARIS we are not only interested in developing a perceptual operator, which takes into account human visibility, but we are also very interested in validating this operator and other published ones against reality. It is in fact very important that the operator gives a high fidelity result so that we can use it with a high level of security knowing that the results is accurate. At the University of Bristol, we have recently purchased a High Dynamic Range Monitor which is capable of 30,000:1 dynamic range which is 300 times more than what standard monitors (or paper) are capable of. It is also capable of displaying very low luminance values, which is ideal in those circumstances where many shadows are present in the scene. This range that can be achieved on the monitor is much closer to what it can be achieved in the real world. Although the dynamic range the can exist in a real scene can exceed 100,000:1 most scenes rarely reach such a huge range. Having such a device allows us to display high dynamic range photographs without the need of any advanced tone mapping technique because in most scenes, the monitor is capable of displaying all the values directly. It is possible that in the future, these monitor could become the standard (reducing the role of tone mapping techniques) but is unlikely since for most application they are simply to bright and unpractical. However we believe that they can be very useful for validation purposes because we can use them as a reference of reality. Thus validating tone mapping algorithms becomes a much more practical and feasible procedure. The validation will first be in the form of a pilot study with a few scenes and a fairly small number of participants. The purpose of the pilot study is to get an overall idea of what the results will be. The pilot study is also very important because it will give us the chance to notice any errors before the lengthy validation begins. The main validation will take place in December 2003 and will last two to three weeks. The idea of the validation is novel and we believe that it is very important. In the past 10 years many algorithms have been published, some are perceptual some are not, however none of them have been accurately validated. There are many operators but 23 no one knows what algorithms performs best and in what conditions. Hopefully the results will be interesting and useful for the High Dynamic Range/Tone Mapping community. Nobody has run some experiments to validate current algorithms, which is one of the main reasons why TMOs are still very difficult to use in all kind of scene. Usually an algorithm may perform well in a specific type of scene; sometimesmanual adjustments have to be made in order to obtain a decent result. We believe that an accurate validation will give the community some answers and will help us to develop a unique TMO, which will hopefully be both automatic and realistic. The validation will consist of different parts and we will be asking participants not only specific questions (using questionnaires) but also to compare some contrast charts which are widely used in the optometry field. Two example of the charts used is illustrated below, in these cases the viewers may be asked to compare the visibility of certain letters or circles in the scene displayed on the HDR monitor and in the tone mapped image on a standard CRT display. Figure 24: Examples of contrast charts used for the validation We cannot simply ask participants to observe a series of images and determine which one is better. We believe that it is more important to analyse the results produced by the different TMOs at a lower level. We will try to investigate the following: • • • • • Naturalness Visibility Colour Accuracy Brightness Contrast The validation will be a 3-way comparison between two algorithms (at any one time) and the HDR Monitor. From the validation we are hoping to understand what are the weaknesses and strengths of some of the most popular algorithms. We believe that with this knowledge and knowing what are the mathematical functions that they are based on, we should be able to develop an operator which uses the strengths of the successful TMOs and that is as universal and realistic as possible. 24 Figure 25: Some images used for the validation 6 References [1] Chiu K., Herf M., Shirley P., Swamy S., Wang C., Zimmerman K. 1993 Spatially Nonunifrom Scaling Functions for High Contrast Images. Proceedings Graphics Interface 93, 245-254 [2] Debevec P.E. and Malik J. 1997 Recovering High Dynamic Range Radiance Maps from Images. Proceedings SIGGRAPH 97. 369-378 [3] Ferwerda J., Pattanaik SN., Shirley P. and Greenberg DP. 1996 A Model of Visual Adaptation for Realistic Image Synthesis, In Proceedings of SIGGRAPH 1996, ACM Press / ACM SIGGRAPH, New York. H. Rushmeier, Ed., Computer Graphics Proceedings, Annual Conference Series, ACM, pp. 249-258 [4] Hunt G. The Reproduction of Colour. Kings Langley:Fountain Press, 3rd edition, 1975. (5th edition is now available). 25 [5] Mitsunaga T. and Nayar S.K., 1999 Radiometric Self Calibration, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 1999. [6] Pattanaik SN., Tumblin JE.,Yee H. and Greenberg DP. 2000 Time-Dependent Visual Adaptation for Realistic Real-Time Image Display, In Proceedings of SIGGRAPH 2000, ACM Press / ACM SIGGRAPH, New York. K. Akeley, Ed., Computer Graphics Proceedings, Annual Conference Series, ACM, 47-54. [7] Pattanaik SN., Ferwerda J, Fairchild MD., and Greenberg DP. 1998 A Multiscale Model of Adaptation and Spatial Vision for Realistic Image Display, in Proceedings of SIGGRAPH 1998, ACM Press / ACM SIGGRAPH, New York. M. Cohen, Ed., Computer Graphics Proceedings, Annual Conference Series, ACM, 287-298. [8] Rahman Z., D. J. Jobson, and G. A. Woodell. Multi-scale retinex for colour image enhancement. In Proceedings, International Conference on Image Processing, volume 3, pages 1003{1006, June 1996. Held in Lausanne, Switzerland 16-19 September 1996. [9] Schilick C. 1995 Quantization Techniques for High Dynamic Range Pictures. In G. Sakasm, P.Shurley and S. Mueller(eds) Photoreslistic Rendering Techniques, Berlin:Springer-Verlag, 7-20. [10] Spencer G., P. Shirley, K. Zimmerman, and D. P. Greenberg. Physically-based glare effects for digital images. In SIGGRAPH 95, Annual Conference Series, pages 325{334, New York, NY, August 1995. ACM SIGGRAPH, Addison-Wesley. Held in Los Angeles CA. 6{11 August 1995. [11] Stevens and J. C. Stevens. Brightness function: Parametric effects of adaptation and contrast. Journal of the Optical Society of America, 50(11):1139, November 1960. Program of the 1960 Annual Meeting. [12] Tumblin J., Rushmeier H. 1993 Tone Reproduction for Realistuc Images, IEEE Computer Graphics and Applications. 13(6), 42-48. [13] Tumblin J., Hodgkins J. and Guenter B. 1997 Display of High Contrast Images Using Models of Visual Adaptation. Visual proceedings SIGGRAPH 97, 154 [14] Tumblin, J. AND Hodgings, J.K., Guenter, B. 1999. Two Methods for Display of High Contrast Images. ACM Transactions on Graphics, Vol. 18, No. 1, 56-94. [15] Ward G. 1994 A contrast-based scalefactor for luminances display. In Graphics Gems IV, P. Heckbert, Ed. Academic Press, Boston, 415-421. [16] Ward G. and Shakespeare R, 1997, Rendering with Radiance: The Art and Science of Lighting Visualisation, Morgan Kaufmann Publication [17] Ward Larson G., Rushmeier H., Piatko C., 1997 A Visibility Matching Tone Reproduction Operator for High Dynamic Range Scenes, IEEE Transactions on Visualization and Computer Graphics, Vol. 3, No. 4, 1997. 26