JOHANNES KEPLER UNIVERSITÄT LINZ JKU Technisch-Naturwissenschaftliche Fakultät Stereopsis in the Context of High Dynamic Range Stereo Displays MASTERARBEIT zur Erlangung des akademischen Grades Diplomingenieur im Masterstudium Informatik Eingereicht von: Philipp R. Aumayr Angefertigt am: Institut für Computergrafik Beurteilung: Univ. Prof. Dr.-Ing. habil. Oliver Bimber Linz, Juli, 2012 To Granddad, the engineer. A BSTRACT There are two major trends in the display industry: Increasing contrast and (auto-) stereoscopic content presentation. While it is obvious that both trends do have an impact on perception, the relation between high dynamic range contrast and stereoscopic viewing is not well established in the literature. The goal of this thesis was to construct a high dynamic range display capable of presenting stereoscopic content to perform a user study testing the response to such a viewing experience especially when considering multiplexing side-effects such as crosstalk. The construction process and the many setbacks, such as polarization and thermal design issues, encountered during the process of building such a display are described in this thesis. Even though the display prototype did not exhibit the highly anticipated contrast range, the user study did provide valuable feedback on how far stereopsis benefits from a higher dynamic range. The user study also included an attempt to uncover the role of crosstalk and its perceived counterpart ghosting in the process of stereopsis. Z USAMMENFASSUNG Die Display Industrie gibt zwei große Trends vor: Steigender Kontrast und die Möglichkeit (Auto-) Stereskopische Inhalte zu präsentieren. Obwohl es offensichtlich ist, dass beide Veränderungen einen großen Einfluss auf die Wahrnehmung von visuellen Inhalten haben, ist die Verbindung zwischen Kontrast und Stereo-sehen nur wenig dokumentiert. Ziel dieser Arbeit war es ein Hochkontrastdisplay zu entwickeln, welches auch die Möglichkeit bietet stereoskopische Inhalte darzustellen um eine Benutzerstudie durchführen zu können, welche diese Zusammenhänge aufdecken soll. Von Interesse war auch der Einfluss von Übersprechen der Bildkanäle und dessen optischen Schatten-Effekt auf die Wahrnehmung. Die Entwicklung als auch die vielen Fehler, welche die im Laufe des Aufbaus des Prototypes gemacht wurden sind hierin dokumentiert. Obwohl der Prototyp bei weitem nicht die erhofften Kontrastwerte erziehlen konnte, sind die Ergebnisse der Benutzerstudie doch von Bedeutung. Die verwendeten Tests zielten auch darauf ab, die Rolle des Übersprechens der Bildkanäle und dessen optischen Pendants im Kontext der Stereopsis aufzudecken. ACKNOWLEDGMENTS Prof. Bimber endured my slow progress on my thesis for almost 2 and a half years. It would be oblique to say that the correspondences we had were always fun and motivating, but instead of giving up on me he encouraged me to go on when I lost confidence that I would ever finish this thesis. So thank you, Prof. Bimber, for all the valuable time, feedback and support! A great, big, heartful ”thank you” to my parents and family for the continued support, especially for dragging me all the way through gymnasium and accepting my nerd-behaviour of preferring CRT rays to real sunshine. Especially I want to thank dad for encouraging me to go my own way instead of following the obvious path to medicine. I also need to thank my friends at Rarebyte. I definitely would not have studied computer science if George hadn’t had the patience to teach me basic programming skills (and OpenGL for the fancy stuff). Thank you to Rainer, Karin and Alex from timecockpit.com for their continued support and motivation. It is a real pleasure to work with you and I am looking forward to the adventures down the road. Thanks to Simon for his continous stream of wisdom and unfiltered, sometimes radical criticism. A great, big thank you also goes out to all of the guinea pigs that participated in the user study for enduring the heat in the ”‘HDR cabin”’ and providing valuable feedback! Thank you to the ZID at the Johannes Kepler University for providing hardware and access to the VRC. Finally, thank you to Elisabeth for accepting all of the geek talk and being there. Contents C ONTENTS 1 2 Introduction 1 1.1 Defining Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Tonal Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Refresh Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Motivation and Preview of Contribution and Results . . . . . . . . . . . . . . . . . . . . 4 1.5 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Related Work 6 2.1 High Dynamic Range Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 High Dynamic Range Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 HDR Capturing and Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.1 Capturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.2 Tonemapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.3 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Stereoscopic Displays and Ghosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.1 Crosstalk Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Crosstalk Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.6 Stereopsis and the Correspondence Problem . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 3 Depth Perception 15 3.1 Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Vergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Accommodation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5 Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5.1 Horopter, Vieth-Müller Circle and Panum’s area . . . . . . . . . . . . . . . . . 17 3.5.2 Neurophysical Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Perception of Crosstalk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.6 4 Construction of a HDR Stereo Display 22 4.1 LCD - Projector Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Dual Layer LCD Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3.1 Rotated around Y-Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3.2 Rotated around Z-Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.3 Using a Wave Retarder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.4 Inverted Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.5 Center Analyzer removed, Back Panel Polarizer rotated . . . . . . . . . . . . . . 28 Backlight Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4 -i- Contents 4.5 5 6 7 4.4.1 Light Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Visual Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Calibration 33 5.1 Using HDR Image Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 Using an Intensity Measurement Device . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 Crosstalk Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.4 Just-Noticeable-Difference Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 User Study 42 6.1 Constructing Random Dot Stereograms . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.2 Test Patterns and Parameter Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3 User Study Environment and Participants . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Summary and Future Work 49 7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7.2 A Better Time-Multiplexed Double Modulation Display . . . . . . . . . . . . . . . . . . 50 References 52 - ii - List of Figures L IST OF F IGURES 1.1 Min-Max Contrast, Imax refers to the maxmium luminocity, whereas Imin denotes the minimal luminocity possible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The definition of Weber Contrast. I refers to the luminance of a feature, Ib to the luminance of the background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 1 2 The definition of Michelson Contrast. Imax and Imin refer to the maximum and minimum luminance, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Root-Mean-Square Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 Projector-LCD approach as presented by Seetzen et al. [54] . . . . . . . . . . . . . . . . 7 2.2 A stereoscopic pair of images of a scene with a sphere. From left to right: the image destined for the left image, the image destined for the right image and the fusioned image with shadows of the original image visible due to ghosting. . . . . . . . . . . . . . . . . 2.3 The Intensity reaching the eye is the additive of the intended signal and the unintended signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 10 11 The amount of unintended signal is usually dependent on the pixel position (x, y) and the viewing angle (α, β ) of the observer as well as the image content of the opponent Image (I) at the given position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 11 The setup of the display presented in Hoffman et al [25]. Two semi-transparent and one front side mirror allow the eye to focus and converge at multiple image planes at different depths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 13 Occlusion can be ambigous in the monocular case, but is usually very well resolved by binocular vision. The first image lets us conclude that the red rectangle is behind the blue rectangle as its side aligns perfectly, which would be an uncommon situation when viewing. The second and third image show the images as perceived by the left and right eye, which resolve that the red rectangle is actually in front of the blue rectangle and the monocular image, by accident, alignes borders with the blue rectangle. . . . . . . . . . . 3.2 Vergence controls the orientation of the eyes. At the same time it acts as a depth cue by providing the angle of orientation to the vision system. . . . . . . . . . . . . . . . . . . 3.3 16 17 The Vieth-Müller Circle, also known as the Theoretical Horopter and the Empirical Horopter. Points on the horopter represent points that result in single, fused images. Points too far from the (empirical) horopter are seen as double images (diplopia). . . . . . . . . . . . . 18 3.4 The Gabor function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.5 A gabor patch oriented at 45° with a gaussion envelope (50 pixels standard deviation), a frequency of 0.05 cycles per pixel. The original image had a with and height of 500 pixels. The patch was generated using the online gabor patch generator found at [7] . . . 3.6 19 A stereoscopic contrast curve. The top images are the intended signals for the left and right eye, the bottom images the actually perceived images. . . . . . . . . . . . . . . . . - iii - 20 List of Figures 3.7 First derivative of the contrast curve presented in Figure 3.6. Even though the amount of crosstalk is the same, the absolute brightness levels differ for the left eye and the right eye. This difference causes the bright contrast edge to be easier to detect than the dark contrast edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1 The final (second) prototype built from two stereo LCD panels. . . . . . . . . . . . . . . 22 4.2 Schematic of the dual layer LCD approach. . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3 Malus’ Law. I0 corresponds to the initial intensity and θ to the angle between the polarization directions of the polarizers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 24 If the polarization of the back panel analyzer and front panel polarizer have to match and the polarization is aligned or perpendicular to the y axis of the display, one of the panels can be rotated. This causes the polarizer and analyzer to swap roles and invert the flow of light. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 In practice, polarization foil is not aligned with the display axis but oriented at 45°. This causes the polarization to remain rotation invariant. . . . . . . . . . . . . . . . . . . . . 4.6 25 Rotating one of the panels by 90° would reduce the usable HDR area to the square of the height of the display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 25 26 Each individual pixel in a screen consists of 3 sub pixels for red, green and blue which are aligned as stripes. Rotating one of the panels would cause the overlapping sub pixels to mismatch and therefore reduce brightness. . . . . . . . . . . . . . . . . . . . . . . . 4.8 27 Final working approach by removing the back panel analyzer and replacing the back panel polarizer with polarization foil rotated by 90°. . . . . . . . . . . . . . . . . . . . . . . . 29 The final backlight consisting of 24 LEDs aligned in a six by four grid. . . . . . . . . . . 29 4.10 Closeup of the LED light sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.9 4.11 Visual result of the backlight simulation. The white hotspots represent the LEDs that are aligned in a 6 by 4 grid in this iteration. The simulation takes the layout, the type and the light intensity distribution of the individual LEDs into account. . . . . . . . . . . . . . . 5.1 General transfer function mapping a pixel drive value to a presented luminance as well the inverted function, mapping from a luminance to a pixel value. . . . . . . . . . . . . . 5.2 31 33 In order to find the lens distortion parameters as well as a homography transform, a checkerboard pattern is presented, captured using a standard, single jpeg image and compared to the presented checkerboard pattern using OpenCV. These parameters are then used to undistort the captured HDR images. . . . . . . . . . . . . . . . . . . . . . . . . 5.3 34 The captured images using multiple exposure times are recombined to a HDR image, then undistorted and homography warped in order to align the captured pixels with the calibration image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 35 Image for the left eye, image for the right eye and the reference map: a) denotes the area with an intended absolute black and without ghosting, b) denotes the area with absolute white (no ghosting), c) denotes the area of an intended black ghosted with pure white and d) the area with intended black ghosted with pure black. The ghosted areas of course swap roles if the calibration image is taken through the opposite eye. . . . . . . . . . . . - iv - 37 List of Figures 5.5 Using the calibration chart presented in Figure 5.4 the amount of crosstalk can be calculated independently for an intended white as well as an intended black image. Ia,b,c,d denote the intensities measured at area a, b, c and d in the pattern, CTBI defines the crosstalk for the black intended intensity, whereas CTW I denotes the amount of crosstalk for the white intended intensity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Weber’s Law. ∆I I 38 denotes the Weber (also known as Fechner) fraction. The law states that the incremental threshold step over a background intensity relates in a linear way to the background intensity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 38 The front / back panel combinations chosen by the JND calibration algorithm presented in [5]. The value on the x - axis is associated with the display drive level for the front panel, the value on the y - axis corresponds to the value of the backpanel. The color-coded value corresponds to the intensity of the front-panel in cd/m2 . The blue, red and green paths correspond to hull values of 0.025, 0.05 and 0.5. . . . . . . . . . . . . . . . . . . . 5.8 40 The influence of space between the color filters of the back and the front panel. The left image indicates the ideal situation, in which separation is minimal, causing few light rays to be terminated due to passing through color filters of different wavelength. The right image explains the current situation causing a sinusoidal grating to be visible, depending on the viewing position, if not compensated by a front diffuser. . . . . . . . . . . . . . . 6.1 41 Process of constructing a Random Dot Stereogram. First, both images are filled with the same pattern. Then, the region(s) that should have disparity are displaced and the resulting holes are finally filled by new random dot patterns. . . . . . . . . . . . . . . . 6.2 The actual crosstalk measured depends on the contrast. The plot describes the ranges of crosstalk that where measured for various contrast settings at a given crosstalk label. . . 6.3 43 45 Plots containing the results with increasing crosstalk along the images. The two rows represent results seperated into stimuli with crossed (top row) and uncrossed (bottom row) disparity. The axis of the images denote the measured contrast as well as the disparity used. The color coded value designates the rate at which the participants answered incorrectly, which is normalized from 0 to 50 % (red being equal to 50 % incorrect answers). . 6.4 46 Boxplot showing the increasing amount of correct answers when contrast is increased. The values where evaluated over all crosstalk, disparity and contrast settings. While the plots in Figure 6.3 show that there isn’t any real improvement beyond a contrast level of 110:1 this contrast ratio may be even lower considering the aggregated results shown in this plot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -v- 48 Chapter 1 - Introduction 1 I NTRODUCTION There have been few but major trends in the display technology industry: Increasing screen sizes resulting in bigger screens with a greater image plane. Increasing pixel densities cause more pixels to be crowded into the same physical space. Increasing contrast leading to a brighter peak luminance than ever before and black levels dropping even closer to one of a starlight. The pixel refresh time of a screen is lowered constantly, manifesting itself with higher frames per second and enabling channel multiplexing for image separation technologies based on space multiplexing. At the same time the tonal resolution only slowly increases, as the benefits for an enduser are not as apparent and additional support is required from the software-side to make good use of such an improvement. Screen size and pixel density have a very close relationship: Increasing the pixel density of a display reduces the size of the screen if the total number of pixels is kept the same. Increasing the screen size on the other hand, while keeping the total pixel count the same, reduces pixel density. Tonal resolution and contrast behave in a somewhat similar way if the step size between two intensities is considered: Increasing the maximal contrast while at the same time not increasing tonal resolution causes the step size to increase. As with images appearing pixelated, this causes gradients to have detectable steps instead of a smooth transition from one intensity level to the next. While there are many different types of even visual display, this thesis is concerned with the design and the perception of rectangular displays, capable of presenting a two dimensional matrix of picture elements, referred to as an image. A picture element, or in short pixel, can be either achromatic or consist of chromatic components. The display prototypes presented in this work support chromaticity, but the primary focus of the perception research part deals only with the achromatic case in which the components of the RGB channels are set to equal amounts as such that the pixel appears colorless. 1.1 Defining Contrast Contrast can be defined in multiple ways all of which to some extend describe how much brighter the whitest white is compared to the blackest black. For an image the minimum and the peak luminance are very primitive descriptors of the image content. The most basic formula for contrast is using the ratio between the maximum and the minimum brightness as shown in formula 1.1. This translates to dividing the intensity measured for white by the intensity measured for black and is often given as an N : 1 number, indicating that white is N times larger than black. Imax Imin Figure 1.1: Min-Max Contrast, Imax refers to the maxmium luminocity, whereas Imin denotes the minimal luminocity possible. -1- Chapter 1 - Introduction Another notion of contrast is the Weber contrast, given in formula 1.2. In this formula I represents the luminance of the features, whereas Ib represents the luminance of the background. I − Ib Ib Figure 1.2: The definition of Weber Contrast. I refers to the luminance of a feature, Ib to the luminance of the background. The results of both formulas, the basic ratio and the Weber contrast become very large when the black level becomes very low. On the other side, they quickly become very small when a constant is added to both terms, such as an ambient light. As an example, a display can be considered having a peak luminance of 200cd/m2 and a minimum luminance of 0.5cd/m2 . Using the basic ratio, this display has a contrast of 400:1 and a Weber contrast of 399. Adding an ambient light source increasing the measured levels by 1cd/m2 , this decreases the contrast ratio to 134:1 and the Weber contrast to 133. Another definition of contrast is the Michelson contrast. The Michelson contrast is less subjective to a constant added factor as in the example described above. The definition is given in formula 1.3 and always leads to a number between 0 and 1, which is usually interpreted as a percentage. Imax − Imin Imax + Imin Figure 1.3: The definition of Michelson Contrast. Imax and Imin refer to the maximum and minimum luminance, respectively. Using the same hypothetical display explained in the example before with a peak luminance of 200cd/m2 and a minimum luminance of 0.5cd/m2 , the Michelson Contrast gives a number of 0.995 corresponding to 99.5 %. With the ambient light increasing both levels by 1cd/m2 , the resulting percentage is still at 99 %. One other popular measure of the contrast of an image is described by the Root Mean Square (RMS) contrast which is defined as the standard deviation of the pixel intensities (figure 1.4). v u u 1 N−1 M−1 2 t Ii j − I ∑ ∑ MN i=0 j=0 Figure 1.4: Root-Mean-Square Contrast Describing the contrast a display can produce is different from measuring the contrast of an image: In order for a display system to be able to reproduce an image accurately, it has to have at least the contrast that the image content requires. On the other hand, the contrast of the image can be lower, as it is usually not a problem to represent an image of lower contrast on a display system being capable to reproducing a higher dynamic range as long as the tonal resolution of the display system is accurate enough to reproduce the fine differences in the tonal steps of the image. -2- Chapter 1 - Introduction 1.2 Tonal Resolution The higher the difference between the brightest white and darkest black is, the more unique shades of grey can be represented by the display. The number of steps that can be uniquely addressed by a display system represents the tonal resolution. A standard, commercially available display system today supports between six and ten bits of tonal resolution in every color channel. The tonal resolution of a chromatic display system directly relates to the amount of colors that can be displayed by the system. LCD panels with a lower bit depth usually employ some kind of dithering mechanism such as Frame Rate Control (FRC) in order to artificially increase the number of possible colors [9]. Most Twisted-Nematic (TN) Liquid Crystal Panels, for instance, have a physical bit depth of 6 bits per color channel, achieving a total of 18 bits of color depth. This translates to 262,144 different physical colors. Using temporal dithering 18 bit display systems are usually marketed as being able to display 16.2 million instead of the full 16.7 million colors that a true 24 bit color display would be able to reproduce. Tonal resolution is considered especially important in medical applications where traditional X-ray film is still dominant, as a trained radiologist is capable of detecting features well beyond the tonal resolution of a classic 8 bit screen. X-ray film offers a dynamic range of about 10000 : 1 as well as a near infinite number of tonal resolution steps due to its analog nature. Still, there is a demand for a digital replacement due to the benefits of a digital display device, such as easier image transport, archiving as well as dynamic content. 1.3 Refresh Rates Ever rising refresh rates are on the other end of the spectrum. Liquid crystal displays supporting a refresh rate of 120 Hz and higher have appeared recently. Since activated LCD screens do not flash pixels to black in between frames, there is no flicker than can be perceived by the human eye. One could therefore question why a higher refresh rate is important for a liquid crystal display (LCD) panel. The main answer is that, while a display may be flicker-free at 30 frames per second, this only applies to the human eye being capable of differentiating between a constant grey image and an image swapping between black and white. The point at which this becomes indistinguishable is known as the flicker fusion frequency [24]. The flicker fusion frequency is only one part of the required display refresh rate story: With screen sizes becoming larger the distance an object can physically move on a screen from one frame to the next is growing as well and can cause a disorienting effect. Another effect is the wagon-wheel effect [20]. This effect causes a rotating wheel – like the wheel from a wagon in a western movie – to appear rotating slower, faster or even in the opposite direction of its true rotation. The real reason for this effect is still to be determined, but two theories have been established: One of them states that the human vision system partitions the continuous visual stream into frames causing the frequency of the rotation to interfere with the frequency of the vision system [50] resulting in temporal aliasing. The other, currently favored theory, -3- Chapter 1 - Introduction argues that true motion is captured by visual detectors sensitive to motion as well as detectors sensitive to the opposite motion by temporal aliasing [53]. Higher refresh rates will not resolve these visual effects, but will allow a reproduction closer to what the real scene would look like. Another use case for higher refresh rates is to support time and space multiplexed stereoscopic content. The goal in such a setup is to present a different picture to the left and the right eye using shutter glasses that are synchronized with the display’s refresh rate. Every pixel therefore has to change the intensity projected from frame to frame which can cause flicker to be apparent if the refresh rate is not high enough. The de-facto standard refresh rate for time multiplexed stereoscopic displays is 60 Hz per eye. 1.4 Motivation and Preview of Contribution and Results Even though it has been shown that stereo acuity improves with higher contrast (see [23]) it is unclear if this improvement continues to hold for higher dynamic range displays, as the tests performed previously used cathode ray tubes (CRT) screens with a lower dynamic range. The hypothesis for this work is that stereo acuity will decrease for higher dynamic range scenes, due to simultaneous contrast effects. The main idea stems from the fact that X-ray radiologists cover parts of the picture in order to allow the eye to adapt to a specific part of the image that is of interest. The results of this thesis could be used for further improvement of disparity mapping algorithms by incorporating more knowledge of the behavior of stereopsis in scenes with higher dynamic range. The display prototype constructed to perform the user study did not exhibit the anticipated contrast ratios due to technical constraints. Nevertheless, the outcome of the user study shows that stereo acuity seems to not necessarily improve any further with increasing contrast. 1.5 Outline of the Thesis The thesis will start with a discussion on related work in chapter 2 in the field of high dynamic range as well as stereoscopic display systems. On the other end, the related work section will also cover publications from the field of perception, especially the ones dealing with the perception of high dynamic range images and stereo perception. Selected items from the related field of high dynamic range capturing and processing will be discussed as well. The related work section will be followed by an introductionary chapter into depth perception discussing depth cues such as perspective, occlusion, vergence, accommodation and especially stereopsis, as well as some aspects of the physiological basis in the human vision system. Chapter 4 continues by describing the prototypical construction of a stereoscopic high dynamic range display including issues concerning polarization, the construction of a suitable backlight and potential explanations for the visual artifacts apparent in the second prototype that was used for the user study. Chapter 5 discusses the calibration of the display prototype. Both approaches, one using a HDR recovery approach with a standard camera and one using a commercial color calibration system for standard -4- Chapter 1 - Introduction LDR displays that have been attempted are discussed in detail. The benefits and downsides of those are explained therein as well. The characteristics of the display prototype are also presented in this section. Chapter 6 explains the performed user study and the results thereof. It discusses the motivation behind such a user study, the design of the stimuli and test patterns as well as the candidates and the environment in which it took place. The chapter finishes in a discussion of the results attained from the user study. Finally the thesis finishes with a summary of the presented work and an outlook into future work. -5- Chapter 2 - Related Work 2 R ELATED W ORK The topic of this thesis reaches into various areas, not only of computer graphics and display technology, but also those of psychology and especially perception. In this section, related work concerning high dynamic range displays, stereoscopic displays and those dealing with crosstalk, the perception of stereoscopic content and contrast are discussed. Related work concerning the calibration of displays in general is discussed herein as well. 2.1 High Dynamic Range Displays While consumer displays, particulary television sets advertise high contrasts ratios of up and beyond a few million to one, the promoted value defines the dynamic range of a display when viewing either a completely white or completely black screen (on/off contrast). In comparison, the static contrast defines the contrast ratio within a single image. More recent displays include a light emitting diode (LED) backlight system composed of multiple independently controllable LEDs. This allows a display to adapt the intensity of a backlight to the average luminosity of the display area influenced by the corresponding backlight LED. Of course the more controllable backlight LEDs there are, the more accurately the backlight intensity for each individual pixel can be controlled. There have been multiple approaches to placing the LEDs in quad and hexagonal grids (see Seetzen [54]). Even though the term resolution is usually associated with rectangular grids, it is applicable to all kinds of alignments. In order to differentiate between image resolution (the actual pixels on the screen) and the resolution of the backlight, the term image resolution and contrast resolution are introduced. Image resolution defines the resolution of pixels on the screen, whereas contrast resolution defines the resolution of the backlight. The maximal contrast resolution is limited by the image resolution. It does not matter whether static or dynamic contrast, all of today’s display prototypes considered being able to display high dynamic range content have a double modulation approach in common. This means that the amount of light passing through one pixel is controlled at two places by two separate modulators in the light path. The combination of modulators used differentiates the approaches from each other: The first display prototype to be considered of higher dynamic range was presented by Ledda in 2003 [37] and featured a stereoscopic wide-field viewer presenting a HDR stereo image pair to the viewer without any crosstalk due to the physical separation of the image channels. This prototype used transparencies and was able to produce a contrast of up to 10000 to 1. Due to the nature of printed images the display could only present static images. First interactive HDR displays were presented by Seetzen [54] with two prototypes: One was constructed using a projector and a LCD panel, with the projector acting as an intelligent light source. The projector directed only as much light onto the individual pixels as required to support the intended intensity instead of a classic uniform backlight where the same amount of light reaches every pixel. An illustration of the projector-LCD panel setup can be found in figure 2.1. -6- Chapter 2 - Related Work Fresnel Lens and Diffuser LCD Projector Dual-VGA Graphics Card in PC LCD Controller Figure 2.1: Projector-LCD approach as presented by Seetzen et al. [54] The contrast resolution therefore was close to that of the image resolution itself. The setup itself is in general straight forward to construct, but as described in Chapter 4, the approach is infeasible for stereo setups due to synchronization issues. The second prototype consisted of a LCD panel with a backlight composed of 760 individually controllable LEDs aligned in a hexagonal grid. The prototype has since evolved into a product using 1838 LEDs, producing a peak brightness greater than 4,000 cd/m2 and a 16 bit tonal resolution [14]. Another approach was presented by Bimber et al in 2008 [5], combining a projector-camera system with a paper printout of the image to view. The paper was annotated with a marker that was visually tracked by a camera. By knowing all parameters of the projector-camera system, the projector could super-impose an image on top of the tracked printout and augment the image. While the printout was limited to a static image, experiments using electronic paper allowed more rapid changes of the image, even though interactive rates were not possible with the electronic ink technology at that time. The approach reached a contrast ratio of up to 61,000 to 1 when using an LED projector with special photo paper. In work done by Guarnieri [22] a display prototype that uses two monochromatic LCD panels is presented. Even though the panels are placed immediately on top of each other, the high resolution causes the back panel image to appear as a shadow at contours, making the image appear blurry. The author presents two methods for compensating the view dependence by blurring the back panel and sharpening the front panel image as well as using a constrained low-pass filtering approach which assures that the front panel pixel value does not clip after compensating for the blur of the back panel image. While the first prototype in this thesis is based on the same principle as the one of Seetzens’ [54] display, using a projector backlight and a LCD front panel, the prototypes presented here always aimed at being stereo capable. As discussed in chapter 4, the projector-LCD approach bears many pitfalls resulting in the approach of using two identical LCD panels stacked behind each other, which is closer to the prototype that Guarnieri [22] performed work on. Yet, Guarnieris’ main contribution is on reducing the shadowing due to the parallax effect. The LCD panels used in the prototypes presented here are of a lower resolution and did not exhibit the same shadowing issues as Guarnieri discussed. -7- Chapter 2 - Related Work 2.2 High Dynamic Range Perception Visual Perception is a intensively studied field of research. From the field of visual perception, stereo perception is especially relevant to this thesis. Various attempts have been made to construct a model of the visual cortex from the human vision system (HVS) focusing on the adjustments the HVS performs in order to be able to perceive HDR content [8], [18], [45], [46] and [40]. One of the main use cases for such models is to compare images for visual equality and therefore be able to compare the performance of lossy image and video compression algorithms against each other. Daly [8] was first to present such a visual difference predictor (VDP). The difference predictor presented takes two images and information about the presentation environment (such as the distance of the viewer and the pixel density) and uses a model of the human vision system (HVS) to mark the pixels that are different enough within the two images to be detected by the HVS. The model takes into account the contrast sensitivity function, the amplitude nonlinearity of the intensity detectors and contrast masking effects. The algorithm was extended to support High Dynamic Range images [39]. In medical display devices it is often important to assure that all shades of gray can be actually perceived by the observer. In 1992, Barten [3] introduces the notion of a just noticeable difference to the field of visual perception. In order for all shades to be distinguishable, a logical increase in display brightness has to be just large enough to be perceivable by the viewer. Later work by Barten [4] extends this with a temporal component, taking the adaption luminance of the HVS into account. The amount of additional brightness to reach the consecutive step is not constant but exponential. The result of this is that the theoretical maximum number of JND steps that can be achieved is quite low. With the first JND step starting at low starlight and the highest one at direct sunlight, there are about 2000 steps that can be achieved according to Bartens Model [3]. For comparison, the display prototypes presented in [54] cover 962 JND steps (for the projector-LCD prototype) and 1139 JND steps (for the LED based prototype), yet not all of the JND steps are actually reachable by the prototypes due to a lack of accuracy of both modulators. The work presented here differs from related work trying to present an accurate model of the human vision system, in that the goal is not to find a complete model, but rather an answer to whether higher contrast is beneficial to stereopsis as well as questioning the role that crosstalk plays in this context. JND calibration is an important characteristic of the accuracy of a display and was performed on the display prototypes presented here. The results of the user study could be used to verify the reliability of the visible differences predictor. 2.3 2.3.1 HDR Capturing and Processing Capturing In order to validate the output and calibrate the display it is required to capture the actual luminosities presented by the display. One way to achieve this is to use a colorimeter and measure the display values -8- Chapter 2 - Related Work directly. Another approach is to capture an HDR image using a camera. Due to cameras capable of capturing HDR scenes not being readily available, algorithms that recover HDR images from multiple low dynamic range pictures have been developed. This works by first recovering the response function (up to scale) of the camera by combining the response of multiple pictures of the same scene taken with different exposure times. Knowing this response function, the value of a pixel in a picture of a known exposure time can be quickly converted to the relative irradiance value originally captured by the camera. For robustness, pixel values from all taken images are combined using weights according to how reliable the taken image at the given location is believed to be. The weighting function is usually a Gaussian function with increasing weights towards the center of the output value range of 0 to 255 ([52], [12]). 2.3.2 Tonemapping Since most displays currently available are not able to correctly represent HDR content, algorithms have been developed to simulate visual effects (bloom for instance) that would be apparent to the eye if the original scene was observed. One of the first tone reproduction / mapping algorithms was presented by Tumblin [58]. It models an observer as well as the display device and adjusts the display drive values by a delta calculated from the inverse of those models. This means, that given it is wanted that a given luminance L is observed by an observer, specifics about the display system (such as the transfer function) and the observer (such as the adaption luminance of the human eye) need to be known. These models then need to be inverted and applied to the intended luminance L in order to compute the level that has to be set in the pixel buffer. This has to be done in such a way that the display devices produces such an intensity value that the human vision system observes it as the intended luminance L (at a given adaption level). All following tone mapping operators improve the models used in the work done by Tumblin and Rushmeier ([34], [47], [16], [17], [51], [27], [15] and [36]). For a great overview of the state-of-the-art tone mapping algorithms refer to work performed by Kuang [32] where comparisons have been performed and [35] where some of them are benchmarked against real high dynamic range displays. 2.3.3 Storage High dynamic range imaging requires more than just an improvement of the output and the recording devices. Image storage formats for standard dynamic range images currently do not support high dynamic range content. HDR images of course require more space to store as every channel of every pixel of an image needs to represent a greater number of values. [59] presents an extension to the JPEG image standard by storing information required to restore the high dynamic range components of the image in an application specific marker of the standard JPEG format. This way high dynamic range information can be stored along with the JPEG image, keeping it compatible with existing software. -9- Chapter 2 - Related Work One of the standard high dynamic range image formats used in the industry is OpenEXR. It is described in detail in [30] and a lot of material including a SDK can be found at [26]. EXR was the main image format used throughout this work when handling high dynamic range content. The HDR recovery algorithm presented by Robertson [52] was used for capturing HDR images of the patterns used for the first approach done to calibrate the display prototype. The tone mapping operators are related, even though the prototype is capable of producing a higher contrast, as the provided dynamic range is not enough to display any HDR image. Especially the peak luminance of the display is not improved over conventional displays and tone mapping mechanisms are therefore still required and relevant for reproduction of high dynamic range images. 2.4 Stereoscopic Displays and Ghosting One of the main reasons for headaches and stereo sickness in prolonged consumption of stereoscopic content is caused by ghosting, which in turn is triggered by crosstalk between the image channels. Ghosting manifests itself as a semitransparent shadow image of the content destined for the other eye. The distance between the contours depends on the disparity of the image, which for a projective camera transform in turn depends on the depth of the scene at a given contour border. An exaggerated illustration of how ghosting appears as can be found in figure 2.2. Crosstalk in fact is caused by the inability of the multiplexing system to either correctly generate or decompose the multiplexed signal. Figure 2.2: A stereoscopic pair of images of a scene with a sphere. From left to right: the image destined for the left image, the image destined for the right image and the fusioned image with shadows of the original image visible due to ghosting. The amount of crosstalk certainly depends on the display technology used. Two image channels are usually multiplexed either in time (active) or polarization (passive). While both technologies de-multiplex the image channels using glasses worn by the observer, the technologies are inherently different: Active time-multiplexed stereo setups use shutter glasses, where the left and right eyes are synchronized to the left/right refresh rate of the display. Passive, polarization-multiplex based setups use polarization filters and therefore no synchronization is required. The amount of crosstalk therefore depends on the quality of the polarization filters for passive systems as well as the accuracy of synchronization for active setups. Perfect image channel separation is possible by using a haploscope and therefore not multiplexing the image channels at all, but keeping them separated. When talking about crosstalk, literature usually refers to the image channels as the intended and the - 10 - Chapter 2 - Related Work unintended stimuli. The intended stimulus represents the signal that is supposed to reach the targeted eye, whereas the unintended stimulus is the signal that leaks from one channel to the other. The resulting signal can be modeled as a simple addition of the intended and unintended stimuli as described in formula 2.3. I = Iintended + Iunintended Figure 2.3: The Intensity reaching the eye is the additive of the intended signal and the unintended signal. Further refining the model for crosstalk reveals that the crosstalk is usually dependent on factors like the pixel position or the viewing angle. The unintended stimuli can therefore be defined as a function dependent on these parameters (see formula 2.4). Iunintended = f (x, y, α, β ) ∗ I (x, y) Figure 2.4: The amount of unintended signal is usually dependent on the pixel position (x, y) and the viewing angle (α, β ) of the observer as well as the image content of the opponent Image (I) at the given position. 2.4.1 Crosstalk Compensation If the amount of unintended signal is known, it can be compensated for by subtracting it from the intended signal. There are two challenges by doing this: First, knowing the accurate amount of crosstalk usually requires calibration which can become cumbersome to handle if all parameters are to be considered. Second, due to the physical properties of light, crosstalk cannot be compensated for if the intended signal does not contribute, since the intended signal cannot be set to less than zero. The approaches here differ in the way they perform the subtraction as well as how calibration data is looked up and the parameters that are taken into account when calibrating. Lipscomb and Wooten [38] described a method to decrease crosstalk by first increasing the background intensity and then decreasing pixel intensities according to a specifically constructed function. The amount of crosstalk correction applied is not uniform across the screen but the screen is separated into 16 horizontal bands and the function is adjusted for each of the bands individually. Konrad et al [31] presented a model that takes into account the intended and unintended signal. The calibration phase consists of a psycho-visual experiment for estimating the crosstalk factors which are later used for compensation. The experiment works by presenting the user two rectangles in the middle of the screen. One of the rectangles contains no crosstalk by presenting the same stimulus on both eye channels, while the other rectangle contains crosstalk by presenting an unintended stimulus on the other eye channel. The user then has to adjust the color patches by changing the crosstalk correction factor. The stimulus is repeated using multiple combinations of intended and unintended stimuli and stored in a - 11 - Chapter 2 - Related Work two dimensional lookup table. Upon rendering the image, the table is inverted in a preprocessing stage and the necessary amount of crosstalk correction is retrieved from the lookup table for every pixel in the stereo pair. The major downside of this algorithm is that it does not take the position of the corrected pixel into account. Smit et al [56] extend the crosstalk model presented by Konrad [31] by taking the vertical position of the pixels into account. As the calibration table would become infeasible to handle due to having three dimensions instead of two, the values encoded in the table are parameters to a function that additionally take the vertical position of the pixel pair to be corrected. The calibration process presented in [31] is also extended to include the vertical position. Later that year, Smit et al presented three extensions to subtractive crosstalk reduction [57]. Firstly, Smit proposes to use the CIELAB instead of the RGB color space to adjust the lightness component when the resulting corrected RGB value would be clamped. The second proposal uses a geometric approach by adjusting the intensity of the other pixel constituting the fused pixel instead of the pixel at the same position in the other eye channel. Doing so requires finding the corresponding fused pixel by retrieving the depth value of the pixel and calculating the disparity at the given depth. Another approach to dealing with crosstalk is presented by Siegel [55]. In his work it is argued that if the disparity of a stereo pair is small enough, the appearing borders can sometimes not be easily detected but are perceived as a blur on contour borders. This blur can be unobjectionable if it is similar enough to depth-of-focus and would then allow for a relaxation of the strict zero or little crosstalk requirement for virtual reality applications. The related work concerning stereoscopic crosstalk differs in a way, that the main goal of this thesis is not to improve crosstalk compensation mechanisms but to understand the influence of crosstalk on stereoscopic perception. The mechanisms presented here benefit from the fact that if higher contrast does not improve image perception, contrast can be reduced in favor of less inflicted crosstalk and therefore less visible ghosting. 2.5 Crosstalk Perception Models of the Human Vision System (HVS) help to understand how our vision system basically works, even though side effects such as headaches and stereo sickness in general are usually not yet incorporated into these models because there is not enough known what exactly causes those symptoms. [1], [25] and [44] have performed user studies trying to uncover these reasons. Pastoor [44] describes factors involved in 3D imaging, especially visibility thresholds for ghosted contours at various disparities. Hoffman et al [25] focusses on the vergence-accommodation mismatch: In a stereoscopic setup the depth we perceive is caused by various depth indicators: stereopsis and vergence, parallax effects as well as shadows. Yet, one of these indicators, accommodation, cannot be tricked into registering the correct depth using a standard flat display surface. - 12 - Chapter 2 - Related Work The reason for this is that both eyes still individually focus on the display surface, giving the HVS a hint on how far away an object in the scene really is. Therefore, if an object is supposed to appear in front of the scene and disparity, parallax and stereopsis indicate the expected depth, the depth of focus that monocular vision indicates is still at the depth of the display surface. Hoffman et al [25] therefore constructed a display prototype that uses three layers of semitransparent mirrors at three different depths. The display interpolates the intensity of the projected image pixels, depending on the requested depth of a pixel and therefore forcing monocular vision to focus on a depth layer closer to the expected depth, minimizing the vergence-accommodation mismatch. The setup is also stereoscopic and therefore both eyes can have individual images where a pixel may lie on planes of different distances to the eye. A schematic of the display can be found in figure 2.5. Top View Side View Front-surface mirror Far image planes Mid planes Semi-transparent Mirrors Near planes IBM T221 TFT Display (3840x 2400 pixels) Figure 2.5: The setup of the display presented in Hoffman et al [25]. Two semi-transparent and one front side mirror allow the eye to focus and converge at multiple image planes at different depths. Hoffmans paper describes three different user studies in which the display was used. All of them use sinusoidal gratings whose orientation has to be detected by the participant. 2.6 Stereopsis and the Correspondence Problem Stereo matching is the first and most important part of the depth reconstruction process performed by the HVS. Because there are two images available, pixels corresponding to the same physical location in space have to be resolved. This task is referred to as the correspondence problem. In [48] it is shown that, given the choice between a global match to a monocular image of the same contrast or to an image of a higher contrast, the higher contrast match is preferred. A global match is a match where all possible candidates in the neighborhood have to be considered collectively and only the matches that fall within preferably smooth surface(s) are selected. A computational model for disparity processing in the human vision system is presented by Mansson [41]. - 13 - Chapter 2 - Related Work The model is inherently different to previous models in a way that it does not rely on a set of predefined higher level features such as edges (zero crossings in the second derivative of the image) or bars. Instead it uses a hierarchical model of sub regions and compares the overall configuration of contrast within limited regions. The information gained from the coarser levels is used to restrict the set of potential matches for the finer levels. It is well known that the processing of disparity is based on the first and second derivative of the image and not on the actual absolute intensities. The receptive fields encoding the intensities are best described by the Gabor function (see figure 3.4). It is still unclear on how the human vision system encodes binocular disparity. Two main theories, phase and positional encoding exist: ”‘In the position difference model, the receptive field profiles are assumed to have identical shape in the two eyes but are centewred at noncorresponding points in the two retinas ... In the phase difference model, the Gaussian envelopes of the receptive fields are constrained to be at corresponding retinal points, but the receptive fields are allowed to have different shapes or phases.”’ [10] - 14 - Chapter 3 - Depth Perception 3 D EPTH P ERCEPTION Depth Perception is an important part of the vision system not only for humans but for all predators as it supports estimating the distance to an object and therefore massively aids moving in a three dimensional environment. Even though the image captured by an individual eye is two dimensional, it contains enough information to recover the qualitative and relative depth image, even allowing the estimation of an absolute depth value. A single piece of information is called a depth cue. The types of depth cues can be split into two distinct groups: Monocular and Binocular cues. Monocular depth cues are available to the single eye, whereas Binocular depth cues require the relationship between two distinct images projected on the individual retinas. Binocular cues work basically on the fact that the eyes of a human are separated horizontally, which is generally referred to as binocular disparity. Depth cues have been studied for a long time and have been employed by artists in order to achieve an effect of depth in two dimensional paintings. 3.1 Perspective One of the depth cues that form the basis for a few other cues is perspective. Perspective itself stems from the fact that the lens of the human eye produces a perspective instead of an orthogonal projection of the scene onto the retina. This causes objects further away from the eye to have a smaller projection area on the retina than an object of the same size closer to the eye. This gives rise to derived depth cues such as motion parallax. Motion parallax occurs if the relative position of the eye towards two objects at different distances changes. Consider two objects in space moving at the same absolute speed. Due to perspective, the projected images move at different speeds with the object being at a further distance moving slower than the object closer to the image plane. Motion parallax is very similar to depth from motion, stating that the perspective projection of an object becomes smaller the further the object moves away from the eye even though the size of the object does not change. The same effect is apparent if two objects of known size are positioned at different depths. Depending on the distance, the relative size between the objects varies. 3.2 Occlusion The visual cue that one object covers another object completely or only parts of the other object does not reveal any absolution depth information. Yet it is critical to depth perception, as it allows for an absolute ordering and thereby helps estimating depth values. Occlusion may at first consideration be a very simple computation for the brain as it only requires information about which object covers another object. At second thought though it becomes clear that the notion of what defines the boundary of an object requires higher knowledge of the objects in the scene. The monocular occlusion information is rather ambiguous, as it is not apparent from a single image if the - 15 - Chapter 3 - Depth Perception object actually continues behind the edge of the occluding element. It could – although very unlikely in a real world scenario – be the case that the object actually ends exactly at the edge and is actually closer to the observer than anticipated by the occlusion information. An artificial example of such a situation is shown in figure 3.1. There are two distinct aides in human vision that help make occlusion more robust: temporal consistency and binocular vision. Temporal consistency allows the brain to correlate images following each other temporally. If, as in the example given above, either one of the objects or the observer changes position, the optical ambiguous situation is resolved. In most situations binocular vision resolves this problem as well, as it provides a different perspective onto the same scene. Figure 3.1: Occlusion can be ambigous in the monocular case, but is usually very well resolved by binocular vision. The first image lets us conclude that the red rectangle is behind the blue rectangle as its side aligns perfectly, which would be an uncommon situation when viewing. The second and third image show the images as perceived by the left and right eye, which resolve that the red rectangle is actually in front of the blue rectangle and the monocular image, by accident, alignes borders with the blue rectangle. 3.3 Vergence Human beings, such as most other predators, have binocular vision with eyes separated horizontally. As both eyes focus on the same object, the view axes of the eyes have to point towards the object in question. It is obvious that the angle under which the object subtends each eye is offset due to the horizontal displacement. The difference of these angles directly relates to the distance of the object and therefore serves as a source of depth information. With stereoscopic image content, vergence is directly related to the disparity of fused image features, as the disparity controls how far apart fused pixels in the image plane are. The distance between the pixels can have a positive as well as negative sign, resulting in a fused image appearing either in front of or behind the focused image plane. When relating this to the vergence of the eyes, a stereoscopic stimuli with a positive disparity is usually referred to as an uncrossed stimuli, whereas one with a negative disparity would be referred to as being crossed. Figure 3.2 visually describes change of the vergence angle according to the distance object. It is important to note that disparity on an image plane has a maximum separation equal to the distance of the eyes when uncrossed. This is due to the fact that the view directions of the two eyes become closer - 16 - Chapter 3 - Depth Perception View direction left eye View direction right eye Convergence angle β α View direction left eye View direction right eye Convergence angle Point of Focus α β Point of Focus Figure 3.2: Vergence controls the orientation of the eyes. At the same time it acts as a depth cue by providing the angle of orientation to the vision system. to being parallel the further the object moves away from the observer. 3.4 Accommodation Since the eye is not a simple pinhole camera but rather consists of a lens and a retina, the eye is required to focus the image to a specific depth. This means that in case of the human eye, the curvature of the lenses can be adjusted by the ciliary muscles, an act that is called accommodation. Accommodation is used to focus on an object at a specific depth. The information of how much the lens is contracted acts as a depth cue to the vision system. Accommodation is believed to be one of the main problems in stereoscopic displays employing a single surface. The eye always accommodates to the display surface while both eyes converge to an object believed to be at a different depth. This problem is referred to in the literature as the AccommodationVergence mismatch. The solution of the problem would be a true light field display, causing the eye to focus at the same depth as towards which the eyes converge. 3.5 Stereopsis Even though a human typically has two eyes, the brain is usually confronted with a single image. This image is considered to be the fused image of the two eyes and is referred to as binocular fusion. 3.5.1 Horopter, Vieth-Müller Circle and Panum’s area The horopter describes a surface containing all points that are fused to a single vision when focusing on a specific point in depth. Points belonging to the horopter all have an equal distance to the center of both eyes. Initially it was believed that the area of single fusion is formed by the circle defined by the fixation point and the two centers of the lenses. - 17 - Chapter 3 - Depth Perception Fixation Point C Empirical Horopter D B E A Vieth-Müller Circle Theoretical Horopter E D C BA Left Eye Nodal Points A ED C B Right Eye Figure 3.3: The Vieth-Müller Circle, also known as the Theoretical Horopter and the Empirical Horopter. Points on the horopter represent points that result in single, fused images. Points too far from the (empirical) horopter are seen as double images (diplopia). By observing images in his newly invented haploscope, Wheatstone empirically found that the actual area of single vision is larger than the theoretical horopter which lead to the discrimination between the empirical and theoretical or geometrical horopter. Often, the geometrical horopter is referred to as the Vieth-Müller Circle (see figure 3.3), named after the inventors Gerhard Vieth and Johannes Müller. In 1858 Peter L. Panum [43] found that points within an area around the horopter can also be fused to a single vision. This area is known as Panum’s area of single vision. Points in front of, or behind Panum’s area will appear as two separated object, generally referred to as diplopia. 3.5.2 Neurophysical Basis Since Sir Charles Wheatstone presented the first Mirror-Stereoscope in 1833, many theories have been formulated as to how the human vision system combines the two separate streams of information arriving from both eyes. Wheatstone himself believed that the images where analyzed independently of each other. This analysis included finding contours and higher order shapes which at a later stage in the human visual cortex are combined into a single fused view. It was considered that stereopsis used many of the available depth cues in order to fuse an image. It was not until Béla Julesz presented Random Dot Stereograms [29] in 1959 which displaced the theory of late binocular fusion. The reason for this change in belief was that Random Dot Stereograms do not contain any other depth cue than pixel patterns displaced by some disparity and yet allow the perception of depth. A random dot stereogram is created by taking an image of random black and white dots and displacing parts of it by some pixels for the opposing eye. The amount the random pixel pattern is shifted directly relates to the fused depth. - 18 - Chapter 3 - Depth Perception An important aspect of perception is the distance between the two eyes, called interpupillary distance (IPD). The IPD has a major influence on the perception of stereoscopic content. The vast majority of adults have an IPD between 50 – 75mm with the average being at 63mm [13]. An important aspect when designing stereoscopic systems is that the IPD for children of age 5 to 15 years is in the range from 40mm to 65mm. If this is not considered, the perceived depth may be well beyond the limits of stereo fusion for children. Neurophysical research has been performed on cats and macaque monkeys in order to decode the individual cells involved in the decoding of the depth information available in the stereoscopic image streams. In the human vision system the neural signals from the ganglion cells first pass through the lateral geniculate nucleus (LGN) of the thalamus before arriving at the visual cortex. There seems to be no direct evidence for disparity selective cells in the LGN. Instead disparity selectivity emerges first in the primary visual cortex (V1) where signals from the two eyes converge upon single neurons [11]. Most of the input to V1 arrives at so called simple cells, which are orientation-selective and consist of receptive fields whose profiles are found to be well described by a Gabor function [28], [19], [10]. A Gabor function is the product of a Gaussian Bell and a sinusoid, as defined in figure 3.4, where x0 and ω correspond to the center position and width of the Gaussian envelope, f and Φ denote the spatial frequency and phase of the sinusoid, and k is an arbitrary scaling factor [11]. The resulting visual stimuli of a Gabor function with specific assignments to the center position, width, frequency, phase and scaling factor is called a Gabor patch (such as the one shown in figure 3.5. G (x) = k × exp − (2 (x − x0 ) /ω)2 × cos (2π f (x − x0 ) + Φ) Figure 3.4: The Gabor function Figure 3.5: A gabor patch oriented at 45° with a gaussion envelope (50 pixels standard deviation), a frequency of 0.05 cycles per pixel. The original image had a with and height of 500 pixels. The patch was generated using the online gabor patch generator found at [7] It can be argued that V1 simple cells are responsible for the first stage of disparity processing in the brain. One of the questions remaining is, how the simple cells encode binocular disparity. According to [11] - 19 - Chapter 3 - Depth Perception there exist two main theories: The theory of position difference and phase difference: In the position difference model, the layout of the receptive field is the same for both eyes but its center is located at noncorresponding points on the two retinas. In the phase difference model, the receptive fields are centered at corresponding retinal points, but the two receptive fields are allowed to have different shapes or phases [11]. Studies related to depth perception also to some extend incorporate findings of effects from binocular rivalry. Binocular rivalry (BR) occurs if the image pair is too different in each eye to fuse a single image. In that case, the human vision system switches between the two images. The study of binocular rivalry aides the understanding of binocular depth perception as it represents a state of failed binocular fusion. With functional magnetic resonance imaging (fMRI) becoming more widely available (see [6]), perception studies relying on brain scans using fMRI are becoming more and more common as the technique is non-invasive and allows interactivity to some extent. Yet, most of those studies try to resolve the most inner workings of perception such as the perception of motion, well beyond the spectrum of this thesis. 3.6 Perception of Crosstalk Since most available stereo systems use some kind of multiplexing to deliver individual images to the two eyes, the perception of crosstalk is an important topic to discuss. Considering a simple scene with one high contrast border at a disparity of some amount as depicted in figure 3.6. Left Eye Right Eye Intended Image Perceived Image Figure 3.6: A stereoscopic contrast curve. The top images are the intended signals for the left and right eye, the bottom images the actually perceived images. It is usually assumed that the amount of crosstalk is symmetric, such that white becomes darker by a certain amount and black becomes lighter by that same amount. While the amount of crosstalk is equal, the amount of ghosting which is perceivable, is not. Due to Webers’ Law (see Figure 5.6), the ghosting in figure 3.6 is less visible for the right eye than for the left eye. This is due to the fact that the brightness difference required for a contrast edge to be detectable for the eye is greater at brighter intensities. The actual perceived image naturally depends on the media presenting the image due to varying transfer - 20 - Chapter 3 - Depth Perception Figure 3.7: First derivative of the contrast curve presented in Figure 3.6. Even though the amount of crosstalk is the same, the absolute brightness levels differ for the left eye and the right eye. This difference causes the bright contrast edge to be easier to detect than the dark contrast edge. functions but the effect is still apparent even if linearity is given. Figure 3.7 visualizes the first derivative of the contrast edges in figure 3.6 indicating that the amount of crosstalk is the same for both cases. Please note that the effect requires a linear transfer function of the output media to be objectively judged. This fact can be easily reproduced on any stereoscopic display by displaying a contrast curve with some disparity and allowing the observer to adjust the intensity of the black level. If the observer first adjusts the black level with only one eye observing the contrast curve such that the ghosted area is not visible the ghosted area will be apparent in the other eye. Most ghosting compensation algorithms do employ the fact that lowering the contrast ratio will also lower the amount of perceivable ghosting, but lack an explanation of this asymmetry. - 21 - Chapter 4 - Construction of a HDR Stereo Display 4 C ONSTRUCTION OF A HDR S TEREO D ISPLAY Since the goal of this thesis is to evaluate the correlation between higher contrast ratios and crosstalk concerning stereopsis, a display capable of producing a higher dynamic range, while still being able to display stereoscopic content, became a necessity. It should be noted that at the time this project started (late 2009), the first LCD panels capable of refreshing the screen fast enough to support stereoscopic viewing at 120Hz became commercially available. The construction of such a HDR Stereo display manifested itself in two prototypes: The first one is based on the approach presented in [54], using a projector as the back panel modulator and a stereo LCD panel as a front modulator. The second prototype uses two identical stereo LCD panels and a bright LED backlight and is shown in figure 4.1. Figure 4.1: The final (second) prototype built from two stereo LCD panels. 4.1 LCD - Projector Approach The first prototype lends its approach from Seetzens’ prototype [54], using an Infocus DepthQ projector as the back modulator and the LCD panel of a Samsung 2233RZ as the front modulator. As the projector is able to direct the amount of light in every individual pixel, it acts as an intelligent light source, whereas the standard backlight would direct the same amount of light towards all pixels. Since the resolutions as well as the aspect ratios of both devices differ, a homographic transform was used to stretch the projector image accordingly. The setup worked for monoscopic viewing, whereas for stereo viewing, synchronization issues appeared: The used stereo setup is based on NVidia’s consumer - 22 - Chapter 4 - Construction of a HDR Stereo Display line stereo system, 3DVision. The problem with this system is that the graphics card is able to detect the model and brand of the display device and the driver adjusts the shutter delay of the glasses accordingly. Since we were using two different devices, both devices had different delay timings and it was therefore not possible to get a single synchronized stereo delay for both display systems. While this problem theoretically could have been solved by adding a delay to the LCD panel, it was not the only issue this first approach suffered from: The Infocus DepthQ projector is a DLP device, whereas the Samsung 2233rz uses an LCD panel. Not only do the technologies used here inherently differ, the resulting refresh cycle also differs: The single chip DLP projector sends the red, green and blue components of the images in sequential images towards the front modulator. The LCD panel on the other hand refreshes all of its pixels from top to bottom, where red, green and blue components are refreshed at the same time. When comparing the two time lines next to each other, it becomes obvious that depending on the order the DLP projects images, parts of the LCD do not get any image in that color. The solution would have been to remove the color wheel altogether and only use a monochromatic back modulator (as in [54]), or using a three chip DLP projector, solving the display technology synchronization issue, but the synchronization delay issue on the software / driver side would have still been apparent. The approach of using a projector as the back modulator for a HDR stereo display was therefore abandoned and another approach, using layered LCD panels, was pursued. 4.2 Dual Layer LCD Approach Due to the multiplicative nature of light modulators, all known approaches of producing a HDR display, use two (or more) modulators. In the first prototype the DLP represented the first modulator, the LCD panel the second. Dissecting a DLP projector more closely reveals that in fact a DLP projector consists of a bright light source and a Digital Mirror Device (DMD). Instead of the DMD and a single LCD panel, the second prototype uses two Samsung 2233rz LCD Panels mounted directly in front of each other. This moves the modulator from the projector closer to the front modulator and therefore does not require distance in depth to achieve a decently sized image. A schematic of the dual layer lcd approach is presented in figure 4.2. An array of 6 by 4 high power LEDs acts as a very strong, uniform backlight which is closer described in chapter 4.4. A major advantage of using two LCD panels is that the same model can be used as the front as well as the back panel. The pixel alignment also becomes very simple since resolutions match up and the panels can be aligned along a common edge. Using the same panel twice also allows stereoscopic content to be viewed with fewer synchronization issues as the image delay for both panels internally is very likely to be at least nearly identical. 4.3 Polarization Due to the stacking of multiple LCD layers, it is important to understand the effects of polarization happening as the layers of polarization foil, of which liquid crystal displays are built, interact optically. - 23 - Chapter 4 - Construction of a HDR Stereo Display Backlight Diffuser Back Panel Front Panel Figure 4.2: Schematic of the dual layer LCD approach. In order to fully comprehend the further steps taken, it is important to understand the basic principle of how LCD panels work. A chromatic LCD panel typically consists of four layers: • A polarizer (polarization foil) • A color filter depending on the color of the pixel (red, green or blue) • A Spatial Light Modulator (SLM), in case of this prototype based on Twisted Nematic (TN) cells. • An Analyzer (polarization foil rotated 90°to the Polarizer) Light passing through the LCD panel becomes linearly polarized by the Polarizer, in which wavelengths of unwanted colors are filtered out by the color filters for the individual pixels. The Spatial Light Modulator then rotates the polarization of the light depending on the state of the twisted nematic. Depending on how much the polarization of the light is rotated, the more the polarization direction of the light is directed towards a direction allowed to pass the following analyzer. The more light passes the analyzer, the brighter the final pixel will appear. As an example, given the twisted nematic layer rotates the polarization by 45°, 50% of the light is absorbed and transformed to heat, whereas the other 50% passes the analyzer and hopefully reaches the observers eyes. Due to this, light coming from the display is polarized linearly in a way which matches the polarizer alignment of the shutter glasses. Even though the example given stated that a rotation of 45° manifests itself as a reduction of half of the light, the relation is not linear but instead follows Malus’ Law (figure 4.3): I = I0 cos2 θ Figure 4.3: Malus’ Law. I0 corresponds to the initial intensity and θ to the angle between the polarization directions of the polarizers. - 24 - Chapter 4 - Construction of a HDR Stereo Display 4.3.1 Rotated around Y-Axis When stacking two LCD panels on top of each other it becomes clear that the analyzer of the back panel and the polarizer of the front panel are aligned perpendicularly, preventing any light from passing through. The obvious solution would be to rotate the front panel by 180°around its vertical axis causing the analyzers to be next to each other (see figure 4.4). This will only work if the linear polarization filters are aligned with the rotating axis. In that case, the analyzer of the front panel and the analyzer of the back panel would align in polarization direction and would, when using a perfect polarizer, let the entire light pass through. Of course there is always a loss of light, even if polarizers are aligned perfectly. 180 ° Analyzers Backlight Analyzers Backlight Conflict, all light is filtered Conflict resolved. Polarizers Polarizers Spatial Light Modulators (SLM) Spatial Light Modulators (SLM) Figure 4.4: If the polarization of the back panel analyzer and front panel polarizer have to match and the polarization is aligned or perpendicular to the y axis of the display, one of the panels can be rotated. This causes the polarizer and analyzer to swap roles and invert the flow of light. In general though, polarizers and analyzers of any modern LCD panel are not oriented along the vertical axis. Therefore, the trick of placing both analyzers next to each other by rotating the front (or back) panel will not work (as explained in figure 4.5). 180 ° Analyzers Backlight Conflict remains! Polarizers Spatial Light Modulators (SLM) Figure 4.5: In practice, polarization foil is not aligned with the display axis but oriented at 45°. This causes the polarization to remain rotation invariant. - 25 - Chapter 4 - Construction of a HDR Stereo Display 4.3.2 Rotated around Z-Axis Another quite viable solution is to rotate one of the displays 90°around the z-axis as shown in figure 4.6. This has three negative aspects: Firstly, the visible area is reduced to a quadratic rectangle with a lateral length equal to the height of the original panel. The remains of the panels are then on either side or on top or bottom of the display and cannot be used for double modulation as the light path would only pass through one display. Secondly, considering that the LCD panels used are chromatic and not greyscale, adds another problem to this approach, visually explained in figure 4.7: With chromatic displays, pixels are not square but rectangular stripes of red, green and blue pixels or set up in a Bayer grid pattern. Aligning the displays perpendicularly causes the sub pixels to cover each other, further reducing light throughput as the light path for a majority of the pixels would pass through two different color filters essentially blocking most of the light. Finally the alignment of the panels becomes far more difficult as they would not share a common edge to align with. Analyzers Backlight Conflict resolved Polarizers Usable display area Spatial Light Modulators (SLM) Figure 4.6: Rotating one of the panels by 90° would reduce the usable HDR area to the square of the height of the display. 4.3.3 Using a Wave Retarder Since disassembling the LCD Panels and removing the polarization foil seemed too invasive and too influential on image quality itself, it was first decided to try using a wave retarder that would re-orient the polarization in between the analyzer of the back panel and the polarizer of the front panel. Since optical wave retarders with the size of a 22 inch LCD panel are rather expensive, cellophane film was tried as a wave retarder. Ortiz-Gutiérrez et al [42] describes the optical aspects of cellophane film in the context of - 26 - Chapter 4 - Construction of a HDR Stereo Display Single Pixel Unpredicatble Sub-pixel misalignment Figure 4.7: Each individual pixel in a screen consists of 3 sub pixels for red, green and blue which are aligned as stripes. Rotating one of the panels would cause the overlapping sub pixels to mismatch and therefore reduce brightness. polarization and praises the wide band of wavelength, cellophane film rotates by a half-wave. While the effect could be reproduced, the cellophane films’ spectrum of operation was not as wide as is required, leaving longer wavelengths in their original state and therefore not rotating the full spectrum of white light. This manifested itself as a yellow tint as the bluish components of white light were absorbed by the front panel polarizer. 4.3.4 Inverted Mode The next idea involved removing the analyzer of the back panel and operating the back panel in an inverted mode. This also inverted the task of the twisted nematic cells: Instead of rotating the polarization of the light that is supposed to pass the analyzer towards the orientation of the analyzer, the cells were now responsible of manipulating the polarization of the light which was supposed to be absorbed by the analyzer. Therefore, following the light path from the light source through the display to the eye, the following is supposed to happen: • Light passes through the polarizer, removing light not aligned vertically. • the remaining light passes through the spatial light modulator (SLM), rotating light away from the vertical polarization direction when activated, letting the light pass through with its vertical polarization when deactivated. • The polarizer of the front panel acts as the analyzer for the back panel. As it is also aligned vertically, it absorbs light that was rotated by the SLM. • The front panel spatial light modulator rotates light towards the horizontal polarization direction of the front panel analyzer. • Finally, the front panel analyzer absorbs light that is not aligned horizontally. This is light that was not affected by the SLM. - 27 - Chapter 4 - Construction of a HDR Stereo Display In this setup the two LCD panels operate in two different modes: The front panel works in the classic mode where the front panel and the back panel polarizer orientations are perpendicular to each other. The back panel acts in an inverted mode, where the analyzer and polarizer are aligned to each other. Therefore, in order to produce a white pixel on the back panel, the intensity value has to be set to zero, whereas a white pixel becomes visible if a front panel pixel is set to 255. This can be achieved transparently for all applications by modifying the color lookup table in the graphics driver. This table is used by the graphics driver to map 8 bit Red-Green-Blue tuples to values that are then sent to the display device. In order to invert the colors back to normal it was therefore possible to set a color lookup table that declined rather than increased with increasing index value, mapping higher index values to lower intensity levels, which in inverted mode caused the twisted nematic to rotate less light away from the analyzer polarization orientation. The main problem with this approach was that a lot of light passes by the twisted nematic cells without being influenced by its structure and maintaining the polarization as mandated by the polarizer. In the normal case this is not a problem as the unaffected light would not pass by the analyzer, but instead being absorbed. Since the analyzer and the polarizer of the back panel were now aligned, this was not the case anymore and contrast was reduced by two orders of magnitude. 4.3.5 Center Analyzer removed, Back Panel Polarizer rotated The final approach was to rotate the polarization of the back panel by 90°. This involved removing the analyzer as well as the polarizer of the back panel. Since the polarizer of the front panel has the same alignment as the analyzer of the back panel, it was not necessary to add another (rotated) analyzer. This allowed the polarization foil in between the two spatial light modulators to act as both, polarizer and analyzer, at the same time. Completely removing one analyzer improved light throughput (even if the incident light is perfectly aligned with the polarization foil, there is still some amount of light that is absorbed) and reduced the chance of the back panel being misaligned with the front panel which would have further reduced transmissivity (see figure 4.8). 4.4 Backlight Construction A typical color LCD panel has a transmissivity of about four percent. Double modulation using two LCD panels therefore requires twenty-five times the amount of light in order to achieve the same brightness. In the demonstrated setup this was improved slightly by removing a redundant analyzer and changing the polarization of the back panel. Nonetheless, a strong backlight is an essential part for building a high dynamic range display with two stacked layers of chromatic LCD panels. Additionally, using double modulation allows a finer control of pixel intensities since the transmissitivity level of every pixel is controlled by two, instead of only one 8 bit value per color channel. This additional bit depth can be used to either increase the accuracy of the displayed intensity or to support a brighter peak luminance while keeping the accuracy the same. Due to the light loss of the two LCD panels it was - 28 - Chapter 4 - Construction of a HDR Stereo Display Removed! Analyzer Backlight Polarizer rotated (90°) Polarizer and Analyzer at the same time! Spatial Light Modulators (SLM) Figure 4.8: Final working approach by removing the back panel analyzer and replacing the back panel polarizer with polarization foil rotated by 90°. decided to build the brightest possible backlight within the cooling and budgetary constraints. A picture of the final backlight unit can be found in figure 4.9. Figure 4.9: The final backlight consisting of 24 LEDs aligned in a six by four grid. 4.4.1 Light Sources In recent years Light Emitting Diodes (LEDs) have drastically improved their brightness per watt level and were an obvious choice for such a backlight. The final LED array used was the Bridgelux BXRAC2002 (as shown in figure 4.10) because two of those LEDs can be powered by one Mean Well LPC 60-1570 constant current power supply. In this configuration every LED produces about 2270 lumen of white light while using 28,875 Watts of power. In order to fit the aspect ratio of the screen and keep a - 29 - Chapter 4 - Construction of a HDR Stereo Display smooth lighting across the surface, twenty four LEDs were aligned in a six by four regular grid. Figure 4.10: Closeup of the LED light sources. Even though LEDs are highly efficient at producing bright white light, the vast amount of energy is still converted to heat which has to be accounted for and therefore cooled. Custom-building a heat spreader that would fit the screen perfectly would have been the optimal solution but was out of the scope of the project. So twenty four standard CPU socket coolers, one per LED, were mounted together in a rectangular shape. Since the luminous output of a LED decreases with the temperature the LED is operated at, excess cooling would only manifest itself as a brighter final image. The coolers used are designed for processors with a thermal design package (TDP) of 65 Watts and provided good enough cooling, even in a vertical alignment of multiple heat spreaders. 4.4.2 Layout In order to understand the number of LEDs and their layout, an application was written to simulate the light pattern reaching the display if it were to be placed at a specific depth. The application considered parameters including the number of LEDs used, the layout and size of the grid as well as the type of LED used. The simulator went as far as considering the light distribution pattern of the LED as the light intensity is not evenly distributed across the hemisphere in front of the LED. The output of such a simulation included the light intensity in cd/m2 , the variance of the light across the surface, information on the required power supply and the cost of the parts. The resulting light pattern was also presented as shown in figure 4.11. The final placement of the backlight was determined by practical constraints and was placed at about seven centimeters from the panel to spread the light contribution further across the backlight diffuser and therefore avoided luminous hotspots. It also aided cooling as it provided enough space for air to flow freely between the LEDs and the panels. - 30 - Chapter 4 - Construction of a HDR Stereo Display Figure 4.11: Visual result of the backlight simulation. The white hotspots represent the LEDs that are aligned in a 6 by 4 grid in this iteration. The simulation takes the layout, the type and the light intensity distribution of the individual LEDs into account. 4.5 Visual Artefacts Distributing the task of producing enough luminous power among twenty four LED arrays made the backlight quite diffuse, but it still produced visible hotspots on the final image. To reduce the hotspots an additional three millimeter thick diffuser with a transparency of 42% was added to completely remove the hotspots and make them undetectable to human vision. Another disturbing artifact was the varying brightness depending on the viewing angle, as it was not uniform across the screen. It described a sinusoidal grating repeating itself multiple times. The frequency of the grating depended on the distance of the viewer to the screen. It is important to note that the effect was only visible horizontally and not vertically. A closer observation of a LCD panel provided a solid understanding for this effect, but no proof has been found to verify it: When viewing a single LCD panel under a microscope, it reveals the red, green and blue sub pixels. These are aligned in vertical stripes. Stacking two panels perfectly on top of each other will align the vertical stripes as well. Because physical pixels do not only have a width and a height, but also a depth that light has to pass through, the light ray from the backlight may traverse into neighboring sub pixels instead of the corresponding front panel sub pixel. Therefore, a light ray perpendicular to the screen will probably pass through the corresponding sub pixels in both panels, but a light ray at a different angle will – depending on the entry position of the ray at the back panel – traverse a sub pixel neighbor of the corresponding front panel pixel. As the retina of the human eye is smaller than the visible area of the display all but the central vertical pixel stripe will have a non-perpendicular angle to the viewer. In a monochromatic display this would not be a big problem and could be accounted for digitally (see guarnieries’ work [22]) if the position of the eye is known because the corespondances between the back - 31 - Chapter 4 - Construction of a HDR Stereo Display panel and the front panel pixels can be computed. The LCD panels used have built-in color filters instead of a separable layer, and the removal thereof was not possible. Therefore the only solution to this problem was to diffuse the resulting image using a diffuser. Of course this reduced contrast by a factor of about three, but resulted in a much more appealing image. - 32 - Chapter 5 - Calibration 5 C ALIBRATION The motivation for calibrating any output device such as a display, a printer or a projector, is to be able to know the exact luminance presented for a given display drive level. The goal of calibration therefore is to find a function f that transforms the drive level to the output luminance L. The form of such a formula is given in figure 5.1. Such a function is called transfer function of a device. A feasible implementation of a transfer function is a lookup table that maps drive levels to output luminances. L = f (PDL), PDL = f 0 (L) Figure 5.1: General transfer function mapping a pixel drive value to a presented luminance as well the inverted function, mapping from a luminance to a pixel value. An application intending to display a luminance of a given level L will, on the other hand, use the inverse transfer function to lookup a display drive level that needs to be set in order to produce that luminance level. Inverting a transfer function generally incurs a loss of accuracy, as the resolution of the pixel drive level is usually in the range of eight to ten bits, whereas the luminance requested is of analog nature. Therefore, luminances are usually mapped to their closest possible representation for the inverted transfer function. Depending on the accuracy of the calibration algorithm, other factors such as the pixel location which is to be calibrated, the position of the viewer, temperature and possibly luminances of neighboring pixels can be additional parameters to the transfer function. A tradeoff has to be made between the size and complexity of the lookup table and the generality of the transfer function. Environmental influences such as ambient light are implicitly measured and therefore included in the calibration without requiring special care. Other factors such as the viewer’s position are usually ignored altogether. Calibrating a display therefore consists of sampling the color space by displaying those samples and measuring the corresponding response. Depending on the display device it might be infeasible to completely sample the full color space. In such a case linear interpolation or more advanced curve fitting can be used to fill missing samples with estimated measurements. The transfer function of a double modulator display, such as the one presented in this thesis, is different in that the pixel drive level is not a single value but a tuple of two values representing front and the back panel. Full calibration for double modulation displays thereby requires iterating both 8 bit drive levels, resulting in 216 required measurements per color component. Even though LCD panels consist of separate red, green and blue sub pixel, color independence of those channels is assumed. Two approaches to calibrate the display were attempted, which differ in the way the luminosity presented is measured. The second attempt used a colorimeter with a modified software stack, whereas the first attempt used a consumer grade DSLR camera and is described in the section below: - 33 - Chapter 5 - Calibration 5.1 Using HDR Image Recovery A digital camera is at the core a light intensity measurement device capable of recording multiple samples at the same time. An image of a display presenting a calibration pattern contains the relative luminance values of the perceived calibration pattern. In order to recover those luminance values from the captured image, the location of the individual pixels have to be associated with the pixels in the calibration pattern. This association is created by determining the intrinsic camera parameters and a homography transform representing the linear transformation from the display to the camera. Both can be measured by capturing the image of a checkerboard pattern. First, corners in the reference checkerboard image and the captured image are determined using OpenCVs findChessboardCorners. For further accuracy, the corners are refined using OpenCVs cornerSubPix, which improves the actual position of the corner by inspecting the surrounding gradients. After the exact corners have been located in the reference as well as the captured image, a 2D camera matrix is created using initCameraMatrix2D and calibrateCamera providing the corners of the reference image as object points (extended to 3 dimensions) and the corners found in the captured image as the image points. calibrateCamera estimates the intrinsic and extrinsic camera parameters from multiple views, but only a single view is used here. The resulting intrinsic parameters are used to undistort the captured image, the extrinsic parameters are ignored. Finally, a homography is calculated using findHomography with the corners found by a call to findChessboardCorners on the undistorted captured image. The parameters are then stored as calibration data which can be reused for any calibration image captured as long as the relative position between the camera and the display is not changed. The process of measuring the intrinsic and extrinsic camera parameters is explained in figure 5.2 Take image of scene Find camera distortion parameters distortion parameters Find homography transform and crop Homography matrix (3x3) Figure 5.2: In order to find the lens distortion parameters as well as a homography transform, a checkerboard pattern is presented, captured using a standard, single jpeg image and compared to the presented checkerboard pattern using OpenCV. These parameters are then used to undistort the captured HDR images. Every calibration image captured is undistorted using the camera distortion parameters, homography - 34 - Chapter 5 - Calibration warped and cropped, aligning the pixels in the captured image with the pixels in the source calibration pattern (explained in figure 5.3). Take hdr image of scene Apply undistortion distortion parameters Apply homography transform Homography matrix (3x3) Figure 5.3: The captured images using multiple exposure times are recombined to a HDR image, then undistorted and homography warped in order to align the captured pixels with the calibration image. Since the dynamic range of a consumer DSLR does not capture high dynamic range image content, multiple exposures are used to recover such content. Every HDR image was captured using 16 exposures ranging from 1/500s to 30s, the maximum range possible with the Canon EOS 400D used in this project. The HDR recombination uses the algorithm presented by Robertson et al [52] which is implemented in the PFStools package. The implementation claims to be able to recover the actual absolute candela values provided by the Y component of the XYZ color space stored in the resulting OpenEXR files. The calibration pattern used consisted of multiple patches of size 256 by 256 pixels. The red component was increased by a step size of 4 units in the x dimension of the patch, whereas the green component was incremented by the same step size in the y direction of the patch. From patch to patch the blue component was also increased by a step size of 4 units. The larger step size allowed for a slight misalignment of the registration, because a single pixel measurement corresponded to four pixels in the captured image. In this configuration, two calibration images consisting of 24 and one consisting of 16 patches where required allowing a full calibration to be performed by capturing three high dynamic range pictures using 16 exposures. Unfortunately the result of this calibration approach was unsatisfying as blooming effects from the longer exposed pictures caused inaccurate measurements of the black levels. Also, the absolute luminosity values reported by the HDR recovery algorithm were questionable. It was therefore decided to try another approach using a device more suited towards performing accurate measurements: a colorimeter. - 35 - Chapter 5 - Calibration 5.2 Using an Intensity Measurement Device A colorimeter is a device created for sensing luminosities and allows the measurement of color. At the hardware level a colorimeter is similar to a digital camera, as both are built using CMOS sensors, but the way the measurements are made differ drastically: a digital camera counts the number of photons reaching the individual CMOS cells in a given time frame called the exposure time. A colorimeter on the other hand measures the time taken until a specific number of photons reach the CMOS sensor. The time required for a measurement using a colorimeter therefore depends on the measured brightness with dark luminances (fewer photons) requiring a longer time to measure compared to bright luminances (more photons). A colorimeter reports a single measurement at a time. Colorimeters are usually bundled with software for calibrating standard display devices. The calibration algorithms sample the color space at various locations and fill the missing measurements by estimation. Unfortunately the software is not suited for use with double modulation displays and custom software was required for calibration. Fortunately the open source color management suite Argyll [21] includes support for the Datacolor Spyder 3 device that was chosen. Parts of the calibration software were modified to allow triggering a single measurement with the device and retrieving the resulting color. Due to the required time taken for measurement it was decided to calibrate for colorless luminosities only in order to enable JND calibration. Using this interface, a java application was developed that sampled the 256 by 256 (8-bit front panel, 8-bit back panel) calibration space using recursive refinement. The algorithm first measures the most extreme points of calibration, namely (0, 0), (255, 0), (0, 255) and (255, 255). After this first iteration the sampling space is subdivided on both axes and measurements are taken at (0, 127), (127, 0), (127, 127), (127, 255) and (255, 127). The further iterations continue with the subdivision in the same way. In order to compensate for the increasing temperature of the backlight causing the brightness to drop over time, each measurement was preceded by a measurement of the current peak white luminance. This allowed the measured intensities to be represented relative to the maximum instead of an absolute value and therefore reduced the effect of changing backlight intensities. 5.3 Crosstalk Calibration Multiplexed stereoscopic display setups have an additional parameter that can be measured and for which compensation techniques can be calibrated for: Crosstalk. Crosstalk is caused by the fact that multiplexed stereo setups have a component interleaving the signal (e.g. the display) and a component separating the interleaved channel into two distinct channels (the glasses). Ghosting is the visual result if this multiplexing or de-multiplexing step does not perform accurately. The term ghosting stems from the semitransparent appearance of the same object to the left and the right of the object, depending on the distance and therefore disparity of the pixels at a given depth. An illustration of what ghosting appears like can be found in figure 2.2. - 36 - Chapter 5 - Calibration If the amount of crosstalk for a given pixel is known, a compensation technique can be employed to reduce this effect. A few common crosstalk compensation techniques are discussed in the related work. There are two categories of crosstalk: system crosstalk and perceived crosstalk. System crosstalk is the amount of physical light that unintendedly leaks from one channel to the other. The amount of system crosstalk can be measured and depends on the intensity values of the pixels at the same physical location, but is usually independent of the image content in general. Perceived crosstalk on the other hand is the amount of crosstalk that a user actually perceives and therefore heavily depends on the image content. In a high frequency image, for instance, a ghosted contour might not be as visible as the ghosted contour in an otherwise low frequent image. Due to this dependency on image content, it is usually not attempted to calibrate for perceived crosstalk in general. The physical crosstalk of a stereo display system can be measured by capturing an image through the shutter glasses. For every combination of the left and the right intensity value, the image for calibration requires two samples: One containing the intended intensity on both eye channels, representing the state without ghosting, and one containing the intended intensity on one channel and the unintended on the other channel. An example of what such an calibration pattern could look like is given in figure 5.4. c) a) b) d) Figure 5.4: Image for the left eye, image for the right eye and the reference map: a) denotes the area with an intended absolute black and without ghosting, b) denotes the area with absolute white (no ghosting), c) denotes the area of an intended black ghosted with pure white and d) the area with intended black ghosted with pure black. The ghosted areas of course swap roles if the calibration image is taken through the opposite eye. As an example, if the crosstalk of the display at the maximum inter-ocular contrast (one eye white, the other one black) is supposed to be measured, four color samples need to be presented on the display, captured by the camera and retrieved from the captured image: the white value without any crosstalk, the white value with crosstalk, the black value without any crosstalk and the black value with crosstalk. The intensity ratio between the sample with ghosting and the one without ghosting then denotes the amount of system crosstalk produced by the pair of intensities. The formulas for calculating the amount of crosstalk in percent for the white and black intended intensities is given in figure 5.5. Due to the high amount of required samples to fully calibrate for crosstalk, only the crosstalk values for the contrast combinations used in the user study have been measured using this approach. - 37 - Chapter 5 - Calibration CTBI = Ic Id ,CTW I = 1 − Ib − Ia Ib − Ia Figure 5.5: Using the calibration chart presented in Figure 5.4 the amount of crosstalk can be calculated independently for an intended white as well as an intended black image. Ia,b,c,d denote the intensities measured at area a, b, c and d in the pattern, CTBI defines the crosstalk for the black intended intensity, whereas CTW I denotes the amount of crosstalk for the white intended intensity. 5.4 Just-Noticeable-Difference Mapping Even though a display might be able to display a great amount of grey levels, not all of them may be distinguishable. In medical display scenarios it is important for the observer to be able to distinguish between two grey levels, even if they are close to each other in the tonal space. The rendering algorithm should therefore map every single grey level to a level that is distinguishable from all other grey levels in the same image. The minimal brightness step required to distinguish one brightness level from the following is called Just Noticeable Difference (JND) and is described as a function of Weber’s Law 5.6. The size of the just noticeable difference step therefore grows with an increasing absolute value. ∆I =k I Figure 5.6: Weber’s Law. ∆I I denotes the Weber (also known as Fechner) fraction. The law states that the incremental threshold step over a background intensity relates in a linear way to the background intensity. In order to present a JND mapped image, every distinct grey level in the image is assigned to a JND step from the calibration table. If there are more distinct grey levels in the source image than the display is capable of presenting, some levels have to be assigned to the same JND step. This causes some JND steps to become indistinguishable from each other. Producing a just noticeable differences (JND) table from calibration data of a display is a matter of finding a sequence of intensities where every subsequent intensity produces a luminance that is at least the size of the JND step greater than the luminance produced by the previous luminance. In the case of a double modulator display, there might be multiple combinations producing the same or similar luminosity values. In such cases, the JND calibration involves choosing among multiple combinations of front and back panel drive levels. The criteria that has to be met is that one JND step to the next must be of at least the size of the JND step at the given intensity. Choosing a value at a given step changes the possible combinations at the next step. Therefore, there are multiple paths through the two dimensional array of luminances which obey the rules of JND calibration, but the amount of possible JND steps can vary greatly. Bimber et al [5] present an algorithm that optimizes the path through the two dimensional array. It - 38 - Chapter 5 - Calibration works by sampling multiple curves of the form y = xσ – referred to as the basis function – through the two dimensional array. The basis function is chosen, such that all possible values can be reached by optimizing for a single parameter (σ ). ”For each theoretically possible JND step ( j) with luminance L j we choose a set (C j ) of gray scale candidates (c ∈ C j ) that leads to reproducible luminance levels (Lc ) larger than or equal to L j , and whose shortest (x, y)-distance (∆c ) to our basis function is not larger than a predefined maximum (∆). From each C j , we select the candidate s j ∈ C j that is closest to L j .” [5]. The algorithm used for this thesis is the same but the constraint of having front and back panel values close to the basis function is relaxed as it can only be beneficial if the image should remain apprehend able without the second modulator. In Bimber et al’s work one of the modulators is a printout and the second modulation is produced by a projector tracking the printout and aligning the projected image with the printout. In that case, the proximity of the front and back values compensates for a slight misalignment of the tracking algorithm. For this prototype as the lcd panels are tightly stacked on top of each other the amount of misalignment is minimal and constant. Visual banding effects, such as described in [5] are not apparent. 5.5 Results The results of the display calibration lead to a maximum contrast of 2400 to 1. The peak luminance is at 300 cd/m2 , but quickly declines down to about 240 cd/m2 when the backlight starts heating up. The contrast decreases down to about 1900 to 1, which still corresponds to about three times the contrast of a single original panel. Using the Just Noticeable Difference algorithm presented above the display allows displaying 446 JND steps, about one third more when compared to the original display. Figure 5.7 shows the resulting paths in the two dimensional array of display drive levels. The luminance is color coded. The color red is associated with the maximum luminance of 296.82 cd/m2 and dark blue with the lowest luminance of 0.148cd/m2 . The different paths correspond to the curves determined by the algorithm presented by [5] using different curve hulls of 0.025, 0.05 and 0.5. The curve hull (∆) value limits the maximum Cartesian distance between the front and back panel drive values. The greater this limit is, the more candidates can be chosen from at each JND step allowing for more potential combinations of front and back panel drive values. Accordingly, the path with the largest delta can recreate 446 of 463 possible JND steps within that range and was accepted since no side effects from the higher delta could be observed. There are multiple reasons why the contrast of the display is not as high as originally anticipated. First, the manually reapplied polarization filters are not as well aligned and of such good quality as the original polarization filters on the LCD panel. If this approach was applied using the same process that was employed to construct the original panel, the reached contrast would be substantially higher. Secondly, the display was not constructed in a cleanroom environment. There is always some amount of dust destroying the polarization of the light between the layers. This could also be avoided if the model was built in a production environment similar to that of the original panel. - 39 - Chapter 5 - Calibration cd/m^2 300 250 250 back panel drive level (8 bit) 200 200 150 150 100 100 50 50 0 0 0 50 100 150 200 250 front panel drive level (8 bit) Figure 5.7: The front / back panel combinations chosen by the JND calibration algorithm presented in [5]. The value on the x - axis is associated with the display drive level for the front panel, the value on the y - axis corresponds to the value of the backpanel. The color-coded value corresponds to the intensity of the front-panel in cd/m2 . The blue, red and green paths correspond to hull values of 0.025, 0.05 and 0.5. Another reason stems from the alignment of the panels. Rows of pixels within a LCD panel have some amount of spacing in between them. The transmissivity of this area does not change with the values set for the pixels but completely blocks the light. The area of the front panel does not receive any light from the corresponding area in the back panel. If the pixels are all perfectly aligned and the modulators would be absolutely flat, this would not be a problem as the light would pass through the pixel completely and not pass from a pixel area to a blocked area and vice versa. In the prototype the two layers are placed as closely to each other as possible, but the physical thickness of the panels and especially the polarization foil causes a pixel to have a depth of some millimeters. If the observer is positioned perpendicular to the display surface, the light passes through the pixel perfectly. The problem is that the observer is smaller than the display surface, so every pixel but the one that the observer is perfectly in front of will be viewed from an angle. The effect is even worse considering the horizontal case. Color in LCD panels is achieved by having separate sub pixels filtered by either a red, green or blue color - 40 - Chapter 5 - Calibration filter. The sub pixel elements are aligned next to each other and are small enough to be indistinguishable by the human vision system. The three separated image channels are merged together in an additive color mixing fashion, allowing colors other than red green or blue to be presented. The shape of the sub pixels are rectangles aligned next to each other with the height being three times as high as the sub pixel is wide. This causes the three sub pixels to form a square when put next to each other. When placing two LCD panels on top of each other as described before, the alignments of the sub pixels only match in the perpendicular case. At all other angles, light passes from a sub pixel element to a sub pixel element filtered by a different color filter. The result of this is that the initially white light is filtered by two different color filters absorbing most of the spectrum. The problem is explained visually in figure 5.8. (few) light rays are terminated due to passing two filters of different wavelength Many light rays are blocked due to the greater spacing between the color filters TN Layer Polarization Foil Air TN Layer Polarization Foil Figure 5.8: The influence of space between the color filters of the back and the front panel. The left image indicates the ideal situation, in which separation is minimal, causing few light rays to be terminated due to passing through color filters of different wavelength. The right image explains the current situation causing a sinusoidal grating to be visible, depending on the viewing position, if not compensated by a front diffuser. - 41 - Chapter 6 - User Study 6 U SER S TUDY Related studies have shown that crosstalk leads to discomfort, headaches and stereo sickness ([25], [44]). It has also been shown that stereo pairs with a higher disparity are more difficult to fuse into a single image, due to the fact that the accommodation-vergence mismatch becomes greater with greater disparities. If the disparity is beyond that of the Percival’s zone of comfort, and therefore if the coupling of vergence and accommodation provides inconsistent signals, the viewer will not fuse the images without discomfort [2]. Crosstalk on the other hand is known to have a high impact on stereo perception, and ghosting becomes visible and disturbing even for very low amounts of leakage between the two image channels. What – to our knowledge – is missing is the correlation between all of those factors. How does contrast relate to crosstalk? Is a higher contrast even really beneficial or is the technical effort required better invested into reducing the amount of crosstalk? Does artificially added crosstalk influence crossed and uncrossed stereoscopic viewing in the same way? Most user studies concerning stereoscopic displays involve questionnaires trying to estimate the discomfort a user feels e.g. while watching a movie over a longer period of time. One goal for this user study was to retrieve objective measurements from the user instead of subjective ones. This is important as there are other factors involved in subjective measurements than in objective ones, such the motivation of the participant. Further, it was utterly tried to keep the parameter space as compact as possible to not complicate the matter further and leave less room for interpretation. As the contrast levels produced by the high dynamic range stereo display are not as high as initially anticipated, the user study is not able to give definite answers onto how contrast, crosstalk, disparity and the kind of stereo image (crossed / uncrossed) relate to each other, but should at least give some indication on the impact of these various factors affecting each other. One of the key aspects of stereoscopic viewing is stereopsis. Stereopsis matches features in both monocular eyes and works even if all other depth cues other than features offset by disparity are missing. It was assumed that – as detecting features in both images is required – contrast plays a major role in stereopsis as previous related work has shown [23] that – albeit at very low contrast ratios – stereo acuity depends on contrast. A well-known approach to presenting depth images that have no other depth cues than stereopsis contained within them, are Random Dot Stereograms (RDS). 6.1 Constructing Random Dot Stereograms First presented by Béla Julez in 1919, Random Dot Stereograms have been employed in many user studies and have also found their way into books titled by the name ”The magic eye”. Variations including color and even animated Random Dot Stereograms are possible. A RDS is usually created in two steps: First, the left image is filled with random black and white dots. Then, a copy of that image is created for the right image. In this right image, the area to be offset in depth - 42 - Chapter 6 - User Study is shifted by the desired amount of disparity. Finally, the space between the original area and the shifted area is filled with new random dots in order to avoid having the same random dot patterns twice. The process is explained visually in figure 6.1 Figure 6.1: Process of constructing a Random Dot Stereogram. First, both images are filled with the same pattern. Then, the region(s) that should have disparity are displaced and the resulting holes are finally filled by new random dot patterns. A single RDS can have multiple areas placed at different depths, but it is important to note that the depth of the surface has to be constant across the surface. This means that only flat surfaces, perpendicular to the viewing axis, can be represented effectively within RDSs. The approach of constructing a RDS presented here is the most basic one, and more advanced algorithms that accept a depth image as input and deal with texture patterns to construct more or less visually appealing stereograms in real-time on a GPU, exist [49]. 6.2 Test Patterns and Parameter Discussion Initially, it was planned to confront the user with a task that involved detecting multiple square surfaces and determine whether they are in front or behind of a reference surface. While the output of the test is a value with a high resolution (the number of squares detected correctly), it also has many parameters influencing the result: the size of the individual squares, their position on the screen as well as possible relations to each other. This kind of test pattern also allows comparing multiple quads to each other and therefore allows a user to detect a square in front of the screen. This is possible because it is different from other squares instead of being able to detect it correctly, being able to correctly fuse the image. The task would involve other cognitive subtasks such a counting. It was therefore decided to reduce the recognition task down to detecting a single quadratic surface and generate the required precision by repeating the task multiple times. This allowed narrowing down the parameter space to the settings of interest alone, as the other parameters, such as the size and position of the square are fixed and remain the same for all tasks that the user is confronted with. After the presentation of each single stimulus image, the participant decides on whether the square was in front of or behind the reference surface by pressing - 43 - Chapter 6 - User Study the up key if the surfaces appeared further away and the down key if it appeared closer than the reference surface, which was always at a single pixel disparity. This single pixel disparity for the reference surface is necessary to force the display to switch pixels without which the contrast ratios of the quadratic surface and the reference plane would differ. The stimulus was presented for a second and in between the stimuli, a cues-consistent cross with no disparity was presented as a fixation stimulus, allowing the participants eyes to converge at the image plane. The fixation stimulus was presented until the user answered the previous task with a key press and pressed the space bar to trigger the next stimuli. This allowed users to take a break if needed, but only one of the participants requested a short break of about 5 minutes. A session for one participant consisted of 750 repetitions of the same task with the display parameters varying in three dimensions: contrast, crosstalk and disparity. All pixel values for the front and back panel used for presentation were manually selected in order to achieve uniform display brightness across the different contrast, crosstalk and disparity levels. The actual amount of crosstalk and contrast was later measured (using the same methods as described in section 5) and therefore lead to the graphs having a non-orthogonal area containing actual sample values. The highest contrast level was measured at 600 to 1, with peak luminance reaching an estimated 70cd/m2 when observed through the shutter glasses. The contrast level ranged from 0.1 percent of the maximum contrast (about 2:1) up to the maximal contrast in five discrete steps at 0.1, 1.0, 10.0, 50.0 and 100.0 percent. The 50.0 percent step was introduced as closer accuracy, as the higher contrast levels was of interest. The actual contrast levels varied due to the influence of the crosstalk parameter. Measurements of the chosen contrast levels revealed contrast values of 5 : 1, 18 : 1, 61 : 1, 130 : 1, 540 : 1 when averaged over all crosstalk levels. The amount of crosstalk is a parameter that is rather unreliable, as the actual crosstalk caused by the display system is not known and varies with the contrast that the display is required to display. The crosstalk dimension was sampled at five different locations and included crosstalk levels that are way above what any user would accept when viewing a stereoscopic image. Figure 6.2 depicts how the crosstalk steps (labeled 0.0, 0.1, through to 0.4) correspond to the actual measured crosstalk using the approach described in the calibration chapter (chapter 5). The percentage in the graph indicates the relative amount of luminance that leaks from one eye channel to the other. A percentage of 100 % indicates that the pixel intensity of the unintended pixel luminance is the same as the intended pixel luminance and therefore no image separation is taking place. A percentage of 0 % on the other hand, corresponds to perfect image channel separation and that only the intended luminance is visible to the destined eye. The disparity was varied from 5 pixels to 45 pixels in 10 pixel step increments, which corresponded to a disparity angle ranging from 0,1 to 0,9 degrees. The participants’ heads where not fixated but had an approximate viewing distance of about 70 to 90 centimeters, depending on the viewers preferred pose while sitting in front of the image plane. The disparity was actually varied in two directions, once for crossed and once for uncrossed stimuli. The disparity steps and ranges were kept the same for both kinds - 44 - Chapter 6 - User Study 0.6 ● 0.4 ● 0.2 ● 0.0 measured crosstalk factor (% / 100) 0.8 ● ● 0 0.1 0.2 0.3 0.4 labeled crosstalk step Figure 6.2: The actual crosstalk measured depends on the contrast. The plot describes the ranges of crosstalk that where measured for various contrast settings at a given crosstalk label. of stimuli. The visual acuity of the random dot stereograms was at about 24 CPD and therefore well resolvable by the human vision system. 6.3 User Study Environment and Participants The user study itself was setup at the Virtual Reality Center of the Johannes Kepler University Linz, as the room had no windows and therefore guaranteed to have consistent lighting conditions independent of the time of day. The participant and the screen were enclosed in a cabin covered by light absorbing blankets, further reducing the amount of ambient light that could possible interfere with the users perception. The lighting in the room was reduced to a bare minimum, but an emergency exit light – which was not directly visible from the test setup – had to remain on. The user study was performed by 44 participants in the age between 24 and 59. 19 of them were female and 25 were male. All of the participants had normal or corrected to normal vision and, if required, wore either contact lenses or glasses along with the shutter glasses. On request the varying parameters were explained to the user after the experiment was conducted, in order to not influence the result. - 45 - Chapter 6 - User Study 6.4 Results Due to limiting the number of varying variables to four (contrast, crosstalk, disparity and crossed/uncrossed), the output of the user study is a five dimensional data structure with one dependent and four independent variables. The number of dimensions inhibits a visualization of all dimensions within one image. Due to possible errors of interpretation and lack of navigation possibility in static display environments of three dimensional perspective images, two dimensional color maps are used here. The color coded intensity value represents the error rate. While observing the data, all graphs where aligned in a grid structure, such that two dimensions where represented along the sides of the grid as well as two dimensions were mapped within one graph. The observer was then able to increase by a step size on the outer dimension by switching to another graph, either on the horizontal or on the vertical axis, depending on which parameter should be increased. Additionally animations of the graphs were used to employ temporal vision to find trends within the data. Every combination of contrast, crosstalk, disparity and kind (crossed/uncrossed) stimuli setting has been observed 3 times by every user. Therefore, at the lowest level of detail, where every parameter is set to a specific value, the percentual step size is crosstalk = 0.1 crosstalk = 0.2 0.5 0.8 0.6 0.6 0.4 0.2 0.2 0.2 100 300 400 500 100 crosstalk = 0.1 0.2 0.2 0.2 0.4 100 400 500 400 500 600 0.6 0.2 0.4 0.1 0.1 0.1 0.0 0.0 200 200 0.6 0.6 0.3 0.2 0.4 0.4 500 400 400 500 500 600 600 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.2 0.2 0.0 400 contrast 300 0.8 0.8 0.1 300 0.2 0.2 crosstalk crosstalk = = 0.4 0.4 0.2 200 0.2 100 100 0.4 0.3 100 0.3 0.3 contrast contrast 0.8 0.0 300 contrast 0.3 crosstalk = 0.3 0.1 200 300 0.5 0.2 0.0 400 0.4 0.4 0.0 200 disparity 0.6 0.1 300 100 0.4 0.3 0.4 contrast disparity 0.4 contrast 500 0.8 disparity disparity 0.4 200 400 crosstalk = 0.2 0.4 0.6 300 0.5 0.5 0.2 0.2 0.0 200 0.5 0.8 0.6 0.4 0.4 contrast 0.5 0.8 0.2 0.1 0.0 200 0.3 0.2 contrast crosstalk = 0 100 0.4 0.1 0.0 400 0.2 0.2 contrast 0.6 0.6 disparity 0.4 0.1 300 0.3 0.5 0.8 0.8 0.4 0.6 disparity 0.3 disparity disparity 0.4 200 0.8 0.4 0.6 crosstalk crosstalk = = 0.4 0.4 0.5 0.8 0.4 100 crosstalk = 0.3 0.5 disparity disparity 0.8 giving a resolution of 0.75 percent. disparity disparity crosstalk = 0 1 132 100 0.0 200 300 400 contrast 500 600 100 100 0.0 0.0 200 200 300 400 400 500 500 600 600 contrast contrast Figure 6.3: Plots containing the results with increasing crosstalk along the images. The two rows represent results seperated into stimuli with crossed (top row) and uncrossed (bottom row) disparity. The axis of the images denote the measured contrast as well as the disparity used. The color coded value designates the rate at which the participants answered incorrectly, which is normalized from 0 to 50 % (red being equal to 50 % incorrect answers). Quite a few pieces can be derived by visually examining the gathered data shown in figure 6.3: Firstly, as expected and well known in the literature (such as in [2]), the amount of errors increases with increasing disparity. This is due to the fact that the vergence-accommodation mismatch becomes greater with - 46 - Chapter 6 - User Study increased disparity. Secondly, it is clear that increasing crosstalk also increases the error rate due to disturbing artifacts. It is known to literature that stereopsis works even with a lot of crosstalk disturbing image perception. Yet, it is astonishing how much crosstalk is actually tolerated before binocular fusion is given up and diplopia is accepted. Of course the amount of added crosstalk is well beyond that of what any display engineer would accept. The amount of visible ghosting at the higher sampling steps in the crosstalk dimension would also not be tolerable when watching a movie. Rather, the argument for including this dimension in the user study is to find out what kind of effect such disturbance has on stereopsis and possibly what causes stereopsis to be so stable. Another very interesting conclusion from the data gathered is that the error rate does not seem to improve with increasing contrast once it has reached a contrast level of about 110 : 1. This value was determined by visual examination of the graphs displayed. The boxplot in Figure 6.4 with data aggregated over all test settings reveals that this level might be even closer to 60 : 1 for low-crosstalk settings. It is also interesting that this holds for different levels of crosstalk as well. While the exact neurophysical reasons causing this is unknown to the author, a reasonable explanation would be that the cells detecting borders in the human vision saturate at this given contrast level for binocular cells. Applying the t-test to the values aggregated as groups by contrast only reveals a significant increase with a confidence interval of 95 % between the lowest (5 : 1) and the second lowest (18 : 1) contrast level, the null-hypothesis can be rejected due to the p value being well below 0.05. The t-test fails to reject the null hypothesis for contrast levels higher than the lowest value with a p value greater than 0.05. Up to which value stereo acuity actually improves cannot be determined by this user study due to lacking accuracy and variance in the measurements. Possible improvements on how this could be measured with more confidence are described in section 7. As described before, an improvement in stereo acuity with rising contrast can be derived from the lowest contrast level to the second lowest level only. This improvement does not hold statistically for the higher levels. These results can have to possible interpretations: Either stereo acuity does not improve with a contrast level higher than about 110 to 1, or the stimuli used in the user study are too easy to resolve and the false answers in the results at higher contrast levels actually stems from user input errors. It should be noted that the user study was performed in hope of finding decreased stereo acuity. While visual hints might reveal indications of decreasing stereo acuity, the statistical test do not reveal any signs towards such a trend. When separating the results into crossed and uncrossed stimuli, another interesting effect becomes apparent: It seems as if crosstalk has a higher influence on uncrossed stimuli than on crossed stimuli. Without artificial crosstalk added, the graphs depicting the results from the uncrossed stimuli reveal that fewer errors have been made in the uncrossed case compared to the crossed scenario. As explained previously, this makes sense as perspective causes a greater perceived depth for the same amount of disparity. When adding artificial crosstalk to both scenarios, the graphs reveal that this has a higher impact on the uncrossed scenario whereas the crossed results remain nearly the same. - 47 - 90 100 Chapter 6 - User Study 80 ● ● ● 70 ● 60 % correct ● 40 50 ● 5:1 18:1 61:1 130:1 540:1 contrast Figure 6.4: Boxplot showing the increasing amount of correct answers when contrast is increased. The values where evaluated over all crosstalk, disparity and contrast settings. While the plots in Figure 6.3 show that there isn’t any real improvement beyond a contrast level of 110:1 this contrast ratio may be even lower considering the aggregated results shown in this plot. The only difference between the crossed and uncrossed scenarios is the sign of the disparity and therefore any influence of contrast or crosstalk is the same for both tests. A possible explanation is that the difficulty of the test pattern becomes too low for higher contrast ratios, such that the crossed patterns do not suffer any further. This though can be argued for all tests performed in the user study. - 48 - Chapter 7 - Summary and Future Work 7 S UMMARY AND F UTURE W ORK In this thesis work on high dynamic range stereo perception was presented. The goal of this thesis was to find an indication on whether higher contrast in stereoscopic image pairs could have a negative impact on the stereo acuity rather than to build the perfect display. It should be considered as an attempt to build a display in order to answer one essential question: Does a contrast ratio beyond 110 : 1 really help when viewing stereoscopic content? Unfortunately, a definitive answer cannot be derived from this work, but indicators reveal that display development may have very well reached the limit of what stereopsis in the human vision system can benefit from within a single image. The user study in this thesis has shown that stereo acuity does not improve, but not that stereo acuity could actually decrease with further increased contrast. A third simple prototype was developed using transparencies with higher contrast values. This prototype used the backlight from the second prototype and provided a higher peak luminance. Even with the higher contrast values it was still possible to fuse the stereo pair and therefore also did not reveal the intended effect. Nonetheless, light leaking around the transparency seemed to make fusion more difficult and covering the light leak made this task seemingly easier. Even if this might be an indication, the simultaneous contrast required is still orders of magnitude from the current state of the art and it is discussable whether any real impact would be caused in real world scenarios. In such cases, the effect of decreasing stereo acuity may even be intended as the ultimate goal of display designers is to reproduce scenes from reality as accurately as possible. Does that mean display research is done yet? It is safe to say that this is definitely not the case since this work only covers stereopsis, a small part of depth perception. Understanding the full effects of a higher dynamic range on depth perception with the influence on gradients and texture is well beyond this work but definitely of interesting nature as it is still not fully understood in what way the individual parts of the human vision system work together to perceive depth. The maximum contrast that stereopsis can benefit of could be used to improve disparity mapping algorithms further. Disparity mapping tries to either extract depth information from a single image and to create a stereo pair exhibiting disparity, or tries to remap a stereoscopic pair into the visual comfort zone of the observer. While even some of the latest disparity mapping algorithms (such as presented in [33]) do not directly incorporate contrast information, it could prove beneficial in future disparity mapping algorithms, since more detailed information on how the human vision system responds to stereoscopic content at a given contrast could influence the way maximum disparity values are chosen in such algorithms. Of course there is still a lot more to accomplish in understanding the contrast limits for stereopsis in order to incorporate such work and this thesis can only represent a small step towards such an understanding. - 49 - Chapter 7 - Summary and Future Work 7.1 Future Work For scientific studies there also exists a better approach to constructing a display capable of producing an image that is of high dynamic range quality as well as stereoscopic: using a classic haploscope approach. This type of stereo setup has one fundamental advantage: It does not suffer from any kind of crosstalk as the eye channels are kept completely separate from each other using two mirror pathways towards the separate displays. These two, non-stereoscopic displays can then be built using familiar HDR display setups, such as the one presented by [54] using dimmable LEDs. At the same time, the fundamental setup of displays is changing as well: Organic LEDs are very promising in delivering even higher static contrast ratios. At least for vision research on the limits of contrast, the author would suspect that building a high dynamic range, stereoscopic display based on two standard OLED displays and haploscope is feasible. Since the contrast level at increased contrast does not improve stereo acuity is surprisingly quite low, a similar user study could be performed by building a haploscope using two high quality calibrated displays. Many issues described in this thesis could be avoided and with more accurate calibration more confidence could be gained. Especially contrast and crosstalk could be sampled more accurately because a haploscope setup has, by design, no crosstalk even at high contrast ratios. On the other hand, since haploscope displays are not a technology designed for the consumer market, all other – possibly unknown – effects involved with time and space multiplexing are ignored rendering any research less representative. Crosstalk is a relevant factor especially with auto stereoscopic displays, where multiple viewing zones are presented two the user solely by the display itself by intelligently directing luminance. Since most autostereoscopic displays do not adjust light direction depending on the users viewing position, the viewer is not always in the perfect viewing position and will therefore suffer from crosstalk and ghosting from neighboring viewing zones. 7.2 A Better Time-Multiplexed Double Modulation Display What would the perfect double modulation display look like if all components were under full control of the author? The main problem causing the lower than expected contrast results are the two redundant color filters, causing the disturbing light variance effect described in section 4.5. The perfect back panel would therefore not feature a color filter. This would cause the back panel to become achromatic but of higher resolution which in turn could be used to control light on a sub pixel basis. The second major improvement would be that the layers should be glued together in the same way the polarization foil is applied to the spatial light modulator in any panel. All of this would of course happen in a clean-room environment just as is the case for any display production. The LED backlight could also be improved by using more LEDs of a lower power level. The brightness level would be kept the same or even improved but cooling would become easier and the brightness - 50 - Chapter 7 - Summary and Future Work would be spread among more LED sources. This in turn would allow for a more transparent diffuser, again improving peak luminance. The TN panels used in the prototype employ Frame Rate Control to increase tonal resolution. Since the tonal resolution is increased by the double modulation already, FRC could be removed as well, causing less interference with calibration and fewer artifacts. Would the display still be based on two spatial light modulators using polarization? Probably yes, as the resolution aspect is of major interest. Using dimmable LEDs, even if hundreds of them, still causes many pixels to be illuminated by the same backlight LED. This reduces the possible contrast resolution down to the resolution of the LED grid. Synchronization again would become more of an issue as a higher resolution LED grid would probably incur some kind of addressing scheme, involving pulse width modulation and possibly other sources of temporal delay. Finally, it should be noted that display technology has improved the last three years and especially OLED technology seems to be a promising candidate for the future of high dynamic range stereoscopic displays. It is quite possible that such technology will soon make the double modulation techniques used for high dynamic range displays today appear similar to CRT technology of more than two decades ago. - 51 - References R EFERENCES [1] Barton L. Anderson. Stereovision: beyond disparity computations. Trends in Cognitive Sciences, 2(6):214 – 222, 1998. 12 [2] Martin S. Banks, Kurt Akeley, David M. Hoffman, and Ahna R. Girshick. Consequences of incorrect focus cues in stereo displays. Information Displays, 7:10–14, 2008. 42, 46 [3] Peter G. J. Barten. Physical model for the contrast sensitivity of the human eye. Proceedings of SPIE, 1666(1):57–72, 1992. 8 [4] Peter G. J. Barten. Spatiotemporal model for the contrast sensitivity of the human eye and its temporal aspects. Proceedings of SPIE, 1913(1):2–14, 1993. 8 [5] Oliver Bimber and Daisuke Iwai. Superimposing dynamic range. In ACM SIGGRAPH Asia 2008 papers, SIGGRAPH Asia ’08, pages 150:1–150:8, New York, NY, USA, 2008. ACM. v, 7, 38, 39, 40 [6] Buckthought. A matched comparison of binocular rivalry and depth perception with fmri. Journal of Vision, 11:1–15, 2011. 20 [7] COGSCI. Online Gabor Patch generator, 2012 (Last accessed May 18, 2012). iii, 19 [8] Scott Daly. The visible differences predictor: an algorithm for the assessment of image fidelity. Human Vision, Visual Processing, and Digital Display, pages 179–206, 1993. 8 [9] Scott Daly and Xiaofan Feng. Bit-depth extension: Overcoming lcd-driver limitations by using models of the equivalent input noise of the visual system. Journal of the Society for Information Display, 13(1):51–66, 2005. 3 [10] G. C. DeAngelis, I. Ohzawa, and R. D. Freeman. Spatiotemporal organization of simple-cell receptive fields in the cat’s striate cortex. i. general characteristics and postnatal development. Journal of Neurophysiology, 69(4):1091–1117, 1993. 14, 19 [11] Gregory C. DeAngelis. Seeing in three dimensions: The neurophysiology of stereopsis. Trends in Cognitive Sciences, 4(3), March 2000. 19, 20 [12] Paul E. Debevec and Jitendra Malik. Recovering high dynamic range radiance maps from photographs. In ACM SIGGRAPH 2008 classes, SIGGRAPH ’08, pages 31:1–31:10, New York, NY, USA, 2008. ACM. 9 [13] Neil A. Dodgson. Variation and extrema of human interpupillary distance. Proceedings of SPIE Sterescopic Displays and Virtual Reality Systems XI, 5291:36–46, 2004. 19 [14] Dolby. Dolby shows latest hdr display prototype developed in collaboration with sim2. 7 [15] F. Drago, K. Myszkowski, T. Annen, and N. Chiba. Adaptive logarithmic mapping for displaying high contrast scenes. Computer Graphics Forum, 22:419–426, 2003. 9 - 52 - References [16] Frédo Durand and Julie Dorsey. Fast bilateral filtering for the display of high-dynamic-range images. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’02, pages 257–266, New York, NY, USA, 2002. ACM. 9 [17] Raanan Fattal, Dani Lischinski, and Michael Werman. Gradient domain high dynamic range compression. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’02, pages 249–256, New York, NY, USA, 2002. ACM. 9 [18] James A. Ferwerda, Sumanta N. Pattanaik, Peter Shirley, and Donald P. Greenberg. A model of visual adaptation for realistic image synthesis. In SIGGRAPH ’96: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 249–258, New York, NY, USA, 1996. ACM. 8 [19] D. J. Field and D. J. Tolhurst. The Structure and Symmetry of Simple-Cell Receptive-Field Profiles in the Cat’s Visual Cortex. Royal Society of London Proceedings Series B, 228:379–400, September 1986. 19 [20] David Finlay, Peter C. Dodwell, and Terry Caelli. The waggon-wheel effect. Perception, 13(3):237– 248, 1984. 3 [21] Graeme Gill. Argyll color management system. http://http://www.argyllcms.com//. 36 [22] Gabriele Guarnieri, Luigi Albani, and Giovanni Ramponi. Image-splitting techniques for a duallayer high dynamic range lcd display. Journal of Electronic Imaging, 17(4):043009, 2008. 7, 31 [23] D Lynn HalpernTF and Randolph R Blake. How contrast affects stereoacuity. Perception, 17:483– 495, 1988. 4, 42 [24] Selig Hecht and Simon Schlaer. Intermittent stimulation by light v. the relation between intensity and critical frequency for different parts of the spectrum. The Journal of General Physiology, 19(6):965– 977, 1936. 3 [25] David M. Hoffman, Ahna R. Girshick, Kurt Akeley, and Martin S. Banks. Vergence-accommodation conflicts hinder visual performance and causevisual fatigue. In Journal of Vision, pages 1–30, 2008. iii, 12, 13, 42 [26] Industrial Light and Magic. Openexr is a high dynamic-range (hdr) image file format developed by industrial light & magic for use in computer imaging applications. 10 [27] Garrett M. Johnson and Mark D. Fairchild. Rendering hdr images. In in IS&T/SID 11th Color Imaging Conference, pages 36–41, 2003. 9 [28] J. P. Jones and L. A. Palmer. The two-dimensional spatial structure of simple receptive fields in cat striate cortex. J Neurophysiol, 58(6):1187–1211, December 1987. 19 [29] Béla Julesz. Foundations of Cyclopean Perception. The University of Chicago Press, 1971. 18 - 53 - References [30] Florian Kainz and Rod Bogart. Technical introduction to openexr. Technical report, Industrial Light & Magic, 2009. 10 [31] Janusz Konrad, Bertrand Lacotte, Senior Member, and Eric Dubois. Cancellation of image crosstalk in time-sequential displays of stereoscopic video. In in IEEE Transactions on Image Processing, pages 897–908, 2000. 11, 12 [32] Jiangtao Kuang, Hiroshi Yamaguchi, Changmeng Liu, Garrett M. Johnson, and Mark D. Fairchild. Evaluating hdr rendering algorithms. ACM Trans. Appl. Percept., 4, July 2007. 9 [33] Manuel Lang, Alexander Hornung, Oliver Wang, Steven Poulakos, Aljoscha Smolic, and Markus Gross. Nonlinear disparity mapping for stereoscopic 3d. ACM Trans. Graph., 29(3):10, 2010. 49 [34] G W Larson, H Rushmeier, and C Piatko. A visibility matching tone reproduction operator for high dynamic range scenes. Technical Report LBNL-39882, Lawrence Berkeley Nat. Lab., Berkeley, CA, Jan 1997. 9 [35] Patrick Ledda, Alan Chalmers, Tom Troscianko, and Helge Seetzen. Evaluation of tone mapping operators using a high dynamic range display. In ACM SIGGRAPH 2005 Papers, SIGGRAPH ’05, pages 640–648, New York, NY, USA, 2005. ACM. 9 [36] Patrick Ledda, Luis Paulo Santos, and Alan Chalmers. A local model of eye adaptation for high dynamic range images. In Proceedings of the 3rd international conference on Computer graphics, virtual reality, visualisation and interaction in Africa, AFRIGRAPH ’04, pages 151–160, New York, NY, USA, 2004. ACM. 9 [37] Patrick Ledda, Greg Ward, and Alan Chalmers. A wide field, high dynamic range, stereographic viewer. In GRAPHITE ’03: Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia, pages 237–244, New York, NY, USA, 2003. ACM. 6 [38] James S. Lipscomb and Wayne L. Wooten. Reducing crosstalk between stereoscopic views. Proceedings of SPIE, 2177:92, 1994. 11 [39] Rafał Mantiuk, Karol Myszkowski, and Hans-Peter Seidel. Visible difference predicator for high dynamic range images. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pages 2763–2769, 2004. 8 [40] L.M.J. Meesters, W.A. IJsselsteijn, and P.J.H. Seuntiens. A survey of perceptual evaluations and requirements of three-dimensional tv. Circuits and Systems for Video Technology, IEEE Transactions on, 14(3):381 – 391, march 2004. 8 [41] Jens Månsson. Stereovision: a model of human stereopsis. 1997. 13 [42] M. Ortiz-Gutiérrez, A. Olivares-Pérez, and V. Sánchez-Villicaña. Cellophane film as half wave retarder of wide spectrum. Optical Materials, 17(3):395 – 400, 2001. 26 - 54 - References [43] Peter Ludwig Panum. Physiologische Untersuchungen Über Das Sehen Mit Zwei Augen. Schwersche Buchhandlung, Kiel, 1858. 18 [44] Siegmund Pastoor. Human factors of 3d imaging: Results of recent research at heinrich-hertzinstitute berlin. In Proceedings ASIA Display Conference, 1995. 12, 42 [45] Sumanta N. Pattanaik, James A. Ferwerda, Mark D. Fairchild, and Donald P. Greenberg. A multiscale model of adaptation and spatial vision for realistic image display. In SIGGRAPH ’98: Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pages 287–298, New York, NY, USA, 1998. ACM. 8 [46] Sumanta N. Pattanaik, James A. Ferwerda, Donald P. Greenberg, and Mark D. Fairchild. Multiscale model of adaptation, spatial vision and color appearance. ITE Technical Report, 23(23):2, 1999. 8 [47] Sumanta N. Pattanaik, Jack Tumblin, Hector Yee, and Donald P. Greenberg. Time-dependent visual adaptation for fast realistic image display. In SIGGRAPH ’00: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 47–54, New York, NY, USA, 2000. ACM Press/Addison-Wesley Publishing Co. 9 [48] Yury Petrov. Higher-contrast is preferred to equal-contrast in stereo-matching. In Vision Research, volume 44, pages 775–784, 2004. 13 [49] Fabio Policarpo. Real-Time Stereograms, chapter 41. Addison-Wesley, 2007. 43 [50] D. Purves, J. A. Paydarfar, and T. J. Andrews. The Wagon Wheel Illusion in Movies and Reality. Proceedings of the National Academy of Science, 93:3693–3697, April 1996. 3 [51] Erik Reinhard, Michael Stark, Peter Shirley, and James Ferwerda. Photographic tone reproduction for digital images. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’02, pages 267–276, New York, NY, USA, 2002. ACM. 9 [52] Mark A. Robertson, Sean Borman, and Robert L. Stevenson. Dynamic range improvement through multiple exposures. In In Proc. of the Int. Conf. on Image Processing (ICIP’99, pages 159–163. IEEE, 1999. 9, 10, 35 [53] J. F. Schouten. Subjective stroboscopy and a model of visual movement detectors. Cambridge MA: MIT Press, 1967. 4 [54] Helge Seetzen, Wolfgang Heidrich, Wolfgang Stuerzlinger, Greg Ward, Lorne Whitehead, Matthew Trentacoste, Abhijeet Ghosh, and Andrejs Vorozcovs. HighDynamic Range Display Systems. ACM Trans. Graph., 23(3):760–768, 2004. iii, 6, 7, 8, 22, 23, 50 [55] Mel Siegel. Perceptions of crosstalk and the possibility of a zoneless autostereoscopic display. In A. J. Woods, M. T. Bolas, J. O. Merritt, and S. A. Benton, editors, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, volume 4297 of Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, pages 34–41, June 2001. 12 - 55 - References [56] F. Smit, v. R. Liere, and B. Fröhlich. Non-uniform crosstalk reduction for dynamic scenes. In IEEE Virtual Reality 2007, 2007. 12 [57] F. Smit, van R. Liere, and B. Fröhlich. Three extensions to subtractive crosstalk reduction. In Eurographics EGVE, 2007. 12 [58] Jack Tumblin and Holly Rushmeier. Tone reproduction for realistic images. In Computer Graphics and Applications, volume Nov., pages 42–48, 1993. 9 [59] Greg Ward and Maryann Simmons. Jpeg-hdr: a backwards-compatible, high dynamic range extension to jpeg. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Courses, page 3, New York, NY, USA, 2006. ACM. 9 - 56 - Lebenslauf Angaben zur Person Name Philipp Rylands Aumayr Geschlecht Männlich Geburtsdatum 12. März 1985 Geburtsort Linz Adresse Alte Hauptstraße 25, A-4072 Alkoven (Österreich) Nationalität Österreich, Vereinigte Staaten v. Amerika Ausbildung 1991 - 1995 Volksschule: VS Alkoven 1995 - 1999 Unterstufe: Stiftsgymnasium Wilhering 1999 - 2003 Oberstufe: Stiftsgymnasium Wilhering 2003 - 2007 Bachelorstudium: Johannes Kepler Universität Linz - Informatik seit Oktober 2007 Masterstudium: Johannes Kepler Universität Linz - Informatik Berufserfahrung März 2004 - August 2005 Softwareentwicklung bei SWA, Traun September 2005 - Dezember 2007 Projektmitarbeiter, Institut f. Pervasive Computing, Johannes Kepler Universität Linz Februar 2008 - Mai 2008 Projektmitarbeiter (Spectacles), Research Studios Austria in Kollaboration m. Institut f. Pervasive Computing Januar 2009 - März 2009 Projektmitarbeiter, Research Studios Austria Juli 2009 - September 2009 Projektmitarbeiter, Research Studios Austria in Zusammenarbeit mit sofware architects og seit Oktober 2009 Mitarbeiter software architects gmbh Wissenschaftliche Arbeiten Juli 2007 Bachelorarbeit: Implementierung der Visualisierung eines Ambient Awareness Displays (Prof. Ferscha) Juni 2009 Christoph Anthes, et al, Space Trash - Development of a Networked Immersive Virtual Reality Installation, Technical Report at Institut of Graphics and Parallel Processing, Johannes Kepler University Linz, June 2009 16th July 2012 Eidesstattliche Erklärung Ich erkläre an Eides statt, dass ich die vorliegende Masterarbeit selbstständig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wörtlich oder sinngemäß entnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Masterarbeit ist mit dem elektronisch übermittelten Textdokument identisch. Philipp Aumayr, Linz am 16. Juli 2012