Stereopsis in the Context of High Dynamic Range Stereo Displays

advertisement
JOHANNES KEPLER
UNIVERSITÄT LINZ
JKU
Technisch-Naturwissenschaftliche
Fakultät
Stereopsis in the Context of High Dynamic
Range Stereo Displays
MASTERARBEIT
zur Erlangung des akademischen Grades
Diplomingenieur
im Masterstudium
Informatik
Eingereicht von:
Philipp R. Aumayr
Angefertigt am:
Institut für Computergrafik
Beurteilung:
Univ. Prof. Dr.-Ing. habil. Oliver Bimber
Linz, Juli, 2012
To Granddad, the engineer.
A BSTRACT
There are two major trends in the display industry: Increasing contrast and (auto-) stereoscopic content
presentation. While it is obvious that both trends do have an impact on perception, the relation between
high dynamic range contrast and stereoscopic viewing is not well established in the literature. The goal
of this thesis was to construct a high dynamic range display capable of presenting stereoscopic content
to perform a user study testing the response to such a viewing experience especially when considering
multiplexing side-effects such as crosstalk. The construction process and the many setbacks, such as
polarization and thermal design issues, encountered during the process of building such a display are
described in this thesis.
Even though the display prototype did not exhibit the highly anticipated contrast range, the user study did
provide valuable feedback on how far stereopsis benefits from a higher dynamic range. The user study
also included an attempt to uncover the role of crosstalk and its perceived counterpart ghosting in the
process of stereopsis.
Z USAMMENFASSUNG
Die Display Industrie gibt zwei große Trends vor: Steigender Kontrast und die Möglichkeit (Auto-)
Stereskopische Inhalte zu präsentieren. Obwohl es offensichtlich ist, dass beide Veränderungen einen
großen Einfluss auf die Wahrnehmung von visuellen Inhalten haben, ist die Verbindung zwischen Kontrast und Stereo-sehen nur wenig dokumentiert. Ziel dieser Arbeit war es ein Hochkontrastdisplay zu
entwickeln, welches auch die Möglichkeit bietet stereoskopische Inhalte darzustellen um eine Benutzerstudie durchführen zu können, welche diese Zusammenhänge aufdecken soll. Von Interesse war auch der
Einfluss von Übersprechen der Bildkanäle und dessen optischen Schatten-Effekt auf die Wahrnehmung.
Die Entwicklung als auch die vielen Fehler, welche die im Laufe des Aufbaus des Prototypes gemacht
wurden sind hierin dokumentiert.
Obwohl der Prototyp bei weitem nicht die erhofften Kontrastwerte erziehlen konnte, sind die Ergebnisse
der Benutzerstudie doch von Bedeutung. Die verwendeten Tests zielten auch darauf ab, die Rolle des
Übersprechens der Bildkanäle und dessen optischen Pendants im Kontext der Stereopsis aufzudecken.
ACKNOWLEDGMENTS
Prof. Bimber endured my slow progress on my thesis for almost 2 and a half years. It would be oblique
to say that the correspondences we had were always fun and motivating, but instead of giving up on me
he encouraged me to go on when I lost confidence that I would ever finish this thesis. So thank you, Prof.
Bimber, for all the valuable time, feedback and support!
A great, big, heartful ”thank you” to my parents and family for the continued support, especially for
dragging me all the way through gymnasium and accepting my nerd-behaviour of preferring CRT rays to
real sunshine. Especially I want to thank dad for encouraging me to go my own way instead of following
the obvious path to medicine.
I also need to thank my friends at Rarebyte. I definitely would not have studied computer science if
George hadn’t had the patience to teach me basic programming skills (and OpenGL for the fancy stuff).
Thank you to Rainer, Karin and Alex from timecockpit.com for their continued support and motivation. It is a real pleasure to work with you and I am looking forward to the adventures down the
road.
Thanks to Simon for his continous stream of wisdom and unfiltered, sometimes radical criticism.
A great, big thank you also goes out to all of the guinea pigs that participated in the user study for enduring
the heat in the ”‘HDR cabin”’ and providing valuable feedback! Thank you to the ZID at the Johannes
Kepler University for providing hardware and access to the VRC.
Finally, thank you to Elisabeth for accepting all of the geek talk and being there.
Contents
C ONTENTS
1
2
Introduction
1
1.1
Defining Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Tonal Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Refresh Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.4
Motivation and Preview of Contribution and Results . . . . . . . . . . . . . . . . . . . .
4
1.5
Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Related Work
6
2.1
High Dynamic Range Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2
High Dynamic Range Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3
HDR Capturing and Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3.1
Capturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3.2
Tonemapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.3.3
Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Stereoscopic Displays and Ghosting . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.4.1
Crosstalk Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.5
Crosstalk Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.6
Stereopsis and the Correspondence Problem . . . . . . . . . . . . . . . . . . . . . . . .
13
2.4
3
Depth Perception
15
3.1
Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.2
Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.3
Vergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.4
Accommodation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.5
Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.5.1
Horopter, Vieth-Müller Circle and Panum’s area . . . . . . . . . . . . . . . . .
17
3.5.2
Neurophysical Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
Perception of Crosstalk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.6
4
Construction of a HDR Stereo Display
22
4.1
LCD - Projector Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.2
Dual Layer LCD Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
4.3
Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
4.3.1
Rotated around Y-Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
4.3.2
Rotated around Z-Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
4.3.3
Using a Wave Retarder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
4.3.4
Inverted Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
4.3.5
Center Analyzer removed, Back Panel Polarizer rotated . . . . . . . . . . . . . .
28
Backlight Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
4.4
-i-
Contents
4.5
5
6
7
4.4.1
Light Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
4.4.2
Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
Visual Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
Calibration
33
5.1
Using HDR Image Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
5.2
Using an Intensity Measurement Device . . . . . . . . . . . . . . . . . . . . . . . . . .
36
5.3
Crosstalk Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
5.4
Just-Noticeable-Difference Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
5.5
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
User Study
42
6.1
Constructing Random Dot Stereograms . . . . . . . . . . . . . . . . . . . . . . . . . .
42
6.2
Test Patterns and Parameter Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
43
6.3
User Study Environment and Participants . . . . . . . . . . . . . . . . . . . . . . . . .
45
6.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
Summary and Future Work
49
7.1
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
7.2
A Better Time-Multiplexed Double Modulation Display . . . . . . . . . . . . . . . . . .
50
References
52
- ii -
List of Figures
L IST OF F IGURES
1.1
Min-Max Contrast, Imax refers to the maxmium luminocity, whereas Imin denotes the minimal luminocity possible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
The definition of Weber Contrast. I refers to the luminance of a feature, Ib to the luminance of the background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
1
2
The definition of Michelson Contrast. Imax and Imin refer to the maximum and minimum
luminance, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.4
Root-Mean-Square Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2.1
Projector-LCD approach as presented by Seetzen et al. [54] . . . . . . . . . . . . . . . .
7
2.2
A stereoscopic pair of images of a scene with a sphere. From left to right: the image
destined for the left image, the image destined for the right image and the fusioned image
with shadows of the original image visible due to ghosting. . . . . . . . . . . . . . . . .
2.3
The Intensity reaching the eye is the additive of the intended signal and the unintended
signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
10
11
The amount of unintended signal is usually dependent on the pixel position (x, y) and the
viewing angle (α, β ) of the observer as well as the image content of the opponent Image
(I) at the given position. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
11
The setup of the display presented in Hoffman et al [25]. Two semi-transparent and one
front side mirror allow the eye to focus and converge at multiple image planes at different
depths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
13
Occlusion can be ambigous in the monocular case, but is usually very well resolved by
binocular vision. The first image lets us conclude that the red rectangle is behind the
blue rectangle as its side aligns perfectly, which would be an uncommon situation when
viewing. The second and third image show the images as perceived by the left and right
eye, which resolve that the red rectangle is actually in front of the blue rectangle and the
monocular image, by accident, alignes borders with the blue rectangle. . . . . . . . . . .
3.2
Vergence controls the orientation of the eyes. At the same time it acts as a depth cue by
providing the angle of orientation to the vision system. . . . . . . . . . . . . . . . . . .
3.3
16
17
The Vieth-Müller Circle, also known as the Theoretical Horopter and the Empirical Horopter.
Points on the horopter represent points that result in single, fused images. Points too far
from the (empirical) horopter are seen as double images (diplopia). . . . . . . . . . . . .
18
3.4
The Gabor function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
3.5
A gabor patch oriented at 45° with a gaussion envelope (50 pixels standard deviation),
a frequency of 0.05 cycles per pixel. The original image had a with and height of 500
pixels. The patch was generated using the online gabor patch generator found at [7] . . .
3.6
19
A stereoscopic contrast curve. The top images are the intended signals for the left and
right eye, the bottom images the actually perceived images. . . . . . . . . . . . . . . . .
- iii -
20
List of Figures
3.7
First derivative of the contrast curve presented in Figure 3.6. Even though the amount of
crosstalk is the same, the absolute brightness levels differ for the left eye and the right
eye. This difference causes the bright contrast edge to be easier to detect than the dark
contrast edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
4.1
The final (second) prototype built from two stereo LCD panels. . . . . . . . . . . . . . .
22
4.2
Schematic of the dual layer LCD approach. . . . . . . . . . . . . . . . . . . . . . . . .
24
4.3
Malus’ Law. I0 corresponds to the initial intensity and θ to the angle between the polarization directions of the polarizers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4
24
If the polarization of the back panel analyzer and front panel polarizer have to match and
the polarization is aligned or perpendicular to the y axis of the display, one of the panels
can be rotated. This causes the polarizer and analyzer to swap roles and invert the flow of
light. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5
In practice, polarization foil is not aligned with the display axis but oriented at 45°. This
causes the polarization to remain rotation invariant. . . . . . . . . . . . . . . . . . . . .
4.6
25
Rotating one of the panels by 90° would reduce the usable HDR area to the square of the
height of the display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7
25
26
Each individual pixel in a screen consists of 3 sub pixels for red, green and blue which
are aligned as stripes. Rotating one of the panels would cause the overlapping sub pixels
to mismatch and therefore reduce brightness. . . . . . . . . . . . . . . . . . . . . . . .
4.8
27
Final working approach by removing the back panel analyzer and replacing the back panel
polarizer with polarization foil rotated by 90°. . . . . . . . . . . . . . . . . . . . . . . .
29
The final backlight consisting of 24 LEDs aligned in a six by four grid. . . . . . . . . . .
29
4.10 Closeup of the LED light sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
4.9
4.11 Visual result of the backlight simulation. The white hotspots represent the LEDs that are
aligned in a 6 by 4 grid in this iteration. The simulation takes the layout, the type and the
light intensity distribution of the individual LEDs into account. . . . . . . . . . . . . . .
5.1
General transfer function mapping a pixel drive value to a presented luminance as well
the inverted function, mapping from a luminance to a pixel value. . . . . . . . . . . . . .
5.2
31
33
In order to find the lens distortion parameters as well as a homography transform, a
checkerboard pattern is presented, captured using a standard, single jpeg image and compared to the presented checkerboard pattern using OpenCV. These parameters are then
used to undistort the captured HDR images. . . . . . . . . . . . . . . . . . . . . . . . .
5.3
34
The captured images using multiple exposure times are recombined to a HDR image,
then undistorted and homography warped in order to align the captured pixels with the
calibration image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4
35
Image for the left eye, image for the right eye and the reference map: a) denotes the area
with an intended absolute black and without ghosting, b) denotes the area with absolute
white (no ghosting), c) denotes the area of an intended black ghosted with pure white and
d) the area with intended black ghosted with pure black. The ghosted areas of course
swap roles if the calibration image is taken through the opposite eye. . . . . . . . . . . .
- iv -
37
List of Figures
5.5
Using the calibration chart presented in Figure 5.4 the amount of crosstalk can be calculated independently for an intended white as well as an intended black image. Ia,b,c,d denote the intensities measured at area a, b, c and d in the pattern, CTBI defines the crosstalk
for the black intended intensity, whereas CTW I denotes the amount of crosstalk for the
white intended intensity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6
Weber’s Law.
∆I
I
38
denotes the Weber (also known as Fechner) fraction. The law states that
the incremental threshold step over a background intensity relates in a linear way to the
background intensity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7
38
The front / back panel combinations chosen by the JND calibration algorithm presented
in [5]. The value on the x - axis is associated with the display drive level for the front
panel, the value on the y - axis corresponds to the value of the backpanel. The color-coded
value corresponds to the intensity of the front-panel in cd/m2 . The blue, red and green
paths correspond to hull values of 0.025, 0.05 and 0.5. . . . . . . . . . . . . . . . . . . .
5.8
40
The influence of space between the color filters of the back and the front panel. The left
image indicates the ideal situation, in which separation is minimal, causing few light rays
to be terminated due to passing through color filters of different wavelength. The right
image explains the current situation causing a sinusoidal grating to be visible, depending
on the viewing position, if not compensated by a front diffuser. . . . . . . . . . . . . . .
6.1
41
Process of constructing a Random Dot Stereogram. First, both images are filled with
the same pattern. Then, the region(s) that should have disparity are displaced and the
resulting holes are finally filled by new random dot patterns. . . . . . . . . . . . . . . .
6.2
The actual crosstalk measured depends on the contrast. The plot describes the ranges of
crosstalk that where measured for various contrast settings at a given crosstalk label. . .
6.3
43
45
Plots containing the results with increasing crosstalk along the images. The two rows represent results seperated into stimuli with crossed (top row) and uncrossed (bottom row)
disparity. The axis of the images denote the measured contrast as well as the disparity
used. The color coded value designates the rate at which the participants answered incorrectly, which is normalized from 0 to 50 % (red being equal to 50 % incorrect answers). .
6.4
46
Boxplot showing the increasing amount of correct answers when contrast is increased.
The values where evaluated over all crosstalk, disparity and contrast settings. While the
plots in Figure 6.3 show that there isn’t any real improvement beyond a contrast level of
110:1 this contrast ratio may be even lower considering the aggregated results shown in
this plot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
-v-
48
Chapter 1 - Introduction
1
I NTRODUCTION
There have been few but major trends in the display technology industry: Increasing screen sizes resulting
in bigger screens with a greater image plane. Increasing pixel densities cause more pixels to be crowded
into the same physical space. Increasing contrast leading to a brighter peak luminance than ever before
and black levels dropping even closer to one of a starlight. The pixel refresh time of a screen is lowered
constantly, manifesting itself with higher frames per second and enabling channel multiplexing for image
separation technologies based on space multiplexing. At the same time the tonal resolution only slowly
increases, as the benefits for an enduser are not as apparent and additional support is required from the
software-side to make good use of such an improvement.
Screen size and pixel density have a very close relationship: Increasing the pixel density of a display
reduces the size of the screen if the total number of pixels is kept the same. Increasing the screen size
on the other hand, while keeping the total pixel count the same, reduces pixel density. Tonal resolution
and contrast behave in a somewhat similar way if the step size between two intensities is considered:
Increasing the maximal contrast while at the same time not increasing tonal resolution causes the step size
to increase. As with images appearing pixelated, this causes gradients to have detectable steps instead of
a smooth transition from one intensity level to the next.
While there are many different types of even visual display, this thesis is concerned with the design
and the perception of rectangular displays, capable of presenting a two dimensional matrix of picture
elements, referred to as an image. A picture element, or in short pixel, can be either achromatic or
consist of chromatic components. The display prototypes presented in this work support chromaticity,
but the primary focus of the perception research part deals only with the achromatic case in which the
components of the RGB channels are set to equal amounts as such that the pixel appears colorless.
1.1
Defining Contrast
Contrast can be defined in multiple ways all of which to some extend describe how much brighter the
whitest white is compared to the blackest black. For an image the minimum and the peak luminance are
very primitive descriptors of the image content.
The most basic formula for contrast is using the ratio between the maximum and the minimum brightness
as shown in formula 1.1. This translates to dividing the intensity measured for white by the intensity
measured for black and is often given as an N : 1 number, indicating that white is N times larger than
black.
Imax
Imin
Figure 1.1: Min-Max Contrast, Imax refers to the maxmium luminocity, whereas Imin denotes the minimal
luminocity possible.
-1-
Chapter 1 - Introduction
Another notion of contrast is the Weber contrast, given in formula 1.2. In this formula I represents the
luminance of the features, whereas Ib represents the luminance of the background.
I − Ib
Ib
Figure 1.2: The definition of Weber Contrast. I refers to the luminance of a feature, Ib to the luminance
of the background.
The results of both formulas, the basic ratio and the Weber contrast become very large when the black
level becomes very low. On the other side, they quickly become very small when a constant is added to
both terms, such as an ambient light. As an example, a display can be considered having a peak luminance
of 200cd/m2 and a minimum luminance of 0.5cd/m2 . Using the basic ratio, this display has a contrast
of 400:1 and a Weber contrast of 399. Adding an ambient light source increasing the measured levels by
1cd/m2 , this decreases the contrast ratio to 134:1 and the Weber contrast to 133.
Another definition of contrast is the Michelson contrast. The Michelson contrast is less subjective to
a constant added factor as in the example described above. The definition is given in formula 1.3 and
always leads to a number between 0 and 1, which is usually interpreted as a percentage.
Imax − Imin
Imax + Imin
Figure 1.3: The definition of Michelson Contrast. Imax and Imin refer to the maximum and minimum
luminance, respectively.
Using the same hypothetical display explained in the example before with a peak luminance of 200cd/m2
and a minimum luminance of 0.5cd/m2 , the Michelson Contrast gives a number of 0.995 corresponding
to 99.5 %. With the ambient light increasing both levels by 1cd/m2 , the resulting percentage is still at 99
%.
One other popular measure of the contrast of an image is described by the Root Mean Square (RMS)
contrast which is defined as the standard deviation of the pixel intensities (figure 1.4).
v
u
u 1 N−1 M−1
2
t
Ii j − I
∑
∑
MN i=0 j=0
Figure 1.4: Root-Mean-Square Contrast
Describing the contrast a display can produce is different from measuring the contrast of an image: In
order for a display system to be able to reproduce an image accurately, it has to have at least the contrast
that the image content requires. On the other hand, the contrast of the image can be lower, as it is usually
not a problem to represent an image of lower contrast on a display system being capable to reproducing a
higher dynamic range as long as the tonal resolution of the display system is accurate enough to reproduce
the fine differences in the tonal steps of the image.
-2-
Chapter 1 - Introduction
1.2
Tonal Resolution
The higher the difference between the brightest white and darkest black is, the more unique shades of
grey can be represented by the display. The number of steps that can be uniquely addressed by a display
system represents the tonal resolution. A standard, commercially available display system today supports
between six and ten bits of tonal resolution in every color channel.
The tonal resolution of a chromatic display system directly relates to the amount of colors that can be
displayed by the system. LCD panels with a lower bit depth usually employ some kind of dithering
mechanism such as Frame Rate Control (FRC) in order to artificially increase the number of possible
colors [9]. Most Twisted-Nematic (TN) Liquid Crystal Panels, for instance, have a physical bit depth of
6 bits per color channel, achieving a total of 18 bits of color depth. This translates to 262,144 different
physical colors. Using temporal dithering 18 bit display systems are usually marketed as being able to
display 16.2 million instead of the full 16.7 million colors that a true 24 bit color display would be able
to reproduce.
Tonal resolution is considered especially important in medical applications where traditional X-ray film
is still dominant, as a trained radiologist is capable of detecting features well beyond the tonal resolution
of a classic 8 bit screen. X-ray film offers a dynamic range of about 10000 : 1 as well as a near infinite
number of tonal resolution steps due to its analog nature. Still, there is a demand for a digital replacement
due to the benefits of a digital display device, such as easier image transport, archiving as well as dynamic
content.
1.3
Refresh Rates
Ever rising refresh rates are on the other end of the spectrum. Liquid crystal displays supporting a refresh
rate of 120 Hz and higher have appeared recently. Since activated LCD screens do not flash pixels to
black in between frames, there is no flicker than can be perceived by the human eye. One could therefore
question why a higher refresh rate is important for a liquid crystal display (LCD) panel. The main answer
is that, while a display may be flicker-free at 30 frames per second, this only applies to the human eye
being capable of differentiating between a constant grey image and an image swapping between black
and white. The point at which this becomes indistinguishable is known as the flicker fusion frequency
[24].
The flicker fusion frequency is only one part of the required display refresh rate story: With screen sizes
becoming larger the distance an object can physically move on a screen from one frame to the next is
growing as well and can cause a disorienting effect. Another effect is the wagon-wheel effect [20]. This
effect causes a rotating wheel – like the wheel from a wagon in a western movie – to appear rotating
slower, faster or even in the opposite direction of its true rotation. The real reason for this effect is still to
be determined, but two theories have been established: One of them states that the human vision system
partitions the continuous visual stream into frames causing the frequency of the rotation to interfere with
the frequency of the vision system [50] resulting in temporal aliasing. The other, currently favored theory,
-3-
Chapter 1 - Introduction
argues that true motion is captured by visual detectors sensitive to motion as well as detectors sensitive to
the opposite motion by temporal aliasing [53]. Higher refresh rates will not resolve these visual effects,
but will allow a reproduction closer to what the real scene would look like.
Another use case for higher refresh rates is to support time and space multiplexed stereoscopic content.
The goal in such a setup is to present a different picture to the left and the right eye using shutter glasses
that are synchronized with the display’s refresh rate. Every pixel therefore has to change the intensity
projected from frame to frame which can cause flicker to be apparent if the refresh rate is not high
enough. The de-facto standard refresh rate for time multiplexed stereoscopic displays is 60 Hz per eye.
1.4
Motivation and Preview of Contribution and Results
Even though it has been shown that stereo acuity improves with higher contrast (see [23]) it is unclear if
this improvement continues to hold for higher dynamic range displays, as the tests performed previously
used cathode ray tubes (CRT) screens with a lower dynamic range. The hypothesis for this work is that
stereo acuity will decrease for higher dynamic range scenes, due to simultaneous contrast effects. The
main idea stems from the fact that X-ray radiologists cover parts of the picture in order to allow the eye
to adapt to a specific part of the image that is of interest. The results of this thesis could be used for
further improvement of disparity mapping algorithms by incorporating more knowledge of the behavior
of stereopsis in scenes with higher dynamic range. The display prototype constructed to perform the
user study did not exhibit the anticipated contrast ratios due to technical constraints. Nevertheless, the
outcome of the user study shows that stereo acuity seems to not necessarily improve any further with
increasing contrast.
1.5
Outline of the Thesis
The thesis will start with a discussion on related work in chapter 2 in the field of high dynamic range as
well as stereoscopic display systems. On the other end, the related work section will also cover publications from the field of perception, especially the ones dealing with the perception of high dynamic range
images and stereo perception. Selected items from the related field of high dynamic range capturing and
processing will be discussed as well.
The related work section will be followed by an introductionary chapter into depth perception discussing
depth cues such as perspective, occlusion, vergence, accommodation and especially stereopsis, as well as
some aspects of the physiological basis in the human vision system.
Chapter 4 continues by describing the prototypical construction of a stereoscopic high dynamic range
display including issues concerning polarization, the construction of a suitable backlight and potential
explanations for the visual artifacts apparent in the second prototype that was used for the user study.
Chapter 5 discusses the calibration of the display prototype. Both approaches, one using a HDR recovery approach with a standard camera and one using a commercial color calibration system for standard
-4-
Chapter 1 - Introduction
LDR displays that have been attempted are discussed in detail. The benefits and downsides of those are
explained therein as well. The characteristics of the display prototype are also presented in this section.
Chapter 6 explains the performed user study and the results thereof. It discusses the motivation behind
such a user study, the design of the stimuli and test patterns as well as the candidates and the environment
in which it took place. The chapter finishes in a discussion of the results attained from the user study.
Finally the thesis finishes with a summary of the presented work and an outlook into future work.
-5-
Chapter 2 - Related Work
2
R ELATED W ORK
The topic of this thesis reaches into various areas, not only of computer graphics and display technology,
but also those of psychology and especially perception. In this section, related work concerning high
dynamic range displays, stereoscopic displays and those dealing with crosstalk, the perception of stereoscopic content and contrast are discussed. Related work concerning the calibration of displays in general
is discussed herein as well.
2.1
High Dynamic Range Displays
While consumer displays, particulary television sets advertise high contrasts ratios of up and beyond a
few million to one, the promoted value defines the dynamic range of a display when viewing either a
completely white or completely black screen (on/off contrast). In comparison, the static contrast defines
the contrast ratio within a single image. More recent displays include a light emitting diode (LED) backlight system composed of multiple independently controllable LEDs. This allows a display to adapt the
intensity of a backlight to the average luminosity of the display area influenced by the corresponding backlight LED. Of course the more controllable backlight LEDs there are, the more accurately the backlight
intensity for each individual pixel can be controlled.
There have been multiple approaches to placing the LEDs in quad and hexagonal grids (see Seetzen [54]).
Even though the term resolution is usually associated with rectangular grids, it is applicable to all kinds
of alignments. In order to differentiate between image resolution (the actual pixels on the screen) and
the resolution of the backlight, the term image resolution and contrast resolution are introduced. Image
resolution defines the resolution of pixels on the screen, whereas contrast resolution defines the resolution
of the backlight. The maximal contrast resolution is limited by the image resolution.
It does not matter whether static or dynamic contrast, all of today’s display prototypes considered being
able to display high dynamic range content have a double modulation approach in common. This means
that the amount of light passing through one pixel is controlled at two places by two separate modulators
in the light path. The combination of modulators used differentiates the approaches from each other:
The first display prototype to be considered of higher dynamic range was presented by Ledda in 2003 [37]
and featured a stereoscopic wide-field viewer presenting a HDR stereo image pair to the viewer without
any crosstalk due to the physical separation of the image channels. This prototype used transparencies
and was able to produce a contrast of up to 10000 to 1. Due to the nature of printed images the display
could only present static images.
First interactive HDR displays were presented by Seetzen [54] with two prototypes: One was constructed
using a projector and a LCD panel, with the projector acting as an intelligent light source. The projector
directed only as much light onto the individual pixels as required to support the intended intensity instead
of a classic uniform backlight where the same amount of light reaches every pixel. An illustration of the
projector-LCD panel setup can be found in figure 2.1.
-6-
Chapter 2 - Related Work
Fresnel Lens and Diffuser
LCD
Projector
Dual-VGA Graphics
Card in PC
LCD Controller
Figure 2.1: Projector-LCD approach as presented by Seetzen et al. [54]
The contrast resolution therefore was close to that of the image resolution itself. The setup itself is in
general straight forward to construct, but as described in Chapter 4, the approach is infeasible for stereo
setups due to synchronization issues. The second prototype consisted of a LCD panel with a backlight
composed of 760 individually controllable LEDs aligned in a hexagonal grid. The prototype has since
evolved into a product using 1838 LEDs, producing a peak brightness greater than 4,000 cd/m2 and a 16
bit tonal resolution [14].
Another approach was presented by Bimber et al in 2008 [5], combining a projector-camera system with
a paper printout of the image to view. The paper was annotated with a marker that was visually tracked by
a camera. By knowing all parameters of the projector-camera system, the projector could super-impose
an image on top of the tracked printout and augment the image. While the printout was limited to a
static image, experiments using electronic paper allowed more rapid changes of the image, even though
interactive rates were not possible with the electronic ink technology at that time. The approach reached
a contrast ratio of up to 61,000 to 1 when using an LED projector with special photo paper.
In work done by Guarnieri [22] a display prototype that uses two monochromatic LCD panels is presented.
Even though the panels are placed immediately on top of each other, the high resolution causes the back
panel image to appear as a shadow at contours, making the image appear blurry. The author presents two
methods for compensating the view dependence by blurring the back panel and sharpening the front panel
image as well as using a constrained low-pass filtering approach which assures that the front panel pixel
value does not clip after compensating for the blur of the back panel image.
While the first prototype in this thesis is based on the same principle as the one of Seetzens’ [54] display,
using a projector backlight and a LCD front panel, the prototypes presented here always aimed at being
stereo capable. As discussed in chapter 4, the projector-LCD approach bears many pitfalls resulting in the
approach of using two identical LCD panels stacked behind each other, which is closer to the prototype
that Guarnieri [22] performed work on. Yet, Guarnieris’ main contribution is on reducing the shadowing
due to the parallax effect. The LCD panels used in the prototypes presented here are of a lower resolution
and did not exhibit the same shadowing issues as Guarnieri discussed.
-7-
Chapter 2 - Related Work
2.2
High Dynamic Range Perception
Visual Perception is a intensively studied field of research. From the field of visual perception, stereo
perception is especially relevant to this thesis. Various attempts have been made to construct a model of
the visual cortex from the human vision system (HVS) focusing on the adjustments the HVS performs
in order to be able to perceive HDR content [8], [18], [45], [46] and [40]. One of the main use cases for
such models is to compare images for visual equality and therefore be able to compare the performance
of lossy image and video compression algorithms against each other.
Daly [8] was first to present such a visual difference predictor (VDP). The difference predictor presented
takes two images and information about the presentation environment (such as the distance of the viewer
and the pixel density) and uses a model of the human vision system (HVS) to mark the pixels that are
different enough within the two images to be detected by the HVS. The model takes into account the
contrast sensitivity function, the amplitude nonlinearity of the intensity detectors and contrast masking
effects. The algorithm was extended to support High Dynamic Range images [39].
In medical display devices it is often important to assure that all shades of gray can be actually perceived
by the observer. In 1992, Barten [3] introduces the notion of a just noticeable difference to the field of
visual perception. In order for all shades to be distinguishable, a logical increase in display brightness
has to be just large enough to be perceivable by the viewer. Later work by Barten [4] extends this with a
temporal component, taking the adaption luminance of the HVS into account.
The amount of additional brightness to reach the consecutive step is not constant but exponential. The
result of this is that the theoretical maximum number of JND steps that can be achieved is quite low. With
the first JND step starting at low starlight and the highest one at direct sunlight, there are about 2000 steps
that can be achieved according to Bartens Model [3]. For comparison, the display prototypes presented
in [54] cover 962 JND steps (for the projector-LCD prototype) and 1139 JND steps (for the LED based
prototype), yet not all of the JND steps are actually reachable by the prototypes due to a lack of accuracy
of both modulators.
The work presented here differs from related work trying to present an accurate model of the human
vision system, in that the goal is not to find a complete model, but rather an answer to whether higher
contrast is beneficial to stereopsis as well as questioning the role that crosstalk plays in this context. JND
calibration is an important characteristic of the accuracy of a display and was performed on the display
prototypes presented here. The results of the user study could be used to verify the reliability of the
visible differences predictor.
2.3
2.3.1
HDR Capturing and Processing
Capturing
In order to validate the output and calibrate the display it is required to capture the actual luminosities
presented by the display. One way to achieve this is to use a colorimeter and measure the display values
-8-
Chapter 2 - Related Work
directly. Another approach is to capture an HDR image using a camera. Due to cameras capable of
capturing HDR scenes not being readily available, algorithms that recover HDR images from multiple
low dynamic range pictures have been developed. This works by first recovering the response function
(up to scale) of the camera by combining the response of multiple pictures of the same scene taken with
different exposure times. Knowing this response function, the value of a pixel in a picture of a known
exposure time can be quickly converted to the relative irradiance value originally captured by the camera.
For robustness, pixel values from all taken images are combined using weights according to how reliable
the taken image at the given location is believed to be. The weighting function is usually a Gaussian
function with increasing weights towards the center of the output value range of 0 to 255 ([52], [12]).
2.3.2
Tonemapping
Since most displays currently available are not able to correctly represent HDR content, algorithms have
been developed to simulate visual effects (bloom for instance) that would be apparent to the eye if the
original scene was observed. One of the first tone reproduction / mapping algorithms was presented by
Tumblin [58]. It models an observer as well as the display device and adjusts the display drive values by
a delta calculated from the inverse of those models.
This means, that given it is wanted that a given luminance L is observed by an observer, specifics about
the display system (such as the transfer function) and the observer (such as the adaption luminance of
the human eye) need to be known. These models then need to be inverted and applied to the intended
luminance L in order to compute the level that has to be set in the pixel buffer. This has to be done in such
a way that the display devices produces such an intensity value that the human vision system observes it
as the intended luminance L (at a given adaption level).
All following tone mapping operators improve the models used in the work done by Tumblin and Rushmeier ([34], [47], [16], [17], [51], [27], [15] and [36]).
For a great overview of the state-of-the-art tone mapping algorithms refer to work performed by Kuang
[32] where comparisons have been performed and [35] where some of them are benchmarked against real
high dynamic range displays.
2.3.3
Storage
High dynamic range imaging requires more than just an improvement of the output and the recording
devices. Image storage formats for standard dynamic range images currently do not support high dynamic
range content. HDR images of course require more space to store as every channel of every pixel of an
image needs to represent a greater number of values. [59] presents an extension to the JPEG image
standard by storing information required to restore the high dynamic range components of the image in
an application specific marker of the standard JPEG format. This way high dynamic range information
can be stored along with the JPEG image, keeping it compatible with existing software.
-9-
Chapter 2 - Related Work
One of the standard high dynamic range image formats used in the industry is OpenEXR. It is described
in detail in [30] and a lot of material including a SDK can be found at [26]. EXR was the main image
format used throughout this work when handling high dynamic range content.
The HDR recovery algorithm presented by Robertson [52] was used for capturing HDR images of the
patterns used for the first approach done to calibrate the display prototype. The tone mapping operators
are related, even though the prototype is capable of producing a higher contrast, as the provided dynamic
range is not enough to display any HDR image. Especially the peak luminance of the display is not improved over conventional displays and tone mapping mechanisms are therefore still required and relevant
for reproduction of high dynamic range images.
2.4
Stereoscopic Displays and Ghosting
One of the main reasons for headaches and stereo sickness in prolonged consumption of stereoscopic content is caused by ghosting, which in turn is triggered by crosstalk between the image channels. Ghosting
manifests itself as a semitransparent shadow image of the content destined for the other eye. The distance
between the contours depends on the disparity of the image, which for a projective camera transform in
turn depends on the depth of the scene at a given contour border. An exaggerated illustration of how ghosting appears as can be found in figure 2.2. Crosstalk in fact is caused by the inability of the multiplexing
system to either correctly generate or decompose the multiplexed signal.
Figure 2.2: A stereoscopic pair of images of a scene with a sphere. From left to right: the image destined
for the left image, the image destined for the right image and the fusioned image with shadows of the
original image visible due to ghosting.
The amount of crosstalk certainly depends on the display technology used. Two image channels are
usually multiplexed either in time (active) or polarization (passive). While both technologies de-multiplex
the image channels using glasses worn by the observer, the technologies are inherently different: Active
time-multiplexed stereo setups use shutter glasses, where the left and right eyes are synchronized to the
left/right refresh rate of the display. Passive, polarization-multiplex based setups use polarization filters
and therefore no synchronization is required. The amount of crosstalk therefore depends on the quality
of the polarization filters for passive systems as well as the accuracy of synchronization for active setups.
Perfect image channel separation is possible by using a haploscope and therefore not multiplexing the
image channels at all, but keeping them separated.
When talking about crosstalk, literature usually refers to the image channels as the intended and the
- 10 -
Chapter 2 - Related Work
unintended stimuli. The intended stimulus represents the signal that is supposed to reach the targeted
eye, whereas the unintended stimulus is the signal that leaks from one channel to the other. The resulting
signal can be modeled as a simple addition of the intended and unintended stimuli as described in formula
2.3.
I = Iintended + Iunintended
Figure 2.3: The Intensity reaching the eye is the additive of the intended signal and the unintended
signal.
Further refining the model for crosstalk reveals that the crosstalk is usually dependent on factors like
the pixel position or the viewing angle. The unintended stimuli can therefore be defined as a function
dependent on these parameters (see formula 2.4).
Iunintended = f (x, y, α, β ) ∗ I (x, y)
Figure 2.4: The amount of unintended signal is usually dependent on the pixel position (x, y) and the
viewing angle (α, β ) of the observer as well as the image content of the opponent Image (I) at the given
position.
2.4.1
Crosstalk Compensation
If the amount of unintended signal is known, it can be compensated for by subtracting it from the intended
signal. There are two challenges by doing this: First, knowing the accurate amount of crosstalk usually
requires calibration which can become cumbersome to handle if all parameters are to be considered.
Second, due to the physical properties of light, crosstalk cannot be compensated for if the intended signal
does not contribute, since the intended signal cannot be set to less than zero. The approaches here differ
in the way they perform the subtraction as well as how calibration data is looked up and the parameters
that are taken into account when calibrating.
Lipscomb and Wooten [38] described a method to decrease crosstalk by first increasing the background intensity and then decreasing pixel intensities according to a specifically constructed function. The amount
of crosstalk correction applied is not uniform across the screen but the screen is separated into 16 horizontal bands and the function is adjusted for each of the bands individually.
Konrad et al [31] presented a model that takes into account the intended and unintended signal. The
calibration phase consists of a psycho-visual experiment for estimating the crosstalk factors which are
later used for compensation. The experiment works by presenting the user two rectangles in the middle
of the screen. One of the rectangles contains no crosstalk by presenting the same stimulus on both eye
channels, while the other rectangle contains crosstalk by presenting an unintended stimulus on the other
eye channel. The user then has to adjust the color patches by changing the crosstalk correction factor.
The stimulus is repeated using multiple combinations of intended and unintended stimuli and stored in a
- 11 -
Chapter 2 - Related Work
two dimensional lookup table. Upon rendering the image, the table is inverted in a preprocessing stage
and the necessary amount of crosstalk correction is retrieved from the lookup table for every pixel in the
stereo pair. The major downside of this algorithm is that it does not take the position of the corrected
pixel into account.
Smit et al [56] extend the crosstalk model presented by Konrad [31] by taking the vertical position of
the pixels into account. As the calibration table would become infeasible to handle due to having three
dimensions instead of two, the values encoded in the table are parameters to a function that additionally
take the vertical position of the pixel pair to be corrected. The calibration process presented in [31] is also
extended to include the vertical position.
Later that year, Smit et al presented three extensions to subtractive crosstalk reduction [57]. Firstly, Smit
proposes to use the CIELAB instead of the RGB color space to adjust the lightness component when
the resulting corrected RGB value would be clamped. The second proposal uses a geometric approach
by adjusting the intensity of the other pixel constituting the fused pixel instead of the pixel at the same
position in the other eye channel. Doing so requires finding the corresponding fused pixel by retrieving
the depth value of the pixel and calculating the disparity at the given depth.
Another approach to dealing with crosstalk is presented by Siegel [55]. In his work it is argued that if
the disparity of a stereo pair is small enough, the appearing borders can sometimes not be easily detected
but are perceived as a blur on contour borders. This blur can be unobjectionable if it is similar enough to
depth-of-focus and would then allow for a relaxation of the strict zero or little crosstalk requirement for
virtual reality applications.
The related work concerning stereoscopic crosstalk differs in a way, that the main goal of this thesis
is not to improve crosstalk compensation mechanisms but to understand the influence of crosstalk on
stereoscopic perception. The mechanisms presented here benefit from the fact that if higher contrast does
not improve image perception, contrast can be reduced in favor of less inflicted crosstalk and therefore
less visible ghosting.
2.5
Crosstalk Perception
Models of the Human Vision System (HVS) help to understand how our vision system basically works,
even though side effects such as headaches and stereo sickness in general are usually not yet incorporated
into these models because there is not enough known what exactly causes those symptoms. [1], [25] and
[44] have performed user studies trying to uncover these reasons.
Pastoor [44] describes factors involved in 3D imaging, especially visibility thresholds for ghosted contours at various disparities.
Hoffman et al [25] focusses on the vergence-accommodation mismatch: In a stereoscopic setup the depth
we perceive is caused by various depth indicators: stereopsis and vergence, parallax effects as well as
shadows. Yet, one of these indicators, accommodation, cannot be tricked into registering the correct
depth using a standard flat display surface.
- 12 -
Chapter 2 - Related Work
The reason for this is that both eyes still individually focus on the display surface, giving the HVS a
hint on how far away an object in the scene really is. Therefore, if an object is supposed to appear in
front of the scene and disparity, parallax and stereopsis indicate the expected depth, the depth of focus
that monocular vision indicates is still at the depth of the display surface. Hoffman et al [25] therefore
constructed a display prototype that uses three layers of semitransparent mirrors at three different depths.
The display interpolates the intensity of the projected image pixels, depending on the requested depth
of a pixel and therefore forcing monocular vision to focus on a depth layer closer to the expected depth,
minimizing the vergence-accommodation mismatch. The setup is also stereoscopic and therefore both
eyes can have individual images where a pixel may lie on planes of different distances to the eye. A
schematic of the display can be found in figure 2.5.
Top View
Side View
Front-surface
mirror
Far image planes
Mid planes
Semi-transparent
Mirrors
Near planes
IBM T221 TFT
Display (3840x
2400 pixels)
Figure 2.5: The setup of the display presented in Hoffman et al [25]. Two semi-transparent and one front
side mirror allow the eye to focus and converge at multiple image planes at different depths.
Hoffmans paper describes three different user studies in which the display was used. All of them use
sinusoidal gratings whose orientation has to be detected by the participant.
2.6
Stereopsis and the Correspondence Problem
Stereo matching is the first and most important part of the depth reconstruction process performed by
the HVS. Because there are two images available, pixels corresponding to the same physical location in
space have to be resolved. This task is referred to as the correspondence problem.
In [48] it is shown that, given the choice between a global match to a monocular image of the same
contrast or to an image of a higher contrast, the higher contrast match is preferred. A global match is a
match where all possible candidates in the neighborhood have to be considered collectively and only the
matches that fall within preferably smooth surface(s) are selected.
A computational model for disparity processing in the human vision system is presented by Mansson [41].
- 13 -
Chapter 2 - Related Work
The model is inherently different to previous models in a way that it does not rely on a set of predefined
higher level features such as edges (zero crossings in the second derivative of the image) or bars. Instead it
uses a hierarchical model of sub regions and compares the overall configuration of contrast within limited
regions. The information gained from the coarser levels is used to restrict the set of potential matches for
the finer levels.
It is well known that the processing of disparity is based on the first and second derivative of the image and
not on the actual absolute intensities. The receptive fields encoding the intensities are best described by
the Gabor function (see figure 3.4). It is still unclear on how the human vision system encodes binocular
disparity. Two main theories, phase and positional encoding exist: ”‘In the position difference model,
the receptive field profiles are assumed to have identical shape in the two eyes but are centewred at noncorresponding points in the two retinas ... In the phase difference model, the Gaussian envelopes of the
receptive fields are constrained to be at corresponding retinal points, but the receptive fields are allowed
to have different shapes or phases.”’ [10]
- 14 -
Chapter 3 - Depth Perception
3
D EPTH P ERCEPTION
Depth Perception is an important part of the vision system not only for humans but for all predators as
it supports estimating the distance to an object and therefore massively aids moving in a three dimensional environment. Even though the image captured by an individual eye is two dimensional, it contains
enough information to recover the qualitative and relative depth image, even allowing the estimation of
an absolute depth value. A single piece of information is called a depth cue.
The types of depth cues can be split into two distinct groups: Monocular and Binocular cues. Monocular
depth cues are available to the single eye, whereas Binocular depth cues require the relationship between
two distinct images projected on the individual retinas. Binocular cues work basically on the fact that the
eyes of a human are separated horizontally, which is generally referred to as binocular disparity. Depth
cues have been studied for a long time and have been employed by artists in order to achieve an effect of
depth in two dimensional paintings.
3.1
Perspective
One of the depth cues that form the basis for a few other cues is perspective. Perspective itself stems from
the fact that the lens of the human eye produces a perspective instead of an orthogonal projection of the
scene onto the retina. This causes objects further away from the eye to have a smaller projection area on
the retina than an object of the same size closer to the eye.
This gives rise to derived depth cues such as motion parallax. Motion parallax occurs if the relative
position of the eye towards two objects at different distances changes. Consider two objects in space
moving at the same absolute speed. Due to perspective, the projected images move at different speeds
with the object being at a further distance moving slower than the object closer to the image plane.
Motion parallax is very similar to depth from motion, stating that the perspective projection of an object
becomes smaller the further the object moves away from the eye even though the size of the object does
not change. The same effect is apparent if two objects of known size are positioned at different depths.
Depending on the distance, the relative size between the objects varies.
3.2
Occlusion
The visual cue that one object covers another object completely or only parts of the other object does not
reveal any absolution depth information. Yet it is critical to depth perception, as it allows for an absolute
ordering and thereby helps estimating depth values. Occlusion may at first consideration be a very simple
computation for the brain as it only requires information about which object covers another object. At
second thought though it becomes clear that the notion of what defines the boundary of an object requires
higher knowledge of the objects in the scene.
The monocular occlusion information is rather ambiguous, as it is not apparent from a single image if the
- 15 -
Chapter 3 - Depth Perception
object actually continues behind the edge of the occluding element. It could – although very unlikely in
a real world scenario – be the case that the object actually ends exactly at the edge and is actually closer
to the observer than anticipated by the occlusion information. An artificial example of such a situation is
shown in figure 3.1.
There are two distinct aides in human vision that help make occlusion more robust: temporal consistency
and binocular vision. Temporal consistency allows the brain to correlate images following each other
temporally. If, as in the example given above, either one of the objects or the observer changes position,
the optical ambiguous situation is resolved. In most situations binocular vision resolves this problem as
well, as it provides a different perspective onto the same scene.
Figure 3.1: Occlusion can be ambigous in the monocular case, but is usually very well resolved by
binocular vision. The first image lets us conclude that the red rectangle is behind the blue rectangle as
its side aligns perfectly, which would be an uncommon situation when viewing. The second and third
image show the images as perceived by the left and right eye, which resolve that the red rectangle is
actually in front of the blue rectangle and the monocular image, by accident, alignes borders with the
blue rectangle.
3.3
Vergence
Human beings, such as most other predators, have binocular vision with eyes separated horizontally. As
both eyes focus on the same object, the view axes of the eyes have to point towards the object in question.
It is obvious that the angle under which the object subtends each eye is offset due to the horizontal
displacement. The difference of these angles directly relates to the distance of the object and therefore
serves as a source of depth information.
With stereoscopic image content, vergence is directly related to the disparity of fused image features,
as the disparity controls how far apart fused pixels in the image plane are. The distance between the
pixels can have a positive as well as negative sign, resulting in a fused image appearing either in front
of or behind the focused image plane. When relating this to the vergence of the eyes, a stereoscopic
stimuli with a positive disparity is usually referred to as an uncrossed stimuli, whereas one with a negative
disparity would be referred to as being crossed. Figure 3.2 visually describes change of the vergence angle
according to the distance object.
It is important to note that disparity on an image plane has a maximum separation equal to the distance
of the eyes when uncrossed. This is due to the fact that the view directions of the two eyes become closer
- 16 -
Chapter 3 - Depth Perception
View direction left eye
View direction right eye
Convergence
angle
β
α
View direction left eye
View direction right eye
Convergence
angle
Point of Focus
α
β
Point of Focus
Figure 3.2: Vergence controls the orientation of the eyes. At the same time it acts as a depth cue by
providing the angle of orientation to the vision system.
to being parallel the further the object moves away from the observer.
3.4
Accommodation
Since the eye is not a simple pinhole camera but rather consists of a lens and a retina, the eye is required
to focus the image to a specific depth. This means that in case of the human eye, the curvature of the
lenses can be adjusted by the ciliary muscles, an act that is called accommodation. Accommodation is
used to focus on an object at a specific depth. The information of how much the lens is contracted acts as
a depth cue to the vision system.
Accommodation is believed to be one of the main problems in stereoscopic displays employing a single
surface. The eye always accommodates to the display surface while both eyes converge to an object
believed to be at a different depth. This problem is referred to in the literature as the AccommodationVergence mismatch. The solution of the problem would be a true light field display, causing the eye to
focus at the same depth as towards which the eyes converge.
3.5
Stereopsis
Even though a human typically has two eyes, the brain is usually confronted with a single image. This
image is considered to be the fused image of the two eyes and is referred to as binocular fusion.
3.5.1
Horopter, Vieth-Müller Circle and Panum’s area
The horopter describes a surface containing all points that are fused to a single vision when focusing on
a specific point in depth. Points belonging to the horopter all have an equal distance to the center of both
eyes. Initially it was believed that the area of single fusion is formed by the circle defined by the fixation
point and the two centers of the lenses.
- 17 -
Chapter 3 - Depth Perception
Fixation Point
C
Empirical
Horopter
D
B
E
A
Vieth-Müller
Circle
Theoretical
Horopter
E
D
C BA
Left Eye
Nodal
Points
A
ED C B
Right Eye
Figure 3.3: The Vieth-Müller Circle, also known as the Theoretical Horopter and the Empirical
Horopter. Points on the horopter represent points that result in single, fused images. Points too far from
the (empirical) horopter are seen as double images (diplopia).
By observing images in his newly invented haploscope, Wheatstone empirically found that the actual
area of single vision is larger than the theoretical horopter which lead to the discrimination between the
empirical and theoretical or geometrical horopter. Often, the geometrical horopter is referred to as the
Vieth-Müller Circle (see figure 3.3), named after the inventors Gerhard Vieth and Johannes Müller.
In 1858 Peter L. Panum [43] found that points within an area around the horopter can also be fused to a
single vision. This area is known as Panum’s area of single vision. Points in front of, or behind Panum’s
area will appear as two separated object, generally referred to as diplopia.
3.5.2
Neurophysical Basis
Since Sir Charles Wheatstone presented the first Mirror-Stereoscope in 1833, many theories have been
formulated as to how the human vision system combines the two separate streams of information arriving
from both eyes. Wheatstone himself believed that the images where analyzed independently of each other.
This analysis included finding contours and higher order shapes which at a later stage in the human visual
cortex are combined into a single fused view. It was considered that stereopsis used many of the available
depth cues in order to fuse an image.
It was not until Béla Julesz presented Random Dot Stereograms [29] in 1959 which displaced the theory
of late binocular fusion. The reason for this change in belief was that Random Dot Stereograms do not
contain any other depth cue than pixel patterns displaced by some disparity and yet allow the perception
of depth. A random dot stereogram is created by taking an image of random black and white dots and
displacing parts of it by some pixels for the opposing eye. The amount the random pixel pattern is shifted
directly relates to the fused depth.
- 18 -
Chapter 3 - Depth Perception
An important aspect of perception is the distance between the two eyes, called interpupillary distance
(IPD). The IPD has a major influence on the perception of stereoscopic content. The vast majority of
adults have an IPD between 50 – 75mm with the average being at 63mm [13]. An important aspect when
designing stereoscopic systems is that the IPD for children of age 5 to 15 years is in the range from 40mm
to 65mm. If this is not considered, the perceived depth may be well beyond the limits of stereo fusion for
children.
Neurophysical research has been performed on cats and macaque monkeys in order to decode the individual cells involved in the decoding of the depth information available in the stereoscopic image streams. In
the human vision system the neural signals from the ganglion cells first pass through the lateral geniculate
nucleus (LGN) of the thalamus before arriving at the visual cortex. There seems to be no direct evidence
for disparity selective cells in the LGN. Instead disparity selectivity emerges first in the primary visual
cortex (V1) where signals from the two eyes converge upon single neurons [11].
Most of the input to V1 arrives at so called simple cells, which are orientation-selective and consist of
receptive fields whose profiles are found to be well described by a Gabor function [28], [19], [10].
A Gabor function is the product of a Gaussian Bell and a sinusoid, as defined in figure 3.4, where x0
and ω correspond to the center position and width of the Gaussian envelope, f and Φ denote the spatial
frequency and phase of the sinusoid, and k is an arbitrary scaling factor [11]. The resulting visual stimuli
of a Gabor function with specific assignments to the center position, width, frequency, phase and scaling
factor is called a Gabor patch (such as the one shown in figure 3.5.
G (x) = k × exp − (2 (x − x0 ) /ω)2 × cos (2π f (x − x0 ) + Φ)
Figure 3.4: The Gabor function
Figure 3.5: A gabor patch oriented at 45° with a gaussion envelope (50 pixels standard deviation), a
frequency of 0.05 cycles per pixel. The original image had a with and height of 500 pixels. The patch
was generated using the online gabor patch generator found at [7]
It can be argued that V1 simple cells are responsible for the first stage of disparity processing in the brain.
One of the questions remaining is, how the simple cells encode binocular disparity. According to [11]
- 19 -
Chapter 3 - Depth Perception
there exist two main theories: The theory of position difference and phase difference: In the position
difference model, the layout of the receptive field is the same for both eyes but its center is located at noncorresponding points on the two retinas. In the phase difference model, the receptive fields are centered
at corresponding retinal points, but the two receptive fields are allowed to have different shapes or phases
[11].
Studies related to depth perception also to some extend incorporate findings of effects from binocular
rivalry. Binocular rivalry (BR) occurs if the image pair is too different in each eye to fuse a single image.
In that case, the human vision system switches between the two images. The study of binocular rivalry
aides the understanding of binocular depth perception as it represents a state of failed binocular fusion.
With functional magnetic resonance imaging (fMRI) becoming more widely available (see [6]), perception studies relying on brain scans using fMRI are becoming more and more common as the technique
is non-invasive and allows interactivity to some extent. Yet, most of those studies try to resolve the most
inner workings of perception such as the perception of motion, well beyond the spectrum of this thesis.
3.6
Perception of Crosstalk
Since most available stereo systems use some kind of multiplexing to deliver individual images to the two
eyes, the perception of crosstalk is an important topic to discuss. Considering a simple scene with one
high contrast border at a disparity of some amount as depicted in figure 3.6.
Left Eye
Right Eye
Intended Image
Perceived Image
Figure 3.6: A stereoscopic contrast curve. The top images are the intended signals for the left and right
eye, the bottom images the actually perceived images.
It is usually assumed that the amount of crosstalk is symmetric, such that white becomes darker by a
certain amount and black becomes lighter by that same amount. While the amount of crosstalk is equal,
the amount of ghosting which is perceivable, is not. Due to Webers’ Law (see Figure 5.6), the ghosting
in figure 3.6 is less visible for the right eye than for the left eye. This is due to the fact that the brightness
difference required for a contrast edge to be detectable for the eye is greater at brighter intensities. The
actual perceived image naturally depends on the media presenting the image due to varying transfer
- 20 -
Chapter 3 - Depth Perception
Figure 3.7: First derivative of the contrast curve presented in Figure 3.6. Even though the amount of
crosstalk is the same, the absolute brightness levels differ for the left eye and the right eye. This
difference causes the bright contrast edge to be easier to detect than the dark contrast edge.
functions but the effect is still apparent even if linearity is given. Figure 3.7 visualizes the first derivative
of the contrast edges in figure 3.6 indicating that the amount of crosstalk is the same for both cases. Please
note that the effect requires a linear transfer function of the output media to be objectively judged.
This fact can be easily reproduced on any stereoscopic display by displaying a contrast curve with some
disparity and allowing the observer to adjust the intensity of the black level. If the observer first adjusts
the black level with only one eye observing the contrast curve such that the ghosted area is not visible
the ghosted area will be apparent in the other eye. Most ghosting compensation algorithms do employ
the fact that lowering the contrast ratio will also lower the amount of perceivable ghosting, but lack an
explanation of this asymmetry.
- 21 -
Chapter 4 - Construction of a HDR Stereo Display
4
C ONSTRUCTION OF A HDR S TEREO D ISPLAY
Since the goal of this thesis is to evaluate the correlation between higher contrast ratios and crosstalk
concerning stereopsis, a display capable of producing a higher dynamic range, while still being able to
display stereoscopic content, became a necessity. It should be noted that at the time this project started
(late 2009), the first LCD panels capable of refreshing the screen fast enough to support stereoscopic
viewing at 120Hz became commercially available.
The construction of such a HDR Stereo display manifested itself in two prototypes: The first one is based
on the approach presented in [54], using a projector as the back panel modulator and a stereo LCD panel
as a front modulator. The second prototype uses two identical stereo LCD panels and a bright LED
backlight and is shown in figure 4.1.
Figure 4.1: The final (second) prototype built from two stereo LCD panels.
4.1
LCD - Projector Approach
The first prototype lends its approach from Seetzens’ prototype [54], using an Infocus DepthQ projector
as the back modulator and the LCD panel of a Samsung 2233RZ as the front modulator. As the projector
is able to direct the amount of light in every individual pixel, it acts as an intelligent light source, whereas
the standard backlight would direct the same amount of light towards all pixels.
Since the resolutions as well as the aspect ratios of both devices differ, a homographic transform was
used to stretch the projector image accordingly. The setup worked for monoscopic viewing, whereas for
stereo viewing, synchronization issues appeared: The used stereo setup is based on NVidia’s consumer
- 22 -
Chapter 4 - Construction of a HDR Stereo Display
line stereo system, 3DVision. The problem with this system is that the graphics card is able to detect the
model and brand of the display device and the driver adjusts the shutter delay of the glasses accordingly.
Since we were using two different devices, both devices had different delay timings and it was therefore
not possible to get a single synchronized stereo delay for both display systems.
While this problem theoretically could have been solved by adding a delay to the LCD panel, it was
not the only issue this first approach suffered from: The Infocus DepthQ projector is a DLP device,
whereas the Samsung 2233rz uses an LCD panel. Not only do the technologies used here inherently
differ, the resulting refresh cycle also differs: The single chip DLP projector sends the red, green and blue
components of the images in sequential images towards the front modulator. The LCD panel on the other
hand refreshes all of its pixels from top to bottom, where red, green and blue components are refreshed at
the same time. When comparing the two time lines next to each other, it becomes obvious that depending
on the order the DLP projects images, parts of the LCD do not get any image in that color. The solution
would have been to remove the color wheel altogether and only use a monochromatic back modulator (as
in [54]), or using a three chip DLP projector, solving the display technology synchronization issue, but
the synchronization delay issue on the software / driver side would have still been apparent.
The approach of using a projector as the back modulator for a HDR stereo display was therefore abandoned and another approach, using layered LCD panels, was pursued.
4.2
Dual Layer LCD Approach
Due to the multiplicative nature of light modulators, all known approaches of producing a HDR display,
use two (or more) modulators. In the first prototype the DLP represented the first modulator, the LCD
panel the second. Dissecting a DLP projector more closely reveals that in fact a DLP projector consists
of a bright light source and a Digital Mirror Device (DMD). Instead of the DMD and a single LCD
panel, the second prototype uses two Samsung 2233rz LCD Panels mounted directly in front of each
other. This moves the modulator from the projector closer to the front modulator and therefore does not
require distance in depth to achieve a decently sized image. A schematic of the dual layer lcd approach
is presented in figure 4.2. An array of 6 by 4 high power LEDs acts as a very strong, uniform backlight
which is closer described in chapter 4.4.
A major advantage of using two LCD panels is that the same model can be used as the front as well as
the back panel. The pixel alignment also becomes very simple since resolutions match up and the panels
can be aligned along a common edge. Using the same panel twice also allows stereoscopic content to be
viewed with fewer synchronization issues as the image delay for both panels internally is very likely to
be at least nearly identical.
4.3
Polarization
Due to the stacking of multiple LCD layers, it is important to understand the effects of polarization
happening as the layers of polarization foil, of which liquid crystal displays are built, interact optically.
- 23 -
Chapter 4 - Construction of a HDR Stereo Display
Backlight
Diffuser
Back Panel
Front Panel
Figure 4.2: Schematic of the dual layer LCD approach.
In order to fully comprehend the further steps taken, it is important to understand the basic principle of
how LCD panels work. A chromatic LCD panel typically consists of four layers:
• A polarizer (polarization foil)
• A color filter depending on the color of the pixel (red, green or blue)
• A Spatial Light Modulator (SLM), in case of this prototype based on Twisted Nematic (TN) cells.
• An Analyzer (polarization foil rotated 90°to the Polarizer)
Light passing through the LCD panel becomes linearly polarized by the Polarizer, in which wavelengths
of unwanted colors are filtered out by the color filters for the individual pixels. The Spatial Light Modulator then rotates the polarization of the light depending on the state of the twisted nematic. Depending
on how much the polarization of the light is rotated, the more the polarization direction of the light is
directed towards a direction allowed to pass the following analyzer. The more light passes the analyzer,
the brighter the final pixel will appear.
As an example, given the twisted nematic layer rotates the polarization by 45°, 50% of the light is absorbed and transformed to heat, whereas the other 50% passes the analyzer and hopefully reaches the
observers eyes. Due to this, light coming from the display is polarized linearly in a way which matches
the polarizer alignment of the shutter glasses. Even though the example given stated that a rotation of
45° manifests itself as a reduction of half of the light, the relation is not linear but instead follows Malus’
Law (figure 4.3):
I = I0 cos2 θ
Figure 4.3: Malus’ Law. I0 corresponds to the initial intensity and θ to the angle between the
polarization directions of the polarizers.
- 24 -
Chapter 4 - Construction of a HDR Stereo Display
4.3.1
Rotated around Y-Axis
When stacking two LCD panels on top of each other it becomes clear that the analyzer of the back
panel and the polarizer of the front panel are aligned perpendicularly, preventing any light from passing
through. The obvious solution would be to rotate the front panel by 180°around its vertical axis causing
the analyzers to be next to each other (see figure 4.4). This will only work if the linear polarization filters
are aligned with the rotating axis. In that case, the analyzer of the front panel and the analyzer of the back
panel would align in polarization direction and would, when using a perfect polarizer, let the entire light
pass through. Of course there is always a loss of light, even if polarizers are aligned perfectly.
180 °
Analyzers
Backlight
Analyzers
Backlight
Conflict, all light is filtered
Conflict resolved.
Polarizers
Polarizers
Spatial Light Modulators
(SLM)
Spatial Light Modulators
(SLM)
Figure 4.4: If the polarization of the back panel analyzer and front panel polarizer have to match and
the polarization is aligned or perpendicular to the y axis of the display, one of the panels can be rotated.
This causes the polarizer and analyzer to swap roles and invert the flow of light.
In general though, polarizers and analyzers of any modern LCD panel are not oriented along the vertical
axis. Therefore, the trick of placing both analyzers next to each other by rotating the front (or back) panel
will not work (as explained in figure 4.5).
180 °
Analyzers
Backlight
Conflict remains!
Polarizers
Spatial Light Modulators
(SLM)
Figure 4.5: In practice, polarization foil is not aligned with the display axis but oriented at 45°. This
causes the polarization to remain rotation invariant.
- 25 -
Chapter 4 - Construction of a HDR Stereo Display
4.3.2
Rotated around Z-Axis
Another quite viable solution is to rotate one of the displays 90°around the z-axis as shown in figure 4.6.
This has three negative aspects: Firstly, the visible area is reduced to a quadratic rectangle with a lateral
length equal to the height of the original panel. The remains of the panels are then on either side or on
top or bottom of the display and cannot be used for double modulation as the light path would only pass
through one display.
Secondly, considering that the LCD panels used are chromatic and not greyscale, adds another problem
to this approach, visually explained in figure 4.7: With chromatic displays, pixels are not square but
rectangular stripes of red, green and blue pixels or set up in a Bayer grid pattern. Aligning the displays
perpendicularly causes the sub pixels to cover each other, further reducing light throughput as the light
path for a majority of the pixels would pass through two different color filters essentially blocking most
of the light.
Finally the alignment of the panels becomes far more difficult as they would not share a common edge to
align with.
Analyzers
Backlight
Conflict resolved
Polarizers
Usable display area
Spatial Light Modulators
(SLM)
Figure 4.6: Rotating one of the panels by 90° would reduce the usable HDR area to the square of the
height of the display.
4.3.3
Using a Wave Retarder
Since disassembling the LCD Panels and removing the polarization foil seemed too invasive and too
influential on image quality itself, it was first decided to try using a wave retarder that would re-orient the
polarization in between the analyzer of the back panel and the polarizer of the front panel. Since optical
wave retarders with the size of a 22 inch LCD panel are rather expensive, cellophane film was tried as a
wave retarder. Ortiz-Gutiérrez et al [42] describes the optical aspects of cellophane film in the context of
- 26 -
Chapter 4 - Construction of a HDR Stereo Display
Single Pixel
Unpredicatble Sub-pixel misalignment
Figure 4.7: Each individual pixel in a screen consists of 3 sub pixels for red, green and blue which are
aligned as stripes. Rotating one of the panels would cause the overlapping sub pixels to mismatch and
therefore reduce brightness.
polarization and praises the wide band of wavelength, cellophane film rotates by a half-wave. While the
effect could be reproduced, the cellophane films’ spectrum of operation was not as wide as is required,
leaving longer wavelengths in their original state and therefore not rotating the full spectrum of white
light. This manifested itself as a yellow tint as the bluish components of white light were absorbed by the
front panel polarizer.
4.3.4
Inverted Mode
The next idea involved removing the analyzer of the back panel and operating the back panel in an
inverted mode. This also inverted the task of the twisted nematic cells: Instead of rotating the polarization
of the light that is supposed to pass the analyzer towards the orientation of the analyzer, the cells were
now responsible of manipulating the polarization of the light which was supposed to be absorbed by the
analyzer. Therefore, following the light path from the light source through the display to the eye, the
following is supposed to happen:
• Light passes through the polarizer, removing light not aligned vertically.
• the remaining light passes through the spatial light modulator (SLM), rotating light away from
the vertical polarization direction when activated, letting the light pass through with its vertical
polarization when deactivated.
• The polarizer of the front panel acts as the analyzer for the back panel. As it is also aligned
vertically, it absorbs light that was rotated by the SLM.
• The front panel spatial light modulator rotates light towards the horizontal polarization direction of
the front panel analyzer.
• Finally, the front panel analyzer absorbs light that is not aligned horizontally. This is light that was
not affected by the SLM.
- 27 -
Chapter 4 - Construction of a HDR Stereo Display
In this setup the two LCD panels operate in two different modes: The front panel works in the classic
mode where the front panel and the back panel polarizer orientations are perpendicular to each other. The
back panel acts in an inverted mode, where the analyzer and polarizer are aligned to each other.
Therefore, in order to produce a white pixel on the back panel, the intensity value has to be set to zero,
whereas a white pixel becomes visible if a front panel pixel is set to 255. This can be achieved transparently for all applications by modifying the color lookup table in the graphics driver. This table is used by
the graphics driver to map 8 bit Red-Green-Blue tuples to values that are then sent to the display device.
In order to invert the colors back to normal it was therefore possible to set a color lookup table that
declined rather than increased with increasing index value, mapping higher index values to lower intensity
levels, which in inverted mode caused the twisted nematic to rotate less light away from the analyzer
polarization orientation.
The main problem with this approach was that a lot of light passes by the twisted nematic cells without
being influenced by its structure and maintaining the polarization as mandated by the polarizer. In the
normal case this is not a problem as the unaffected light would not pass by the analyzer, but instead being
absorbed. Since the analyzer and the polarizer of the back panel were now aligned, this was not the case
anymore and contrast was reduced by two orders of magnitude.
4.3.5
Center Analyzer removed, Back Panel Polarizer rotated
The final approach was to rotate the polarization of the back panel by 90°. This involved removing the
analyzer as well as the polarizer of the back panel. Since the polarizer of the front panel has the same
alignment as the analyzer of the back panel, it was not necessary to add another (rotated) analyzer. This
allowed the polarization foil in between the two spatial light modulators to act as both, polarizer and
analyzer, at the same time. Completely removing one analyzer improved light throughput (even if the
incident light is perfectly aligned with the polarization foil, there is still some amount of light that is
absorbed) and reduced the chance of the back panel being misaligned with the front panel which would
have further reduced transmissivity (see figure 4.8).
4.4
Backlight Construction
A typical color LCD panel has a transmissivity of about four percent. Double modulation using two LCD
panels therefore requires twenty-five times the amount of light in order to achieve the same brightness.
In the demonstrated setup this was improved slightly by removing a redundant analyzer and changing the
polarization of the back panel. Nonetheless, a strong backlight is an essential part for building a high
dynamic range display with two stacked layers of chromatic LCD panels.
Additionally, using double modulation allows a finer control of pixel intensities since the transmissitivity
level of every pixel is controlled by two, instead of only one 8 bit value per color channel. This additional
bit depth can be used to either increase the accuracy of the displayed intensity or to support a brighter
peak luminance while keeping the accuracy the same. Due to the light loss of the two LCD panels it was
- 28 -
Chapter 4 - Construction of a HDR Stereo Display
Removed!
Analyzer
Backlight
Polarizer rotated (90°)
Polarizer and Analyzer at the same time!
Spatial Light Modulators
(SLM)
Figure 4.8: Final working approach by removing the back panel analyzer and replacing the back panel
polarizer with polarization foil rotated by 90°.
decided to build the brightest possible backlight within the cooling and budgetary constraints. A picture
of the final backlight unit can be found in figure 4.9.
Figure 4.9: The final backlight consisting of 24 LEDs aligned in a six by four grid.
4.4.1
Light Sources
In recent years Light Emitting Diodes (LEDs) have drastically improved their brightness per watt level
and were an obvious choice for such a backlight. The final LED array used was the Bridgelux BXRAC2002 (as shown in figure 4.10) because two of those LEDs can be powered by one Mean Well LPC
60-1570 constant current power supply. In this configuration every LED produces about 2270 lumen of
white light while using 28,875 Watts of power. In order to fit the aspect ratio of the screen and keep a
- 29 -
Chapter 4 - Construction of a HDR Stereo Display
smooth lighting across the surface, twenty four LEDs were aligned in a six by four regular grid.
Figure 4.10: Closeup of the LED light sources.
Even though LEDs are highly efficient at producing bright white light, the vast amount of energy is still
converted to heat which has to be accounted for and therefore cooled. Custom-building a heat spreader
that would fit the screen perfectly would have been the optimal solution but was out of the scope of the
project. So twenty four standard CPU socket coolers, one per LED, were mounted together in a rectangular shape. Since the luminous output of a LED decreases with the temperature the LED is operated at,
excess cooling would only manifest itself as a brighter final image. The coolers used are designed for
processors with a thermal design package (TDP) of 65 Watts and provided good enough cooling, even in
a vertical alignment of multiple heat spreaders.
4.4.2
Layout
In order to understand the number of LEDs and their layout, an application was written to simulate the
light pattern reaching the display if it were to be placed at a specific depth. The application considered
parameters including the number of LEDs used, the layout and size of the grid as well as the type of
LED used. The simulator went as far as considering the light distribution pattern of the LED as the
light intensity is not evenly distributed across the hemisphere in front of the LED. The output of such a
simulation included the light intensity in cd/m2 , the variance of the light across the surface, information
on the required power supply and the cost of the parts. The resulting light pattern was also presented as
shown in figure 4.11.
The final placement of the backlight was determined by practical constraints and was placed at about
seven centimeters from the panel to spread the light contribution further across the backlight diffuser and
therefore avoided luminous hotspots. It also aided cooling as it provided enough space for air to flow
freely between the LEDs and the panels.
- 30 -
Chapter 4 - Construction of a HDR Stereo Display
Figure 4.11: Visual result of the backlight simulation. The white hotspots represent the LEDs that are
aligned in a 6 by 4 grid in this iteration. The simulation takes the layout, the type and the light intensity
distribution of the individual LEDs into account.
4.5
Visual Artefacts
Distributing the task of producing enough luminous power among twenty four LED arrays made the
backlight quite diffuse, but it still produced visible hotspots on the final image. To reduce the hotspots
an additional three millimeter thick diffuser with a transparency of 42% was added to completely remove
the hotspots and make them undetectable to human vision.
Another disturbing artifact was the varying brightness depending on the viewing angle, as it was not
uniform across the screen. It described a sinusoidal grating repeating itself multiple times. The frequency
of the grating depended on the distance of the viewer to the screen. It is important to note that the effect
was only visible horizontally and not vertically. A closer observation of a LCD panel provided a solid
understanding for this effect, but no proof has been found to verify it:
When viewing a single LCD panel under a microscope, it reveals the red, green and blue sub pixels. These
are aligned in vertical stripes. Stacking two panels perfectly on top of each other will align the vertical
stripes as well. Because physical pixels do not only have a width and a height, but also a depth that light
has to pass through, the light ray from the backlight may traverse into neighboring sub pixels instead of
the corresponding front panel sub pixel.
Therefore, a light ray perpendicular to the screen will probably pass through the corresponding sub pixels
in both panels, but a light ray at a different angle will – depending on the entry position of the ray at the
back panel – traverse a sub pixel neighbor of the corresponding front panel pixel. As the retina of the
human eye is smaller than the visible area of the display all but the central vertical pixel stripe will have
a non-perpendicular angle to the viewer.
In a monochromatic display this would not be a big problem and could be accounted for digitally (see
guarnieries’ work [22]) if the position of the eye is known because the corespondances between the back
- 31 -
Chapter 4 - Construction of a HDR Stereo Display
panel and the front panel pixels can be computed. The LCD panels used have built-in color filters instead
of a separable layer, and the removal thereof was not possible. Therefore the only solution to this problem
was to diffuse the resulting image using a diffuser. Of course this reduced contrast by a factor of about
three, but resulted in a much more appealing image.
- 32 -
Chapter 5 - Calibration
5
C ALIBRATION
The motivation for calibrating any output device such as a display, a printer or a projector, is to be able to
know the exact luminance presented for a given display drive level. The goal of calibration therefore is to
find a function f that transforms the drive level to the output luminance L. The form of such a formula is
given in figure 5.1. Such a function is called transfer function of a device. A feasible implementation of
a transfer function is a lookup table that maps drive levels to output luminances.
L = f (PDL), PDL = f 0 (L)
Figure 5.1: General transfer function mapping a pixel drive value to a presented luminance as well the
inverted function, mapping from a luminance to a pixel value.
An application intending to display a luminance of a given level L will, on the other hand, use the inverse
transfer function to lookup a display drive level that needs to be set in order to produce that luminance
level. Inverting a transfer function generally incurs a loss of accuracy, as the resolution of the pixel drive
level is usually in the range of eight to ten bits, whereas the luminance requested is of analog nature.
Therefore, luminances are usually mapped to their closest possible representation for the inverted transfer
function.
Depending on the accuracy of the calibration algorithm, other factors such as the pixel location which is
to be calibrated, the position of the viewer, temperature and possibly luminances of neighboring pixels
can be additional parameters to the transfer function. A tradeoff has to be made between the size and
complexity of the lookup table and the generality of the transfer function. Environmental influences
such as ambient light are implicitly measured and therefore included in the calibration without requiring
special care. Other factors such as the viewer’s position are usually ignored altogether.
Calibrating a display therefore consists of sampling the color space by displaying those samples and measuring the corresponding response. Depending on the display device it might be infeasible to completely
sample the full color space. In such a case linear interpolation or more advanced curve fitting can be used
to fill missing samples with estimated measurements.
The transfer function of a double modulator display, such as the one presented in this thesis, is different
in that the pixel drive level is not a single value but a tuple of two values representing front and the back
panel. Full calibration for double modulation displays thereby requires iterating both 8 bit drive levels,
resulting in 216 required measurements per color component. Even though LCD panels consist of separate
red, green and blue sub pixel, color independence of those channels is assumed.
Two approaches to calibrate the display were attempted, which differ in the way the luminosity presented
is measured. The second attempt used a colorimeter with a modified software stack, whereas the first
attempt used a consumer grade DSLR camera and is described in the section below:
- 33 -
Chapter 5 - Calibration
5.1
Using HDR Image Recovery
A digital camera is at the core a light intensity measurement device capable of recording multiple samples
at the same time. An image of a display presenting a calibration pattern contains the relative luminance
values of the perceived calibration pattern. In order to recover those luminance values from the captured
image, the location of the individual pixels have to be associated with the pixels in the calibration pattern.
This association is created by determining the intrinsic camera parameters and a homography transform
representing the linear transformation from the display to the camera. Both can be measured by capturing
the image of a checkerboard pattern. First, corners in the reference checkerboard image and the captured
image are determined using OpenCVs findChessboardCorners. For further accuracy, the corners are
refined using OpenCVs cornerSubPix, which improves the actual position of the corner by inspecting the
surrounding gradients.
After the exact corners have been located in the reference as well as the captured image, a 2D camera
matrix is created using initCameraMatrix2D and calibrateCamera providing the corners of the reference
image as object points (extended to 3 dimensions) and the corners found in the captured image as the image points. calibrateCamera estimates the intrinsic and extrinsic camera parameters from multiple views,
but only a single view is used here. The resulting intrinsic parameters are used to undistort the captured
image, the extrinsic parameters are ignored. Finally, a homography is calculated using findHomography
with the corners found by a call to findChessboardCorners on the undistorted captured image. The parameters are then stored as calibration data which can be reused for any calibration image captured as long
as the relative position between the camera and the display is not changed. The process of measuring the
intrinsic and extrinsic camera parameters is explained in figure 5.2
Take image of scene
Find camera distortion
parameters
distortion parameters
Find homography transform
and crop
Homography matrix (3x3)
Figure 5.2: In order to find the lens distortion parameters as well as a homography transform, a
checkerboard pattern is presented, captured using a standard, single jpeg image and compared to the
presented checkerboard pattern using OpenCV. These parameters are then used to undistort the
captured HDR images.
Every calibration image captured is undistorted using the camera distortion parameters, homography
- 34 -
Chapter 5 - Calibration
warped and cropped, aligning the pixels in the captured image with the pixels in the source calibration
pattern (explained in figure 5.3).
Take hdr image of scene
Apply undistortion
distortion parameters
Apply homography transform
Homography matrix (3x3)
Figure 5.3: The captured images using multiple exposure times are recombined to a HDR image, then
undistorted and homography warped in order to align the captured pixels with the calibration image.
Since the dynamic range of a consumer DSLR does not capture high dynamic range image content, multiple exposures are used to recover such content. Every HDR image was captured using 16 exposures
ranging from 1/500s to 30s, the maximum range possible with the Canon EOS 400D used in this project.
The HDR recombination uses the algorithm presented by Robertson et al [52] which is implemented in
the PFStools package. The implementation claims to be able to recover the actual absolute candela values
provided by the Y component of the XYZ color space stored in the resulting OpenEXR files.
The calibration pattern used consisted of multiple patches of size 256 by 256 pixels. The red component
was increased by a step size of 4 units in the x dimension of the patch, whereas the green component was
incremented by the same step size in the y direction of the patch. From patch to patch the blue component
was also increased by a step size of 4 units. The larger step size allowed for a slight misalignment of
the registration, because a single pixel measurement corresponded to four pixels in the captured image.
In this configuration, two calibration images consisting of 24 and one consisting of 16 patches where
required allowing a full calibration to be performed by capturing three high dynamic range pictures using
16 exposures.
Unfortunately the result of this calibration approach was unsatisfying as blooming effects from the longer
exposed pictures caused inaccurate measurements of the black levels. Also, the absolute luminosity
values reported by the HDR recovery algorithm were questionable. It was therefore decided to try another
approach using a device more suited towards performing accurate measurements: a colorimeter.
- 35 -
Chapter 5 - Calibration
5.2
Using an Intensity Measurement Device
A colorimeter is a device created for sensing luminosities and allows the measurement of color. At the
hardware level a colorimeter is similar to a digital camera, as both are built using CMOS sensors, but
the way the measurements are made differ drastically: a digital camera counts the number of photons
reaching the individual CMOS cells in a given time frame called the exposure time. A colorimeter on the
other hand measures the time taken until a specific number of photons reach the CMOS sensor. The time
required for a measurement using a colorimeter therefore depends on the measured brightness with dark
luminances (fewer photons) requiring a longer time to measure compared to bright luminances (more
photons). A colorimeter reports a single measurement at a time.
Colorimeters are usually bundled with software for calibrating standard display devices. The calibration
algorithms sample the color space at various locations and fill the missing measurements by estimation.
Unfortunately the software is not suited for use with double modulation displays and custom software
was required for calibration.
Fortunately the open source color management suite Argyll [21] includes support for the Datacolor Spyder 3 device that was chosen. Parts of the calibration software were modified to allow triggering a single
measurement with the device and retrieving the resulting color. Due to the required time taken for measurement it was decided to calibrate for colorless luminosities only in order to enable JND calibration.
Using this interface, a java application was developed that sampled the 256 by 256 (8-bit front panel, 8-bit
back panel) calibration space using recursive refinement. The algorithm first measures the most extreme
points of calibration, namely (0, 0), (255, 0), (0, 255) and (255, 255). After this first iteration the sampling
space is subdivided on both axes and measurements are taken at (0, 127), (127, 0), (127, 127), (127, 255)
and (255, 127). The further iterations continue with the subdivision in the same way.
In order to compensate for the increasing temperature of the backlight causing the brightness to drop
over time, each measurement was preceded by a measurement of the current peak white luminance. This
allowed the measured intensities to be represented relative to the maximum instead of an absolute value
and therefore reduced the effect of changing backlight intensities.
5.3
Crosstalk Calibration
Multiplexed stereoscopic display setups have an additional parameter that can be measured and for which
compensation techniques can be calibrated for: Crosstalk.
Crosstalk is caused by the fact that multiplexed stereo setups have a component interleaving the signal (e.g.
the display) and a component separating the interleaved channel into two distinct channels (the glasses).
Ghosting is the visual result if this multiplexing or de-multiplexing step does not perform accurately. The
term ghosting stems from the semitransparent appearance of the same object to the left and the right of
the object, depending on the distance and therefore disparity of the pixels at a given depth. An illustration
of what ghosting appears like can be found in figure 2.2.
- 36 -
Chapter 5 - Calibration
If the amount of crosstalk for a given pixel is known, a compensation technique can be employed to
reduce this effect. A few common crosstalk compensation techniques are discussed in the related work.
There are two categories of crosstalk: system crosstalk and perceived crosstalk. System crosstalk is the
amount of physical light that unintendedly leaks from one channel to the other. The amount of system
crosstalk can be measured and depends on the intensity values of the pixels at the same physical location,
but is usually independent of the image content in general. Perceived crosstalk on the other hand is the
amount of crosstalk that a user actually perceives and therefore heavily depends on the image content. In
a high frequency image, for instance, a ghosted contour might not be as visible as the ghosted contour in
an otherwise low frequent image. Due to this dependency on image content, it is usually not attempted to
calibrate for perceived crosstalk in general.
The physical crosstalk of a stereo display system can be measured by capturing an image through the
shutter glasses. For every combination of the left and the right intensity value, the image for calibration
requires two samples: One containing the intended intensity on both eye channels, representing the state
without ghosting, and one containing the intended intensity on one channel and the unintended on the
other channel. An example of what such an calibration pattern could look like is given in figure 5.4.
c)
a)
b)
d)
Figure 5.4: Image for the left eye, image for the right eye and the reference map: a) denotes the area
with an intended absolute black and without ghosting, b) denotes the area with absolute white (no
ghosting), c) denotes the area of an intended black ghosted with pure white and d) the area with
intended black ghosted with pure black. The ghosted areas of course swap roles if the calibration image
is taken through the opposite eye.
As an example, if the crosstalk of the display at the maximum inter-ocular contrast (one eye white, the
other one black) is supposed to be measured, four color samples need to be presented on the display,
captured by the camera and retrieved from the captured image: the white value without any crosstalk, the
white value with crosstalk, the black value without any crosstalk and the black value with crosstalk. The
intensity ratio between the sample with ghosting and the one without ghosting then denotes the amount of
system crosstalk produced by the pair of intensities. The formulas for calculating the amount of crosstalk
in percent for the white and black intended intensities is given in figure 5.5.
Due to the high amount of required samples to fully calibrate for crosstalk, only the crosstalk values for
the contrast combinations used in the user study have been measured using this approach.
- 37 -
Chapter 5 - Calibration
CTBI =
Ic
Id
,CTW I = 1 −
Ib − Ia
Ib − Ia
Figure 5.5: Using the calibration chart presented in Figure 5.4 the amount of crosstalk can be
calculated independently for an intended white as well as an intended black image. Ia,b,c,d denote the
intensities measured at area a, b, c and d in the pattern, CTBI defines the crosstalk for the black intended
intensity, whereas CTW I denotes the amount of crosstalk for the white intended intensity.
5.4
Just-Noticeable-Difference Mapping
Even though a display might be able to display a great amount of grey levels, not all of them may be
distinguishable. In medical display scenarios it is important for the observer to be able to distinguish
between two grey levels, even if they are close to each other in the tonal space. The rendering algorithm
should therefore map every single grey level to a level that is distinguishable from all other grey levels in
the same image.
The minimal brightness step required to distinguish one brightness level from the following is called Just
Noticeable Difference (JND) and is described as a function of Weber’s Law 5.6. The size of the just
noticeable difference step therefore grows with an increasing absolute value.
∆I
=k
I
Figure 5.6: Weber’s Law.
∆I
I
denotes the Weber (also known as Fechner) fraction. The law states that the
incremental threshold step over a background intensity relates in a linear way to the background
intensity.
In order to present a JND mapped image, every distinct grey level in the image is assigned to a JND step
from the calibration table. If there are more distinct grey levels in the source image than the display is
capable of presenting, some levels have to be assigned to the same JND step. This causes some JND steps
to become indistinguishable from each other.
Producing a just noticeable differences (JND) table from calibration data of a display is a matter of finding
a sequence of intensities where every subsequent intensity produces a luminance that is at least the size
of the JND step greater than the luminance produced by the previous luminance.
In the case of a double modulator display, there might be multiple combinations producing the same or
similar luminosity values. In such cases, the JND calibration involves choosing among multiple combinations of front and back panel drive levels. The criteria that has to be met is that one JND step to the
next must be of at least the size of the JND step at the given intensity. Choosing a value at a given step
changes the possible combinations at the next step. Therefore, there are multiple paths through the two
dimensional array of luminances which obey the rules of JND calibration, but the amount of possible
JND steps can vary greatly.
Bimber et al [5] present an algorithm that optimizes the path through the two dimensional array. It
- 38 -
Chapter 5 - Calibration
works by sampling multiple curves of the form y = xσ – referred to as the basis function – through the
two dimensional array. The basis function is chosen, such that all possible values can be reached by
optimizing for a single parameter (σ ). ”For each theoretically possible JND step ( j) with luminance L j
we choose a set (C j ) of gray scale candidates (c ∈ C j ) that leads to reproducible luminance levels (Lc )
larger than or equal to L j , and whose shortest (x, y)-distance (∆c ) to our basis function is not larger than a
predefined maximum (∆). From each C j , we select the candidate s j ∈ C j that is closest to L j .” [5].
The algorithm used for this thesis is the same but the constraint of having front and back panel values
close to the basis function is relaxed as it can only be beneficial if the image should remain apprehend able
without the second modulator. In Bimber et al’s work one of the modulators is a printout and the second
modulation is produced by a projector tracking the printout and aligning the projected image with the
printout. In that case, the proximity of the front and back values compensates for a slight misalignment
of the tracking algorithm. For this prototype as the lcd panels are tightly stacked on top of each other the
amount of misalignment is minimal and constant. Visual banding effects, such as described in [5] are not
apparent.
5.5
Results
The results of the display calibration lead to a maximum contrast of 2400 to 1. The peak luminance is
at 300 cd/m2 , but quickly declines down to about 240 cd/m2 when the backlight starts heating up. The
contrast decreases down to about 1900 to 1, which still corresponds to about three times the contrast of a
single original panel. Using the Just Noticeable Difference algorithm presented above the display allows
displaying 446 JND steps, about one third more when compared to the original display.
Figure 5.7 shows the resulting paths in the two dimensional array of display drive levels. The luminance
is color coded. The color red is associated with the maximum luminance of 296.82 cd/m2 and dark
blue with the lowest luminance of 0.148cd/m2 . The different paths correspond to the curves determined
by the algorithm presented by [5] using different curve hulls of 0.025, 0.05 and 0.5. The curve hull
(∆) value limits the maximum Cartesian distance between the front and back panel drive values. The
greater this limit is, the more candidates can be chosen from at each JND step allowing for more potential
combinations of front and back panel drive values. Accordingly, the path with the largest delta can
recreate 446 of 463 possible JND steps within that range and was accepted since no side effects from the
higher delta could be observed.
There are multiple reasons why the contrast of the display is not as high as originally anticipated. First,
the manually reapplied polarization filters are not as well aligned and of such good quality as the original
polarization filters on the LCD panel. If this approach was applied using the same process that was
employed to construct the original panel, the reached contrast would be substantially higher.
Secondly, the display was not constructed in a cleanroom environment. There is always some amount of
dust destroying the polarization of the light between the layers. This could also be avoided if the model
was built in a production environment similar to that of the original panel.
- 39 -
Chapter 5 - Calibration
cd/m^2
300
250
250
back panel drive level (8 bit)
200
200
150
150
100
100
50
50
0
0
0
50
100
150
200
250
front panel drive level (8 bit)
Figure 5.7: The front / back panel combinations chosen by the JND calibration algorithm presented in
[5]. The value on the x - axis is associated with the display drive level for the front panel, the value on
the y - axis corresponds to the value of the backpanel. The color-coded value corresponds to the
intensity of the front-panel in cd/m2 . The blue, red and green paths correspond to hull values of 0.025,
0.05 and 0.5.
Another reason stems from the alignment of the panels. Rows of pixels within a LCD panel have some
amount of spacing in between them. The transmissivity of this area does not change with the values set
for the pixels but completely blocks the light. The area of the front panel does not receive any light from
the corresponding area in the back panel. If the pixels are all perfectly aligned and the modulators would
be absolutely flat, this would not be a problem as the light would pass through the pixel completely and
not pass from a pixel area to a blocked area and vice versa.
In the prototype the two layers are placed as closely to each other as possible, but the physical thickness
of the panels and especially the polarization foil causes a pixel to have a depth of some millimeters. If
the observer is positioned perpendicular to the display surface, the light passes through the pixel perfectly.
The problem is that the observer is smaller than the display surface, so every pixel but the one that the
observer is perfectly in front of will be viewed from an angle. The effect is even worse considering the
horizontal case.
Color in LCD panels is achieved by having separate sub pixels filtered by either a red, green or blue color
- 40 -
Chapter 5 - Calibration
filter. The sub pixel elements are aligned next to each other and are small enough to be indistinguishable
by the human vision system. The three separated image channels are merged together in an additive color
mixing fashion, allowing colors other than red green or blue to be presented.
The shape of the sub pixels are rectangles aligned next to each other with the height being three times as
high as the sub pixel is wide. This causes the three sub pixels to form a square when put next to each
other. When placing two LCD panels on top of each other as described before, the alignments of the sub
pixels only match in the perpendicular case. At all other angles, light passes from a sub pixel element to
a sub pixel element filtered by a different color filter. The result of this is that the initially white light is
filtered by two different color filters absorbing most of the spectrum. The problem is explained visually
in figure 5.8.
(few) light rays are terminated due to passing
two filters of different wavelength
Many light rays are blocked due to the
greater spacing between the color filters
TN Layer
Polarization Foil
Air
TN Layer
Polarization Foil
Figure 5.8: The influence of space between the color filters of the back and the front panel. The left
image indicates the ideal situation, in which separation is minimal, causing few light rays to be
terminated due to passing through color filters of different wavelength. The right image explains the
current situation causing a sinusoidal grating to be visible, depending on the viewing position, if not
compensated by a front diffuser.
- 41 -
Chapter 6 - User Study
6
U SER S TUDY
Related studies have shown that crosstalk leads to discomfort, headaches and stereo sickness ([25], [44]).
It has also been shown that stereo pairs with a higher disparity are more difficult to fuse into a single image,
due to the fact that the accommodation-vergence mismatch becomes greater with greater disparities. If
the disparity is beyond that of the Percival’s zone of comfort, and therefore if the coupling of vergence
and accommodation provides inconsistent signals, the viewer will not fuse the images without discomfort
[2].
Crosstalk on the other hand is known to have a high impact on stereo perception, and ghosting becomes
visible and disturbing even for very low amounts of leakage between the two image channels. What –
to our knowledge – is missing is the correlation between all of those factors. How does contrast relate
to crosstalk? Is a higher contrast even really beneficial or is the technical effort required better invested
into reducing the amount of crosstalk? Does artificially added crosstalk influence crossed and uncrossed
stereoscopic viewing in the same way?
Most user studies concerning stereoscopic displays involve questionnaires trying to estimate the discomfort a user feels e.g. while watching a movie over a longer period of time. One goal for this user study
was to retrieve objective measurements from the user instead of subjective ones. This is important as
there are other factors involved in subjective measurements than in objective ones, such the motivation
of the participant. Further, it was utterly tried to keep the parameter space as compact as possible to not
complicate the matter further and leave less room for interpretation.
As the contrast levels produced by the high dynamic range stereo display are not as high as initially
anticipated, the user study is not able to give definite answers onto how contrast, crosstalk, disparity and
the kind of stereo image (crossed / uncrossed) relate to each other, but should at least give some indication
on the impact of these various factors affecting each other.
One of the key aspects of stereoscopic viewing is stereopsis. Stereopsis matches features in both monocular eyes and works even if all other depth cues other than features offset by disparity are missing. It was
assumed that – as detecting features in both images is required – contrast plays a major role in stereopsis
as previous related work has shown [23] that – albeit at very low contrast ratios – stereo acuity depends on
contrast. A well-known approach to presenting depth images that have no other depth cues than stereopsis
contained within them, are Random Dot Stereograms (RDS).
6.1
Constructing Random Dot Stereograms
First presented by Béla Julez in 1919, Random Dot Stereograms have been employed in many user studies
and have also found their way into books titled by the name ”The magic eye”. Variations including color
and even animated Random Dot Stereograms are possible.
A RDS is usually created in two steps: First, the left image is filled with random black and white dots.
Then, a copy of that image is created for the right image. In this right image, the area to be offset in depth
- 42 -
Chapter 6 - User Study
is shifted by the desired amount of disparity. Finally, the space between the original area and the shifted
area is filled with new random dots in order to avoid having the same random dot patterns twice. The
process is explained visually in figure 6.1
Figure 6.1: Process of constructing a Random Dot Stereogram. First, both images are filled with the
same pattern. Then, the region(s) that should have disparity are displaced and the resulting holes are
finally filled by new random dot patterns.
A single RDS can have multiple areas placed at different depths, but it is important to note that the depth
of the surface has to be constant across the surface. This means that only flat surfaces, perpendicular to
the viewing axis, can be represented effectively within RDSs.
The approach of constructing a RDS presented here is the most basic one, and more advanced algorithms
that accept a depth image as input and deal with texture patterns to construct more or less visually appealing stereograms in real-time on a GPU, exist [49].
6.2
Test Patterns and Parameter Discussion
Initially, it was planned to confront the user with a task that involved detecting multiple square surfaces
and determine whether they are in front or behind of a reference surface. While the output of the test is
a value with a high resolution (the number of squares detected correctly), it also has many parameters
influencing the result: the size of the individual squares, their position on the screen as well as possible
relations to each other. This kind of test pattern also allows comparing multiple quads to each other and
therefore allows a user to detect a square in front of the screen. This is possible because it is different
from other squares instead of being able to detect it correctly, being able to correctly fuse the image.
The task would involve other cognitive subtasks such a counting. It was therefore decided to reduce
the recognition task down to detecting a single quadratic surface and generate the required precision by
repeating the task multiple times. This allowed narrowing down the parameter space to the settings of
interest alone, as the other parameters, such as the size and position of the square are fixed and remain the
same for all tasks that the user is confronted with. After the presentation of each single stimulus image,
the participant decides on whether the square was in front of or behind the reference surface by pressing
- 43 -
Chapter 6 - User Study
the up key if the surfaces appeared further away and the down key if it appeared closer than the reference
surface, which was always at a single pixel disparity. This single pixel disparity for the reference surface
is necessary to force the display to switch pixels without which the contrast ratios of the quadratic surface
and the reference plane would differ.
The stimulus was presented for a second and in between the stimuli, a cues-consistent cross with no
disparity was presented as a fixation stimulus, allowing the participants eyes to converge at the image
plane. The fixation stimulus was presented until the user answered the previous task with a key press and
pressed the space bar to trigger the next stimuli. This allowed users to take a break if needed, but only
one of the participants requested a short break of about 5 minutes.
A session for one participant consisted of 750 repetitions of the same task with the display parameters
varying in three dimensions: contrast, crosstalk and disparity.
All pixel values for the front and back panel used for presentation were manually selected in order to
achieve uniform display brightness across the different contrast, crosstalk and disparity levels. The actual
amount of crosstalk and contrast was later measured (using the same methods as described in section
5) and therefore lead to the graphs having a non-orthogonal area containing actual sample values. The
highest contrast level was measured at 600 to 1, with peak luminance reaching an estimated 70cd/m2
when observed through the shutter glasses.
The contrast level ranged from 0.1 percent of the maximum contrast (about 2:1) up to the maximal contrast
in five discrete steps at 0.1, 1.0, 10.0, 50.0 and 100.0 percent. The 50.0 percent step was introduced as
closer accuracy, as the higher contrast levels was of interest. The actual contrast levels varied due to the
influence of the crosstalk parameter. Measurements of the chosen contrast levels revealed contrast values
of 5 : 1, 18 : 1, 61 : 1, 130 : 1, 540 : 1 when averaged over all crosstalk levels.
The amount of crosstalk is a parameter that is rather unreliable, as the actual crosstalk caused by the
display system is not known and varies with the contrast that the display is required to display. The
crosstalk dimension was sampled at five different locations and included crosstalk levels that are way
above what any user would accept when viewing a stereoscopic image. Figure 6.2 depicts how the
crosstalk steps (labeled 0.0, 0.1, through to 0.4) correspond to the actual measured crosstalk using the
approach described in the calibration chapter (chapter 5).
The percentage in the graph indicates the relative amount of luminance that leaks from one eye channel
to the other. A percentage of 100 % indicates that the pixel intensity of the unintended pixel luminance is
the same as the intended pixel luminance and therefore no image separation is taking place. A percentage
of 0 % on the other hand, corresponds to perfect image channel separation and that only the intended
luminance is visible to the destined eye.
The disparity was varied from 5 pixels to 45 pixels in 10 pixel step increments, which corresponded to
a disparity angle ranging from 0,1 to 0,9 degrees. The participants’ heads where not fixated but had an
approximate viewing distance of about 70 to 90 centimeters, depending on the viewers preferred pose
while sitting in front of the image plane. The disparity was actually varied in two directions, once for
crossed and once for uncrossed stimuli. The disparity steps and ranges were kept the same for both kinds
- 44 -
Chapter 6 - User Study
0.6
●
0.4
●
0.2
●
0.0
measured crosstalk factor (% / 100)
0.8
●
●
0
0.1
0.2
0.3
0.4
labeled crosstalk step
Figure 6.2: The actual crosstalk measured depends on the contrast. The plot describes the ranges of
crosstalk that where measured for various contrast settings at a given crosstalk label.
of stimuli.
The visual acuity of the random dot stereograms was at about 24 CPD and therefore well resolvable by
the human vision system.
6.3
User Study Environment and Participants
The user study itself was setup at the Virtual Reality Center of the Johannes Kepler University Linz, as the
room had no windows and therefore guaranteed to have consistent lighting conditions independent of the
time of day. The participant and the screen were enclosed in a cabin covered by light absorbing blankets,
further reducing the amount of ambient light that could possible interfere with the users perception. The
lighting in the room was reduced to a bare minimum, but an emergency exit light – which was not directly
visible from the test setup – had to remain on.
The user study was performed by 44 participants in the age between 24 and 59. 19 of them were female
and 25 were male. All of the participants had normal or corrected to normal vision and, if required, wore
either contact lenses or glasses along with the shutter glasses.
On request the varying parameters were explained to the user after the experiment was conducted, in
order to not influence the result.
- 45 -
Chapter 6 - User Study
6.4
Results
Due to limiting the number of varying variables to four (contrast, crosstalk, disparity and crossed/uncrossed),
the output of the user study is a five dimensional data structure with one dependent and four independent
variables. The number of dimensions inhibits a visualization of all dimensions within one image. Due to
possible errors of interpretation and lack of navigation possibility in static display environments of three
dimensional perspective images, two dimensional color maps are used here. The color coded intensity
value represents the error rate.
While observing the data, all graphs where aligned in a grid structure, such that two dimensions where
represented along the sides of the grid as well as two dimensions were mapped within one graph. The observer was then able to increase by a step size on the outer dimension by switching to another graph, either
on the horizontal or on the vertical axis, depending on which parameter should be increased. Additionally
animations of the graphs were used to employ temporal vision to find trends within the data.
Every combination of contrast, crosstalk, disparity and kind (crossed/uncrossed) stimuli setting has been
observed 3 times by every user. Therefore, at the lowest level of detail, where every parameter is set to a
specific value, the percentual step size is
crosstalk = 0.1
crosstalk = 0.2
0.5
0.8
0.6
0.6
0.4
0.2
0.2
0.2
100
300
400
500
100
crosstalk = 0.1
0.2
0.2
0.2
0.4
100
400
500
400
500
600
0.6
0.2
0.4
0.1
0.1
0.1
0.0
0.0
200
200
0.6
0.6
0.3
0.2
0.4
0.4
500
400
400
500
500
600
600
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0.2
0.2
0.0
400
contrast
300
0.8
0.8
0.1
300
0.2
0.2
crosstalk
crosstalk =
= 0.4
0.4
0.2
200
0.2
100
100
0.4
0.3
100
0.3
0.3
contrast
contrast
0.8
0.0
300
contrast
0.3
crosstalk = 0.3
0.1
200
300
0.5
0.2
0.0
400
0.4
0.4
0.0
200
disparity
0.6
0.1
300
100
0.4
0.3
0.4
contrast
disparity
0.4
contrast
500
0.8
disparity
disparity
0.4
200
400
crosstalk = 0.2
0.4
0.6
300
0.5
0.5
0.2
0.2
0.0
200
0.5
0.8
0.6
0.4
0.4
contrast
0.5
0.8
0.2
0.1
0.0
200
0.3
0.2
contrast
crosstalk = 0
100
0.4
0.1
0.0
400
0.2
0.2
contrast
0.6
0.6
disparity
0.4
0.1
300
0.3
0.5
0.8
0.8
0.4
0.6
disparity
0.3
disparity
disparity
0.4
200
0.8
0.4
0.6
crosstalk
crosstalk =
= 0.4
0.4
0.5
0.8
0.4
100
crosstalk = 0.3
0.5
disparity
disparity
0.8
giving a resolution of 0.75 percent.
disparity
disparity
crosstalk = 0
1
132
100
0.0
200
300
400
contrast
500
600
100
100
0.0
0.0
200
200
300
400
400
500
500
600
600
contrast
contrast
Figure 6.3: Plots containing the results with increasing crosstalk along the images. The two rows
represent results seperated into stimuli with crossed (top row) and uncrossed (bottom row) disparity. The
axis of the images denote the measured contrast as well as the disparity used. The color coded value
designates the rate at which the participants answered incorrectly, which is normalized from 0 to 50 %
(red being equal to 50 % incorrect answers).
Quite a few pieces can be derived by visually examining the gathered data shown in figure 6.3: Firstly,
as expected and well known in the literature (such as in [2]), the amount of errors increases with increasing disparity. This is due to the fact that the vergence-accommodation mismatch becomes greater with
- 46 -
Chapter 6 - User Study
increased disparity.
Secondly, it is clear that increasing crosstalk also increases the error rate due to disturbing artifacts. It is
known to literature that stereopsis works even with a lot of crosstalk disturbing image perception. Yet, it
is astonishing how much crosstalk is actually tolerated before binocular fusion is given up and diplopia
is accepted. Of course the amount of added crosstalk is well beyond that of what any display engineer
would accept. The amount of visible ghosting at the higher sampling steps in the crosstalk dimension
would also not be tolerable when watching a movie. Rather, the argument for including this dimension
in the user study is to find out what kind of effect such disturbance has on stereopsis and possibly what
causes stereopsis to be so stable.
Another very interesting conclusion from the data gathered is that the error rate does not seem to improve
with increasing contrast once it has reached a contrast level of about 110 : 1. This value was determined
by visual examination of the graphs displayed. The boxplot in Figure 6.4 with data aggregated over all
test settings reveals that this level might be even closer to 60 : 1 for low-crosstalk settings. It is also
interesting that this holds for different levels of crosstalk as well. While the exact neurophysical reasons
causing this is unknown to the author, a reasonable explanation would be that the cells detecting borders
in the human vision saturate at this given contrast level for binocular cells.
Applying the t-test to the values aggregated as groups by contrast only reveals a significant increase with
a confidence interval of 95 % between the lowest (5 : 1) and the second lowest (18 : 1) contrast level, the
null-hypothesis can be rejected due to the p value being well below 0.05. The t-test fails to reject the null
hypothesis for contrast levels higher than the lowest value with a p value greater than 0.05.
Up to which value stereo acuity actually improves cannot be determined by this user study due to lacking
accuracy and variance in the measurements. Possible improvements on how this could be measured with
more confidence are described in section 7.
As described before, an improvement in stereo acuity with rising contrast can be derived from the lowest
contrast level to the second lowest level only. This improvement does not hold statistically for the higher
levels. These results can have to possible interpretations: Either stereo acuity does not improve with a
contrast level higher than about 110 to 1, or the stimuli used in the user study are too easy to resolve and
the false answers in the results at higher contrast levels actually stems from user input errors. It should
be noted that the user study was performed in hope of finding decreased stereo acuity. While visual hints
might reveal indications of decreasing stereo acuity, the statistical test do not reveal any signs towards
such a trend.
When separating the results into crossed and uncrossed stimuli, another interesting effect becomes apparent: It seems as if crosstalk has a higher influence on uncrossed stimuli than on crossed stimuli. Without
artificial crosstalk added, the graphs depicting the results from the uncrossed stimuli reveal that fewer
errors have been made in the uncrossed case compared to the crossed scenario. As explained previously,
this makes sense as perspective causes a greater perceived depth for the same amount of disparity. When
adding artificial crosstalk to both scenarios, the graphs reveal that this has a higher impact on the uncrossed scenario whereas the crossed results remain nearly the same.
- 47 -
90
100
Chapter 6 - User Study
80
●
●
●
70
●
60
% correct
●
40
50
●
5:1
18:1
61:1
130:1
540:1
contrast
Figure 6.4: Boxplot showing the increasing amount of correct answers when contrast is increased. The
values where evaluated over all crosstalk, disparity and contrast settings. While the plots in Figure 6.3
show that there isn’t any real improvement beyond a contrast level of 110:1 this contrast ratio may be
even lower considering the aggregated results shown in this plot.
The only difference between the crossed and uncrossed scenarios is the sign of the disparity and therefore
any influence of contrast or crosstalk is the same for both tests. A possible explanation is that the difficulty
of the test pattern becomes too low for higher contrast ratios, such that the crossed patterns do not suffer
any further. This though can be argued for all tests performed in the user study.
- 48 -
Chapter 7 - Summary and Future Work
7
S UMMARY AND F UTURE W ORK
In this thesis work on high dynamic range stereo perception was presented. The goal of this thesis was to
find an indication on whether higher contrast in stereoscopic image pairs could have a negative impact on
the stereo acuity rather than to build the perfect display. It should be considered as an attempt to build a
display in order to answer one essential question: Does a contrast ratio beyond 110 : 1 really help when
viewing stereoscopic content?
Unfortunately, a definitive answer cannot be derived from this work, but indicators reveal that display
development may have very well reached the limit of what stereopsis in the human vision system can
benefit from within a single image.
The user study in this thesis has shown that stereo acuity does not improve, but not that stereo acuity
could actually decrease with further increased contrast. A third simple prototype was developed using
transparencies with higher contrast values. This prototype used the backlight from the second prototype
and provided a higher peak luminance. Even with the higher contrast values it was still possible to fuse
the stereo pair and therefore also did not reveal the intended effect. Nonetheless, light leaking around the
transparency seemed to make fusion more difficult and covering the light leak made this task seemingly
easier.
Even if this might be an indication, the simultaneous contrast required is still orders of magnitude from
the current state of the art and it is discussable whether any real impact would be caused in real world
scenarios. In such cases, the effect of decreasing stereo acuity may even be intended as the ultimate goal
of display designers is to reproduce scenes from reality as accurately as possible.
Does that mean display research is done yet? It is safe to say that this is definitely not the case since this
work only covers stereopsis, a small part of depth perception. Understanding the full effects of a higher
dynamic range on depth perception with the influence on gradients and texture is well beyond this work
but definitely of interesting nature as it is still not fully understood in what way the individual parts of the
human vision system work together to perceive depth.
The maximum contrast that stereopsis can benefit of could be used to improve disparity mapping algorithms further. Disparity mapping tries to either extract depth information from a single image and to
create a stereo pair exhibiting disparity, or tries to remap a stereoscopic pair into the visual comfort zone
of the observer. While even some of the latest disparity mapping algorithms (such as presented in [33]) do
not directly incorporate contrast information, it could prove beneficial in future disparity mapping algorithms, since more detailed information on how the human vision system responds to stereoscopic content
at a given contrast could influence the way maximum disparity values are chosen in such algorithms. Of
course there is still a lot more to accomplish in understanding the contrast limits for stereopsis in order to
incorporate such work and this thesis can only represent a small step towards such an understanding.
- 49 -
Chapter 7 - Summary and Future Work
7.1
Future Work
For scientific studies there also exists a better approach to constructing a display capable of producing an
image that is of high dynamic range quality as well as stereoscopic: using a classic haploscope approach.
This type of stereo setup has one fundamental advantage: It does not suffer from any kind of crosstalk
as the eye channels are kept completely separate from each other using two mirror pathways towards the
separate displays.
These two, non-stereoscopic displays can then be built using familiar HDR display setups, such as the
one presented by [54] using dimmable LEDs. At the same time, the fundamental setup of displays is
changing as well: Organic LEDs are very promising in delivering even higher static contrast ratios. At
least for vision research on the limits of contrast, the author would suspect that building a high dynamic
range, stereoscopic display based on two standard OLED displays and haploscope is feasible.
Since the contrast level at increased contrast does not improve stereo acuity is surprisingly quite low, a
similar user study could be performed by building a haploscope using two high quality calibrated displays.
Many issues described in this thesis could be avoided and with more accurate calibration more confidence
could be gained.
Especially contrast and crosstalk could be sampled more accurately because a haploscope setup has, by
design, no crosstalk even at high contrast ratios. On the other hand, since haploscope displays are not a
technology designed for the consumer market, all other – possibly unknown – effects involved with time
and space multiplexing are ignored rendering any research less representative.
Crosstalk is a relevant factor especially with auto stereoscopic displays, where multiple viewing zones are
presented two the user solely by the display itself by intelligently directing luminance. Since most autostereoscopic displays do not adjust light direction depending on the users viewing position, the viewer
is not always in the perfect viewing position and will therefore suffer from crosstalk and ghosting from
neighboring viewing zones.
7.2
A Better Time-Multiplexed Double Modulation Display
What would the perfect double modulation display look like if all components were under full control
of the author? The main problem causing the lower than expected contrast results are the two redundant
color filters, causing the disturbing light variance effect described in section 4.5. The perfect back panel
would therefore not feature a color filter. This would cause the back panel to become achromatic but of
higher resolution which in turn could be used to control light on a sub pixel basis.
The second major improvement would be that the layers should be glued together in the same way the
polarization foil is applied to the spatial light modulator in any panel. All of this would of course happen
in a clean-room environment just as is the case for any display production.
The LED backlight could also be improved by using more LEDs of a lower power level. The brightness
level would be kept the same or even improved but cooling would become easier and the brightness
- 50 -
Chapter 7 - Summary and Future Work
would be spread among more LED sources. This in turn would allow for a more transparent diffuser,
again improving peak luminance.
The TN panels used in the prototype employ Frame Rate Control to increase tonal resolution. Since the
tonal resolution is increased by the double modulation already, FRC could be removed as well, causing
less interference with calibration and fewer artifacts.
Would the display still be based on two spatial light modulators using polarization? Probably yes, as
the resolution aspect is of major interest. Using dimmable LEDs, even if hundreds of them, still causes
many pixels to be illuminated by the same backlight LED. This reduces the possible contrast resolution
down to the resolution of the LED grid. Synchronization again would become more of an issue as a
higher resolution LED grid would probably incur some kind of addressing scheme, involving pulse width
modulation and possibly other sources of temporal delay.
Finally, it should be noted that display technology has improved the last three years and especially OLED
technology seems to be a promising candidate for the future of high dynamic range stereoscopic displays.
It is quite possible that such technology will soon make the double modulation techniques used for high
dynamic range displays today appear similar to CRT technology of more than two decades ago.
- 51 -
References
R EFERENCES
[1] Barton L. Anderson. Stereovision: beyond disparity computations. Trends in Cognitive Sciences,
2(6):214 – 222, 1998. 12
[2] Martin S. Banks, Kurt Akeley, David M. Hoffman, and Ahna R. Girshick. Consequences of incorrect
focus cues in stereo displays. Information Displays, 7:10–14, 2008. 42, 46
[3] Peter G. J. Barten. Physical model for the contrast sensitivity of the human eye. Proceedings of
SPIE, 1666(1):57–72, 1992. 8
[4] Peter G. J. Barten. Spatiotemporal model for the contrast sensitivity of the human eye and its
temporal aspects. Proceedings of SPIE, 1913(1):2–14, 1993. 8
[5] Oliver Bimber and Daisuke Iwai. Superimposing dynamic range. In ACM SIGGRAPH Asia 2008
papers, SIGGRAPH Asia ’08, pages 150:1–150:8, New York, NY, USA, 2008. ACM. v, 7, 38, 39,
40
[6] Buckthought. A matched comparison of binocular rivalry and depth perception with fmri. Journal
of Vision, 11:1–15, 2011. 20
[7] COGSCI. Online Gabor Patch generator, 2012 (Last accessed May 18, 2012). iii, 19
[8] Scott Daly. The visible differences predictor: an algorithm for the assessment of image fidelity.
Human Vision, Visual Processing, and Digital Display, pages 179–206, 1993. 8
[9] Scott Daly and Xiaofan Feng. Bit-depth extension: Overcoming lcd-driver limitations by using
models of the equivalent input noise of the visual system. Journal of the Society for Information
Display, 13(1):51–66, 2005. 3
[10] G. C. DeAngelis, I. Ohzawa, and R. D. Freeman. Spatiotemporal organization of simple-cell receptive fields in the cat’s striate cortex. i. general characteristics and postnatal development. Journal of
Neurophysiology, 69(4):1091–1117, 1993. 14, 19
[11] Gregory C. DeAngelis. Seeing in three dimensions: The neurophysiology of stereopsis. Trends in
Cognitive Sciences, 4(3), March 2000. 19, 20
[12] Paul E. Debevec and Jitendra Malik. Recovering high dynamic range radiance maps from photographs. In ACM SIGGRAPH 2008 classes, SIGGRAPH ’08, pages 31:1–31:10, New York, NY,
USA, 2008. ACM. 9
[13] Neil A. Dodgson. Variation and extrema of human interpupillary distance. Proceedings of SPIE
Sterescopic Displays and Virtual Reality Systems XI, 5291:36–46, 2004. 19
[14] Dolby. Dolby shows latest hdr display prototype developed in collaboration with sim2. 7
[15] F. Drago, K. Myszkowski, T. Annen, and N. Chiba. Adaptive logarithmic mapping for displaying
high contrast scenes. Computer Graphics Forum, 22:419–426, 2003. 9
- 52 -
References
[16] Frédo Durand and Julie Dorsey. Fast bilateral filtering for the display of high-dynamic-range images.
In Proceedings of the 29th annual conference on Computer graphics and interactive techniques,
SIGGRAPH ’02, pages 257–266, New York, NY, USA, 2002. ACM. 9
[17] Raanan Fattal, Dani Lischinski, and Michael Werman. Gradient domain high dynamic range compression. In Proceedings of the 29th annual conference on Computer graphics and interactive
techniques, SIGGRAPH ’02, pages 249–256, New York, NY, USA, 2002. ACM. 9
[18] James A. Ferwerda, Sumanta N. Pattanaik, Peter Shirley, and Donald P. Greenberg. A model of
visual adaptation for realistic image synthesis. In SIGGRAPH ’96: Proceedings of the 23rd annual
conference on Computer graphics and interactive techniques, pages 249–258, New York, NY, USA,
1996. ACM. 8
[19] D. J. Field and D. J. Tolhurst. The Structure and Symmetry of Simple-Cell Receptive-Field Profiles
in the Cat’s Visual Cortex. Royal Society of London Proceedings Series B, 228:379–400, September
1986. 19
[20] David Finlay, Peter C. Dodwell, and Terry Caelli. The waggon-wheel effect. Perception, 13(3):237–
248, 1984. 3
[21] Graeme Gill. Argyll color management system. http://http://www.argyllcms.com//.
36
[22] Gabriele Guarnieri, Luigi Albani, and Giovanni Ramponi. Image-splitting techniques for a duallayer high dynamic range lcd display. Journal of Electronic Imaging, 17(4):043009, 2008. 7, 31
[23] D Lynn HalpernTF and Randolph R Blake. How contrast affects stereoacuity. Perception, 17:483–
495, 1988. 4, 42
[24] Selig Hecht and Simon Schlaer. Intermittent stimulation by light v. the relation between intensity and
critical frequency for different parts of the spectrum. The Journal of General Physiology, 19(6):965–
977, 1936. 3
[25] David M. Hoffman, Ahna R. Girshick, Kurt Akeley, and Martin S. Banks. Vergence-accommodation
conflicts hinder visual performance and causevisual fatigue. In Journal of Vision, pages 1–30, 2008.
iii, 12, 13, 42
[26] Industrial Light and Magic. Openexr is a high dynamic-range (hdr) image file format developed by
industrial light & magic for use in computer imaging applications. 10
[27] Garrett M. Johnson and Mark D. Fairchild. Rendering hdr images. In in IS&T/SID 11th Color
Imaging Conference, pages 36–41, 2003. 9
[28] J. P. Jones and L. A. Palmer. The two-dimensional spatial structure of simple receptive fields in cat
striate cortex. J Neurophysiol, 58(6):1187–1211, December 1987. 19
[29] Béla Julesz. Foundations of Cyclopean Perception. The University of Chicago Press, 1971. 18
- 53 -
References
[30] Florian Kainz and Rod Bogart. Technical introduction to openexr. Technical report, Industrial Light
& Magic, 2009. 10
[31] Janusz Konrad, Bertrand Lacotte, Senior Member, and Eric Dubois. Cancellation of image crosstalk
in time-sequential displays of stereoscopic video. In in IEEE Transactions on Image Processing,
pages 897–908, 2000. 11, 12
[32] Jiangtao Kuang, Hiroshi Yamaguchi, Changmeng Liu, Garrett M. Johnson, and Mark D. Fairchild.
Evaluating hdr rendering algorithms. ACM Trans. Appl. Percept., 4, July 2007. 9
[33] Manuel Lang, Alexander Hornung, Oliver Wang, Steven Poulakos, Aljoscha Smolic, and Markus
Gross. Nonlinear disparity mapping for stereoscopic 3d. ACM Trans. Graph., 29(3):10, 2010. 49
[34] G W Larson, H Rushmeier, and C Piatko. A visibility matching tone reproduction operator for high
dynamic range scenes. Technical Report LBNL-39882, Lawrence Berkeley Nat. Lab., Berkeley, CA,
Jan 1997. 9
[35] Patrick Ledda, Alan Chalmers, Tom Troscianko, and Helge Seetzen. Evaluation of tone mapping
operators using a high dynamic range display. In ACM SIGGRAPH 2005 Papers, SIGGRAPH ’05,
pages 640–648, New York, NY, USA, 2005. ACM. 9
[36] Patrick Ledda, Luis Paulo Santos, and Alan Chalmers. A local model of eye adaptation for high
dynamic range images. In Proceedings of the 3rd international conference on Computer graphics,
virtual reality, visualisation and interaction in Africa, AFRIGRAPH ’04, pages 151–160, New York,
NY, USA, 2004. ACM. 9
[37] Patrick Ledda, Greg Ward, and Alan Chalmers. A wide field, high dynamic range, stereographic
viewer. In GRAPHITE ’03: Proceedings of the 1st international conference on Computer graphics
and interactive techniques in Australasia and South East Asia, pages 237–244, New York, NY, USA,
2003. ACM. 6
[38] James S. Lipscomb and Wayne L. Wooten. Reducing crosstalk between stereoscopic views. Proceedings of SPIE, 2177:92, 1994. 11
[39] Rafał Mantiuk, Karol Myszkowski, and Hans-Peter Seidel. Visible difference predicator for high
dynamic range images. In Proceedings of IEEE International Conference on Systems, Man and
Cybernetics, pages 2763–2769, 2004. 8
[40] L.M.J. Meesters, W.A. IJsselsteijn, and P.J.H. Seuntiens. A survey of perceptual evaluations and requirements of three-dimensional tv. Circuits and Systems for Video Technology, IEEE Transactions
on, 14(3):381 – 391, march 2004. 8
[41] Jens Månsson. Stereovision: a model of human stereopsis. 1997. 13
[42] M. Ortiz-Gutiérrez, A. Olivares-Pérez, and V. Sánchez-Villicaña. Cellophane film as half wave
retarder of wide spectrum. Optical Materials, 17(3):395 – 400, 2001. 26
- 54 -
References
[43] Peter Ludwig Panum. Physiologische Untersuchungen Über Das Sehen Mit Zwei Augen. Schwersche Buchhandlung, Kiel, 1858. 18
[44] Siegmund Pastoor. Human factors of 3d imaging: Results of recent research at heinrich-hertzinstitute berlin. In Proceedings ASIA Display Conference, 1995. 12, 42
[45] Sumanta N. Pattanaik, James A. Ferwerda, Mark D. Fairchild, and Donald P. Greenberg. A multiscale model of adaptation and spatial vision for realistic image display. In SIGGRAPH ’98: Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pages
287–298, New York, NY, USA, 1998. ACM. 8
[46] Sumanta N. Pattanaik, James A. Ferwerda, Donald P. Greenberg, and Mark D. Fairchild. Multiscale
model of adaptation, spatial vision and color appearance. ITE Technical Report, 23(23):2, 1999. 8
[47] Sumanta N. Pattanaik, Jack Tumblin, Hector Yee, and Donald P. Greenberg. Time-dependent visual
adaptation for fast realistic image display. In SIGGRAPH ’00: Proceedings of the 27th annual
conference on Computer graphics and interactive techniques, pages 47–54, New York, NY, USA,
2000. ACM Press/Addison-Wesley Publishing Co. 9
[48] Yury Petrov. Higher-contrast is preferred to equal-contrast in stereo-matching. In Vision Research,
volume 44, pages 775–784, 2004. 13
[49] Fabio Policarpo. Real-Time Stereograms, chapter 41. Addison-Wesley, 2007. 43
[50] D. Purves, J. A. Paydarfar, and T. J. Andrews. The Wagon Wheel Illusion in Movies and Reality.
Proceedings of the National Academy of Science, 93:3693–3697, April 1996. 3
[51] Erik Reinhard, Michael Stark, Peter Shirley, and James Ferwerda. Photographic tone reproduction
for digital images. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, SIGGRAPH ’02, pages 267–276, New York, NY, USA, 2002. ACM. 9
[52] Mark A. Robertson, Sean Borman, and Robert L. Stevenson. Dynamic range improvement through
multiple exposures. In In Proc. of the Int. Conf. on Image Processing (ICIP’99, pages 159–163.
IEEE, 1999. 9, 10, 35
[53] J. F. Schouten. Subjective stroboscopy and a model of visual movement detectors. Cambridge MA:
MIT Press, 1967. 4
[54] Helge Seetzen, Wolfgang Heidrich, Wolfgang Stuerzlinger, Greg Ward, Lorne Whitehead, Matthew
Trentacoste, Abhijeet Ghosh, and Andrejs Vorozcovs. HighDynamic Range Display Systems. ACM
Trans. Graph., 23(3):760–768, 2004. iii, 6, 7, 8, 22, 23, 50
[55] Mel Siegel. Perceptions of crosstalk and the possibility of a zoneless autostereoscopic display.
In A. J. Woods, M. T. Bolas, J. O. Merritt, and S. A. Benton, editors, Society of Photo-Optical
Instrumentation Engineers (SPIE) Conference Series, volume 4297 of Presented at the Society of
Photo-Optical Instrumentation Engineers (SPIE) Conference, pages 34–41, June 2001. 12
- 55 -
References
[56] F. Smit, v. R. Liere, and B. Fröhlich. Non-uniform crosstalk reduction for dynamic scenes. In IEEE
Virtual Reality 2007, 2007. 12
[57] F. Smit, van R. Liere, and B. Fröhlich. Three extensions to subtractive crosstalk reduction. In
Eurographics EGVE, 2007. 12
[58] Jack Tumblin and Holly Rushmeier. Tone reproduction for realistic images. In Computer Graphics
and Applications, volume Nov., pages 42–48, 1993. 9
[59] Greg Ward and Maryann Simmons. Jpeg-hdr: a backwards-compatible, high dynamic range extension to jpeg. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Courses, page 3, New York, NY, USA,
2006. ACM. 9
- 56 -
Lebenslauf
Angaben zur Person
Name
Philipp Rylands Aumayr
Geschlecht
Männlich
Geburtsdatum
12. März 1985
Geburtsort
Linz
Adresse
Alte Hauptstraße 25, A-4072 Alkoven (Österreich)
Nationalität
Österreich, Vereinigte Staaten v. Amerika
Ausbildung
1991 - 1995
Volksschule: VS Alkoven
1995 - 1999
Unterstufe: Stiftsgymnasium Wilhering
1999 - 2003
Oberstufe: Stiftsgymnasium Wilhering
2003 - 2007
Bachelorstudium: Johannes Kepler Universität Linz - Informatik
seit Oktober 2007
Masterstudium: Johannes Kepler Universität Linz - Informatik
Berufserfahrung
März 2004 - August 2005
Softwareentwicklung bei SWA, Traun
September 2005 - Dezember 2007
Projektmitarbeiter, Institut f. Pervasive Computing, Johannes Kepler
Universität Linz
Februar 2008 - Mai 2008
Projektmitarbeiter (Spectacles), Research Studios Austria in Kollaboration m. Institut f. Pervasive Computing
Januar 2009 - März 2009
Projektmitarbeiter, Research Studios Austria
Juli 2009 - September 2009
Projektmitarbeiter, Research Studios Austria in Zusammenarbeit mit
sofware architects og
seit Oktober 2009
Mitarbeiter software architects gmbh
Wissenschaftliche Arbeiten
Juli 2007
Bachelorarbeit: Implementierung der Visualisierung eines Ambient
Awareness Displays (Prof. Ferscha)
Juni 2009
Christoph Anthes, et al, Space Trash - Development of a Networked
Immersive Virtual Reality Installation, Technical Report at Institut
of Graphics and Parallel Processing, Johannes Kepler University
Linz, June 2009
16th July 2012
Eidesstattliche Erklärung
Ich erkläre an Eides statt, dass ich die vorliegende Masterarbeit selbstständig und ohne fremde Hilfe
verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die wörtlich oder
sinngemäß entnommenen Stellen als solche kenntlich gemacht habe. Die vorliegende Masterarbeit ist
mit dem elektronisch übermittelten Textdokument identisch.
Philipp Aumayr, Linz am 16. Juli 2012
Download