Permission I give permission for public access to my thesis and for copying to be done at the discretion of the archives librarian and/or the College librarian. Margarita Bratkova, May 19, 2001 Visual Quality of Computer Graphics and its Effect on Distance Judgements Thesis by Margarita Bratkova Primary Advisors William Thompson, University of Utah Robert Weaver, Mount Holyoke College Committee Members Claude Fennema, Mount Holyoke College Submitted to Department of Computer Science Mount Holyoke College in partial fulfillment of the requirements for the degree of Bachelor of Arts with Honors May 19, 2001 Acknowledgements I would like to thank my advisors, Prof. Bill Thompson (University of Utah) and Prof. Bob Weaver, for their continuous personal guidance, encouragement, and support throughout this difficult year. I would like to thank Prof. Claude Fennema for being an excellent opponent, for putting me on the spot, and for making me clarify my ideas; Sarah Creem from University of Utah, for her help with methodology; Joe Cohen for his suggestion to use a perceptual metric device; M. J. Wraga (Smith College) for her methodology advice; Amy Gooch (University of Utah) for her Hallway Model. I would like to thank my fianceeĢ, Branislav L. Slantchev, for his help on statistical analysis and LATEX, and for his continuous support and understanding. I would like to thank Bob Weaver, Sean Broughton, Amber Sheker, Yasmin Benlamlih, Huma Khan, Maya Peeva, Petya Radoeva, Taylor Erickson, John Zawislak, Kristofer J. Clawson, Lauren Sanderson, Stephanie Stacey, Reed Baldwin, Seth Hoffman, Nora Gorcheva, Valia Petkova, and Todd McGarvey for taking time to participate in my experiment. I would like to thank Prof. Lester Senechal for lending me his laptop for my experiment. I also want to thank my friends for taking care of me and for making my life brighter. This project was made possible in part by a software grant from Alias|Wavefront. Professor Thompson’s participation in this project was supported in part by NSF grants CDA-96-23614 and IIS-00-80999. Abstract Current state-of-the-art computer-generated images, despite impressive gains in visual realism, are still unable to convey an accurate sense of scale and distance. This is less the case for the real-world, for which human judgments of scale and distance are fairly good, at least up to distances of 15-20 meters. The disparity is particularly noticeable in judgments involving actual distances to locations in the world rather than judgments involving relative comparisons between distances to different objects or locations. Because of that, we are interested in the exploration of spatial cues which provide us with absolute depth information. Our hypothesis states that the way the visual properties of computer generated objects and environments carry and preserve spatial information is the key to improving depth perception in computer graphics. To our knowledge, there are no controlled experiments done to even verify that the quality of the visual properties of computer generated objects aids in depth judgments involving actual distances to virtual locations. Nothing at all is known about which aspects of visual quality are most significant in aiding such judgments. The purpose of this study is to explore the possible effects of improved geometric complexity and quality of illumination (two aspects of visual quality) on such depth judgements. Results were obtained by performing limited human subject tests, utilizing already developed evaluation methods. The results are not fully conclusive at this time, and there are complex issues that have to be addressed before we can have convincing evidence for our hypothesis. Contents List of Figures iii List of Tables v 1 Introduction 1.1 Research Problem and Importance 1.2 Hypothesis . . . . . . . . . . . . . . 1.3 Interdisciplinary Approach . . . . . 1.4 Thesis Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 5 7 7 2 Computer Graphics 2.1 Two-Dimensional Computer Graphics . 2.2 Three-dimensional Computer Graphics 2.2.1 Geometry . . . . . . . . . . . . 2.2.2 Materials and Lights . . . . . . 2.2.3 Illumination (Rendering) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 10 10 11 13 . . . . . 15 15 16 17 19 20 . . . . 26 26 27 30 30 . . . . . . . . . . 32 32 33 33 33 34 36 37 37 37 40 . . . . 3 Human Vision 3.1 General Overview . . . . . . . . . . . . . . . . . . . . . 3.2 Human Depth Perception . . . . . . . . . . . . . . . . 3.2.1 Properties of Depth Information . . . . . . . . . 3.2.2 Visual Cues and Absolute Distance Judgements 3.2.3 Visual Cues Relevant in Computer Graphics . . 4 Theoretical Issues Related to Methodology 4.1 Quantifying the Perception of Distance . . . . . . . . . 4.2 Field of View, Visual Angle, Picture Frame, Immersion 4.3 Immersion and Monocular Viewing . . . . . . . . . . . 4.4 Eye-Height . . . . . . . . . . . . . . . . . . . . . . . . . 5 Experiments 5.1 Hypothesis, Motivation, Goal 5.2 Methodology . . . . . . . . . 5.2.1 Participants . . . . . . 5.2.2 Apparatus . . . . . . . 5.2.3 Design and Stimuli . . 5.2.4 Experimental Protocol 5.2.5 Statistical Methods . . 5.3 Results . . . . . . . . . . . . . 5.3.1 Collected data . . . . . 5.3.2 Discussion . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion and Future Work 6.1 Overview . . . . . . . . . . . . . . . . . . 6.2 Future Work . . . . . . . . . . . . . . . . 6.2.1 Problems with the Physical Setup 6.2.2 Problems with the Virtual Model 6.2.3 Problems with Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 50 51 52 52 54 Appendix A 55 Appendix B 60 Bibliography 65 ii List of Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Scale establishes the proper factoring between the image size of an object and the object’s actual real world size. Distance establishes how far away objects are in relationship to where the viewer is. . . . . . . . . . . . . . . . . . . . . . . . . . . . In this image, relative distance judgements can clearly be made. But there is no way in which we can make an absolute distance judgement to any point in the figure. (image courtesy of Bill Thompson, University of Utah) . . . . . . . . . . . . . . . . . This image demonstrates a difference in geometric quality. The chair to the left has a simpler geometry compared to the chair to the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . This image demonstrates a difference in materials. The chair to the left has less-detailed surface properties compared to the chair to the right. . . . . . . . . . . . . . . . . . . . . . . . . . This image demonstrates a difference in illumination. The chair to the right has better quality of illumination that the chair to the left. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of complexity of Human Vision. A single line segment on the retina can be a projection of numerous lines in the real environment, while the numerous line segments in the real environment can project into only one line segment. (after ”Vision Science” by Stephen Palmer, p.23) . . . . . . . . . . . Egocentric Distance Judgements. . . . . . . . . . . . . . . . . Exocentric Distance Judgements. . . . . . . . . . . . . . . . . Accommodation is the change occurring in the shape of the lens of the eye. (after ”Vision Science” by Stephen Palmer, p.204) Convergence is the angle between the eyes’ direction of gaze. (after ”Vision Science” by Stephen Palmer, p.205) . . . . . . Familiar Size Depth Cue. . . . . . . . . . . . . . . . . . . . . . Effect of different visual angles. . . . . . . . . . . . . . . . . . The effect of increasing picture frame on the field of view. . . . The occluding box attached to the display monitor. . . . . . . A side view of the experimental setup. . . . . . . . . . . . . . A top view of the experimental setup. . . . . . . . . . . . . . . Data collected per image per subject for eye-height of 60 inches. Data collected per image per subject for eye-height of 62 inches. Data collected per image per subject for eye-height of 64 inches. Data collected per image per subject for eye-height of 66 inches. Distance Estimation Target one – Chair one . . . . . . . . . . Distance Estimation Target two – Chair two . . . . . . . . . . iii 2 4 11 12 14 17 19 20 22 23 25 28 28 34 35 37 43 44 45 46 47 47 23 24 25 26 27 28 29 30 31 32 Distance Estimation Target three – Chair three . . . . . . . . Distance Estimation Target four – Chair four . . . . . . . . . Distance Estimation Target five – Chair five . . . . . . . . . . Distance Estimation Target six – Chair six . . . . . . . . . . . A snapshot of Maya. . . . . . . . . . . . . . . . . . . . . . . . A sphere and a box generated in Maya by the scripting language, MEL, listed below. . . . . . . . . . . . . . . . . . . . . . . . . An example of the modeled room. This example has bad geometry and bad illumination. . . . . . . . . . . . . . . . . . . . . An example of the modeled room. This example has bad geometry and good illumination. . . . . . . . . . . . . . . . . . . . An example of the modeled room. This example has good geometry and bad illumination. . . . . . . . . . . . . . . . . . . An example of the modeled room. This example has good geometry and good illumination. . . . . . . . . . . . . . . . . . . iv 48 48 49 49 55 56 61 62 63 64 List of Tables 1 2 3 4 5 6 7 8 Properties of depth information and their relationship with visual cues. (after ”Vision Science” by Stephen Palmer, p.204 . Definitions of visual cues commonly cited to relate to depth perception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chair number one, 58 observations, R2 = 0.17 . . . . . . . . . Chair number two, 76 observations, R2 = 0.27 . . . . . . . . . Chair number three, 69 observations, R2 = 0.14 . . . . . . . . Chair number four, 52 observations, R2 = 0.08 . . . . . . . . . Chair number five, 44 observations, R2 = 0.04 . . . . . . . . . Chair number six, 85 observations, R2 = 0.33 . . . . . . . . . v 18 21 39 40 40 40 40 40 1 1 Introduction The beginning of computer graphics was marked in 1963 by Ivan Sutherland’s debut of Sketchpad [38]. Since then, the development of computer graphics has been truly phenomenal, especially over the last decade. Computer graphics has a permanent presence in many aspects of our daily life – in motion pictures, advertising, the internet, entertainment. 1.1 Research Problem and Importance Despite impressive gains in visual realism, current state-of-the-art photo-realistic computer graphics are unable to create images of objects and environments that look large and at the right scale. To make objects and environments that look large, one has to make them appear to be far away. An accurate judgement of scale in an image would establish the proper factoring between the image size of an object and the object’s actual real world size. An accurate judgement of distance in an image would position the viewer within that image and would establish how far away objects are in relationship to where the viewer is. It is the inability of computer graphics to make objects appear at the right distance that we address in this research. Figure 1 provides an example of the concept of scale and the concept of distance. Human vision is an interpretive process. It transforms the two-dimensional projection of patterns of light at the back of our eyes into a stable perception 2 Figure 1: Scale establishes the proper factoring between the image size of an object and the object’s actual real world size. Distance establishes how far away objects are in relationship to where the viewer is. of three-dimensional objects in three-dimensional space. The objects we ”see” are perceptual interpretations based on the visual information available in the perceived images. They are not a direct registration of physical reality [26]. In natural settings, for stationary observers, scale and distance estimation is fairly good, at least up to distances of 15-20 meters. Cook’s data shows an accuracy mean close to 95% [7]. With several different methods, Da Silva reports means around 94% [34]. Loomis reports mean error less than 30 cm for target distances ranging from 4m to 15m [22]. For similar distances, people usually underestimate those judgements when viewing computer graphics using visually immersive displays. Average results range from 50% to 70% of the actual real distance. There is only one study contradicting this finding. The Witmer and Sadowski experiments [43] resulted in real-world distance judgements 92% of the true distance, while their computer graphics visually immersive judgements were 85% of the rendered distance. There are a number of potential problems 3 with the Witmer-Sadowski study, the most significant of which is that they used actual walking as an indicator of the real-world distance judgements, while they used treadmill walking as an indicator of the virtual world distance judgements. This reported underestimation when viewing computer graphics using visually immersive displays is particularly noticeable for absolute distance judgements (judgements involving actual distances from the viewer to locations in the world), as opposed to relative distance judgements (judgements involving relative comparisons between distances to different objects or locations) [8, 21]. Figure 2 shows an image where relative distance judgements can clearly be made. Using the presence of the linear perspective, it is easy to estimate the ratio between the real distance to the first object to the left and the real distance to the second object to the right. But there is no way in which we can make an absolute distance judgement to any point in the figure and quantify the real distance between where we are standing as viewers and where these objects are positioned. In particular, if the world (including the viewpoint height) gets doubled in size, the image would still look the same. In both situations, the natural setting and the computer graphics setting, the human visual system is the same. It is the image projecting through the lens of the eye to the back of the retina that has changed. Therefore, the properties of the photo-realistic computer generated images we create and the conditions under which we view them must limit or lack some necessary 4 Figure 2: In this image, relative distance judgements can clearly be made. But there is no way in which we can make an absolute distance judgement to any point in the figure. (image courtesy of Bill Thompson, University of Utah) information otherwise available in the natural world. If we want to find a solution to the computer graphics scale and distance problem, we must find out what it is that we can do to improve the computer generated images and what it is that we must control for when we actually display those images. This failure to achieve an accurate sense of scale and distance has not been considered a major problem for the entertainment industry. Yet, there are many computer graphics applications outside of the entertainment industry, in which a precise judgement of scale and distance is really necessary. These include training and education, visualization, mechanical design, virtual environments, architecture. For example, we can think about non-endoscopic tele-surgery training for 5 brain surgeons. If the surgeon has an opportunity to spend time looking at a virtual model of his patient’s brain and figure out where major vessels are positioned and how the operation should take place, he will most probably perform a better and more successful procedure when he does it for real. We can also think about flight simulators for airplane pilots. When new models of airplanes are produced, airplane pilots need to get training and experience on the new complicated machines. It is very expensive and dangerous to let a pilot fly a plane without having some sort of ”real” experience. Running successful flight simulators decreases the chance for pilot error and increases human safety. Another example are architectural simulations. Before the client commits to a particular solution for a building project, she can actually explore what the building will be like. 1.2 Hypothesis Many people agree that by making our computer generated images closely approximate the real world images, we would achieve ”perfect” realism. And it really makes sense that if we could, through a synthetic image, exactly mimic the patterns of light at the back of a person’s retina that would have otherwise been generated when that same person looked at a real environment, we would have a visually indistinguishable synthetic environment [23]. Unfortunately the real world and the laws of physics are very complex. Thus near-perfect 6 image approximations of the real world using finite resources computer systems are impossible at this time. Therefore, paying attention to the ways spatial information is carried in real or synthetic images and preserving that spatial information in the generation of our computer graphics images, might prove crucial to improving the sense of depth in computer graphics. A synthetic, computer generated environment is made of computer generated objects. The appearance of this synthetic environment and the objects that are in it, just as the appearance of the real environment and the real objects belonging to it, is a function of different visual properties. To our knowledge, there is no research done exploring whether the quality of these different visual properties in computer graphics images relates to absolute distance judgements. Our hypothesis states that the way these visual properties carry and preserve spatial information is the key to improving human depth perception when viewing computer generated images. We motivate our hypothesis by considering the differences between photographs of real environments and computer graphics renderings of similar environments. While controlled experiments have yet to be done, we believe that the former often convey a good sense of scale and distance, while the latter seldom do. This research is a first attempt at making steps towards testing our hypothesis. This research tries to address a complex graphics question, using already established evaluating methods. 7 1.3 Interdisciplinary Approach A satisfactory solution to this complicated problem involves finding answers to many difficult questions. In order to do a better job with the computer graphics images, we need to understand what is possible to do with graphics, and also what it is that people perceive when they view graphics (or else how can we say what is better?). We have to understand the limitations of human depth perception, as well as the amount of 3D information possibly available in a two dimensional image. Only an interdisciplinary approach utilizing knowledge and methods from Computer Graphics, Experimental Perceptual Psychology and Vision Science can help us answer these questions in an informed manner. Experimental Perceptual Psychology provides many theories and data concerning the processes through which the human vision system acquires knowledge about the surrounding environment. It also provides information about human perception limits, and methodology for human subject testing. Vision Science is an interdisciplinary field, a branch of cognitive science interested in image understanding. It studies the nature of optical information and the ways in which the visual system extracts information about the environment [39, 26]. 1.4 Thesis Layout Part 2 is an overview of necessary background theories and definitions in computer graphics. Part 3 is a an overview of necessary background theories and 8 definitions in experimental perceptual psychology and vision science. Part 4 is an overview of issues related to methodology for human perceptual judgement. It also provides an overview of evaluation methods. Part 5 describes the pilot experiment, issues, results, and discussion of findings. Part 6 concludes this study. It provides an overview of challenges and difficulties, and talks about possibilities for future research. 9 2 Computer Graphics The field of computer graphics was born soon after the introduction of computers. It has since grown much beyond its initial specialized boundaries to include the creation, storage, and manipulation of visual models and images of objects [13]. It is no longer a rarity but rather an integral part of our daily life. Areas as diverse as science, education, medicine, engineering, commerce, advertisement, and entertainment all rely on computer graphics. Computer Graphics is usually divided into two subcategories, based on the type of objects it tries to reproduce - two-dimensional or three-dimensional. Both are usually displayed on a two-dimensional computer display. 2.1 Two-Dimensional Computer Graphics Two-dimensional graphics refers to images which do not attempt to introduce the third dimension and whose nature is primarily two-dimensional. An analogy in the visual arts would be a painting or a print. In fact, two-dimensional graphics are created in a similar manner – by applying virtual color to a virtual two-dimensional canvas. Two dimensional graphics can be either raster-based or vector-based. If they are raster-based, the image information consists of pixels stored in a rectangular array and then translated into the screen in a series of horizontal rows called rasters [37]. If the graphics are vector-based, the image information is stored in geometric descriptions of lines, curves, and 10 shapes. Both types of two-dimensional graphics are used, and choosing one over the other is a matter of convenience. 2.2 Three-dimensional Computer Graphics Three-dimensional graphics refers to images considered primarily three-dimensional in nature and built as such. For example, consider the difference between Figure 6 which is a two-dimensional diagram, and Figure 2 which is an illustration intended to convey a three-dimensional effect. Three-dimensional work involves creating a complete virtual model of an object within the computer. This model can be viewed through an imaginary camera, can be positioned anywhere in the virtual space, and can be manipulated in any way necessary. The model can be textured, lit, given the ability to move and change over time. The model is itself a mathematical simulation of a ”real” object [37, 5]. A particular virtual camera view can be selected, and the model can be rendered, thus producing an image, a ”photograph” of the three-dimensional model. The degree of visual realism of three-dimensional computer graphics is a result of the quality of these different visual object properties. 2.2.1 Geometry Geometry is the data that describes and defines an object. It is a collection of geometric descriptions of the lines, curves, and shapes defining an object 11 in three-dimensional space. These include any geometric imperfections like scratches, dents, cracks. Geometry includes both geometric accuracy (how faithful is the geometry of the virtual object to the real-world object, if such a real object exists) and geometric complexity (how much detail is there). An example showing different stages of geometric visual quality, simple vs. complex, is shown in Figure 3. Figure 3: This image demonstrates a difference in geometric quality. The chair to the left has a simpler geometry compared to the chair to the right. 2.2.2 Materials and Lights Materials in computer generated three-dimensional objects are determined by a combination of surface properties and textures. Surface properties determine 12 how the surfaces of our computer generated models absorb, reflect, or transmit light rays [37, 17]. Surfaces Textures give the surfaces texture effects. Textures are two-dimensional images applied to the surfaces of objects. They are very flexible and there are different ways to apply them. Figure 4 gives an example of the difference between a real-world material, and a computer-rendered one. Figure 4: This image demonstrates a difference in materials. The chair to the left has less-detailed surface properties compared to the chair to the right. If we want to see the materials of the objects we are creating, we have to include some sort of a light source in the scene. Otherwise, we would see only a black screen. Lighting is a term used to designate the interaction between an object’ s material and geometry with the light sources available in the scene where the object is positioned [24]. There are several types of lights used in three-dimensional modeling, but in this research only two types are used: point lights and area lights. 13 Point Lights These are light sources which simulate the effect of a bare light bulb hanging from a wire. They emit light from a single point equally in all directions [37, 25]. Area lights These are light sources which simulate the effect of a window, or fluorescent panel, etc. They emit light from an entire rectangular surface area rather than a single point, and emanate in a rectangular beam [37, 25]. 2.2.3 Illumination (Rendering) The creation of photo-realistic images has been a major focus in the study of computer graphics for much of its history. This effort has led to mathematical models and algorithms that can compute physically realistic images from known camera positions and scene descriptions that include the geometry of objects, the reflectance of surfaces, and the lighting used to illuminate the scene [37, 5, 42, 39]. These rendered images attempt to accurately describe the physical quantities that would be measured from a real-world scene. Results are fairly good, even though in most cases the rendering algorithms used to create them are not based on much physical theory. An exception to that are global illumination methods like Radiosity, which are physically based, but can take hours of computation per single image. Improving the realistic quality of the images has been dependent on better rendering algorithms, more memory, improvement in computational speeds, as well as other factors [39]. 14 Figure 5 provides an example of the difference in illumination. The chair to the right has better quality of illumination than the chair to the left. Figure 5: This image demonstrates a difference in illumination. The chair to the right has better quality of illumination that the chair to the left. 15 3 3.1 Human Vision General Overview Most of us take vision for granted. We do not think much about it – we just open our eyes and look! It doesn’t seem like there’s much mystery in it. Yet, when we think about what is actually going on, we begin to understand the complexity of the process. The act of seeing is far from an open window onto reality. What our eyes physically see is just a two-dimensional pattern of light that gets projected to the back of our eyes. Suddenly we realize what a big miracle it is that humans can see at all! There are two domains that contribute to vision [26]: visual perception and nature of optical information. Each is very important, but because it is not pertinent to this research just a brief overview is included. Visual Perception This is the process of acquiring knowledge about the world using the light objects and environments reflect or emit. That knowledge is obtained by extracting and interpreting the light going through the lens of our eyes. Human vision somehow manages to transforms those patterns of emitted and reflected light into a constant, stable perception of threedimensionality. The objects we perceive are actually interpretations based on the structure of the projected images rather that a direct registration of reality. 16 Optical Information This concerns the ways the structure of light interacts with surfaces to produce optical information which gets reflected into our eyes from the surfaces in the environment. Vision is a very difficult process, and there are situations in which it doesn’t work well – for instance in situations when the information provided is systematically misperceived (those cases are called illusions). But these illusions are a very minute part of vision in our everyday life. In all other cases vision provides spatially accurate information. It gives the observer a highly reliable information about the location and properties of objects in the surrounding environment. Vision is useful only if it is reasonably accurate, and in fact, vision is useful precisely because it is accurate [26]. 3.2 Human Depth Perception The difficulty humans face in perceiving the spatial arrangement of the three dimensional world comes from the inherently ambiguous problem of recovering three dimensional information from two dimensional images. The mathematical relation between the two-dimensional projection to the back of the human retina and its real-environment source is not a well-defined function. Each three-dimensional point coming from the real environment maps to a well-defined two-dimensional point in the two-dimensional curved plane of the retina, but each two-dimensional point of the two-dimensional plane could map into an infinite number of three-dimensional points in the real environment. 17 Therefore, there are infinitely many three-dimensional worlds into which the two-dimensional image at the back of our eyes could potentially maps to. Figure 6 provides an example of the complexity of human vision. Figure 6: Illustration of complexity of Human Vision. A single line segment on the retina can be a projection of numerous lines in the real environment, while the numerous line segments in the real environment can project into only one line segment. (after ”Vision Science” by Stephen Palmer, p.23) 3.2.1 Properties of Depth Information The complexity of human vision leads us to the conclusion that correct depth perception is almost impossible. Yet, people achieve accurate depth perception almost every minute of the day. How is that possible? Everyday depth perception is possible because the visual system has assumptions about the real world and the conditions in which vision takes place. The human visual system computes distances to objects in the surrounding environment using information from the posture of the eyes and also from the light getting projected onto the retina from the environment [40]. The sources of such depth information are called visual cues. There are five important depth information properties that characterize 18 these sources of depth information (visual cues) [26]. Each property has one of two possible states: 1. ocular Information concerning the state of the eyes themselves. optical Information concerning the light entering the eyes. 2. monocular Information available from just one eye. binocular Information available from both eyes. 3. static Information available from a non-moving image. dynamic Information available from a moving image. 4. absolute Information determining the actual distance to objects. relative Information determining the relation of objects to each other. 5. quantitative Information specifying numerical distance relation. qualitative Information specifying ordinal distance relation. There are many visual cues and these five properties are pertinent to all of them. Table 1 describes their relationship. Visual Cues Ocular/ Optical Monocular/ Binocular Static/ Dynamic Relative/ Absolute Qualitative/ Quantitative accommodation convergence binocular disparity motion parallax aerial perspective linear perspective relative size familiar size position relative to the horizon shading and shadows texture gradients occlusion ocular ocular optical optical optical optical optical optical optical monocular binocular binocular monocular monocular monocular monocular monocular monocular static static static dynamic static static static static static absolute absolute relative relative relative relative relative absolute relative quantitative quantitative quantitative quantitative qualitative quantitative quantitative quantitative quantitative optical optical optical monocular monocular monocular static static static relative relative relative qualitative quantitative qualitative Table 1: Properties of depth information and their relationship with visual cues. (after ”Vision Science” by Stephen Palmer, p.204 19 3.2.2 Visual Cues and Absolute Distance Judgements Absolute distance judgements are judgements involving computation of actual distances from the viewer to locations in the world. When we talk about about absolute distance judgement we often come across the terms egocentric and exocentric distance. The distance from a viewer to objects in a particular environment is called egocentric distance. Figure 7 shows an example of egocentric distance. Figure 7: Egocentric Distance Judgements. The distance between two or more objects in a particular environment is called exocentric distance, Figure 8 shows an example of exocentric distance. The only way an observer can perform an accurate egocentric distance judgement is if she can uniquely recover an absolute three-dimensional position. Absolute depth is the only property of visual cues which provides for the recovering of such absolute distance. Once absolute depth is determined for a portion of the scene, one can use it to scale the available relative depth information for other parts of the scene [39]. Table 2 gives definitions of the well-known visual cues and lists which 20 Figure 8: Exocentric Distance Judgements. visual cues could possibly provide absolute depth information in addition to providing relative depth information. It should become clear that only a few depth cues provide absolute depth information and hence contribute to absolute distance judgements. 3.2.3 Visual Cues Relevant in Computer Graphics We already mentioned that humans have a fairly accurate sense of distance in real environments, at least up to 15-20 meters. We also talked about the fact that in computer graphics people usually underestimate those judgements and average results range from 50% to 70% of the actual distance. Obviously, in real environments, humans have the ability to retrieve absolute depth information which allows them to have their position in the threedimensional world and also to judge their distance from existing objects. In synthetic environments, on the other hand, there are many problems. As a result of these problems many of these absolute depth cues are not available, or are not directly relevant. Accommodation is the occurring change in the shape of the lens of the 21 Visual Cues Depth Information Description accommodation absolute convergence absolute binocular disparity relative motion parallax absolute relative aerial perspective ?absolute relative linear perspective relative relative size relative familiar size absolute position relative to the horizon ?absolute relative shading and shadows relative texture gradients ?absolute relative occlusion relative The occurring change in the shape of the lens of the eye when one focuses on objects at different distances. An object’s distance can be determined from the amount of accommodation necessary to focus one’s eye on the object. The angle between the eyes’ direction of gaze. An object’s distance can be determined by the angle of convergence necessary to focus one’s eyes on the object. The difference in the relative position of the projections of an object on the retinas of our two eyes. The amount of depth perceived depends on the disparity between the two projections. The existing distinction in the motion of pairs of points due to their different position relative to the eye’s point of focus. The systematic differences in the color and contrast of objects occurring as a result of the moisture and impurities in the atmosphere. Parallel lines in 3D space converge in a vanishing point on the horizon in 2D space. The difference between the projected retinal size of objects that are physically similar in size but situated at different spatial positions. When an object’s size is known to the observer the absolute depth information for the distance of an object is recoverable from the size-distance constancy (relative size becomes familiar size). If the horizon is visible, one can determine the distance from an observer to any point on the ground plane, using eye-height and the horizon angle. Variations of the amount of light reflected from a surface can visually specify the orientation and shape of the surface. Visual gradients produced by texture surfaces determine the spatial orientation of flat and curved surfaces. An object in front of another occludes parts of the second object providing information about depth order. Table 2: Definitions of visual cues commonly cited to relate to depth perception. 22 eye when one focuses on objects at different distances. Figure 9 provides an example of the way accommodation works. Figure 9: Accommodation is the change occurring in the shape of the lens of the eye. (after ”Vision Science” by Stephen Palmer, p.204) Accommodation is an absolute depth cue, but it is effective only up to 2 meters. Its accuracy strongly declines above 0.5 meter. There is not a known way in which we can control for accommodation in conventional computer graphics displays. VisualLABS (http://www.visualabs.com/techexp.html) provides special 3D display technology which allows each and every pixel within an image to appear at its own visual distance from the viewer, thus controlling for accommodation. Convergence is the angle between the eyes’ direction of gaze. An object’s distance can be determined by the angle of convergence necessary to focus one’s eyes on the object. Figure 10 provides an example of the way convergence works. 23 Figure 10: Convergence is the angle between the eyes’ direction of gaze. (after ”Vision Science” by Stephen Palmer, p.205) Convergence is an absolute depth cue which is effective only up to about 2 meters and strongly declining beyond the first 0.5 meter. An additional problem with it is that making convergence viable in computer graphics displays is hindered by intrinsic stereo displaying problems [41]. Binocular Disparity (also called Binocular Stereo) is often considered a powerful absolute depth cue. Actual distance judgements using stereo, though, seem to be rather uncertain and effective only up to a range of few meters [39]. Problems intrinsic to existing methods for displaying stereo graphics make those displays even less effective as they would have otherwise been when viewing real environments [41]. Aerial Perspective might actually be a weak source of absolute distance in large scale environments [27], but so far there is no evidence collected showing that people really use it to do so. 24 Motion Parallax cannot be controlled for, unless a head-tracked immersive display is used. Even in the cases where such display is used, it proves to be a weak cue for absolute distance [2]. Position relative to the horizon has the potential for giving us absolute distance, provided that the height of the observer above the ground plane is known. Unfortunately, its effectiveness attenuates with distance, and is also more effective for judgements of objects around eye-height than ones much taller than eye-height [3, 4]. Also, people use eye-height scaling to judge object size only as long as they perceive themselves to be immersed within the scene [9]. Familiar size is the most general of the agreed upon absolute depth cues, since it does not depend on information about viewing conditions other than that visual angles be presented correctly. Familiar size relies on the recognition of an object within a scene of interest. If the viewer can not only recognize the object, but also recall its size, h, then she can determine within a reasonable range the egocentric distance from her to the object, d, based on the sizedistance relation where a is the visual angle that subtends from the top to the bottom of the upright object. The effect of familiar size was shown by Ittelson [18] and later by Epstein [11]. 25 Figure 11: Familiar Size Depth Cue. 26 4 Theoretical Issues Related to Methodology In order to conduct any experiments, we have to address methodological issues. Ideally, we would like to construct experiments which test the right variables without introducing bias to the data we collect. This section aims to address theoretical issues related to methodology. In particular, it tries to provide background relevant to egocentric distance estimation tasks using computer displays. 4.1 Quantifying the Perception of Distance Perceived egocentric distance cannot be measured directly but from an observer’s response to variables of interest. Knowing how to quantify the observer’s perception of distance has proven to be difficult because there might be biases in the reporting. There are different indicators used in measuring perceived egocentric distance. Loomis gives a summary in [21]. The simplest indicators are those based on direct judgement of perceived distance. One such common direct procedure is numerical estimation reports. Here the observers estimate egocentric distance in some sort of familiar distance units. Obviously, this type of reporting requires that the observers have an internalized unit of measure. Another common direct procedure is based on some open-loop motor behavior. An example of such procedure is visually directed walking. Visually directed walking means that the observer views a target and then, with eyes closed attempts to walk to its location (the target is removed). These motor- 27 behavior responses assume that the observer’s response is calibrated to perception and not by prior knowledge. In fact, results of experiments using some sort of open-loop motor behavior indicate an absence of systematic error, at least when walking to targets up to 20 meters away [20, 29]. Different types of indicators are those using indirect estimates obtained from judgements of other perceptual variables. They were developed by Gogel in 1976 out to concern for contamination possible with numerical estimation procedures [14]. These indirect estimates involve tasks in which subjects perform actions that are scaled by their perception of three-dimensional space [30, 39, 20]. 4.2 Field of View, Visual Angle, Picture Frame, Immersion Another important issue relevant to computer displays is field of view. Field of view is defined as the whole area that you can see without turning your head or that you can see from a particular position [6]. The human field of view for each eye can be thought of as a cone of vision, even though in computer graphics this cone of vision becomes a pyramid of vision, because of the rectangularity of the computer monitor screen [25]. In our everyday, two-eyes looking, we have a field of view close to 208 degrees. Obviously, there is a lot of overlap between what our left eye and what our right eye sees. In fact that overlap accounts for 130 degrees of the human field of view [12]. Therefore an eye alone sees about 170 degrees. When looking at a monitor, under full field of view conditions, the visual angle subtended by the monitor to the back of our eye is just a portion of 28 what the eye sees. The closer we get to the monitor, the bigger the visual angle subtending would get. Figure 12 provides an example. Figure 12: Effect of different visual angles. Thus, a bigger portion of the field of view would be filled by the image displayed on the monitor (see Figure 13). Therefore, the field of view will affect the observer’s awareness of the picture frame on which we display our rendered images. Figure 13: The effect of increasing picture frame on the field of view. There are experiments that give evidence to the fact that having awareness of the surrounding picture frame reduces the amount of perceived depth within the viewed scene [10]. A study by Yang et al. shows that perceptual 29 and cognitive awareness of the conventional computer display is likely to conflict with the intended awareness of the represented scene [47] because the observer can position the computer screen accurately within the real environment. Pirenne [28] provides evidence that awareness of the two-dimensional picture surface interferes with the perceived three-dimensionality of the represented scene. Immersion is defined as the objective viewing context provided by the display, whether or not the displayed environment surrounds the observer and whether or not the ground plane is below the observer’s feet [9]. Dixon et al. [9] claim that non-immersive pictorial displays are inherently ambiguous and there is no guarantee that observers will perceive them as full-scale representations of reality. On the other hand, studies by Yang et al. demonstrate that perception in virtual environments under conditions of minimal subsidiary awareness is subject to exactly the same analysis as the perception of real environments [47]. Therefore, if we want to get correct egocentric distance estimates using computer displays we have to find ways to reduce or eliminate awareness of the picture frame. Smith et al. propose a setup using restrictive aperture in conjunction with monocular viewing or one using stereoscopic display techniques [36]. Both would greatly reduce the perceptual conflict and thus reduce picture frame awareness. Another way for achieving immersion is the use of immersive virtual displays like CAVE or HMD’s. The CAVE technique encloses the observer within a cube of multiple screens projecting stereoscopic images. In a CAVE, one 30 has unrestricted, wide field-of-view observation, but limited mobility. A more common technique, the HMD (head-mounted display), is usually used in conjunction with head-tracking. In an HMD, one has a narrower field-of-view, but no distinct visual awareness of the occluder that limits the field-of-view. In addition, the scene can be scanned using head-motion, and there is usually greater mobility. Studies by Barfield, Heter, Loomis, and others, show that the sense of immersion, especially when the environment is coupled with realistic imagery and high degree of interactivity between the observer and the virtual environment promotes experience of really being present in the virtual world [1, 15, 16, 19, 35] (presence is the sensation of actually being in a virtual environment.) 4.3 Immersion and Monocular Viewing We are using visually semi-immersive setup which incorporates a monocular viewing coupled with restrictive viewing aperture (narrower field-of-view) to control for viewing conditions. We know from Servos [33] that when subjects make a purely perceptual distance estimation in which no time constrains are present, the monocular system is not at a disadvantage relative to the binocular system. His experiments also find that the monocular condition is just as accurate as the binocular condition when subjects perform those distance judgements using a verbal estimation reporting. 4.4 Eye-Height Another issue relevant in distance estimation is eye-height – how far away from the ground are our eyes positioned. 31 There is evidence, that observers use absolute eye-height scaling for the judgement of object size. Wraga et al. [44, 45] show that the visual system can recalibrate its use of eye-height information. Another paper by Wraga [46] shows that the effectiveness of eye-height scaling is mediated by the object height. Previous studies of distance judgements using non-immersive displays show that observers can make relative-size judgements, but not absolute-size judgements [3, 4, 32]. Non-immersive displays do not readily elicit the use of eye-height scaling. Furthermore, regardless of display type, unless observers perceive that they are standing on the same ground as the object, they are not able to use absolute-height scaling effectively [9]. On the other hand, observers used the geometry of eye-height to scale virtual object heights regardless of distance [9]. Therefore, effective eye-height scaling can occur in virtual environments and this result was confirmed even for HMD’s with a reduced field of view. 32 5 5.1 Experiments Hypothesis, Motivation, Goal A synthetic, computer generated environment is made of computer generated objects. The appearance of this synthetic environment and the objects that are in it, just as the appearance of the real environment and real objects that are in it, is a function of different visual properties. In computer graphics those visual properties are controlled by three variables – geometry, materials, and illumination. A qualitative variation in any one of these variables leads to a distinctly different visual result. Our hypothesis states that the way these visual properties carry and preserve spatial information could be the key to improving human egocentric depth perception when viewing computer generated images. We believe that an improvement in the overall visual quality of the computer generated objects should lead to better egocentric depth perception. The motivation for this research arose by considering the differences between photographs of real world environments and the computer graphics renderings of similar environments. While controlled experiments have yet to be done, we believe that the former often convey a good sense of scale and distance, while the latter seldom do. To our knowledge, there is no research done exploring whether the quality of the different visual properties in computer graphics images relates to absolute distance judgements. There are several papers addressing the perception of spatial relationships (relative distance judgements) as a function of rendering quality [40, 31]. 33 This research is an attempt at making steps towards testing our hypothesis. The goal of this work is to establish possible positive correlation effects between improved visual quality of the computer graphics models and the accurateness of the egocentric depth perception. In this pilot study we vary two of the visual properties – the quality of the objects’s geometry and the quality of illumination, and keep the third, materials, constant throughout the trials. 5.2 Methodology 5.2.1 Participants 16 subjects (6 men, 10 women) from Mount Holyoke College and the University of Massachusetts, Amherst participated in this experiment. All had normal or corrected to normal vision. 5.2.2 Apparatus Images were generated on a Dell Workstation PWS420 running Windows 2000, with an ELSA Gloria II Pro graphics card. They were displayed on a Dell P1110 21-inch color monitor. The viewing region was approximately 16 inches wide and 12 inches high with a resolution of 1600 x 1200 pixels. The image size was 1404 pixels wide and 1200 pixels high. Subjects viewed images on the monitor monocularly through a 2.5 inches wide by 2 inches high by 1.5 inches long tube attached to the center of an occluding box. Their eye was positioned approximately 11.5 inches from the display screen. The dimensions of the box were approximately 16 inches wide by 12 inches tall by 10 inches long. The box occluded the edges of the display, 34 fixed the field of view, and also fixed the viewing position. Figure 14 provides an example of the box, Figure 15 provides an example of the setup. The display subtended 54.3 degrees of visual angle horizontally, and 46.2 degrees of visual angle vertically. Monitor brightness and contrast were held constant for all subjects. Figure 14: The occluding box attached to the display monitor. The monitor with the attached viewing box was positioned at a 60 inch height. Thus the center of the monitor display had an eye-height of 68. 5.2.3 Design and Stimuli Maya v3.0, a three-dimensional modeling and rendering environment developed by Alias|Wavefront, was used to create and render the virtual scenes. The Maya model was built after an existing room, Clapp 416. The virtual room includes objects like chairs, tables, shelves, and windows. The objects’ surface properties were determined by a Phong and Blinn Illumination Models. 35 Figure 15: A side view of the experimental setup. The object’s surface textures were created by taking photographs of the real objects present in the room, manipulating them in Photoshop v5.5 by Adobe, and then applying them to the surface properties of the geometry. Subjects looked at 24 distinct image renderings from the virtual scene, which had a 2 (illumination) x 2 (geometry) x 3 (distance) variation and were shown twice in a randomized fashion for a total of 24 images. There was also a randomization of the estimation target – there were 6 chair estimation targets. Each of the unique 12 images had a randomized estimation target from the pool of 6 possible. Subjects were standing and images were rendered with a viewpoint appropriate to the subject’s eye height, and with a view-frustum appropriate to the actual viewing conditions. The 24 images were displayed in a sequence and in a fixed position on the screen by IrfanView v3.35, an image viewer by Irfan Skiljan. Eye height was measured for each subject and a stepping box was adjusted so that by stepping on the box the subjects looked at the occluding box at 36 the right eye-height. For instance, if the subject’s eye-height was 62 then the stepping box was approximately 6 inches (the center of the monitor display was positioned at 68 inches). This setup was visually semi-immersive. It compensated for eye height, so the actual and intended eye-height were the same. Accommodation was fixed at the wrong depth (because of the distance of the computer display) and it is a possible confound. Binocular stereo was missing. Familiar size was utilized by using chairs as target objects. 5.2.4 Experimental Protocol Each subject was asked to step on the adjusted stepping box and look at an image through the occluding box for as much time as she felt was needed. She was given directions to look at a particular chair in the virtual room setup, and was asked to try to get a sense of how far away the closest edge of the target chair was from where she was standing. Then she was asked to step down from the stepping box and line up to a line on the floor. The experimenter held a pole and the subject was instructed to tell her to move the pole towards or away from her, until she believed the pole was positioned at the same distance as the edge of the target chair in the image she just looked at (Figure 16 gives an example of the setup). The distance was measured and recorded. She was then instructed to look at the next image, and then perform another perceptual estimate until she looked at all 24 images. 37 Figure 16: A top view of the experimental setup. 5.2.5 Statistical Methods The statistical method used for the data analysis is OLS (Ordinary Least Squares) regression with robust standard errors and clustering by individual. 5.3 Results 5.3.1 Collected data The following listing of figures visualizes all collected data per subject per specific image. Each figure is a data report within a specific eye-height. The figures are grouped by eye-height for simplicity of visualization, and also because within a specific eye-height, subjects looked at the same images in the same order. The figures can tell you the range of the subject’s responses when viewing exactly the same image. There were four eye-heights within the subject pool - 60 inches, 62 inches, 64 inches, and 66 inches The perceptual egocentric distance estimates are displayed as 38 reportedDistance realDistance which yields the percentage correctness of the estimate. The closer the estimates are to 1, (a perfect hit, 100% percentage correctness), the more accurate the distance estimates are. Each data report is organized in three groups. The order of the data plot by independent variable for each of the three groups is: simple geometry/simple illumination, simple geometry/complex illumination, complex geometry/simple illumination, complex geometry/complex illumination. Group one is the data collected for the longest tested distance - approximately 168 inches. Group two is the data collected for the middle distance - approximately 85 inches. Group three is the data collected for the shortest tested distance - approximately 42 inches. Since subjects perform estimations over the same set of 12 images twice, a Large dataset and Small dataset are introduced to visualize the different reported distances to the same presented image. Large dataset refers to the larger value of the reported distance for an observation to a specific image and Small dataset to the smaller value of the reported distance for an observation to the same specific image (because two observations per image were recorded). There were 3 subjects within eye-height 60. Figure 17 is the data collected for subjects of eye-height of 60 inches, corresponding to body-height of approximately 64 inches. There were 3 subjects within eye-height 62. Figure 18 is the data collected for subjects of eye-height of 62 inches, corresponding to body-height of approximately 66 inches. There were 7 subjects within eye-height 64. Figure 19 is the data col- 39 lected for subjects of eye-height of 64 inches, corresponding to body-height of approximately 68 inches. There were 3 subjects within eye-height 66. Figure 20 is the data collected for subjects of eye-height of 66 inches, corresponding to body-height of approximately 70 inches. A statistical analysis of the data with an OLS regression clustered by person with a dependent variable |real distance – reported distance| and explanatory variables geometry, illumination, and height, showed a statistically significant effect for geometry (p-value1 = 0.032, standard error2 = 1.63 inches, coefficient = 3.85 inches ) and a statistically significant result for illumination (p-value = 0.005, standard error = 1.16 inches, coefficient = -3.8 inches). Height was not significant (p-value = 0.845, standard error = 0.91 inches, coefficient = -0.18 inches). We also ran an OLS regression clustered by type of chair used as distance estimation target, out of a suspicion that the chairs used had a variance in their visual type, which might have affected the egocentric distance perception. Tables 3 through 8 list the results in terms of coefficient, standard error, and p-value based on the specific chair used. variables height geometry illumination α constant coefficient (in inches) 2.39 18.07 9.73 129.61 standard error (in inches) 1.38 10.97 6.95 86.90 p value 0.103 0.12 0.182 0.157 Table 3: Chair number one, 58 observations, R2 = 0.17 1 a p-value of 0.05 means that there is a 1 in 20 chance that the statistical result was due to random variation in the data. a p < 0.05 is required for a factor to be considered a significant factor 2 it measures the potential for a random error in the estimate. 40 variables height geometry illumination α constant coefficient (in inches) -3.9 -31.08 -18.74 304.6 standard error (in inches) 1.35 8.69 8.68 84.17 p value 0.011 0.003 0.0047 0.003 Table 4: Chair number two, 76 observations, R2 = 0.27 variables height geometry illumination α constant coefficient (in inches) -2.86 15.79 -7.87 211.05 standard error (in inches) 1.26 7.09 6.85 80.6 p value 0.038 0.042 0.269 0.019 Table 5: Chair number three, 69 observations, R2 = 0.14 variables height geometry illumination α constant coefficient (in inches) -4.17 18.43 -14.28 293.68 standard error (in inches) 3.31 9.19 14.28 214.06 p value 0.227 0.063 0.333 0.19 Table 6: Chair number four, 52 observations, R2 = 0.08 variables height geometry illumination α constant coefficient (in inches) 0.59 -4.15 4.75 -0.58 standard error (in inches) 1.78 10.17 15.56 105.26 p value 0.744 0.689 0.764 0.996 Table 7: Chair number five, 44 observations, R2 = 0.04 variables height geometry illumination α constant coefficient (in inches) 3.11 13.29 -22.84 -153.44 standard error (in inches) 1.36 4.31 4.61 86.44 p value 0.037 0.007 0.00 0.096 Table 8: Chair number six, 85 observations, R2 = 0.33 5.3.2 Discussion Results from OLS regression clustered by person The strongest result apparent from this data is that we observed the same egocentric distance 41 compression estimate (approximately 50% to 70%) found in other studies. While all of the results were compressed, a statistical analysis suggested that a small amount of that compression might be dependent on changes in the independent variables. Clearly, much more research is necessary to make any robust conclusions. We found that good geometry leads to a small increase (by 3.85 inches) in the egocentric distance error (as opposed to bad geometry). Namely, better geometry seems to make people perceive the target objects even closer that they actually are. Our hypothesis hoped to find the exact opposite result – that with better geometry leading to an improvement in the visual quality of the objects in the scene, people will perceive the target objects farther away and closer to the real distance. A possible explanation of this result is that when people look at the better geometry in the target objects, their visual system suddenly observes a lot of sharp, focused detail. As a result, the visual system assumes that the object must be even closer – else why should you see it so clearly, sharply, and in focus? This could potentially happen because of the closeness of the computer display and the fact that in the rendering process no linear blurring for distant objects was introduced. Much additional exploration is necessary to actually confirm this hypothesis. We also found that good illumination leads to a small decrease (by 3.8 inches) in the egocentric distance error (as opposed to bad illumination). Namely, better illumination seems to make people perceive the target objects farther away than when the same objects are viewed under bad illumination. This is exactly what our hypothesis predicted. 42 Results from OLS regression clustered by target chair Spurred by our suspicion that the type of chair used as a target object might have actually introduced a bias in the distance estimation, we ran a statistical analysis clustered by chair type. The analysis gives us the R-squared value3 . The huge range of values for R2 we get (from 4% to 33%) supports our suspicion that the particular chair type might have affected the egocentric distance estimation tasks. There are chairs which have statistically significant effect from geometry on the distance estimation tasks - for instance chair two, chair three, and chair six. Also, there are chairs which have statistically significant effect from illumination on the distance estimation tasks - for instance chair two and chair six. Interestingly, there is also a statistically significant effect from the subject’s eye-height - again for chair two, chair three, and chair six. Figures 21 through 26 provide image renderings of the chairs used in the experiment. Once again, we do not want to jump to generalized conclusions. The data collected is limited, and was not created for the purposes of exploring this hypothesis. Clearly, much more research is necessary to make any robust conclusions. 3 R2 is a measure of the proportion of variability sample of paired data. It is a number between zero and one and a value close to zero suggests a poor model. 43 Figure 17: Data collected per image per subject for eye-height of 60 inches. 44 Figure 18: Data collected per image per subject for eye-height of 62 inches. 45 Figure 19: Data collected per image per subject for eye-height of 64 inches. 46 Figure 20: Data collected per image per subject for eye-height of 66 inches. 47 Figure 21: Distance Estimation Target one – Chair one Figure 22: Distance Estimation Target two – Chair two 48 Figure 23: Distance Estimation Target three – Chair three Figure 24: Distance Estimation Target four – Chair four 49 Figure 25: Distance Estimation Target five – Chair five Figure 26: Distance Estimation Target six – Chair six 50 6 6.1 Conclusion and Future Work Overview This research addresses a complicated computer graphics problem – namely how to improve computer graphics so that we can create synthetically generated objects and environments that look at the right distance away from us. In other words, we would like to know how to improve people’s egocentric depth perception in computer graphics. Our hypothesis states that the way the visual properties of computer generated environments carry and preserve spatial information is the key to improving human egocentric depth perception in computer graphics. We believe that an improvement in the overall visual quality of the computer generated objects and environments should lead to better egocentric depth perception. This research approached the problem by setting up an experiment which attempted to establish possible positive correlation effects between improved visual quality of the computer graphics models and the accurateness of the egocentric depth perception. The experiment varied two of the known visual properties – namely, the quality of the objects’s geometry and the quality of illumination, and keep the third, materials, constant throughout the trials. A Maya model was built after an existing room. The Maya model, a virtual room, included objects like chairs, tables, shelves, and windows. Human subject tests were set up, in which 16 subjects looked at 24 rendered images from the virtual environment. The images had variations of quality of geometry, quality of illumination, and 3 different egocentric distances. The strongest result we found from the experimental data is that we 51 observed the same egocentric distance compression estimate already found in other studies (50% to 70%). A statistical analysis of the data also suggested that a small amount of that compression might be dependent on changes in the independent variables. In the case of better illumination the results were as predicted - better illumination actually led to better egocentric distance estimation. In the case of better geometry, the results were not as predicted - better geometry led to worse egocentric distance estimation. This surprising result was hypothetically explained with the physical setup we are using and with the linearly sharp rendering of the virtual scene. 6.2 Future Work The complicated problem we are posing is far from being answered. Despite having an overall statistically significant result from the collected data for geometry and illumination, we are hesitant to draw any generalized conclusions. Clearly, much additional work is needed before we can claim we have solved this complicated problem. It is our hope, that issues brought up with this first effort will help us improve our physical setup, our virtual model, and our methodology. Several methodological problems with the experimental design became apparent over the course of this research. Future work should try to avoid these biases, and try to make a simpler, more robust experimental dataset with a higher number of test subjects and repeated trials. A possible improvement in the physical setup can be done by using high-quality HMD instead of a visually semi-immersive occluding box with a computer display. Introduction 52 of binocular vision and interactivity might also prove to be helpful. Also, if possible, we should try to incorporate materials as part of the variables variation. 6.2.1 Problems with the Physical Setup Accommodation Accommodation is effective up to 80 inches, but its accuracy strongly declines above 20 inches. In this particular setup, the distance between the eye and the monitor display was 11.5, because we tried to have a reasonable field of view. It is very probably that accommodation bias was introduced. Future research should try to avoid that bias as much as possible by using larger monitor, by increasing the eye-monitor distance at the cost of a narrower field of view, or by using display technology controlling for accommodation. Display Resolution The display resolution, even at display size of 1200 x 1600 pixels is still far less than the fovial resolution of the eye when observing real, physical objects in natural environments. This problem is not well researched, and it is unclear what are the possible effects on the visual system. There is not much we can do about it, but hope for better and bigger monitors in the future. 6.2.2 Problems with the Virtual Model Angled down rotation For some of the rendered images, the viewing direction was angled down slightly from the horizontal. In retrospect, that was a mistake. The problem arose because we wanted to test distances ranging from 40 inches to approximately 180 inches. But because of the insufficient width 53 and length of the field of view, we ended up angling down the rendered view, so that the target objects could be in full view. Time constrains did not allow for a re-rendering of the dataset scenes without the introduced angled down rotation bias. Future research should avoid similar problems by possibly using taller objects, or using a wider field of view, or by testing longer distances. Another possibility is the use of a HMD which would allow for head rotation and thus shorter distances can be explored more fully. Geometry and Volume Consistency Difficulties arose from the practical execution of building bad geometry. We know how to build good geometry, because we have examples of it in the real world. But what constitutes ”bad” geometry is a difficult thing to decide, in particular when there has been no research and no data we can compare our models to. We have to be able to at least stop at the recognizable level, so that the familiar size absolute depth cue could work. Another problem was keeping the overall volumetric difference between the simple and the complex model similar. The current models have such potential problem, because the possible dependence of the models on the overall volume was not observed until the images were rendered. We do not know the importance of this overall volume consistency, but we presume it is important for the familiar size absolute depth cue to work. Future work should try to explore this potential problem further, and models should be built and rendered accordingly. Target Objects Variation It appears that the different chairs we used, by the mere fact that they were different sizes, colors, and shapes, might 54 have affected the egocentric distance estimations. Familiar size was one of the absolute depth cues we were using, and it appears that familiar size might be affected by those variations. Much more exploration is needed in the future to confirm that hypothesis. 6.2.3 Problems with Methodology Future research should try to have stricter limits for the independent variables. This research was difficult to analyze, because in the end there were many potential variables (for example 6 target chairs). There was not enough data, and there was not enough intra-variable consistency. Future research should try to have less variation in the tested independent variables, and also have less intra-variable variation. 55 Appendix A Maya is a 2D/3D animation, modeling, and rendering environment developed by Alias|Wavefront. It was used to create and render the virtual scenes for this study. You can see a snapshot of Maya in Figure 27. Figure 27: A snapshot of Maya. An example of Maya’s scripting language (MEL) follows. This listing is the code that generated the simple scene of a sphere and a box for the snap-shot in Figure 27 and also in Figure 28. //Maya ASCII 3.0 scene requires maya "3.0"; currentUnit -l centimeter -a degree -t ntsc; createNode transform -s -n "persp"; setAttr ".v" no; setAttr ".t" -type "double3" 12.726063667126587 10.366793139266784 25.721265893462594; setAttr ".r" -type "double3" -21.338352729602757 18.200000000000237 4.1850634926034226; createNode camera -s -n "perspShape" -p "persp"; setAttr -k off ".v" no; setAttr ".fl" 34.999999999999993; setAttr ".coi" 28.144424748929119; setAttr ".imn" -type "string" "persp"; setAttr ".den" -type "string" "persp_depth"; setAttr ".man" -type "string" "persp_mask"; setAttr ".tp" -type "double3" 8 0 0 ; setAttr ".hc" -type "string" "viewSet -p %camera"; createNode transform -s -n "top"; setAttr ".v" no; setAttr ".t" -type "double3" 0 100 0 ; setAttr ".r" -type "double3" -89.999999999999986 0 0 ; createNode camera -s -n "topShape" -p "top"; setAttr -k off ".v" no; 56 Figure 28: A sphere and a box generated in Maya by the scripting language, MEL, listed below. setAttr setAttr setAttr setAttr setAttr setAttr setAttr setAttr createNode setAttr setAttr createNode setAttr setAttr setAttr setAttr setAttr setAttr setAttr setAttr setAttr createNode setAttr setAttr setAttr createNode setAttr setAttr setAttr setAttr setAttr setAttr setAttr setAttr ".rnd" no; ".coi" 100; ".ow" 30; ".imn" -type "string" "top"; ".den" -type "string" "top_depth"; ".man" -type "string" "top_mask"; ".hc" -type "string" "viewSet -t %camera"; ".o" yes; transform -s -n "front"; ".v" no; ".t" -type "double3" 0 0 100 ; camera -s -n "frontShape" -p "front"; -k off ".v" no; ".rnd" no; ".coi" 100; ".ow" 30; ".imn" -type "string" "front"; ".den" -type "string" "front_depth"; ".man" -type "string" "front_mask"; ".hc" -type "string" "viewSet -f %camera"; ".o" yes; transform -s -n "side"; ".v" no; ".t" -type "double3" 100 0 0 ; ".r" -type "double3" 0 89.999999999999986 0 ; camera -s -n "sideShape" -p "side"; -k off ".v" no; ".rnd" no; ".coi" 100; ".ow" 30; ".imn" -type "string" "side"; ".den" -type "string" "side_depth"; ".man" -type "string" "side_mask"; ".hc" -type "string" "viewSet -s %camera"; 57 setAttr createNode createNode setAttr setAttr setAttr setAttr setAttr setAttr setAttr setAttr createNode setAttr setAttr createNode createNode setAttr setAttr setAttr setAttr setAttr setAttr setAttr setAttr createNode createNode setAttr setAttr setAttr setAttr setAttr setAttr setAttr setAttr createNode createNode setAttr setAttr setAttr setAttr setAttr setAttr setAttr setAttr createNode createNode setAttr setAttr setAttr setAttr setAttr setAttr setAttr setAttr createNode createNode setAttr setAttr setAttr setAttr setAttr setAttr setAttr ".o" yes; transform -n "sphere"; nurbsSurface -n "sphereShape" -p "sphere"; -k off ".v"; ".vir" yes; ".vif" yes; ".tw" yes; ".dvu" 3; ".dvv" 3; ".cpr" 15; ".cps" 4; transform -n "cube"; ".t" -type "double3" 8 0 0 ; ".s" -type "double3" 10 5 10 ; transform -n "topnurbsCube1" -p "cube"; nurbsSurface -n "topnurbsCubeShape1" -p "topnurbsCube1"; -k off ".v"; ".vir" yes; ".vif" yes; ".tw" yes; ".dvu" 3; ".dvv" 3; ".cpr" 15; ".cps" 4; transform -n "bottomnurbsCube1" -p "cube"; nurbsSurface -n "bottomnurbsCubeShape1" -p "bottomnurbsCube1"; -k off ".v"; ".vir" yes; ".vif" yes; ".tw" yes; ".dvu" 3; ".dvv" 3; ".cpr" 15; ".cps" 4; transform -n "leftnurbsCube1" -p "cube"; nurbsSurface -n "leftnurbsCubeShape1" -p "leftnurbsCube1"; -k off ".v"; ".vir" yes; ".vif" yes; ".tw" yes; ".dvu" 3; ".dvv" 3; ".cpr" 15; ".cps" 4; transform -n "rightnurbsCube1" -p "cube"; nurbsSurface -n "rightnurbsCubeShape1" -p "rightnurbsCube1"; -k off ".v"; ".vir" yes; ".vif" yes; ".tw" yes; ".dvu" 3; ".dvv" 3; ".cpr" 15; ".cps" 4; transform -n "frontnurbsCube1" -p "cube"; nurbsSurface -n "frontnurbsCubeShape1" -p "frontnurbsCube1"; -k off ".v"; ".vir" yes; ".vif" yes; ".tw" yes; ".dvu" 3; ".dvv" 3; ".cpr" 15; 58 setAttr ".cps" 4; createNode transform -n "backnurbsCube1" -p "cube"; createNode nurbsSurface -n "backnurbsCubeShape1" -p "backnurbsCube1"; setAttr -k off ".v"; setAttr ".vir" yes; setAttr ".vif" yes; setAttr ".tw" yes; setAttr ".dvu" 3; setAttr ".dvv" 3; setAttr ".cpr" 15; setAttr ".cps" 4; createNode lightLinker -n "lightLinker1"; createNode displayLayerManager -n"layerManager"; createNodedisplayLayer -n "defaultLayer"; createNode renderLayerManager -n "renderLayerManager"; createNoderenderLayer -n "defaultRenderLayer"; createNode renderLayer -s -n "globalRender"; createNode makeNurbSphere -n "makeNurbSphere1"; setAttr ".r" 8; setAttr ".s" 20; setAttr ".nsp" 20; createNode makeNurbCube -n "makeNurbCube1"; setAttr ".ax" -type "double3" 0 1 0 ; setAttr ".u" 5; setAttr ".v" 5; createNode script -n "uiConfigurationScriptNode"; setAttr ".b" -type "string" ( setAttr ".st" 3; createNode script -n "animationScriptNode"; setAttr ".a" -type "string" ( "playbackOptions -min 1 -max 150 -animationStartTime 1 -animationEndTime 160;"); select -ne :time1; setAttr ".o" 1; select -ne :renderPartition; setAttr -s 2 ".st"; select -ne :renderGlobalsList1; select -ne :defaultShaderList1; setAttr -s 2 ".s"; select -ne :postProcessList1; setAttr -s 2 ".p"; select -ne :lightList1; select -ne :initialShadingGroup; setAttr -s 7 ".dsm"; setAttr ".ro" yes; select -ne :initialParticleSE; setAttr ".ro" yes; select -ne :defaultRenderGlobals; setAttr ".fs" 1; setAttr ".ef" 10; select -ne :hyperGraphLayout; setAttr ".cch" no; setAttr ".ihi" 2; setAttr ".nds" 0; setAttr ".img" -type "string" ""; setAttr ".ims" 1; connectAttr "makeNurbSphere1.os" "sphereShape.cr"; connectAttr "makeNurbCube1.os" "topnurbsCubeShape1.cr"; connectAttr "makeNurbCube1.os1" "bottomnurbsCubeShape1.cr"; connectAttr "makeNurbCube1.os2""leftnurbsCubeShape1.cr"; connectAttr "makeNurbCube1.os3" "rightnurbsCubeShape1.cr"; connectAttr "makeNurbCube1.os4" "frontnurbsCubeShape1.cr"; connectAttr "makeNurbCube1.os5" "backnurbsCubeShape1.cr"; connectAttr ":defaultLightSet.msg" "lightLinker1.lnk[0].llnk"; 59 connectAttr connectAttr connectAttr connectAttr connectAttr connectAttr connectAttr connectAttr connectAttr connectAttr connectAttr connectAttr connectAttr ":initialShadingGroup.msg" "lightLinker1.lnk[0].olnk"; ":defaultLightSet.msg" "lightLinker1.lnk[1].llnk"; ":initialParticleSE.msg" "lightLinker1.lnk[1].olnk"; "layerManager.dli[0]" "defaultLayer.id"; "renderLayerManager.rlmi[0]" "defaultRenderLayer.rlid"; "lightLinker1.msg" ":lightList1.ln" -na; "sphereShape.iog" ":initialShadingGroup.dsm" -na; "topnurbsCubeShape1.iog" ":initialShadingGroup.dsm" -na; "bottomnurbsCubeShape1.iog" ":initialShadingGroup.dsm" -na; "leftnurbsCubeShape1.iog" ":initialShadingGroup.dsm" -na; "rightnurbsCubeShape1.iog" ":initialShadingGroup.dsm" -na; "frontnurbsCubeShape1.iog" ":initialShadingGroup.dsm" -na; "backnurbsCubeShape1.iog" ":initialShadingGroup.dsm" -na; 60 Appendix B Rendered images from the modeled room used in this study follow. Figure 29 is an example of the scene rendered with bad geometry and bad illuimination. Figure 30 is an example of the scene rendered with bad geometry and good illuimination. Figure 31 is an example of the scene rendered with good geometry and bad illuimination. Figure 32 is an example of the scene rendered with good geometry and good illuimination. 61 Figure 29: An example of the modeled room. This example has bad geometry and bad illumination. 62 Figure 30: An example of the modeled room. This example has bad geometry and good illumination. 63 Figure 31: An example of the modeled room. This example has good geometry and bad illumination. 64 Figure 32: An example of the modeled room. This example has good geometry and good illumination. 65 References [1] W. Barfield, D. Zeltzer, T. Sheridan, and M. Slater. Presence and performance within virtual environments. In W. Barfield and T. A. Furness, editors, Virtual Environments and Advanced Interface Design, pages 474– 513. New York: Oxford University, 3 edition, 1995. [2] A. C. Beal and J. M. Loomis. Absolute motion parallax weakly determines visual scale in real and virtual environments. Proc. SPIE, 2411:288–297, 1995. [3] G. P. Bingham. Perceiving the size of trees: Biological form and the horizon ratio. Perception and Psychophysics, 54:485–495, 1993a. [4] G. P. Bingham. Perceiving the size of trees: Form as information about scale. Journal of Experimental Psychology: Human Perception and Performance, 19:1139–1161, 1993b. [5] Ron Brinkmann. The Art And Science of Digital Compositing, chapter 1. Academic Press and Morgan Kaufmann, 1999. [6] Collins Cobuild. Collins Cobuild English Language Dictionary. HarperCollins Publishers, 1987. [7] M. Cook. Judgement of distance on a plane surface. Perception and Psychophysics, 23:85–90, 1978. [8] James Cutting and Peter Vishton. Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. In William Epstein and Sheena Rogers, editors, Perception of Space and Motion, Handbook of Perception and Cognition, chapter 3, pages 69–117. New York: Academic Press, 2 edition, 1995. [9] Melissa W. Dixon, Maryjane Wraga, Dennis R. Proffitt, and George C. Williams. Eye height scaling of absolute size in immersive and nonimmersive displays. Journal of Experimental Psychology: Human Perception and Performance, 26(2):582–593, 2000. [10] D. W. Eby and M. L. Braunstein. The perceptual flattening of threedimensional scenes enclosed by a frame. Perception, 24:981–993, 1995. [11] W. Epstein. The known-size apparent-distance hypothesis. American Journal of Psychology, 74:333–346, 1965. 66 [12] David Falk, Dieter Brill, and David Stork. Seeing the Light: Optics in Nature, Photography, Color, Vision, and Holography, pages 207–238. John Wiley and Sons, New York, 1986. [13] James Foley, Andries van Dam, Steven Feiner, and John Hughes. Computer Graphics, Principles and Practice. Addison-Wesley, 2 edition, 1990. [14] W. C. Gogel and R. E. Newton. An apparatus for the indirect measurement of perceived distance. Perceptual and Motor Skills, 43:295–302, 1976. [15] C. Heeter. Being there: The subjective experience of presence. Presence: Teleoperators and Virtual Environments, 1:262–271, 1992. [16] R. Held and N. I. Durlach. Telepresence. Presence: Teleoperators and Virtual Environments, 1:109–112, 1992. [17] F. S. Hill, Jr. Computer Graphics Using Open GL. Prentice Hall, 2 edition, 2000. [18] W. H. Ittelson. Size as a cue to distance. American Journal of Psychology, 64:54–67, 1951. [19] Jack M. Loomis. Distal attribution and presence. Presence: Teleoperators and Virtual Environments, 1:113–119, 1992. [20] Jack M. Loomis, Naofumi Fujita, Jose A. Da Silva, and Sergio S. Fukusima. Visual space perception and visually directed action. Journal of Experimental Psychology: Human Perception and Performance, 18(4):906–921, 1992. [21] Jack M. Loomis and Joshua M. Knapp. Visual perception of egocentric distance in real and virtual environments. In L. J. Hettinger and M. W. Haas, editors, Virtual and Adaptive Environments, Handbook of Perception and Cognition. Hillsdale NJ: Erlbaum, 2000. [22] Jack M. Loomis, Jose A. Da Silva, John W. Philbeck, and Sergio S. Fukusima. Visual perception of location and distance. Current Directions in Psychological Science, 5(3):72–77, June 1996. [23] G. W. Meyer, H. E. Rushmeier, M. F. Cohen, D. P. Greenberg, and K. E. Torrance. An experimental evaluation of computer graphics imagery. ACM Transactions on Graphics, 5:30–50, January 1986. [24] Thomas Moller and Eric Haines. Real-Time Rendering, chapter 1-6, pages 1–190. A K Peters, Natick, Massachusetts, 1999. 67 [25] Michael O’Rourke. Three-Dimensional Computer Animation: Modeling, Rendering, and Animating with 3D Computer Graphics, chapter 1-2. W. W. Norton, 1998. [26] Stephen E. Palmer. Vision Science: Photons to Phenomenology, chapter 1, 5. MIT Press, 1999. [27] A. J. Peetham, P. Shirley, and B. Smits. A practial analytical model for daylight. Computer Graphics (SIGGRAPH ’99 Proceedings, August 1999. [28] M. H. Pirenne. Optics, Painting, and Photography. Cambridge: Cambridge University Press, 1970. [29] J. J. Rieser, D. H. Ashmead, C. R. Taylor, and G. A. Youngquist. Visual perception and the guidance of locomotion without vision to previously seen targets. Perception, 19:675–689, 1990. [30] John J. Rieser, Jr. Herbert L. Pick, Daniel H. Ashmead, and Anne E. Garing. Calibration of human locomotion and models of perceptual-motor organization. Journal of Experimental Psychology: Human Perception and Performance, 21(3):480–497, 1995. [31] J. C. Rodger and R. A. Browse. Choosing rendering parameters for effective communication of 3d shape. [32] Sheena Rogers. The horizon ratio relation is information for relative size in pictures. Perception and Psychophysics, 58:142–152, 1996. [33] Philip Servos. Distance estimation in visual and visuomotor systems. Experimental Brain Research, 130:35–47, 2000. [34] J. A. Da Silva. Scales of perceived egocentric distance in a large open field: Comparison of three psychophysical methods. American Journal of Psychology, 98:119–144, 1985. [35] M. Slater, M. Usoh, and A. Steed. Depth of presence in virtual environments. Presence: Teleoperators and Virtual Environments, 3:130–144, 1994. [36] P. C. Smith and O. W. Smith. Ball throwing responses to photographically portrayed targets. Journal of Experimental Psychology, 62:223–233, 1961. [37] Anne Morgan Spalter. The Computer in the Visual Arts. Addison-Wesley, 1999. 68 [38] Bridget Mintz Testa. Graphics on the internet: Part 1, a brief history. Computer Graphics World, October 2000. [39] William Thompson. Project summary, 1999. NSF Proposal. [40] Leonard R. Wanger, James A. Ferwerda, and Donald P. Greenberg. Perceiving spatial relationships in computer-generated images. IEEE Computer Graphics and Applications, 11:44–58, May 1992. [41] J. P. Wann, S. Rushton, and M. Mon-Williams. Natural problems for stereoscopic depth perception in virtual environments. Vision Research, 35(19):2731–2736, 1995. [42] Alan Watt. 3D Computer Graphics. Addison-Wesley, 2000. [43] B. G. Witmer and W. J. Sadowski. Nonvisually guided locomotion to a previously viewed target in real and virtual environments. Human Factors, 40(5):478–488, 1998. [44] M. A. Wraga. Using eye height in different postures to scale the heights of objects. Journal of Experimental Psychology: Human Perception and Performance, 25:518–530, 1999a. [45] M. A. Wraga. The role of eye height in perceiving affordances and object dimensions. Perception and Psychophysics, 61:490–507, 1999b. [46] M. A. Wraga and D. R. Proffitt. Mapping the zone of eye-height utility for seated and standing observers. Manuscript Submitted for Publication. [47] T. L. Yang, M. W. Dixon, and D. R. Proffitt. Seeing big things: Overestmation of heights is greater for real objects than for object in pictures. Perception, 28:445–467, 1999.