Interaction Without Gesture or Speech – A Gaze Controlled AR System Susanna Nilsson Linköping University, SE-581 83 Linköping, Sweden susni@ida.liu.se Abstract Even though gaze control is relatively well known as a human computer interaction method, gaze control is not a widely used technique, partly due to several usability issues. However, there are applications that can be improved with the use of eye gaze control. Combining concepts from the Augmented Reality (AR) research domain with concepts from gaze recognition and input gives possibilities to create quick and easy interaction. In this poster, one such application is described - an AR system for instructional use, with a gaze attentive user interface. which in this system is implemented by placing a Near Infra Red (NIR) illumination source next to the gaze recognition cameras optical axis. The camera detects the pupil and it’s reflections by filtering and thresholding the image information. The position of the pupil and the positions of the reflections on the cornea caused by the NIR illumination are calculated. Key words: Augmented Reality, Mixed Reality, Gaze Control, Gaze Interaction, Gaze Awareness 1. Introduction In Augmented Reality (AR) systems, real and virtual objects are merged and aligned relative a real environment, and presented in the field of view of a user. The user can either be a passive recipient of information or an active part of an interaction process. AR applications that give hierarchical instructions to users often require some feedback from the user in order to move to the next step in the instructions. This feedback should be possible to give quickly and without interruption from the ongoing task. AR systems that use gestures and speech for interaction have been developed [1, 2, 3]. However there are examples of situations where speech and gesture may not be appropriate. This poster presents an AR system with an integrated gaze tracker, allowing quick feedback from the user to the system, as well as analysis of the users gaze behaviour. 2. Description of the system The AR system described here, uses a hybrid marker tracking technology, based on ARToolKit, ARToolKit Plus and ARTag, but with the addition of a 3DOF inertial tracker (InterSense and Xsens) [4]. Software has been developed with the aim to permit an application developer to define applications and scenario files in XML syntax. To allow gaze recognition two black/white cameras have been integrated into the helmet-mounteddisplay (see figure 1). The displays have a resolution of 800 x 600 pixels and a field of view of 37 x 28 degrees. The gaze tracking is based the dark pupil principle, Figure1. To the left a head mounted gaze controlled AR system, top right a gaze pattern of a user working with the gaze interaction dialog seen in figure 5. The bottom right image shows the gaze trackers view of the user’s eye and the NIR reflection in the pupil [6]. Four NIR light sources are used as can be seen in the bottom right image in figure 6. However, only two reflections are needed for the calculation of the gaze. The system can choose between these four different reflections which increases the robustness in the system. Interaction dialog In the developed system, eye gaze interaction can be restricted both temporally and spatially - certain parts of the display will have the function, and only when there is a need for gaze interaction. The application designer can define the layout of the gaze control dialog areas, as well as gaze action specifications and dwell times, directly in the XML file without any changes made to the general AR system. The gaze dialog area positions can either be fixed in certain regions of the display, or dynamic relative to the detected marker position which allows flexible design of the application. The interaction area can either be invisible, transparent or opaque. Placing the interaction dialogue in the lower part of the display is more gaze friendly than using the upper part of the display, but the problem of accidentally activating the gaze interaction dialogue (the Midas touch problem) is more prominent than with interaction areas in the top of the display (see figure 2). When the user looks at one of the areas, this is indicated by a change in appearance (color and image), so that the user receives feedback, acknowledging that the system ‘knows’ that s/he looks at the area. Figure 2a. To the left, a static gaze interaction dialogue in the lower area of the users field of view. The dialogue is only visible when the user looks in the nearby region of the no/yes boxes. 2b. to the right, the “yes” button changes appearance to inform the user that it has been activated. If the user does not move the gaze from the area within a set amount of time, the AR system will interpret the response as a “yes” response. In the application described, the point of gaze fixation is not visible when it is positioned outside of the active interaction areas. It is possible to show the gaze fixation point at all times if desirable. Interaction feedback is given to the user in terms of changed image and color of the “button” (see figure 2b). 3. Preliminary pilot user study of the system Three different gaze dialogues have been informally tested in a laboratory setting. Only three participants took part and the main goal of the pilot study was to test the system robustness and functionality. Two AR applications previously used in other user studies were implemented and adapted to gaze control. First the users tested the static, upper interaction dialogue design. A few days later the users tested the dynamic design alternative, and on the third occasion they tested the design with static dialogue in the lower region of the display (figure 2). Results and discussion of the preliminary tests During the first trial it was found that the users tended to turn and tilt their head so that the focus of attention always is in the centre of the field of view. Looking at things in the upper section of the display was therefore a conscious effort and not part of casual gazing. It was reported to be strenuous to interact with this gaze angle, and the users preferred the lower static interaction dialogue. There were however some problematic issues with the lower interaction dialogue as well – when the users tried to activate the dialogue they tended to tilt their head, thus often losing camera contact with the marker, which in turn lead to the loss of the visual instructions. The static dialogues are hence not ideal since they are not adapted to normal human behavior – when humans want to focus their gaze on an object, virtual or real, they tend to place it in the central field of view. The dynamic dialogue does not have the problem of the users tilting and moving their heads. However one important and expected problem was found. Although the interaction dialogue was only visible when the system required an input from the user, it still sometimes covered too much of the user field of view. This caused the participants to experience it as being cluttered. This could be addressed with a redesign of the interaction dialogue in future development of the system. In general, the participants were positive to the concept of gaze control in these types of applications, but the system was experienced as clumsy and not entirely stabile since they sometimes lost the virtual information when the marker was not detected by the camera. These are problems that can be addressed by further refining the AR system. The clumsiness of the system is harder to address in the technical solution presented here. Video-see-through AR and gaze control requires cameras (two at a minimum) and these are too heavy to be comfortably placed on an HMD. For gaze controlled video see-through AR a helmet-mounted solution is currently the best option. 4. Conclusions In this poster we have described an AR system with a gaze attentive user interface. The preliminary pilot studiy indicate that although there are limitations to the proposed AR system, it is functional and can be used for applications such as the ones described in this paper. There is however much need for further improvements and user studies. Acknowledgements The gaze-controlled AR system was built in close cooperation with the Swedish Defence Research Agency, and the project was funded by the Swedish Defence Material Administration. References 1. Billinghurst. M., Kato, H. ans Poupyrev, I. “The Magicbook: A Transitional AR Interface.” Computer Graphics vol 25 no 5 p 745-753, October 2001 2. Gandy. M. MacIntyre, P. Presti, P. Dpw, J. Bolter, B. Yarbrough, and O’Rear, N. “AR Karaoke: Acting in your favorite scenes.” in proceedings of the fourth IEEE and ACM International conference on Mixed and Augmented Reality (ISMAR), Oct 5-8, Wienna Austria 2005 3. Henrysson, A. Ollila, M. and Billinghurst, M. “Mobile Phone Based Augmented Reality.” in Haller, M. Billinghurst, M. and Thomas, B. Emerging technologies of Augmetented Reality. Interfaces and design. London: Idea Group Publishing 2007 4. Nilsson, S. and Johansson, B. “A Cognitive Systems Engineering Perspective on the Design of Mixed Reality Systems.” Proceedings of the 13th European Conference on Cognitive Ergonomics September 20-22, Zürich 2006