LOOKING AT YOU: COMPUTER GENERATED GRAPHICAL – HUMAN INTERACTION Giselle Isner, isner@cis.fordham.edu Computer and Information Science Department Fordham University NY, NY 10023 Advisor: Dr. Damian Lyons INTRODUCTION Currently computers do not interact with their users in the same manner people interact with each other (speaking, gesturing, body language, etc.), but it has been argued that this would facilitate better human-computer interaction [3]. We have already heard of robots which behave like living dogs (Sony’s AIBO), or who can aid in cleaning up a home (iRobot’s iRoomba). We already interact with our computers by clicking on a mouse, or entering commands from the keyboard, but how much easier it would be if we could simply dictate things to our computer, or even point to the screen, and get a response as quickly as if we were speaking to a human. There is already a limited technological basis for this kind of interface: Vivid’s Mandala system, which can be found at museums and amusement parks, allows users and computer generated graphics to interact with one another using a video camera for input and green-screen technology. Our objective was to construct a pair of graphical eyes and have these interact with people by following them, where the tracking information was generated by a computer vision system [2]. Investigating the feasibility of this construction is a first step in determining whether human-computer interaction is facilitated by this kind of interface. APPROACH Using Inventor [1], an object-oriented 3D graphical toolkit, and C++, we were able to create a pair of 3D graphical eyes displayed on a computer screen. Initially, the eyes were given their own behaviors, autonomous vertical and horizontal movements. This created the illusion that the eyes were alive. It was accomplished through a timer routine which was called every 1/10th of a second. This routine caused the eyes to rotate to a certain value on either the x or y axis, then rotate to the corresponding negative value. Our next step was to make the pair of eyes converge and diverge at a certain angle at a certain distance in order to have the eyes “focus” on a moving object at any distance and at any location within its field of view. This would give a user the illusion that the eyes were looking at him. To complete the illusion, the graphical eyes needed to be provided with information about the location of the closest person to the computer display on a realtime basis [2]. A computer vision system connected to an overhead camera extracted the largest region of motion in the camera view and sent the image coordinates (u, v) of the centroid region to the graphical eyes module. In this camera configuration the distance of the user from the display is proportional to the vertical (v) image coordinate. This relies on the assumption that the floor is a plane, the user remains on the floor, and the y axis of the image is parallel to the floor. To provide a common frame of reference for the graphical and vision systems, we created a C++ class, Linearmap, to linearly relate the field of vision of the camera to the field of vision of the eyes. Linearmap mapped (u,v) image coordinates to (x, y) inventor coordinates. The values of the slope and offset parameters in the linear mapping routines were chosen empirically to calibrate the motion of the eyes with actual target movements. As a final touch, we wanted the eyes to return to their independent behaviors. When the same (x,y) values were generated by the program ten times in a row, signaling a halt in the motion of the object, the eyes switched to their own independent movements. CONCLUSION Inventor simplified the creation of the 3D graphical eyes. A single eye was created in an Inventor text file and the file was read in twice (one for the left eye and one for the right eye) into the main program, written in C++. However, acquiring sufficient information to Inventor routines and variables was time-consuming. Furthermore, certain routines in Inventor (timer routines) were not very compatible with aspects of object-oriented design. These discrepancies were fixed fairly easily, if not elegantly, with the addition of a few extra functions. The eyes worked very well in their ability to follow motion. They immediately followed whatever motion they sensed and remained stationary when the motion stopped. However, there were some problems. When the user walked up to the screen where the eyes were displayed and stood above it, the user appeared at the top of the camera’s field of view, and therefore the eyes looked up at the user. However, the user could bend down right in front of the screen, yet because his position in the camera’s field of view remained the same, the eyes continued looking up. In addition, the object’s motion was tracked through the center of the object; thus the eyes looked primarily at the center of the user instead of at his face. In order to remedy this, we had to choose between the eyes accurately following the user when he was up close to the screen or following him when he was further away from the screen. We ultimately decided that it was best for the eyes to follow the user with more precision when the user was closer to the screen. One interesting observation that was made was that although the eyes were able to follow motion quite accurately, the eyes did not appear to be looking directly at the user at all times. However, from the perspective of onlookers watching the interaction, the eyes did appear to be looking directly at the user. Perhaps this is why the magic-mirror effect in augmented reality (the user stands in front of a green screen and appears on a computer screen with graphical entities appearing around him) is more appealing to users (e.g., Mandala or MIT’s ALIVE). In this instance, the user, in effect, becomes the onlooker watching the graphics interact with another. There were some features we developed which we chose not to employ. We gave the eyes the ability to blink while retaining horizontal and vertical movement. We felt, however, that the blinking would distract the audience from the eyes’ other behaviors. FUTURE Determining when someone is looking at us appears to be a basic and precise human skill, and only when we receive direct eye contact with another do we feel a connection. Thus, in order for a user to feel the same type of interaction with these graphical eyes, it is important for them to look directly at the eyes of an individual, even if the user is at a distance from the screen. Our future work will include experimenting with the eyes converging and diverging at varying distances, and determing the correct angle needed for them to appear tolook directly at the user. This could be done using multiple cameras, so that the input for the eyes would not be dependent on just one field of view, but switch to another if the camera detects specific types of motions of changes in image coordinates, such as someone bending down in front of the eyes. REFERENCES: [1] [2] [3] The Inventor Mentor. Wernecke, Josie. Open Inventor Architecture Group. Addison-Wesley, 1994. Lyons, D., Pelletier, D., and Knapp, D. Multi-modal Interactive Advertising. 1998 Perpetual User Interface Workshop, San Francisco, California. Nov. 1998. Emerging Paradigms for User Interaction, in: Usability Engineering. M. Rosson & J. Carrol, Morgen-Kaufmann 2002.