The GripSee Project: A Gesture-controlled Robot for Learning and Using Object Representations Mark Becker, Efthimia Kefalea, Eric Mael, Christoph von der Malsburg, Mike Pagel, Jochen Triesch, Jan C. Vorbruggen, Rolf P. Wurtz, and Stefan Zadel Institut fur Neuroinformatik, Ruhr-Universitat Bochum, D{44780 Bochum, Germany http://www.neuroinformatik.ruhr-uni-bochum.de/ini/ This talk introduces GripSee, our robot platform for gesture and object recognition, object manipulation and learning. As modeling visual perception invariably means modeling human visual perception we have chosen to let the robot components, specically the arrangement and kinematics of camera head and robot manipulator, closely resemble the human eye, head and arm arrangement, including kinematic redundancy to enable obstacle avoidance and leave some choice in grasping and manipulation, with the additional advantage that the arm can avoid occluding its own workspace. The camera system imitates human eyes in supplying a stereo system with a high-resolution, color sensitive fovea and a wide-angle, low resolution, grey value periphery camera for motion detection and saccade planning. Our approach to robot behavior might be called semi-autonomy: the robot must dispose of a repertoire of skills that are carried out autonomously, byt the actual control of overall behavior must be left to a human operator. Although classical techniques are readily available for this control, we have chosen to implement a gesture interface which is suited for use by technically untrained people, a decision dictated by the quest to provide a prototype for a service robot. All individual robot components are controlled by autonomous agents that communicate with each other and with other agents that implement specic skills, basic operations that can be composed in various ways to implement specic tasks. Central control is maintained by a separate agent that denes task timelines and contains a user interface. On the implementation level, each agent basically is one executable under Unix or QNX, respectively (see gure 1 for an overview of all agents). A central skill is xation, i.e., directing the axes of both eyes towards a point of interest. It is calibrated by an autonomous procedure [3] and serves as the principal transformation for the extraction of 3D information because the three-dimensional position of the xated is region can be estimated by simple triangulation from the pan, tilt and vergence angles of the camera head. For the sake of brevity, the skills are only listed as part of a standard example behavior, but they can be recombined in a variety of ways. That behavior is as follows. Initial situation: GripSee stands before a table with various known objects and waits for a human hand to transmit instructions. Hand tracking and xation: The hand is tracked to follow the pointing movement. After the hand movement stops, the hand is xated. Hand posture analysis is executed on the distortion-reduced small central region of interest, and the type of posture is determined. Object Fixation: It is expected that the hand points at an object, thus the xation point is lowered about 10 cm. Now, a simple localization algorithm detects the object, which is then xated. After three or four iterations, the object is xated well, and a new region of interest is dened. Object recognition determines the type of object, its exact image positions and sizes and its orientation. Fixation: The recognition module yields a more precise object position than the preceding xation. That precision is crucial for reliable grasping, thus the object is xated once more. Neural networks are used for xation and triangulation in order to cope with the nonlinearities of the camera head. Grip planning: A grip suitable for the object and the hand posture is selected from a database and transformed into the object's position. Cameras / Framegrabber Acquisition Server Hand Tracking Client Gesture Rec Client Object Rec Client Left Image Object Rec Client Right Image NEUROS Control Server Head Server Camera Head Robot Control Server Grip Finder Client Robot Arm Fig. 1. Current structure of server and client processes, each of which is an implementation of an autonomous agent. Another server for tactile information is to be included soon. Trajectory planning and grip execution: A trajectory is planned, arm and gripper are moved to the object, and nally the object is grasped. Another trajectory transfers it to a default position conveniently located in front of the cameras for possible further inspection. Hand tracking, hand xation and position analysis: GripSee again watches the eld of view for the operator's hand to reappear, this time pointing with a single dened gesture to the three-dimensional position of the place to put the object down. Trajectory planning, execution, and release: A trajectory is planned, the arm moves the object to the desired location and the gripper releases it. In the absence of further clues, the 3D-precision of that release is an immediate measure for the quality of hand localization. Additional functionality, which is currently being worked on, includes the construction and incorporation of tactile sensors and the semi-autonomous learning of new objects. More detailed information can be found on our website and in the articles [1, 2]. This work is funded by grants from the DFG (Graduiertenkolleg KOGNET) and the German Federal Minister for Education and Research (01 IN 504 E9). We thank Michael Potzsch, Michael Rinne, Bernd Fritzke, Herbert Janen, Percy Dahm, Rainer Menzner, Thomas Bergener, and Carsten Bruckho for invaluable help to get various aspects to the point of actual functioning and to Carsten Brauer for lming the video. References 1. M. Becker, E. Kefalea, E. Mael, C. v. d. Malsburg, M. Pagel, J. Triesch, J. Vorbruggen, and S. Zadel. GripSee: a robot for visually-guided grasping. In Proceedings of ICANN, Skovde, Sweden, 2-4 September, 1998, 1998. In press. 2. M. Becker, E. Kefalea, E. Mael, C. von der Malsburg, M. Pagel, J. Triesch, J. C. Vorbruggen, R. P. Wurtz, and S. Zadel. GripSee: A gesture-controlled robot for object perception and manipulation. Autonomous Robots, 1998. Submitted. 3. M. Pagel, E. Mael, and C. von der Malsburg. Self calibration of the xation movement of a stereo camera head. Autonomous Robots: Special Issue on Learning in Autonomous Robots, 1998. In press.