On-line Estimation of Visual-Motor Models using Active Vision Martin Jägersand, Randal Nelson Department of Computer Science, University of Rochester, Rochester, NY 14627 fjag,nelsong@cs.rochester.edu http://www.cs.rochester.edu/u/fjag,nelsong Abstract We present a novel approach for combined visual model acqusition and agent control. The approach differs from previous work in that a full coupled Jacobian is estimated on-line without any prior models, or the introduction of special calibration movements. We show how the estimated models can be used for visual space robot task specification, planning and control. In the other direction the same type of models can be used for view synthesis. 1 Introduction In an active or behavioral vision system the acquisition of visual information is not an independent open loop process, but instead depends on the active agent’s interaction with the world. Also the information need not represent an abstract, high level model of the world, but instead is highly task specific, and represented in a form which facilitates particular operations. Visual servoing, when supplemented with on-line visual model estimation, fits into the active vision paradigm. Results with visual servoing and varying degrees of model adaption have been presented for robot arms [1, 3, 2, 5, 6, 9, 13, 15, 16, 18]1 . Visual models suitable for specifying visual alignments have also been studied [19, 8, 7]. However, the focus of this work has been the movement (servoing) of the robot, not on on-line estimation of high DOF visual-motor models. In this paper we focus on the model exploration aspect and present an active vision technique, having interacting action (control), visual sensing and model building modules, which allows the simultaneous visual-motor model estimation and control (visual servoing) of a variety of robotic active agents. We place the following requirements on the active visual model acquisition: It should be general enough to estimate an arbitrary, but smooth visual motor model, without assuming any particular viewing geometry, camera configuration or manipulator kinematics. It should be efficient in that it should be usable for control after just having observed a few agent movements, and after that it should be able to adapt to a changing visual motor transfer function. It should not interfere with the visual motor task. In normal operation, no extra “calibration movements” should be needed for the model estimation. Supported by ARPA subcontract Z8440902 through University of Maryland 1 For a review of this work we direct the reader to [10] or [15]. A combined model acquisition and control approach has many advantages. In addition to permitting uncalibrated visual servo control, the on-line estimated models are useful for (1) prediction and constraining search in visual tracking [13, 2], (2) perfomring local coordinate transformations between manipulator (joint), world, and visual frames[9, 13], and (3) synthesizing views from a basis of agent poses [11]. We have found such an adaptive approach to be particularly helpful in robot arm manipulation when carrying out difficult tasks, such as manipulation of flexible material [9, 13], or performing large rotations for exploring object shape [12]. For a dextrous multifinger robot hand, such as the Utah/MIT hand, the fully adaptive approach is appealing because dextrous manipulation of a grasped object is much harder to model accurately than a typical robot arm system, where the object is rigidly attached to the end effector. In this paper we present four main contributions: 1. A Broyden-type Jacobian estimator allows the online estimation of a full coupled visual motor Jacobian, allowing control both without initial models or estimates, and over significantly non-linear portions of the transfer function. 2. Visual space trajectory planning and control is used to ensure convergence to a global, rather than a local minimum. 3. A trust region method is used to give convergence for difficult visual-motor transfer functions, and makes the system more general, without the manual tuning of step discretization parameters. 4. We present an experimental evaluation of visual-motor model acqusition and visual feedback control of the Utah/MIT hand, and PUMA robot arms. 2 Viewing model An active vision agent has control over its actions, and can observe the results of an action via the changes in visual appearance. We study a robot agent, fig. 1, in an unstructured environment. The robot action reference frame is joint space, described as desired joint angles2 , x = (x1 : : :xn)T , and their time derivatives ẋ; ẍ. The changes in visual appearance are recorded in a perception or feature vector y = (y1 : : :ym )T . Visual features can be drawn from a large class of visual measurements[1, 9], but we have found that the ones which can be represented as points positions or point vectors in camera space are suitable[10]. We track features such as boundary discontinuities (lines,corners) and surface markings. Redundant visual perceptions (m n) are desirable as they are used to constrain the raw visual sensory information. 2 Vectors are written bold, scalars plain and matrices capitalized. Images Features y Robot control x Figure 1: Visual control setup using two cameras. The visual features and the agent’s actions are related by a visual motor transfer function f , satisfying y = f (x). The goal for the control problem is, given current state x0 and y0 , the desired final state in visual space y , to find a motor command, or sequence thereof, ∆xk s.t. f (x0 + k ∆xk ) = y . Alternatively, one can view this problem as the minimization of the functional = 12 (f y )T (f y ). In a traditional, calibrated setting we have to know two functions, camera calibration h and robot kinematics g, either a priori, or through some calibration process. The accuracy at which we can represent objects and have the active agent manipulate them depends on the accuracy of both of these functions (f 1 = g(h())), since typically feedback is only over the agents internal joint values. In our uncalibrated visual servoing the visual-motor transfer function is unknown and at any time k we estimate a first order model of it, f (x) f (xk ) + J (xk )(x xk ). The model is valid around the current system configuration xk , and described by the “image”[15] or visual-motor Jacobian defined as P j (xk ) Jj;i )(xk ) = @f@x ( i (1) The image Jacobian not only relates visual changes to motor changes, as is exploited in visual feedback control but also highly constrains the possible visual changes to the subspace yk+1 <m of yk+1 = J ∆x + yk (Remember m n). Thus the Jacobian J is also a visual model, parameterized in exactly the degrees of freedom our system can change in and useful in a variety of active vision tasks as we will explore in this paper. During the execution of a manipulation task the agent successively learns more and more about the environment in the form of a piecewise linear model on an adaptive size mesh. 3 Visual Measures The purpose of the visual measures is to transform the intensity images into a more compact and descriptive space, while still capturing the pose of the object. 3.1 Feature Based Measures We use real time visual feature trackers of three different kinds to obtain visual information. The Oxford snakes [4] are used to track surface discontinuities. A locally developed template matching tracker tracks multiple local features from surface markings or corners, and for reliability in repeated experiments, or to deal with smooth featureless surfaces, such as the lightbulb in fig. 6, we use special purpose trackers, tracking attached targets or small lights. To improve tracking, viewing geometry models are widely used. For instance the Oxford snake package uses an affine model to constrain the motion of the spline control points to rigid 3 D deformations, and an strain energy model for nonrigid image plane deformations. The convolution trackers uses point velocity for prediction. The image Jacobian provides a new model for tracking. As noted in section 2, the subspace of of possible solutions yk+1 to yk+1 = J ∆x + yk is of size <n rather than <m (and m n). In our active framework the agent also knows along which direction ∆x the system changes. This leaves only a one dimensional search space along ypredicted = J ∆x + yk ; 2 [0; 1] in feature space. Note however that we cannot simply constrain the tracker output to this space. That would take away the innovation term in our model updating, and the system would no longer adapt its model to a changing environment. Instead we use ypredicted to detect outliers (e.g. stemming from occluded features or the tracker tracking the wrong thing) and to constrain the tracking search window to a small “cylinder” around ypredicted . In a future development we intend to use the predictor in a more general Kalman filter. 3.2 Image Intensity Based Filters The idea in the subspace eigen image method is to project the raw intensity values onto a basis of m eigen images. Representations based on this idea have been used for recognition problem “what”, and for indexing locations “where”[20, 22, 21]. There are several ways to choose the eigen images. In our case we will be looking at the same agent, in different poses, and all the images we want to represent are fairly similar. In this case it is advantageous to use a basis specifically designed for the agent. In summary (see also [20, 22]) this can be done by acquiring a (large) number p of size q q images 2 Ik ; k = 1 : : :p; I 2 <(q ) of the agent in different poses. Let the mean image Ī = k=1:::p Ik =p, and for each image in the data set form the difference image ∆Ik = Ik Ī. Form a measurement matrix A = ∆I1 ; ∆I2 ; : : :; ∆Ip , and calculate the covariance matrix C = AAT . The principal components of this data are the eigenvectors to the matrix C . The eigenvectors form an orthogonal basis for the original image set, accounting for the variation in the data in decreasing order, according to the corresponding eigenvalues. A dimensionality reduction is achieved by using instead of all q2 eigenvectors only a subspace of say the first m < q2 eigenvectors. For practical reasons usually m p q2 , and the covariance matrix C will be rank deficient. We can then save computational effort by instead computing L = AT A and using the p eigenvectors V = (vl ) ; l = 1 : : :p of L to form the m first eigenvectors U = (ul ) of A by U = AV1:::m , where V1:::m = (v1; : : :; vm ). After a basis has been acquired, (which for a particular agent typically only needs to be done once), any new image I can be represented in this basis as a perception vector y = U T (I Ī) and a given y can be transformed (with some quality loss) into a corresponding image by the inverse formula I = Ī + U y. P 4 Model Estimation An estimate to the image Jacobian can be obtained by physically executing a set of calibration movements along the basis direction of motor space ([3, 8, 19]) ei and approximate the Jacobian with finite differences: Jˆ(x; d) = (f (x+d1 e1 ) f (x); : : : ; f (x+dn en ) f (x))D 1 (2) where D = diag(d); d 2 <n. However to keep the estimate current for the typically nonlinearly changing environment the agent would need to perform these movements rather frequently. This is not appropriate in most manipulation settings, where the calibration movements would interfere with the task. Partial modeling of the viewing geometry using an ARMAX model and estimating only one or a few parameters (e.g. depth) has also been tried [2, 6]. This however restricts the camera-robot configurations and environments to structured, easy to model settings. We seek instead, an online method, which estimates the Jacobian by simply observing the process, without introducing any extra “calibration” movements. In observing the process we obtain the changes in visual appearance ∆ymeasured corresponding to a particular controller command ∆x. This is essentially a secant approximation of the derivative of f along the direction ∆x. We want to update the Jacobian in such a way as to satisfy our most recent observation (secant condition): ∆ymeasured = Jˆk+1 ∆x The condition above is under-determined; thus a family of updating formulas, called the Broyden hierarchy, is defined as follows: Jˆk+1 = Jˆk + ai Ξi (3) X i Where Ξi are different rank 1 matrices so the rank of the correction term is equal to the number of non-zero ai 3 . We choose an unsymmetric correction term: Jˆk ∆x)∆xT Jˆk+1 = Jˆk + (∆ymeasured ∆xT ∆x (4) This is a first order updating formula in the above hierarchy, and it converges to the Jacobian after n orthogonal moves f∆xk ; k = 1 : : :ng. It trivially satisfies the secant condition, but to see how it works let’s consider a change of basis transformation P to a coordinate system O0 aligned with the last movement so ∆x0 = (∆x1 ; 0; : : :; 0). In this basis the correction term in eq. 4 is zero except for the first column. Thus our updating schema does the minimum change necessary to fulfill the secant condition (minimum change criterion). Let ∆x1 : : : ∆xn be n orthogonal moves, and P1 : : :Pn the transformation matrices as above. Then P2 : : :Pn are just circular permutations of P1 , and the updating schema is identical to the finite differences in eq. 2, in the O10 frame. Note that the estimation in eq. 4 accepts movements along arbitrary directions ∆x and thus needs no additional data other than what is available as a part of the manipulation task we want to solve. In the more general case of a set of nonorthogonal moves f∆xg the Jacobian gets updated along the dimensions spanned by f∆xg. 5 Use of Estimated Visual-Motor Models Estimated visual-motor manipulation models are useful in a variety of settings. We show how to use them for visual servo control without the need of prior models on both robot arms and hands. We also show how to use the inverse visualmotor model to generate images, thus simulating the actions of an articulated active agent. Internally in the system the visual-motor models are useful in a variety of ways. They can serve as visual representations for recognition, as models for filtering, tracking and search reduction. On a higher level, vision based control can lead to user friendly visual robot “programming” or “teaching” interfaces, suitable for use in 3 Updating formulas of rank 1 and 2 are most common. The motivation of popular rank 2 formulas, such as the Davidson, Fletcher, Powell (DFP) [24] is that they preserve symmetric (Ξi i iT ) positive definiteness of Jˆ, and thus guarantees that the search direction ∆x of a quasi-Newton type controller is a descent direction for . The downside is we know no strong convergence results of the DFP method in combination with inexact line searches (which are used in our trust region controller). = unstructured, hard to model environments. In previous work [13] we have shown how a visually guided robot arm can be instructed to solve a variety of hand-eye tasks by: (1) A sequence of images, describing the task at hand. (2) By having a human draw a sketch describing the visual alignments in the task. (3) Using a video image of the work area and having a human operator interactively point out (with the mouse) to the robot what to do (vision based telemanipulation). 5.1 Control The active agent specifies its actions in terms of desired perceptions y . We need a control system capable of turning these goal perceptions into motor actions x. A simple control law, occuring in some form in most visual servoing research (e.g. [3, 19, 16]) is y y = KJ ẋ (5) where K is a gain matrix. In a discrete time system running at a fixed cycle frequency (at or below the 60Hz video frequency), the gain K turns into a step length : xk+1 = xk + ∆x, where ∆x is the (least square) solution to the (over determined) system y yk = J (xk )∆x. Dynamic stability of the robot at this low sampling frequency is achieved by a secondary set of high bandwidth joint feedback controllers. This popular controller however has two major deficiencies. First, even for a convex problem (f T f convex) it is not guaranteed to be convergent[26], and second in the case of a non convex problem it often does not converge at all [26]. Previous work has overcome this problem by making only a single, small distance move within a relatively smooth and well scaled region of f . To solve whole, real tasks this is not a viable solution. We adopt a trust region method [26] similar to the well known Marquart step length () adaption schema to solve the first problem. In the trust region method, the current indicates the distance for which the estimated model is valid. For the second we use a technique known in numerical analysis as “inbäddning”[25] or homotopy methods[23], which involves the generation of intermediate goals or “way points” along the way to the main goal y , transforming a globally non convex problem into a set of locally convex subproblems. Intuitively both these techniques aid to synchronize actions with model acquisition, so that the actions never run too far ahead before the local model has been adapted to the new environment. For details and theoretical properties of these two methods see our control theory paper [14]. 5.2 Visual space task representation and planning To date work in image/feature space visual control has demonstrated low level servoing behaviors, achieving a single visual alignment, eg. [16, 19]. A remaining principal challenge is how to specify complex tasks in visual space, divide them up into subtasks, plan trajectories in visual space, and select different primitive visual servoing behaviors and visual goals. We suggest a semi-automated way of solving these high level problems, providing an image based programming interface, as shown in fig. 2. The user specifies the changes he wishes to bring about in the world by clicking on the objects, and pointing out their desired locations, and alignment features. If this is done interactively we have a very low bandwidth telemanipulation system, isolating the user from the difficult low level control problems. When it is done off line, we have a user friendly programming interface. Traditional task specification and planning is done in a calibrated global Euclidean world coordinate frame. Our uncalibrated system does not have this frame, so task description is fundamentally different. Instead the central frame is composed of the perception vectors y. Goals as well as relevant Work area plate fork cup fork Moving command: moving cup Figure 2: Vision based programming interface. aspects of current system state are specified in these. There should exist a direct correspondence between the perception vectors and the image appearance, so we can think of coding our task in terms of desired or goal images. As time progresses the system description changes on each of the different representation levels, namely raw image, feature image, perception and motor control, see fig. 3. This describes a dynamic system, involving the real world as a part of it. Our systems uses three visual teaching modes: The first is the “point in image” one as shown in fig. 2. In the second the operator shows a sequence of real images, depicting the task. The feature trackers are used extract goal and subgoal perception vectors from the image sequence. In the third the operator symbolically describes the task, i.e. “put the square puzzle piece in the square slot”. The two first require no image interpretation, and we have tried them successfully in several tasks. The third we have tried only in very simple environments. forming a corner on a rectangular box in two cameras, or two poses, gives an Euclidean base P . Using more views improves the accuracy of the base[12]. Often an incomplete base is enough (i.e. to move up we only need to identify a vertical line near the robot in each of the cameras). A manipulation ∆z described in base P is transformed to vision space by ∆y = P ∆z and to motor space by solving ∆y = J (xk )∆x, using the (locally valid) Jacobian estimate obtained during manipulation. We now describe how to construct a mid level primitive from low level servoing behaviors. Many tasks contain subtasks involving a long range transportation move followed by a short range fine manipulation. Our results from evaluating the controller suggests that for the most robust model estimation and control we should control as few DOF’s as possible. To transport an object described by one point we only need 3 DOF. To manipulate a rigid object we need 6. As noted earlier when controlling 3 DOF our algorithm needs no prior models. To bootstrap the 6 DOF control we use the model estimated during the 3 DOF stage. Fig. 4 shows the visual part of an insertion sequence. For the 3 DOF long range transportation one of the features (here white dots) is extracted and tracked in two cameras. For the fine manipulation 14 features are used by tracking 5 points in one camera and 2 in the other. When switching between 3 and 6 DOF mode, the first three columns of the 14 6 DOF Jacobian are filled from the 4 3 DOF Jacobian, the last three with random numbers. J 311; J 312; J 313; ; ; J 321; J 322; J 323; ; ; J 6 = J 3.11; J 312; J 313; ; ; .. J 321; J 322; J 323; ; ; 0 BB B@ 1 CC CA The 6 DOF alignment serves two purposes. It aligns the piece Reach Raw images Feature images Perception vectors Control commands y0 Robot actions 3 DOF long reach x 6 DOF alignment y1 Feature trackers operate in ROI’s Numerical values extracted from features yk 6 DOF move in x1 Adaptive DVFB controller Start Alignment Joint level robot controller xk Visual space plan Start final 6 DOF move in End of visually controlled part Actual movement sequence Figure 4: Left: Planning the different phases of an insert type movement consisting of reaching and fine manipulation movements. Right: Performing the planned insertion. Video 1 Time k Figure 3: The representation levels in a vision based control system in 6 DOF, obtaining the correct initial pose for the 6 DOF fine manipulation. Also during this phase the bootstrapped 6 DOF Jacobian is updated to an accurate estimate, allowing high prescision moves in the later fine manipulation stage. Not all tasks can be defined entirely in terms of visual alignments. For instance during an insertion, an object may become totally occluded. Some manipulations are inherently more suited to description in a world frame (i.e., move the light bulb to straight above the socket) or the joint frame (highly stereotypical motions such the rotations to screw in an object). We use local, object centered world frames, which can be Cartesian, or affine, depending on how much structure is available in the image. For instance identifying the three lines 5.3 View synthesis View synthesis can be done offline by generating a movie sequence of an agent performing a task given a corresponding control command sequence (fxl g), and an a-priori identified visual motor transfer function f . We can do this by inter and extrapolating the learned visual-motor transferfunction. We tried piecewise first and third order spline models for this. More interestingly, the online case is to generate arbitrary simulated views, representing (reasonably small) deviations ∆x from the current state of the real physical agent, while at the same time learning and refining the model used to generate those views. We describe a telemanipulation application, where the teleoperator controls the agent, but for instance long delays, or limited bandwidth between the teleoperation site and the agent prevents immediate and/or full frame rate visual feedback to the operator. Instead we use the view synthesis method to generate the immediate visual feedback, and use the slower real visual feedback to calibrate the model used for the view synthesis. Through observation of the process by the method in section 4 we have an estimate of the current visual motor Jacobian J . Consider one step in an online algorithm. At time i we have current image Ii, perception vector yi, visual motor Jacobian Ji , and current agent state xi in motor space. The teleoperator makes a motor command ∆x, so xi+1 = xi + ∆x. Our model estimate predicts the changes in the perception vector y: yi+1 = yi + Ji∆x (6) The simulated image Îi resulting from the command is generated from yi+1 , as described in section 3.2 and shown to the operator. After some delay d, and possibly at a lower rate than full frame rate, the real image arrives. From it the real measured feature vector yi;measured is extracted, and the innovation term mei = yi;measured yi;simulated is incorporated (added) into the current (yi+d ) perception vector estimate. Now we have ∆yi;measured and ∆x and can update the Jacobian with the model estimation method shown in section 4. The online method thus estimates, and uses successive linear models of the visual motor transfer function, each model valid around a particular state xi . How long a delay d we can tolerate depends on the validity range for our linear model (which can be found on line, see section 5.1), which in turn depends on the visual motor transfer function of our system, and on the visual measures we choose. Experiments4 6 We have evaluated our visual servoing controller by: (1) Testing the repeatability and convergence of positioning. (2) Using it as a component in solving several complex manipulation tasks. These experiments are described in more detail in our technical report [9]. On a PUMA 761 we found that repeatability is 35 % better under visual servo control than under standard joint control. On a worn PUMA 762, with significant backlash, we got a repeatability improvement of 5 times with the visual control. The Utah/MIT dextrous hand has 16 controllable DOF’s. The four fingers form a parallel kinematic chain when grasping an object. Fine manipulation of an object in the hand is much more difficult than with a robot arm [17]. Manipulating a rigid object in 6 DOF using the visual servo control we note a 73 % improvement in repeatability compared to Cartesian space joint feedback control. We have evaluated the model estimation in 3, 6 and 12 controlled DOF. In 3 DOF we can successfully estimate the Jacobian without any prior models while carrying out a manipulation task. In 6 and 12 DOF a good initial estimate is beneficial. The estimate can be bootstrapped as we described in section 5.2. Redundant visual measures are beneficial, as they reduce errors due to tracking and visual goal specification. In a 3 DOF positioning task we tried using between 4 Electronic m-peg videos of the demonstrations in this section are accessible through the Internet WWW. Use the menu in: http://www.cs.rochester.edu/u/jag/PercAct/videos.html to view the videos corresponding to figures marked Video #. m = 4 and 16 measures. Positioning accuracy increased 4 times with m = 16 compared to m = 4. We have tried using the visual servoing in solving several complex, real world tasks, such as playing checkers, setting a table, solving a kids puzzle and changing a light bulb [10]. Visual space programming is different from conventional robot programming in that commands are given in image space rather than world space. This makes user friendly programmer interfaces easy to implement. We have tried having the robot operator: (1) Draw visual sketches of the desired movements. (2) Point out objects and alignments in video images. (3) Show an image sequence depicting the task (see [13]). Insertion point 3 DOF Transportation Angular alignment Final insertion Open loop movements Guarded pick up Figure 5: Solving a kid’s puzzle. Video 2 and 3 In fig.5 the PUMA robot solves a kid’s puzzle under visual control. The operator points in an image, using the computer mouse, directing which piece goes where. The program decomposes this into transportation, alignment and insertion movements, and plans trajectories in visual space (white lines in fig.). For coarse transportation movements the centroid of the pieces are tracked using two stationary cameras. While aligning and fine manipulating, accurate pose information is given by tracking the corners of the pieces. Learned visualmotor and visual-world models are used for open loop manipulation when visual feedback is unavailable (e.g. due to occlusion during insertion). In fig. 6 the Utah/MIT hand is used to grasp and screw in the light bulb. The hand and cameras are mounted on a PUMA robot, which does the transportation movements. Figure 6: Exchanging a light bulb. Video 6 View synthesis In fig. 7 and 8 on- and off-line view synthesis can be compared. The blurriness in the on-line case is a result of efficiency tradeoffs. To get real-time execution on a SUN sparcstation we use a piecewise linear visual-motor model, and a visual representation with only 24 eigen-images. In the offline case we can allow time for preprocessing before playing the movie. In fig. 8 we use a third order spline model and 300 eigen-images. Another reason for using a first order model in the on-line case is that it has fewer parameters to estimate, and thus can be learned after only a few movements (as described in section 4.). reduced. Furthermore highly redundant systems allow us to detect outliers, and deal with partial occlusion. We have shown how the estimated models can be used for model free, image based view synthesis of an articulated agent. In that application we traded viewing quality for simplicity of use and speed of model acquisition. The system is currently limited by the performance of the visual front end, in which the raw intensity image is turned into a parameterized visual representation. References Figure 7: Using the on-line linear model to synthesize a few small deviations “twiddles” from the real physical state in the bottom center image. Figure 8: Off-line simulation of an articulated PUMA robot here controlled in 3 DOF world space. 7 Discussion Successful application of machine vision and robotics in unstructured environments, without using any a-priori camera or kinematic models has proven hard, yet there are many such environments where robots would be useful. We do transfer function estimation or “learning” on-line by estimating piecewise linear models. The robot controller uses the learned models to predict how to move to achieve new goals. We have showed how to improve a standard, Newton-type visual servoing algorithm. We use a trust region method to achieve convergence for difficult transfer functions, and “inbäddning”[25] or homotopy methods to transform a positioning task on a non-convex domain of the transfer function to a series of smaller tasks, each on a smaller convex domain. Intuitively both these techniques serve to synchronize actions with model acquisition, so that the actions never run ahead too far before the local model has been adapted to the new environment. We have carried out extensive experiments and found that for typical robot arms (PUMA 761 and 762), and hands (Utah/MIT), repeatability is up to five times better under visual servo control than under traditional joint control. We also found that the adaptive visual servoing controller is very robust. The algorithm can successfully estimate the image Jacobian without any prior information, while carrying out a 3 DOF manipulation task. We showed how to bootstrap higher DOF tasks from the 3 DOF Jacobian estimate. We were able to verify that redundant visual information is valuable. Both errors due to imprecise tracking and goal specification were [1] Weiss L. E. Sanderson A. C. Neumann C. P. “Dynamic SensorBased Control of Robots with Visual Feedback” J. of Robotics and Aut. v. RA-3 1987 [2] Feddema J. T. Lee G. C. S. “Adaptive Image Feature Prediction and Control for Visual Tracking with a Hand-Eye Coordinated Camera” IEEE Tr. on Systems, Man and Cyber., v 20, no 5 1990 [3] Conkie A. Chongstitvatana P. “An Uncalibrated Stereo Visual Servo System” DAITR#475, Edinburgh 1990 [4] Curwen R. Blake A. “Dynamic Contours: Real time active splines” In Active Vision Ed Blake, Yuille MIT Press 1992. [5] Wijesoma S. W. Wolfe D. F. H. Richards R. J. “Eye-to-Hand Coordination for vision guided Robot Control Applications” Int. J. of Robotics Research, v 12 No 1 1993 [6] Papanikolopoulos N. P. Khosla P. K. “Adaptive Robotic Visual Tracking: Theory and Experiments” IEEE Tr. on Aut. Control Vol 38 no 3 1993 [7] Harris M. “Vision Guided Part Alignment with Degraded Data” DAI TR #615, Edinburgh 1993 [8] Hollinghurst N. Cipolla R. “Uncalibrated Stereo Hand-Eye Coordination” Brit. Machine Vision Conf 1993 [9] Jägersand M. Nelson R. Adaptive Differential Visual Feedback for uncalibrated hand-eye coordination and motor control TR# 579, U. of Rochester 1994. [10] Jägersand M. “Perception level control for uncalibrated handeye coordination and motor actions” Thesis proposal, University of Rochester, May 1995. [11] M. Jägersand. Model Free View Synthesis of an Articulated Agent, Technical Report 595, Computer Science Department, University of Rochester, Rochester, New York, 1995. [12] Kutulakos K. Jägersand M. “Exploring objects by purposive viewpoint control and invariant-based hand-eye coordination” Workshop on vision for robots In conjunction with IROS 1995. [13] Jägersand M. Nelson R. “Visual Space Task Specification, Planning and Control” In Proc on IEEE Symp. on Computer Vision, 1995. [14] Jägersand “Visual Servoing using Trust Region Methods and Estimation of the Full Coupled Visual-Motor Jacobian”IASTED Applications of Robotics and Control, 1996 [15] Corke P. I. High-Performance Visual Closed-Loop Robot Control PhD thesis U of Melbourne 1994. [16] Hosoda K. Asada M. “Versatile Visual Servoing without Knowledge of True Jacobian” Proc. IROS 1994. [17] Fuentes O. Nelson R. Morphing hands and virtual tools TR# 551, Dept of CS, U. of Rochester 1994. [18] B. H. Yoshimi P. K. Allen “Active, Uncalibrated Visual Servoing” ARPA IUW, 1993. [19] Hager G. “Calibration-Free Visual Control Using Projective Invariance” In Proc. of 5:th ICCV 1995. [20] Nayar S. Nene S. Murase H. Subspace Methods for Robot Vision TR CUCS-06-95 CS, Columbia, 1995. [21] Rao R. Ballard D. An Active Vision Architecture based on Iconic Representations TR 548, CS, University of Rochester, 1995 [22] Turk M. Pentland A. “Eigenfaces for recognition” In J of Cognitive Neuroscience v3 nr1, p71-86, 1991. [23] Garcia, Zangwill Pathways to solutions, fixed points, and equilibria, Prentice-Hall, 1981. [24] Fletcher R. Practical Methods of Optimization Chichester, second ed. 1987 [25] Gustafsson I. Tillämpad Optimeringslära Komp., Inst. för Inf. Beh., Chalmers 1991. [26] Dahlquist G. Björck Å. Numerical Methods Second Ed, Prentice Hall, 199x, preprint.