On-line Estimation of Visual-Motor Models using Active Vision

advertisement
On-line Estimation of Visual-Motor Models using Active Vision
Martin Jägersand, Randal Nelson
Department of Computer Science, University of Rochester, Rochester, NY 14627
fjag,nelsong@cs.rochester.edu
http://www.cs.rochester.edu/u/fjag,nelsong
Abstract
We present a novel approach for combined visual
model acqusition and agent control. The approach
differs from previous work in that a full coupled Jacobian is estimated on-line without any prior models, or the introduction of special calibration movements. We show how the estimated models can be
used for visual space robot task specification, planning and control. In the other direction the same
type of models can be used for view synthesis.
1
Introduction
In an active or behavioral vision system the acquisition of
visual information is not an independent open loop process,
but instead depends on the active agent’s interaction with
the world. Also the information need not represent an abstract, high level model of the world, but instead is highly
task specific, and represented in a form which facilitates
particular operations. Visual servoing, when supplemented
with on-line visual model estimation, fits into the active vision paradigm. Results with visual servoing and varying degrees of model adaption have been presented for robot arms
[1, 3, 2, 5, 6, 9, 13, 15, 16, 18]1 . Visual models suitable for
specifying visual alignments have also been studied [19, 8, 7].
However, the focus of this work has been the movement (servoing) of the robot, not on on-line estimation of high DOF
visual-motor models. In this paper we focus on the model exploration aspect and present an active vision technique, having
interacting action (control), visual sensing and model building
modules, which allows the simultaneous visual-motor model
estimation and control (visual servoing) of a variety of robotic
active agents.
We place the following requirements on the active visual
model acquisition:
It should be general enough to estimate an arbitrary, but
smooth visual motor model, without assuming any particular viewing geometry, camera configuration or manipulator kinematics.
It should be efficient in that it should be usable for control
after just having observed a few agent movements, and
after that it should be able to adapt to a changing visual
motor transfer function.
It should not interfere with the visual motor task. In normal operation, no extra “calibration movements” should
be needed for the model estimation.
Supported by ARPA subcontract Z8440902 through University
of Maryland
1
For a review of this work we direct the reader to [10] or [15].
A combined model acquisition and control approach has
many advantages. In addition to permitting uncalibrated visual servo control, the on-line estimated models are useful
for (1) prediction and constraining search in visual tracking
[13, 2], (2) perfomring local coordinate transformations between manipulator (joint), world, and visual frames[9, 13],
and (3) synthesizing views from a basis of agent poses [11].
We have found such an adaptive approach to be particularly
helpful in robot arm manipulation when carrying out difficult tasks, such as manipulation of flexible material [9, 13],
or performing large rotations for exploring object shape [12].
For a dextrous multifinger robot hand, such as the Utah/MIT
hand, the fully adaptive approach is appealing because dextrous manipulation of a grasped object is much harder to model
accurately than a typical robot arm system, where the object
is rigidly attached to the end effector.
In this paper we present four main contributions:
1. A Broyden-type Jacobian estimator allows the online
estimation of a full coupled visual motor Jacobian, allowing control both without initial models or estimates,
and over significantly non-linear portions of the transfer
function.
2. Visual space trajectory planning and control is used to
ensure convergence to a global, rather than a local minimum.
3. A trust region method is used to give convergence for
difficult visual-motor transfer functions, and makes the
system more general, without the manual tuning of step
discretization parameters.
4. We present an experimental evaluation of visual-motor
model acqusition and visual feedback control of the
Utah/MIT hand, and PUMA robot arms.
2 Viewing model
An active vision agent has control over its actions, and can
observe the results of an action via the changes in visual appearance. We study a robot agent, fig. 1, in an unstructured
environment. The robot action reference frame is joint space,
described as desired joint angles2 , x = (x1 : : :xn)T , and their
time derivatives ẋ; ẍ. The changes in visual appearance are
recorded in a perception or feature vector y = (y1 : : :ym )T .
Visual features can be drawn from a large class of visual
measurements[1, 9], but we have found that the ones which
can be represented as points positions or point vectors in camera space are suitable[10]. We track features such as boundary
discontinuities (lines,corners) and surface markings. Redundant visual perceptions (m n) are desirable as they are
used to constrain the raw visual sensory information.
2
Vectors are written bold, scalars plain and matrices capitalized.
Images
Features
y
Robot control
x
Figure 1: Visual control setup using two cameras.
The visual features and the agent’s actions are related by a
visual motor transfer function f , satisfying y = f (x). The
goal for the control problem is, given current state x0 and y0 ,
the desired final state in visual space y , to find a motor command, or sequence thereof, ∆xk s.t. f (x0 + k ∆xk ) = y .
Alternatively, one can view this problem as the minimization
of the functional = 12 (f y )T (f y ).
In a traditional, calibrated setting we have to know two
functions, camera calibration h and robot kinematics g, either
a priori, or through some calibration process. The accuracy
at which we can represent objects and have the active agent
manipulate them depends on the accuracy of both of these
functions (f 1 = g(h())), since typically feedback is only
over the agents internal joint values.
In our uncalibrated visual servoing the visual-motor transfer
function is unknown and at any time k we estimate a first
order model of it, f (x) f (xk ) + J (xk )(x xk ). The
model is valid around the current system configuration xk ,
and described by the “image”[15] or visual-motor Jacobian
defined as
P
j (xk )
Jj;i )(xk ) = @f@x
(
i
(1)
The image Jacobian not only relates visual changes to motor
changes, as is exploited in visual feedback control but also
highly constrains the possible visual changes to the subspace
yk+1 <m of yk+1 = J ∆x + yk (Remember m n). Thus
the Jacobian J is also a visual model, parameterized in exactly
the degrees of freedom our system can change in and useful
in a variety of active vision tasks as we will explore in this
paper. During the execution of a manipulation task the agent
successively learns more and more about the environment in
the form of a piecewise linear model on an adaptive size mesh.
3
Visual Measures
The purpose of the visual measures is to transform the intensity
images into a more compact and descriptive space, while still
capturing the pose of the object.
3.1 Feature Based Measures
We use real time visual feature trackers of three different kinds
to obtain visual information. The Oxford snakes [4] are used
to track surface discontinuities. A locally developed template
matching tracker tracks multiple local features from surface
markings or corners, and for reliability in repeated experiments, or to deal with smooth featureless surfaces, such as the
lightbulb in fig. 6, we use special purpose trackers, tracking
attached targets or small lights. To improve tracking, viewing
geometry models are widely used. For instance the Oxford
snake package uses an affine model to constrain the motion
of the spline control points to rigid 3 D deformations, and an
strain energy model for nonrigid image plane deformations.
The convolution trackers uses point velocity for prediction.
The image Jacobian provides a new model for tracking.
As noted in section 2, the subspace of of possible solutions
yk+1 to yk+1 = J ∆x + yk is of size <n rather than <m (and
m n). In our active framework the agent also knows along
which direction ∆x the system changes. This leaves only a
one dimensional search space along ypredicted = J ∆x +
yk ; 2 [0; 1] in feature space. Note however that we cannot
simply constrain the tracker output to this space. That would
take away the innovation term in our model updating, and
the system would no longer adapt its model to a changing
environment. Instead we use ypredicted to detect outliers (e.g.
stemming from occluded features or the tracker tracking the
wrong thing) and to constrain the tracking search window to
a small “cylinder” around ypredicted . In a future development
we intend to use the predictor in a more general Kalman filter.
3.2 Image Intensity Based Filters
The idea in the subspace eigen image method is to project the
raw intensity values onto a basis of m eigen images. Representations based on this idea have been used for recognition problem “what”, and for indexing locations “where”[20, 22, 21].
There are several ways to choose the eigen images. In our
case we will be looking at the same agent, in different poses,
and all the images we want to represent are fairly similar.
In this case it is advantageous to use a basis specifically designed for the agent. In summary (see also [20, 22]) this can
be done by acquiring a (large) number p of size q q images
2
Ik ; k = 1 : : :p; I 2 <(q ) of the agent in different poses. Let
the mean image Ī = k=1:::p Ik =p, and for each image in
the data set form the difference image ∆Ik = Ik Ī. Form a
measurement matrix A = ∆I1 ; ∆I2 ; : : :; ∆Ip , and calculate
the covariance matrix C = AAT . The principal components
of this data are the eigenvectors to the matrix C . The eigenvectors form an orthogonal basis for the original image set,
accounting for the variation in the data in decreasing order, according to the corresponding eigenvalues. A dimensionality
reduction is achieved by using instead of all q2 eigenvectors
only a subspace of say the first m < q2 eigenvectors. For
practical reasons usually m p q2 , and the covariance
matrix C will be rank deficient. We can then save computational effort by instead computing L = AT A and using the
p eigenvectors V = (vl ) ; l = 1 : : :p of L to form the m
first eigenvectors U = (ul ) of A by U = AV1:::m , where
V1:::m = (v1; : : :; vm ).
After a basis has been acquired, (which for a particular agent
typically only needs to be done once), any new image I can be
represented in this basis as a perception vector y = U T (I Ī)
and a given y can be transformed (with some quality loss) into
a corresponding image by the inverse formula I = Ī + U y.
P
4 Model Estimation
An estimate to the image Jacobian can be obtained by physically executing a set of calibration movements along the basis
direction of motor space ([3, 8, 19]) ei and approximate the
Jacobian with finite differences:
Jˆ(x; d) = (f (x+d1 e1 ) f (x); : : : ; f (x+dn en ) f (x))D 1
(2)
where D = diag(d); d 2 <n. However to keep the estimate
current for the typically nonlinearly changing environment
the agent would need to perform these movements rather frequently. This is not appropriate in most manipulation settings, where the calibration movements would interfere with
the task. Partial modeling of the viewing geometry using an
ARMAX model and estimating only one or a few parameters (e.g. depth) has also been tried [2, 6]. This however
restricts the camera-robot configurations and environments to
structured, easy to model settings.
We seek instead, an online method, which estimates the Jacobian by simply observing the process, without introducing
any extra “calibration” movements. In observing the process we obtain the changes in visual appearance ∆ymeasured
corresponding to a particular controller command ∆x. This
is essentially a secant approximation of the derivative of f
along the direction ∆x. We want to update the Jacobian in
such a way as to satisfy our most recent observation (secant
condition): ∆ymeasured = Jˆk+1 ∆x
The condition above is under-determined; thus a family of
updating formulas, called the Broyden hierarchy, is defined as
follows:
Jˆk+1 = Jˆk + ai Ξi
(3)
X
i
Where Ξi are different rank 1 matrices so the rank of the
correction term is equal to the number of non-zero ai 3 . We
choose an unsymmetric correction term:
Jˆk ∆x)∆xT
Jˆk+1 = Jˆk + (∆ymeasured
∆xT ∆x
(4)
This is a first order updating formula in the above hierarchy,
and it converges to the Jacobian after n orthogonal moves
f∆xk ; k = 1 : : :ng. It trivially satisfies the secant condition,
but to see how it works let’s consider a change of basis transformation P to a coordinate system O0 aligned with the last
movement so ∆x0 = (∆x1 ; 0; : : :; 0). In this basis the correction term in eq. 4 is zero except for the first column. Thus
our updating schema does the minimum change necessary to
fulfill the secant condition (minimum change criterion). Let
∆x1 : : : ∆xn be n orthogonal moves, and P1 : : :Pn the transformation matrices as above. Then P2 : : :Pn are just circular
permutations of P1 , and the updating schema is identical to
the finite differences in eq. 2, in the O10 frame.
Note that the estimation in eq. 4 accepts movements along
arbitrary directions ∆x and thus needs no additional data other
than what is available as a part of the manipulation task we
want to solve. In the more general case of a set of nonorthogonal moves f∆xg the Jacobian gets updated along the
dimensions spanned by f∆xg.
5
Use of Estimated Visual-Motor Models
Estimated visual-motor manipulation models are useful in a
variety of settings. We show how to use them for visual
servo control without the need of prior models on both robot
arms and hands. We also show how to use the inverse visualmotor model to generate images, thus simulating the actions
of an articulated active agent. Internally in the system the
visual-motor models are useful in a variety of ways. They
can serve as visual representations for recognition, as models
for filtering, tracking and search reduction. On a higher level,
vision based control can lead to user friendly visual robot
“programming” or “teaching” interfaces, suitable for use in
3
Updating formulas of rank 1 and 2 are most common. The motivation of popular rank 2 formulas, such as the Davidson, Fletcher,
Powell (DFP) [24] is that they preserve symmetric (Ξi i iT ) positive definiteness of Jˆ, and thus guarantees that the search direction
∆x of a quasi-Newton type controller is a descent direction for .
The downside is we know no strong convergence results of the DFP
method in combination with inexact line searches (which are used in
our trust region controller).
=
unstructured, hard to model environments. In previous work
[13] we have shown how a visually guided robot arm can
be instructed to solve a variety of hand-eye tasks by: (1) A
sequence of images, describing the task at hand. (2) By having
a human draw a sketch describing the visual alignments in the
task. (3) Using a video image of the work area and having
a human operator interactively point out (with the mouse) to
the robot what to do (vision based telemanipulation).
5.1 Control
The active agent specifies its actions in terms of desired perceptions y . We need a control system capable of turning
these goal perceptions into motor actions x. A simple control
law, occuring in some form in most visual servoing research
(e.g. [3, 19, 16]) is
y
y = KJ ẋ
(5)
where K is a gain matrix. In a discrete time system running at a fixed cycle frequency (at or below the 60Hz
video frequency), the gain K turns into a step length :
xk+1 = xk + ∆x, where ∆x is the (least square) solution
to the (over determined) system y yk = J (xk )∆x. Dynamic stability of the robot at this low sampling frequency is
achieved by a secondary set of high bandwidth joint feedback
controllers.
This popular controller however has two major deficiencies. First, even for a convex problem (f T f convex) it is not
guaranteed to be convergent[26], and second in the case of
a non convex problem it often does not converge at all [26].
Previous work has overcome this problem by making only a
single, small distance move within a relatively smooth and
well scaled region of f . To solve whole, real tasks this is not
a viable solution. We adopt a trust region method [26] similar
to the well known Marquart step length () adaption schema
to solve the first problem. In the trust region method, the current indicates the distance for which the estimated model
is valid. For the second we use a technique known in numerical analysis as “inbäddning”[25] or homotopy methods[23],
which involves the generation of intermediate goals or “way
points” along the way to the main goal y , transforming a
globally non convex problem into a set of locally convex subproblems. Intuitively both these techniques aid to synchronize
actions with model acquisition, so that the actions never run
too far ahead before the local model has been adapted to the
new environment. For details and theoretical properties of
these two methods see our control theory paper [14].
5.2 Visual space task representation and planning
To date work in image/feature space visual control has demonstrated low level servoing behaviors, achieving a single visual
alignment, eg. [16, 19]. A remaining principal challenge is
how to specify complex tasks in visual space, divide them
up into subtasks, plan trajectories in visual space, and select
different primitive visual servoing behaviors and visual goals.
We suggest a semi-automated way of solving these high level
problems, providing an image based programming interface,
as shown in fig. 2. The user specifies the changes he wishes
to bring about in the world by clicking on the objects, and
pointing out their desired locations, and alignment features.
If this is done interactively we have a very low bandwidth
telemanipulation system, isolating the user from the difficult
low level control problems. When it is done off line, we have
a user friendly programming interface.
Traditional task specification and planning is done in a
calibrated global Euclidean world coordinate frame. Our uncalibrated system does not have this frame, so task description
is fundamentally different. Instead the central frame is composed of the perception vectors y. Goals as well as relevant
Work area
plate
fork
cup
fork
Moving command: moving cup
Figure 2: Vision based programming interface.
aspects of current system state are specified in these. There
should exist a direct correspondence between the perception
vectors and the image appearance, so we can think of coding
our task in terms of desired or goal images. As time progresses
the system description changes on each of the different representation levels, namely raw image, feature image, perception
and motor control, see fig. 3. This describes a dynamic system, involving the real world as a part of it.
Our systems uses three visual teaching modes: The first is
the “point in image” one as shown in fig. 2. In the second
the operator shows a sequence of real images, depicting the
task. The feature trackers are used extract goal and subgoal
perception vectors from the image sequence. In the third the
operator symbolically describes the task, i.e. “put the square
puzzle piece in the square slot”. The two first require no
image interpretation, and we have tried them successfully in
several tasks. The third we have tried only in very simple
environments.
forming a corner on a rectangular box in two cameras, or two
poses, gives an Euclidean base P . Using more views improves
the accuracy of the base[12]. Often an incomplete base is
enough (i.e. to move up we only need to identify a vertical
line near the robot in each of the cameras). A manipulation
∆z described in base P is transformed to vision space by
∆y = P ∆z and to motor space by solving ∆y = J (xk )∆x,
using the (locally valid) Jacobian estimate obtained during
manipulation.
We now describe how to construct a mid level primitive
from low level servoing behaviors. Many tasks contain subtasks involving a long range transportation move followed by
a short range fine manipulation. Our results from evaluating
the controller suggests that for the most robust model estimation and control we should control as few DOF’s as possible.
To transport an object described by one point we only need
3 DOF. To manipulate a rigid object we need 6. As noted
earlier when controlling 3 DOF our algorithm needs no prior
models.
To bootstrap the 6 DOF control we use the model estimated
during the 3 DOF stage. Fig. 4 shows the visual part of an
insertion sequence. For the 3 DOF long range transportation
one of the features (here white dots) is extracted and tracked
in two cameras. For the fine manipulation 14 features are used
by tracking 5 points in one camera and 2 in the other. When
switching between 3 and 6 DOF mode, the first three columns
of the 14 6 DOF Jacobian are filled from the 4 3 DOF
Jacobian, the last three with random numbers.
J 311; J 312; J 313; ; ; J 321; J 322; J 323; ; ; J 6 = J 3.11; J 312; J 313; ; ; ..
J 321; J 322; J 323; ; ; 0
BB
B@
1
CC
CA
The 6 DOF alignment serves two purposes. It aligns the piece
Reach
Raw images
Feature images
Perception
vectors
Control
commands
y0
Robot
actions
3 DOF long reach
x
6 DOF alignment
y1
Feature trackers
operate in ROI’s
Numerical values
extracted from
features
yk
6 DOF move in
x1
Adaptive DVFB
controller
Start
Alignment
Joint level
robot
controller
xk
Visual space plan
Start final
6 DOF
move in
End of
visually
controlled
part
Actual movement sequence
Figure 4: Left: Planning the different phases of an insert
type movement consisting of reaching and fine manipulation
movements. Right: Performing the planned insertion. Video
1
Time
k
Figure 3: The representation levels in a vision based control
system
in 6 DOF, obtaining the correct initial pose for the 6 DOF fine
manipulation. Also during this phase the bootstrapped 6 DOF
Jacobian is updated to an accurate estimate, allowing high
prescision moves in the later fine manipulation stage.
Not all tasks can be defined entirely in terms of visual
alignments. For instance during an insertion, an object may
become totally occluded. Some manipulations are inherently
more suited to description in a world frame (i.e., move the
light bulb to straight above the socket) or the joint frame
(highly stereotypical motions such the rotations to screw in an
object). We use local, object centered world frames, which can
be Cartesian, or affine, depending on how much structure is
available in the image. For instance identifying the three lines
5.3 View synthesis
View synthesis can be done offline by generating a movie sequence of an agent performing a task given a corresponding
control command sequence (fxl g), and an a-priori identified
visual motor transfer function f . We can do this by inter and
extrapolating the learned visual-motor transferfunction. We
tried piecewise first and third order spline models for this.
More interestingly, the online case is to generate arbitrary
simulated views, representing (reasonably small) deviations
∆x from the current state of the real physical agent, while at
the same time learning and refining the model used to generate those views. We describe a telemanipulation application,
where the teleoperator controls the agent, but for instance
long delays, or limited bandwidth between the teleoperation
site and the agent prevents immediate and/or full frame rate
visual feedback to the operator. Instead we use the view synthesis method to generate the immediate visual feedback, and
use the slower real visual feedback to calibrate the model used
for the view synthesis.
Through observation of the process by the method in section
4 we have an estimate of the current visual motor Jacobian J .
Consider one step in an online algorithm. At time i we have
current image Ii, perception vector yi, visual motor Jacobian
Ji , and current agent state xi in motor space. The teleoperator
makes a motor command ∆x, so xi+1 = xi + ∆x. Our model
estimate predicts the changes in the perception vector y:
yi+1
=
yi + Ji∆x
(6)
The simulated image Îi resulting from the command is generated from yi+1 , as described in section 3.2 and shown to the
operator.
After some delay d, and possibly at a lower rate than full
frame rate, the real image arrives. From it the real measured feature vector yi;measured is extracted, and the innovation term mei = yi;measured yi;simulated is incorporated
(added) into the current (yi+d ) perception vector estimate.
Now we have ∆yi;measured and ∆x and can update the Jacobian with the model estimation method shown in section 4.
The online method thus estimates, and uses successive linear
models of the visual motor transfer function, each model valid
around a particular state xi . How long a delay d we can tolerate depends on the validity range for our linear model (which
can be found on line, see section 5.1), which in turn depends
on the visual motor transfer function of our system, and on
the visual measures we choose.
Experiments4
6
We have evaluated our visual servoing controller by: (1) Testing the repeatability and convergence of positioning. (2) Using it as a component in solving several complex manipulation
tasks. These experiments are described in more detail in our
technical report [9]. On a PUMA 761 we found that repeatability is 35 % better under visual servo control than under
standard joint control. On a worn PUMA 762, with significant backlash, we got a repeatability improvement of 5 times
with the visual control. The Utah/MIT dextrous hand has
16 controllable DOF’s. The four fingers form a parallel kinematic chain when grasping an object. Fine manipulation of an
object in the hand is much more difficult than with a robot arm
[17]. Manipulating a rigid object in 6 DOF using the visual
servo control we note a 73 % improvement in repeatability
compared to Cartesian space joint feedback control.
We have evaluated the model estimation in 3, 6 and 12
controlled DOF. In 3 DOF we can successfully estimate the
Jacobian without any prior models while carrying out a manipulation task. In 6 and 12 DOF a good initial estimate is
beneficial. The estimate can be bootstrapped as we described
in section 5.2. Redundant visual measures are beneficial, as
they reduce errors due to tracking and visual goal specification. In a 3 DOF positioning task we tried using between
4
Electronic m-peg videos of the demonstrations in this section are accessible through the Internet WWW. Use the menu in:
http://www.cs.rochester.edu/u/jag/PercAct/videos.html to view the
videos corresponding to figures marked Video #.
m = 4 and 16 measures. Positioning accuracy increased 4
times with m = 16 compared to m = 4.
We have tried using the visual servoing in solving several
complex, real world tasks, such as playing checkers, setting a
table, solving a kids puzzle and changing a light bulb [10]. Visual space programming is different from conventional robot
programming in that commands are given in image space
rather than world space. This makes user friendly programmer interfaces easy to implement. We have tried having the
robot operator: (1) Draw visual sketches of the desired movements. (2) Point out objects and alignments in video images.
(3) Show an image sequence depicting the task (see [13]).
Insertion point
3 DOF Transportation
Angular alignment
Final insertion
Open loop movements
Guarded pick up
Figure 5: Solving a kid’s puzzle. Video 2 and 3
In fig.5 the PUMA robot solves a kid’s puzzle under visual
control. The operator points in an image, using the computer mouse, directing which piece goes where. The program
decomposes this into transportation, alignment and insertion
movements, and plans trajectories in visual space (white lines
in fig.). For coarse transportation movements the centroid of
the pieces are tracked using two stationary cameras. While
aligning and fine manipulating, accurate pose information is
given by tracking the corners of the pieces. Learned visualmotor and visual-world models are used for open loop manipulation when visual feedback is unavailable (e.g. due to
occlusion during insertion). In fig. 6 the Utah/MIT hand
is used to grasp and screw in the light bulb. The hand and
cameras are mounted on a PUMA robot, which does the transportation movements.
Figure 6: Exchanging a light bulb. Video 6
View synthesis
In fig. 7 and 8 on- and off-line view synthesis can be compared. The blurriness in the on-line case is a result of efficiency tradeoffs. To get real-time execution on a SUN sparcstation we use a piecewise linear visual-motor model, and a
visual representation with only 24 eigen-images. In the offline case we can allow time for preprocessing before playing
the movie. In fig. 8 we use a third order spline model and 300
eigen-images. Another reason for using a first order model in
the on-line case is that it has fewer parameters to estimate, and
thus can be learned after only a few movements (as described
in section 4.).
reduced. Furthermore highly redundant systems allow us to
detect outliers, and deal with partial occlusion.
We have shown how the estimated models can be used
for model free, image based view synthesis of an articulated
agent. In that application we traded viewing quality for simplicity of use and speed of model acquisition. The system is
currently limited by the performance of the visual front end, in
which the raw intensity image is turned into a parameterized
visual representation.
References
Figure 7: Using the on-line linear model to synthesize a few
small deviations “twiddles” from the real physical state in the
bottom center image.
Figure 8: Off-line simulation of an articulated PUMA robot
here controlled in 3 DOF world space.
7
Discussion
Successful application of machine vision and robotics in unstructured environments, without using any a-priori camera
or kinematic models has proven hard, yet there are many
such environments where robots would be useful. We do
transfer function estimation or “learning” on-line by estimating piecewise linear models. The robot controller uses the
learned models to predict how to move to achieve new goals.
We have showed how to improve a standard, Newton-type
visual servoing algorithm. We use a trust region method
to achieve convergence for difficult transfer functions, and
“inbäddning”[25] or homotopy methods to transform a positioning task on a non-convex domain of the transfer function
to a series of smaller tasks, each on a smaller convex domain.
Intuitively both these techniques serve to synchronize actions
with model acquisition, so that the actions never run ahead
too far before the local model has been adapted to the new
environment.
We have carried out extensive experiments and found that
for typical robot arms (PUMA 761 and 762), and hands
(Utah/MIT), repeatability is up to five times better under visual servo control than under traditional joint control. We
also found that the adaptive visual servoing controller is very
robust. The algorithm can successfully estimate the image Jacobian without any prior information, while carrying out a 3
DOF manipulation task. We showed how to bootstrap higher
DOF tasks from the 3 DOF Jacobian estimate. We were able
to verify that redundant visual information is valuable. Both
errors due to imprecise tracking and goal specification were
[1] Weiss L. E. Sanderson A. C. Neumann C. P. “Dynamic SensorBased Control of Robots with Visual Feedback” J. of Robotics
and Aut. v. RA-3 1987
[2] Feddema J. T. Lee G. C. S. “Adaptive Image Feature Prediction
and Control for Visual Tracking with a Hand-Eye Coordinated
Camera” IEEE Tr. on Systems, Man and Cyber., v 20, no 5 1990
[3] Conkie A. Chongstitvatana P. “An Uncalibrated Stereo Visual
Servo System” DAITR#475, Edinburgh 1990
[4] Curwen R. Blake A. “Dynamic Contours: Real time active
splines” In Active Vision Ed Blake, Yuille MIT Press 1992.
[5] Wijesoma S. W. Wolfe D. F. H. Richards R. J. “Eye-to-Hand
Coordination for vision guided Robot Control Applications”
Int. J. of Robotics Research, v 12 No 1 1993
[6] Papanikolopoulos N. P. Khosla P. K. “Adaptive Robotic Visual
Tracking: Theory and Experiments” IEEE Tr. on Aut. Control
Vol 38 no 3 1993
[7] Harris M. “Vision Guided Part Alignment with Degraded Data”
DAI TR #615, Edinburgh 1993
[8] Hollinghurst N. Cipolla R. “Uncalibrated Stereo Hand-Eye Coordination” Brit. Machine Vision Conf 1993
[9] Jägersand M. Nelson R. Adaptive Differential Visual Feedback
for uncalibrated hand-eye coordination and motor control TR#
579, U. of Rochester 1994.
[10] Jägersand M. “Perception level control for uncalibrated handeye coordination and motor actions” Thesis proposal, University
of Rochester, May 1995.
[11] M. Jägersand. Model Free View Synthesis of an Articulated
Agent, Technical Report 595, Computer Science Department,
University of Rochester, Rochester, New York, 1995.
[12] Kutulakos K. Jägersand M. “Exploring objects by purposive
viewpoint control and invariant-based hand-eye coordination”
Workshop on vision for robots In conjunction with IROS 1995.
[13] Jägersand M. Nelson R. “Visual Space Task Specification,
Planning and Control” In Proc on IEEE Symp. on Computer
Vision, 1995.
[14] Jägersand “Visual Servoing using Trust Region Methods and
Estimation of the Full Coupled Visual-Motor Jacobian”IASTED
Applications of Robotics and Control, 1996
[15] Corke P. I. High-Performance Visual Closed-Loop Robot Control PhD thesis U of Melbourne 1994.
[16] Hosoda K. Asada M. “Versatile Visual Servoing without
Knowledge of True Jacobian” Proc. IROS 1994.
[17] Fuentes O. Nelson R. Morphing hands and virtual tools TR#
551, Dept of CS, U. of Rochester 1994.
[18] B. H. Yoshimi P. K. Allen “Active, Uncalibrated Visual Servoing” ARPA IUW, 1993.
[19] Hager G. “Calibration-Free Visual Control Using Projective
Invariance” In Proc. of 5:th ICCV 1995.
[20] Nayar S. Nene S. Murase H. Subspace Methods for Robot
Vision TR CUCS-06-95 CS, Columbia, 1995.
[21] Rao R. Ballard D. An Active Vision Architecture based on Iconic
Representations TR 548, CS, University of Rochester, 1995
[22] Turk M. Pentland A. “Eigenfaces for recognition” In J of Cognitive Neuroscience v3 nr1, p71-86, 1991.
[23] Garcia, Zangwill Pathways to solutions, fixed points, and equilibria, Prentice-Hall, 1981.
[24] Fletcher R. Practical Methods of Optimization Chichester, second ed. 1987
[25] Gustafsson I. Tillämpad Optimeringslära Komp., Inst. för Inf.
Beh., Chalmers 1991.
[26] Dahlquist G. Björck Å. Numerical Methods Second Ed, Prentice Hall, 199x, preprint.
Download