jlibbyFinal

advertisement
Jackie Libby
Mechanics of Manipulation
Spring 2009
Paper review
This review discusses the paper, “On the Representation and Estimation of Spatial
Uncertainty”, by Randall C. Smith and Peter Cheeseman (1). I will attempt to summarize the
main concepts discussed in the paper, as well as clarify and flesh out some of the details that they
leave out. This is a pivotal paper in Robotics, because it laid the foundations for using error
approximations in estimating the relative position and orientation between objects. The paper
treats such an object in a very general sense, where an object is anything with its own coordinate
frame. In so doing, the results can be applied to many problems. One such problem would be
estimating the pose of an end effector on a robotic arm relative to its base. This is the main
problem that the paper looks at, and it is the most relevant example to the Manipulation topics
covered in this class. Another exemplary problem would be in the field of mobile robots, where
the relative objects are different poses of a moving robot, as it traverses through space over time.
This would be the main example relevant to my individual research in SLAM (Simultaneous
Localization and Mapping). The first part of the SLAM problem is Localization, where I am
using an Extended Kalman Filter to figure out a new pose of a robot relative to a previous pose,
based on internal and external sensor readings. The underlying theory behind Kalman Filters
relies heavily on the ideas discussed in this paper, where error approximations are compounded
and merged over time.
If we examine the relative position and orientation between two objects, then we are
looking at the transformation between the coordinate frames of the two objects. If we are in twodimensional Cartesian space, then the axes of these coordinate frames are X, Y, and ϴ. If we’re
dealing with three dimensions, then the axes are X, Y, Z, roll, pitch and yaw. The example used
in this paper looks at an X, Y, ϴ coordinate system. The authors mention that the same theory
can easily be extended to other dimensions and other domains. It is important for the reader to
make this generalization. Not only can it be extended to a three dimensional Cartesian space, but
even further to any space whose axes are simply the parameters of interest. I want to stress this
point here before moving on, because it is important that the reader visualize the transformation
from coordinate frame A to coordinate frame B as simply a vector that starts at the origin of A,
and ends at the origin of B. In appreciating such a general vector, it not only helps the reader to
abstract the ideas presented in this paper to other domains, but even more so, it helps in
understanding the very details of the example presented here. I will continue to gripe about this
general vector throughout the rest of this review, so be prepared.
Figure 1 shows a relational map of relative
coordinate frames. One can think of each
coordinate frame as a node in a graph, and
the edges linking the nodes as the
transformations between one frame and the
next. These edges are the vectors I speak of
above. For example, edge A in the figure is
the transformation from frame W to frame
L1. Edge B is the transformation from L1 to
L2. Edge E is the transformation from W to
L2. Edge E, along with many other edges in
this network, are drawn with curved lines, to
distinguish they were derived from
combinations of other edges. I think it would
be much clearer to keep all edge lines
straight, to emphasize the fact that these are
simply Euclidean vectors, no matter what
space you are talking about. The tip of the
vector is the coordinates of the new frame.
These coordinates can represent any
transformation you want, but they are still
simply the values along the axes of the original frame. The tail of the vector is at the original
frame that it is coming from, and so by turning the point into a free vector, it is clear that it exists
only in relation to some frame. The vector is Euclidean, because it preserves length. This is
unclear when the vector is drawn as a curved, general edge of a graph.
The ellipses around each coordinate frame node in figure 1 represent the uncertainty of
those coordinates. One very glaringly obvious mistake that the paper makes is that it switches
the dashed and solid lines. Each ellipse is centered around a node, but the reason that there is
more than one ellipse at each node is because each ellipse corresponds with an edge leading into
that node. The dashed ellipses belong to the solid edges, and the solid ellipses belong to the
dashed edges – this is the mistake the paper makes. The paper says in the beginning of section
2.1: “The solid error ellipses express the relative uncertainty of the robot with respect to its last
position, while the dashed ellipses express the uncertainty with respect to W.” It’s actually the
opposite.
With all of this being said, let’s try to appreciate the figure for what it’s worth and keep
going. Treating A, B, and E as vectors, we can see that A + B = E. One can think of a mobile
robot moving from frame W to frame L1 to frame L2. This ends with the same net result as if
the robot moved directly from frame W to L2. It is a little confusing, though, because if the
robot is moving along a 2 dimensional plane, with X, Y, and ϴ as its degrees of freedom, then
the space of these vectors, A, B and E, are really in three dimensions, for X, Y, and ϴ. So the
nodes L1, L2, etc, should not be mistaken for the X, Y positions of the robot.
The paper denotes the transformation A with the coordinates (X1 ,Y1, ϴ1), B as
coordinates (X2, Y2, ϴ2), and E as coordinates (X3, Y3, ϴ3). Since A and E are both coming
from frame W, then (X1 ,Y1, ϴ1) and (X3, Y3, ϴ3) are in world coordinates, whereas (X2, Y2,
ϴ2) is with respect to frame L1. Below is a plot of what’s happening here on an X, Y coordinate
system, with the origin at W.
Notice in this figure where I wrote “not A” and “not B”. This is to denote that edges in
figure 1 are in a three dimensional X, Y, ϴ space, and not a two dimensional (X,Y) plane. (X1,
Y1) and (X3, Y3) can be plotted in my drawing, because they are in world coordinates, but (X2,
Y2) are drawn in as lengths, since they are relative to a different frame. Now that I’ve drawn out
the geometry, it is clear to see how the paper gives the functions f, g, and h in equation (1):
It is a little counterintuitive to think of the inputs and outputs as being in different coordinate
frames, but that is indeed what is happening here. (X1 ,Y1, ϴ1) is wrt W, (X2, Y2, ϴ2) is wrt
L1, and (X3, Y3, ϴ3) is again wrt W.
Now let’s get back to the ellipses. The ellipses around each node, as mentioned
previously, represent the uncertainty associated with that node. It is easier to think of these as
the uncertainty of an edge, with the ellipse being centered around the end of the edge. The
ellipse around L1 corresponds to the uncertainty from moving along edge A. If this is a mobile
robot, this might be the uncertainty related to the odometry controls, whereas if this is a robot
arm, this might be the uncertainty related to the internal torque sensors at the joints. Similarly,
the small ellipse around L2 is the uncertainty corresponding to the movement along edge B. The
two uncertainties are compounded together to get the larger (solid) ellipse around L2. This
corresponds to the (dashed) edge E. Equations (2), (3), and (4), go through the process of
deriving this larger ellipse, and I’ll flesh it out here in more detail. We can ignore the details of
the geometry above and just take the functions f, g and h as general functions from equation 1.
The paper proceeds in a very hand-wavy way to use the words “Taylor series” and “Jacobian”,
but let me try here to explain what they are really talking about. (Please excuse my bitterness.
In general, I understand that papers aren’t meant to derive every detail, but it bothers me that this
paper is so often used as an example of how to derive these basic concepts, if that’s not what it’s
doing.)
The Taylor series is a way of decomposing a function, f(x) into an infinite sum, centered
around some point, x0. The figure below depicts this:
The equation for the Taylor series is:
𝑓 ′ (𝑥0 )
𝑓 ′ ′(𝑥0 )
(𝑥 − 𝑥0 ) +
(𝑥 − 𝑥0 )2 + …
𝑓(𝑥) = 𝑓(𝑥0 ) +
1!
2!
In our case, the variable, x, is really the vector pose [X,Y,ϴ]T. We have some estimate of our
pose, and we assume that the uncertainty of our estimate follows a Gaussian distribution, with
the estimate as the mean of the distribution. So we can replace x0 with 𝑥̂ in the Taylor series:
𝑓(𝑥) = 𝑓(𝑥̂) +
𝑓 ′ (𝑥̂)
𝑓 ′ ′(𝑥̂)
(𝑥 − 𝑥̂) +
(𝑥 − 𝑥̂)2 + …
1!
2!
In this case, the input x is really two poses, (X1, Y1, ϴ1, X2, Y2, ϴ2). The output f(x) is really
the vector function [f, g, h]. But to keep things general, we can just talk about the input as x, and
the output as f(x). Now we take a first order approximation, which means we chop everything
off the Taylor series after the first derivative:
𝑓(𝑥) = 𝑓(𝑥̂) + 𝑓 ′ (𝑥̂)(𝑥 − 𝑥̂)
Subtracting 𝑓(𝑥̂) from both sides, we get:
𝑓(𝑥) − 𝑓(𝑥̂) = 𝑓 ′ (𝑥̂)(𝑥 − 𝑥̂)
The paper makes another hand-wavy statement: “The mean values of the functions (to first
order) are the functions applied to variables means”. I could go ahead and derive this too, but I
won’t. What it means is that:
̂
𝑓(𝑥̂) = 𝑓(𝑥)
Substituting this into the equation above it, we get:
̂ = 𝑓 ′ (𝑥̂)(𝑥 − 𝑥̂)
𝑓(𝑥) − 𝑓(𝑥)
The paper again, in a rather hand wavy way, uses the word “deviate”. What they mean to say is
that they are using the delta sign, ∆, to signify how much a variable deviates from its mean value;
or rather, the difference between some random value for the variable and its mean value.
Substituting this into the equation above, we get:
∆𝑓(𝑥) = 𝑓 ′ (𝑥̂)(∆𝑥)
Now it makes sense to go back to the more specific vector form for this example. Since f(x) is
now a vector function, f’(x) is the Jacobian, J. Now we have the equations (2), and (3) below:
They split up J into submatrices H and K to distinguish between the (X1,Y1,ϴ1) and (X2,Y2,ϴ2)
inputs. It will become clear soon why this is useful. We now multiply both sides of eq (2) by
their respective transposes to get:
𝑇
∆𝑋1
∆𝑋1
∆𝑋1 ∆𝑋1 𝑇
∆𝑌1
∆𝑌1
∆𝑌1 ∆𝑌1
∆𝑋3 ∆𝑋3 𝑇
∆𝜃1
∆𝜃1
∆𝜃1 ∆𝜃1
[ ∆𝑌3 ] [ ∆𝑌3 ] = 𝐽
𝐽
=𝐽
𝐽𝑇
∆𝑋2
∆𝑋2
∆𝑋2 ∆𝑋2
∆𝜃3 ∆𝜃3
∆𝑌2
∆𝑌2
∆𝑌2 ∆𝑌2
[ ∆𝜃2 ] [ [ ∆𝜃2 ]]
[ ∆𝜃2 ] [ ∆𝜃2 ]
∆𝑋3 ∆𝑋3
[ ∆𝑌3 ∆𝑋3
∆𝜃3 ∆𝑋3
∆𝑋3 ∆𝑌3
∆𝑌3 ∆𝑌3
∆𝜃3 ∆𝑌3
∆𝑋3 ∆𝜃3
∆𝑌3 ∆𝜃3 ]
∆𝜃3 ∆𝜃3
∆𝑋1 ∆𝑋1
∆𝑌1 ∆𝑋1
∆𝜃1 ∆𝑋1
=𝐽
∆𝑋2 ∆𝑋1
∆𝑌2 ∆𝑋1
[( ∆𝜃2 ∆𝑋1
∆𝑋1 ∆𝑌1
∆𝑌1 ∆𝑌1
∆𝜃1 ∆𝑌1
∆𝑋2 ∆𝑌1
∆𝑌2 ∆𝑌1
∆𝜃2 ∆𝑌1
∆𝑋1 ∆𝜃1
∆𝑌1 ∆𝜃1
∆𝜃1 ∆𝜃1
∆𝑋2 ∆𝜃1
∆𝑌2 ∆𝜃1
∆𝜃2 ∆𝜃1
∆𝑋1 ∆𝑋2
∆𝑌1 ∆𝑋2
∆𝜃1 ∆𝑋2
∆𝑋2 ∆𝑋2
∆𝑌2 ∆𝑋2
∆𝜃2 ∆𝑋2
∆𝑋1 ∆𝑌2
∆𝑌1 ∆𝑌2
∆𝜃1 ∆𝑌2
∆𝑋2 ∆𝑌2
∆𝑌2 ∆𝑌2
∆𝜃2 ∆𝑌2
∆𝑋1 ∆𝜃2
∆𝑌1 ∆𝜃2
∆𝜃1 ∆𝜃2
𝐽𝑇
∆𝑋2 ∆𝜃2
∆𝑌2 ∆𝜃2
∆𝜃2 ∆𝜃2 )]
Now we take the expectation of both sides. The expectation function, E[x], of a random
variable, x, is the mean of the distribution of that variable. So the expectation of ∆𝑥, E[∆𝑥],
equals 0 because ∆𝑥 is defined as the deviation from the mean:
𝐸[∆𝑥] = 𝐸[𝑥 − 𝑥̂] = 𝐸[𝑥] − 𝐸[𝑥̂] = 𝑥̂ − 𝑥̂ = 0
But if we’re taking the expectation of the product of two delta variables, then it does not reduce
to 0.
𝐸[∆𝑥 2 ] ≠ 0
We now make the assumption that variables X1 and X2 are conditionally independent. This
means that the uncertainties associated with transformations A and B are uncorrelated. Then we
can use the fact:
𝐸[∆𝑋1 ∆𝑋2 ] = 𝐸[∆𝑋1 ]𝐸[∆𝑋2 ] = 0 × 0 = 0
So when we take the expectation of both sides of the large equation above, we get 0 elements for
the top right and bottom left of the large 6x6 matrix. The non-zero 3x3 submatrices left behind
are all individually in terms of (X1, Y1, ϴ1), (X2, Y2, ϴ2), and (X3, Y3, ϴ3). We call these C1,
C2, and C3, respectively. The C stands for covariance. Now we have equation (4):
This is now the full derivation for a compounding procedure. Going back to figure 1, the
chain of edges can be followed along from A to B to C to D, getting to L4. All of these can be
compounded recursively into G. Compounding is the first of two steps used in a Kalman filter.
The merging step is the second step. If the paper wasn’t hand-wavy enough, it gets even more
hand-wavy now, because it doesn’t even attempt to derive the merging step. It just lists the
equations (8), (9), and (10), and then it refers to Nahi 1976 (2) for the derivation. A good point
the paper makes, though, is the analogy to electric circuit theory, looking at resistances in series
or in parallel. Compounding can be thought of as adding resistances in series while resistances
in parallel would be combined with equation (11), which is the 1-dimensional form of the
merging equation:
Looking back at the network in figure 1, a merging operation would be two parallel edges, such
as G and S. S is formed from a sensing operation, while G is formed from a recursive
compounding of motions. First, the direction of the arrow for S must be reversed, so that it is
pointing in the same direction as G. This inverse operation is explained with figure 2 and
equations (5), (6), and (7).
Here, the B arrow is being reversed. Equation (5) is the geometrical interpretation of reversing
the arrow, assuming we’re talking about X,Y,ϴ space:
This is assuming B is the vector (X,Y,ϴ), and the reverse of B is the vector (X’,Y’,ϴ’). This is
another example where it is absolutely crucial to think of these transformations as vectors. The
first time I looked at equation (5), it was difficult for me to figure out what they meant by the
variables X, Y, ϴ, X’,Y’, and ϴ’. I wasn’t sure which of these were in world or relative
coordinates, and if relative, relative to what? Once the edges in figure 2 are viewed as Euclidean
vectors, it is clear that these variables are just relative to the start of the vector, wherever that
may be.
Towards the end of the paper, they give a mobile robot example. They discuss mainly
the sensing procedure. They bring up the point that uncertainty estimations can be used to
decide ahead of time if sensing is even feasible. Appendix A goes through a very nice
formulation of the parameter k, which is known as the Mahalanobis distance:
The threshold for this Mahalanobis distance can be set according to the specifications of the
problem at hand, in order to accept or reject potential sensing steps. For example, in a
localization problem, you have some estimate of the robot’s location, 𝑋̂, 𝑎𝑛𝑑 𝐶𝑥 . You also know
the location of landmarks in your environment. Let’s say there’s a landmark close by at location
x. The covariance is visualized as some ellipse around the robot location estimate, and the
equation above tells you how far out along the ellipse the landmark location is from the estimate.
If it’s not too far out, then you would decide to take the sensor reading to better correct the
estimate.
Once the sensor reading has been taken, you would then use the same equation to decide
if this reading is reasonable. It’s possible that there was a lot of noise in the reading, and it gave
you something false. But now the variables in the equation would be in sensor space,
representing the actual and estimated sensor readings.
I learned a lot from this paper, even though I bitterly whined about some of its hand
wavy-ness. I don’t know – maybe it would have been better to learn some of the fundamentals
from a textbook, but I haven’t been able to find a textbook so far that covers this material in any
sort of depth. Everyone keeps on telling me to read the Probabilistic Robotics textbook, but I
feel that the explanations in this text start out with even more assumptions. Alas, the life of a
grad student living on the cutting edge of research. People like to criticize me and tell me I’m
too detail oriented, that I should be leaving the details to the textbooks, but they rarely have any
textbooks to back up their claims. If you’ve gotten to the end of this review, and you think I’m
full of it, then please, give me some reading to do.
References:
(1) Smith, R. C., and Cheeseman, P. 1985. On the Representation and Estimation of Spatial
Uncertainty. SRI Robotics Lab. Tech. Paper, and to appear Int. J. Robotics Res. 5(4): Winter
1987.
(2) Nahi, N. E. 1976. Estimation theory and applications. New
York: R. E. Krieger.
Download