Pose Recognition using Distributed Sensing: Embedded Visual Patterns

advertisement
Pose Recognition using Distributed Sensing:
Radio Frequency Identification, Machine Vision, and
Embedded Visual Patterns
by
Brendon W. Lewis
Bachelor of Science, Mechanical Engineering
Tufts University, 2000
Submitted to the Department of Mechanical Engineering
in Partial Fulfillment of the Requirements for the Degree of
MASTER OF SCIENCE IN MECHANICAL ENGINEERING
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
MASCUSE1-TS STTE
OF TECHNOLOGY
September 2002
0 2002 Massachusetts Institute of Technology
All rights reserved
OCT 2 5 2002
LIBRARIES
Signature of Author ................................
..
..................
De partment of Mechanical Engineering
August 9, 2002
C ertified by .................................................
A ccepted by ......................................
..................
David L. Brock
Principal Research Scientist
Thesis Supervisor
..
. ..............
Ain A. Sonin
Chairman, Department Committee on Graduate Students
2
Pose Recognition using Distributed Sensing:
Radio Frequency Identification, Machine Vision, and
Embedded Visual Patterns
by
Brendon W. Lewis
Submitted to the Department of Mechanical Engineering
on August 9, 2002 in partial fulfillment of the
requirements for the degree of Master of Science in
Mechanical Engineering
Abstract
Automation systems require a certain amount of information about an object in order to
manipulate it. An important piece of information is the pose, or position and orientation,
of the object. In this thesis, I discuss the design and implementation of a system capable
of determining the pose of rectangular solids. The system combines radio frequency
identification (RFID), machine vision, and visual patterns in order to determine the pose
of an object. Traditionally, pose recognition requires either an intrusive or sophisticated
sensing system. The design relies on distributed sensing and embedded information in
order to reduce the complexity of the sensing system. The RFID component is used to
determine the presence of an object and then communicate physical information about
that object to the system. The vision system, which relies on visual patterns on the
surfaces of the object, gathers additional information that, in conjunction with the
dimensions of the object, is used to determine the pose of the object.
Thesis Supervisor:
Title:
David L. Brock
Principal Research Scientist
3
4
Acknowledgements
I would first like to thank my advisor David Brock for all his help. He made numerous
suggestions during both my research and writing that proved to be vital to the completion
of this thesis.
I would like to thank Dan Engels for all of his suggestions and help with the RFID
system. He helped me get the hardware, told me who to talk to about the software, and
explained to me much about the technology.
Many thanks go to the guys from Oatsystems for their help with the RFID system.
Prasad, Anup, Sumanth, Sridhar, and Gabriel all did a lot of work to help me get the
Intermec reader up and running.
I would also like to thank my officemates at the Auto-ID Center. Stephen, Tim, Robin,
and Yun provided a great work environment and many hours of good conversation. I
would additionally like to thank Stephen for helping me get acclimated to life at MIT.
Thanks to my roommate Tom who proofread and edited many of my drafts.
Lastly, I would like to thank my family and Kelly for being so patient and supportive
over the past two years.
5
6
Table of Contents
Chapter 1: Introduction ------------------------------------------1.1 Introduction ----------------------------------------------1.2 Motivation ------------------------------------------------1.3 Organization of Chapters ---------------------------------------
13
13
13
14
Chapter 2: Pose ------------------------------------------------2.1 Introduction -----------------------------------------------2.2 Coordinate Frames ------------------------------------------2.3 Measuring Position and Orientation -------------------------------2.3.1 Position --------------------------------------------2.3.2 Orientation ------------------------------------------2.4 Degrees of Freedom ------------------------------------------
17
17
17
18
18
19
21
Chapter 3: Radio Frequency Identification ----------------------------3.1 Introduction -----------------------------------------------3.2 Core Components --------------------------------------------
23
23
24
3.3 Reader Fields ----------------------------------------------3.4 Benefits of RFID --------------------------------------------
25
27
Chapter 4: Position Sensors ---------------------------------------4.1 Introduction
-----------------------------------------------4.2 Machine Vision
--------------------------------------------4.3 Tactile Arrays
---------------------------------------------4.4 Radio Frequency
--------------------------------------------
29
29
29
33
36
Chapter 5: Objects and Environment
--------------------------------5.1 Introduction
------------------------------------------------
39
39
5.2 Shapes of Objects
--------------------------------------------
39
5.3 Surface of the Environment ------------------------------------5.4 Object Coordinate Frame
--------------------------------------
40
40
Chapter 6: Object Presence and Identification -------------------------6.1 Introduction
-----------------------------------------------6.2 Auto-ID Center
--------------------------------------------6.3 Object Identification
-----------------------------------------6.4 Object Presence
---------------------------------------------
45
45
45
46
47
Chapter 7: Vision System
----------------------------------------7.1 Introduction
-----------------------------------------------7.2 Visual Pattern
----------------------------------------------7.2.1 Requirements of Pattern ---------------------------------7.2.2 Color Scheme
----------------------------------------7.2.3 Design of Pattern
---------------------------------------
49
49
49
49
50
51
7
7.2.4 Alignment of Pattern on Object ---------------------------7.3 Information from Image ----------------------------------------
51
52
Chapter 8: Distributed Sensing ------------------------------------8.1 Introduction -----------------------------------------------8.2 Sensor Range -----------------------------------------------8.3 Sensor Data ------------------------------------------------
55
55
55
58
Chapter 9: Image Processing --------------------------------------9.1 Introduction -----------------------------------------------9.2 Overview of Algorithms --------------------------------------9.3 Correct Lens Distortion --------------------------------------9.4 Color Image to Binary Image ----------------------------------9.5 Isolate Pattern ---------------------------------------------9.6 Locate Center ----------------------------------------------9.7 Calculate Angle --------------------------------------------9.8 Read Pattern -----------------------------------------------
61
61
61
62
63
66
69
71
73
Chapter 10: Implementation --------------------------------------10.1 Introduction ----------------------------------------------10.2 Hardware -----------------------------------------------10.3 Overview of Software --------------------------------------10.4 Query Reader --------------------------------------------10.5 Access Physical Information ----------------------------------10.6 Image Capture --------------------------------------------10.7 Image Processing ------------------------------------------10.8 Combine Information ----------------------------------------
77
77
77
79
79
80
83
83
84
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
91
91
91
92
93
-------------------------------------------
97
Chapter 11: Analysis
11.1 Introduction
11.2 Accuracy
11.3 Repeatability
11.4 Scalability
Chapter 12: Conclusion
References
-----------------------------------------------------
8
99
List of Figures
Figure 2.1: Environment, Object, and Coordinate Frames -----------------------Figure 2.2:
Figure 2.3:
Figure 2.4:
Figure 3.1:
Figure 4.1:
Figure 4.2:
Figure 4.3:
Figure 4.4:
Figure 4.5:
Figure 5.1:
Figure 5.2:
Figure 5.3:
Figure 6.1:
Figure 7.1:
Figure 7.2:
Figure 8.1:
17
Position Vector P ------------------------------------------18
Components of unit vector u in the directions of the axes of OXYZ ---- 19
Lower-order Pair Joints -----------------------------------22
RFID System Components ---------------------------------23
Object in Environment and Machine Vision Representation ----------- 30
Base of Field of View Proportional to Distance from Camera Lens ----31
Error in Orientation Measurement due to Discretization of Image ------- 32
Object in Environment and Tactile Sensor Array Representation ------35
Error in Measurements due to Size and Spacing of Tactile Sensors ----35
Object Dimensions and Possible Coordinate Frames ---------------41
Front and Back Sides of Object and Possible Coordinate Frames ------42
Object with All Sides Labeled and Coordinate Frame at Center -------43
Antennas Monitoring Shelf Sections --------------------------47
Patterns for each of the Six Sides of an Object -------------------52
Positioning of Pattern on Object Side -------------------------53
RFID System Antenna and Reader Field -----------------------56
Figure 8.2: Machine Vision Field of View -------------------------------
57
Figure 8.3: Required Overlap of RFID and Machine Vision System Ranges -------Figure 9.1: Grayscale Image ----------------------------------------Figure 9.2: Histogram of Grayscale Values
-----------------------------Figure 9.3: Binary Image
------------------------------------------Figure 9.4: Explanation of Collections, Connectivity, Paths, and Neighbors -------Figure 9.5: Additional White Pixels from Pattern on Side of Object ------------Figure 9.6: Rectangle used to find Center of Pattern -----------------------Figure 9.7: u-axis describing Orientation of Pattern ------------------------Figure 9.8: Pattern with 180-degree Offset ------------------------------Figure 10.1: Pose Recognition System Implementation Setup -----------------Figure 10.2: Generating a URL that points to a PML file ---------------------
58
64
65
66
67
69
70
71
75
78
81
Figure 10.3: PMLDTD
82
Figure
Figure
Figure
Figure
10.4:
10.5:
10.6:
10.7:
--------------------------------------------
Example PML file
-------------------------------------82
Top View of Environment and Camera's Field of View ------------- 85
Side View of Environment and Camera's Field of View -----------86
Front View of Environment and Camera's Field of View ----------88
9
10
List of Tables
Table 9.1: Locations of Centers of Squares in Pattern -----------------------------------74
Table 10.1: Calculation of the Elements of the Rotation Matrix ---------------------------- 89
11
12
Chapter 1: Introduction
1.1 Introduction
In this thesis, I present a system that combines machine vision and radio
frequency identification in order to determine the pose of an object. There are many
different types of sensors that are capable of sensing various types and amounts of data.
Different sensors are appropriate for different applications depending on the information
that is required. However, there are some applications in which it is difficult to gather the
appropriate information using a single type of sensor. Pose recognition, which is the
identification of the position and orientation of an object, is a task that is more easily and
accurately performed by multiple sensors.
Many types of sensors are capable of determining some information about an
object's position and orientation. Machine vision is a type of sensor that is capable of
providing much pose information. However, if the vision system does not have
information about the identity and geometry of the object that it is sensing, the task is
more difficult. Radio frequency identification systems provide a means for an object to
identify itself to the sensing system. The pose recognition system that I developed uses
both types of sensing systems to determine the pose of an object.
1.2 Motivation
At the Auto-ID Center at MIT, researchers are developing an infrastructure for
describing physical objects, storing these descriptions, and accessing the information.
They are trying to "create a universal environment in which computers understand the
13
world without help from human beings" [1]. In most applications, a radio frequency
identification system is used in order for an object to communicate its identity, which is
used to access information about that object.
There are many applications, particularly in the supply chain, that can take
advantage of this infrastructure. These applications primarily involve inventory
management. Other applications, such as warehouse automation, require location
information. The Auto-ID system provides a description of the world to computers, but
the coarseness of the information limits the range of applications. It can only determine
the location of an object to within a reader field. Warehouse automation systems may
require more accurate location information. In order to manipulate an object, its pose
must be known. The pose recognition system described in this thesis incorporates the
Auto-ID infrastructure, together with embedded structure and sensors, in order to
determine the pose of objects.
1.3 Organization of Chapters
This thesis is divided into two major sections. The first section provides
background information related to pose recognition. In Chapter 2, I will discuss
coordinate frames, degrees of freedom, and pose definition. In Chapter 3, I will discuss
radio frequency identification, which is a key component of my pose recognition system.
In Chapter 4, I will discuss the different position sensing technologies that are currently
available. Machine vision, which is one of these technologies, is another key component
of my pose recognition system.
14
The second section of this thesis introduces a new approach to pose recognition.
In Chapter 5, I will establish the assumptions related to the objects and the environment.
In Chapter 6, I will explain how the system uses the Auto-ID Center infrastructure to
become aware of the presence and identification of an object. In Chapter 7, I will explain
the requirements and design of the visual patterns used by the vision system. I will also
discuss the information that the vision system will gather from an image of the
environment. In Chapter 8, I will discuss how the system combines data from the radio
frequency identification infrastructure and vision system in order to complete the task. I
will also discuss the ranges of each system and the required overlap of these ranges. In
Chapter 9, I will explain the image processing algorithms used to extract pose from an
image. In Chapter 10, I will explain my implementation of the pose recognition system.
In Chapter 11, I will analyze the design and implementation of the system. Finally, in
Chapter 12, I will discuss my conclusions and areas of possible future work.
15
16
Chapter 2: Pose
2.1 Introduction
To determine the pose of an object, we must first define pose. An object's pose
consists of its position and orientation with respect to a given coordinate frame. In the
case of the pose recognition system described in this thesis, the pose of the object will be
described with respect to a fixed coordinate frame of the environment.
2.2 Coordinate Frames
To describe the pose of an object, we must specify two coordinate frames: one for
the object and one for the environment. Figure 2.1 shows the environment, an object, and
their corresponding coordinate frames. The coordinate frame of the object is a body-
Y
0
X
Figure 2.1: Environment, Object, and Coordinate Frames
coordinate frame, meaning the coordinate frame is fixed to the object and moves with it.
The pose of the object can be described simply as the object's coordinate frame. Here we
name the object's coordinate frame as oxyz, with its origin o and principal axes x, y, and
17
z. The coordinate frame of the environment, OXYZ, is a fixed coordinate frame with
origin 0 and its principle axes X, Y, and Z [2].
2.3 Measuring Position and Orientation
Once the two coordinate frames have been specified, we can determine the pose
of the object by measuring the position and orientation of oxyz with respect to OXYZ.
2.3.1
Position
The position of the object is the position of the origin o with respect to point 0.
Figure 2.2 shows frame OXYZ, point o, and the position vector P that measures the
position of point o. The vector P, which has components px, py, and pz, can be
written as
P = pxx
+ py Uy +Pz
Uz
'
where ur , u Y, and uz are unit vectors along the X, Y, and Z-axes, respectively.
Y
x
Figure 2.2: Position Vector P
18
(2.1)
2.3.2 Orientation
The orientation of frame oxyz is measured about the axes of the frame OXYZ.
For an arbitrarily oriented frame oxyz, unit vectors u , u, , and u along each of the axes
can be written in terms of their components along the axes of frame OXYZ so that
HX = r Iu
(2.2)
+ re2uY + r13Uz ,
H_
= r2 uX + r22 uY + r23 -Z
ut
= r3 u
5and
(2.3)
(2.4)
+ r3 2 uY + r 3 3 H z.
Figure 2.3 shows such a frame oxyz along with the components r , r , and r of the unit
vector u . Using all of these components, one can develop a rotation matrix R that
describes the orientation of the body fixed frame oxyz with respect to the environment
frame OXYZ. The rotation matrix is written as
R=
r,
r52
r13
r2 ,
r22
r 23
r31
r32
r3 3 .
(2.5)
.
z
r13
,1
-........... ...............
0
r j
0
X
Figure 2.3: Components of unit vector ux in the directions
of the axes of OXYZ
19
The rotation matrix can then be used to calculate angles that describe the
orientation. There are many different representations such as X-Y-Z fixed angles or Z-YZ Euler angles [3].
For the X-Y-Z fixed angle representation, the first rotation is about the X-axis by
an angle y. The second rotation is about the Y-axis by an angle 8, and the third rotation
is about the Z-axis by an angle a.
The X-Y-Z fixed angles are calculated using the following equations:
For cos(p3)
#00,
r,22j
p6=tan
- r 31
(2.6)
,
r + r2
a = tan 1_
cos(p) , and
rcos(p3)
(2.7)
y = tan_
/cos(p)
(2.8)
cos(p)
For cos(p8) =00,
p=
(2.9)
90,
(2.10)
a =0', and
.
S= ±tan-'
(2.11)
r22
For the Z-Y-Z Euler angle representation, the first rotation is about the Z-axis by
an angle a, creating frame X'Y'Z'. The second rotation, creating X"Y"Z", is about Y'
by an angle 81, and the third rotation is about the Z"-axis by an angle y .
20
The Z-Y-Z Euler angles are calculated using the following equations:
For sin(fl)
$00,
6= tan!
(2.12)
,,
+
r3
,
233
ta
/'3sin(6)J
,ad(.3
sin(p)
y = tan-tai[32 si
,
(2.14)
sin(i),
For sin(p8) =00,
P =0' or 1800,
(2.15)
a =0', and
(2.16)
y = tanj-2
or tanJ1
.
(2.17)
2.4 Degrees of Freedom
The number of degrees of freedom of an object is the number of independent
variables necessary to locate it. The degrees of freedom also express the amount of
information that is unknown in the pose of an object. For an object that has no
constraints on its position, six variables are necessary to determine its pose: three position
and three orientation.
The degrees of freedom are reduced if the object's movement is constrained by
another object. This constraint may be described as a joint. If objects touch each other at
21
a point or along a line, the connection is called a higher-order pair joint. If the two
objects contact each other on a surface, the connection is termed a lower-order pair joint.
As a reference, Figure 2.4 shows each of the six types of lower-order pair joints and the
number of degrees of freedom associated with each [4].
Spherical pair (S-pair)
Revolute pair (R-pair)
C Co
Planar pair (E-pe')
Prismatic pair (P -pair)
bre wpair (H-pair)
CyIindrical pair (C-pair)
Figure 2.4: Lower-order Pair Joints Courtesy: David L. Brock
22
Chapter 3: Radio Frequency Identification
3.1 Introduction
I used Radio Frequency Identification (RFID) as an integral part of my pose
recognition system. RFID is not a new technology, but recently has been used in a
number of supply chain applications. In an RFID system, data is transmitted using radio
frequency waves and does not require human interaction. The pose recognition system
used RFID to identify objects and their presence in the environment. A typical RFID
system consists of tags and readers, together with a host computer that controls and
communicates with the reader. Figure 3.1 shows the key components of an RFID system
and how they interact. A reader constantly sends out signals to create a reader field. Any
Tag
Host Computer
Reader (including
Transceiver)
Figure 3.1: RFID System Components
23
Antenna
tags within the range of the field use the signal as a source of power, and send their own
signal back to the reader, where it is decoded to determine the identity of the tag.
3.2 Core Components
The tags contain data that can be transmitted through radio frequency (RF)
signals. Tags can either be passive or active. Passive tags consist of a small memory
chip and an antenna. They are powered by the RF signal from a reader's antenna. Once
powered, the tag modulates the reader's signal in order to transmit its data [5]. An active
tag, on the other hand, actively transmits its data to a reader. In addition to a chip and an
antenna, active tags have batteries and do not require the presence of a reader's RF signal
for power. Instead of modulating the reader's signal, the active tag transmits its own
signal. Because of the lack of a battery, passive tags are often smaller and less expensive
than active tags [6]. However, the active tags' batteries allow them to be read over a
much greater distance.
There are three different kinds of tag memory: read-only, write once read many
(WORM), and read/write. Read-only tags are programmed by the manufacturer, and the
data stored cannot be changed. WORM tags can be programmed by the user, but the data
cannot be changed once it has been programmed. Read/write tags can be programmed
and changed by the user [5,7,8].
The readers are responsible for transmitting and receiving RF signals from the
tags. The reader's transceiver has two major functions. First, it generates the RF energy
that is used to send a signal to power passive tags. For a system that uses active tags, the
reader is not required to send a signal to power them. The transceiver also filters and
24
amplifies the signal that is returned by a tag [9]. The reader controls both of these
functions. In order for the reader to transmit an RF signal, the transceiver supplies the
power and the signal is sent through the reader's antenna. The antenna is the part of the
system that transmits and receives the RF signal [5]. When the antenna receives a signal,
the signal is sent to the transceiver, where it is decoded in order to interpret the data the
tag has sent. The reader then sends this data to the host computer, where it is used to run
an application.
3.3 Reader Fields
When the reader transmits an RF signal, an electromagnetic field is created in the
vicinity of the reader. The RF energy of this field supplies power to the tags. The size of
the reader field is dependent on the power available at the reader. For a circular loop
antenna, which is a common type of antenna, the strength H(x) of the electromagnetic
field is described by the equation
INRR 2
2(R
2
+x
(3.1)
2 3
) /2
where I is the current in the antenna coil, NR is the number of turns in the antenna coil,
R is the radius of the antenna, and x is the perpendicular distance from the antenna.
From equation (3.1), the strength of the field is inversely proportional to the cube of the
distance from the antenna, and the power to the tags decreases accordingly.
The amount of power that the tag can generate is a factor that affects the
performance of an RFID system. Even if the signal sent by the receiver is strong enough
to power a tag at a given distance, the signal that the tag sends must be strong enough to
25
be detected by the reader. Although the tag's power does not affect the actual reader
field, it does affect the read range, which is the distance across which a reader can read
tags. For this reason, active tags can be read across much greater distances than passive
tags.
The frequency of the reader's signal and the environment in which an RFID
system is implemented affect the performance and read range of the system. The
frequencies are commonly divided into the following categories:
.
Low Frequency (0 - 300 kHz),
.
High Frequency (3 - 30 MHz),
.
Ultra High Frequency (300 MHz - 3 GHz), and
.
Microwave (> 3 GHz) [10].
Some mediums within the environment can act to reflect, absorb, or block certain RF
signals [11]. Therefore, the frequency of the signal for an RFID system should be chosen
based on the environment in which the system is operating. For a given medium, signals
of certain frequencies can pass through without being affected, while signals of a
different frequency might be weakened or even blocked. For example, "[t]he absorption
rate for water at 100 kHz is lower by a factor of 100,000 than it is at 1 GHz" [10]. In
general, the lower the frequency of the RF signal, the more it is able to penetrate through
materials [12]. The frequency of the signal must be chosen so that the signal is not
blocked by anything in the read range of the reader. Otherwise, tags that should be
located within the read range of the reader would not be read and the system would not
operate effectively.
26
If the power and frequency requirements are met, the system should be able to
effectively read a tag located within the read range of the reader. However, many of the
applications require that the system be able to simultaneously read multiple tags located
within the same reader field. This characteristic of an RFID system is called anticollision [13]. A discussion of the anti-collision problem and an example of an anticollision algorithm are presented in [14].
3.4 Benefits of RFID
There are many aspects of RFID systems that make them attractive in industrial
applications. One of the advantages of RFID is that the interaction between the reader
and tags is not line of sight. This is very important for many applications in which it is
difficult or impossible to use alternate technologies such as barcodes. For example, if a
tag is within a reader field, but there is another object between the tag and the reader, the
tag will still be read as long as the system uses a signal with an appropriate frequency.
One current application of RFID is for animal identification [15]. A pet can be
implanted with an RFID tag so that if the pet becomes lost, its tag can be read and its
owner identified. The tag is inserted under the animal's skin, so the reader and the tag
must be able to communicate through its skin. This would be impossible for systems
requiring line of sight.
Another benefit of RFID systems is autonomy. Software can control the
operation of the reader and store information received from tags. This ability of the
systems to operate unattended by humans makes them very useful.
27
In most warehouses, barcodes and scanners are used to manage inventory. When
a product is being shipped or received, a worker has to physically scan the barcode so
that the inventory in the warehouse is accurately recorded in the system. Using RF tags
and readers, the operation of most warehouses could be radically improved by
eliminating human labor through automatic inventory.
Rapid identification is another benefit of RFID systems. Automatic toll collection
is now a common use of RFID technology. Cars with appropriate tags can pass through
the reader field without stopping and are identified within milliseconds [16].
28
Chapter 4: Position Sensing
4.1 Introduction
I used machine vision, in conjunction with RFID, to determine the pose of an
object. There are many different kinds of sensors that can be used to determine the
position and orientation of an object. Usually, a single type of sensor is used for pose
recognition, but in this thesis I explore combining multiple sensor types (specifically
vision and RFID) to perform the task. I will briefly discuss some of the different kinds of
position sensors, their limitations, and some of the pose recognition research.
4.2 Machine Vision
Machine vision is one of the most common types of sensing in robotics [17].
These systems try to mimic or take advantage of one of the most powerful human senses
- sight. Vision systems take a picture of the environment and then process that image to
make decisions about the state of the environment. The two key steps in machine vision
are image capture and image processing.
A vision system must first capture an image of the environment. Any camera or
image capture device can be used for this purpose. The picture that the camera takes is
composed of many small colored squares. Each square, or pixel, is entirely one color.
However, since the pixels are so small, the camera is able to generate an accurate picture
of the environment by using thousands, or even millions, of pixels. The image is encoded
into an array by storing the color of each pixel in an element of a two-dimensional array
[18]. A binary image is represented by an array of 2-bit values, a grayscale image is
29
often represented by an array of 8-bit grayscales values, and a color image is typically
represented by an array of 24-bit numbers, each of which contains the red, green, and
blue color values. In Figure 4.1, there is a schematic and binary image of a rectangular
solid object lying on a flat surface.
Y
Y
11i
(a)
x
i
(b)
x
Figure 4.1: Object in Environment and Machine
Vision Representation
Once the machine vision system has a data structure that represents the image of
the environment, it extracts information from the image using various image processing
techniques. The type of information being captured depends on the specific application,
but machine vision systems are frequently used for locating objects. Because vision
systems are calibrated, the system is able to determine the location of an object in the
environment from the position of the pixels representing that object in the image. The
area A -, represented by a single pixel is equal to the area Afov of the base of the field of
view of the camera divided by the number of pixels in the image. The area Afov is
proportional to the distance from the camera lens to the base of the field of view, as
shown in Figure 4.2.
30
Camera lens
Base of field
of view
Figure 4.2: Base of Field of View Proportional to
Distance from Camera Lens
The error in position due to the discretization of the image is bounded by
where 1 is the length of the side of a pixel and is equal to
Ap~e .
,
The system can also
determine the orientation of the object in the environment from the pixels representing
the object in the image. The error in orientation can be calculated based on the error in
position. For example, in Figure 4.3 the black line represent the longest edge of an
object, which has length L. Taking the maximum amount of error in the vertical position
of the pixels at the ends of the edge, we can calculate the maximum error 1 in the
orientation of the object. The error in orientation due to the discretization of the image is
bounded by tan'
(L).
For a given field of view of a camera, as the number of pixels
increases, the amount of error decreases.
31
2Y2
Figure 4.3: Error in Orientation Measurement due to
Discretization of Image
While it provides a great deal of pose information, there are limitations of
machine vision. One disadvantage of machine vision is that it generates a twodimensional representation of the environment. Machine vision systems can use multiple
cameras, motion, or other techniques to attempt to generate a three-dimensional
representation of the environment, but the image processing algorithms become much
more complicated [19]. Another problem with machine vision is that it is difficult to deal
with symmetry and differentiate between object faces with similar dimensions. Using a
common image processing technique called edge-detection, a machine vision system can
isolate the edges or boundaries of an object. A system would likely use edge-detection to
determine the position of an object. Since the edges could convey some information
about the dimensions of the object, it would also be possible to gather some information
about the orientation of the object. However, because the system would not be able to
distinguish between two sides of like dimensions, it would not be able to completely
determine the pose of the object. Lighting can be another problem when using machine
32
vision systems. If light reflects off of a black surface, the camera could take picture that
depicts an inaccurate representation of its surroundings.
There has been a great deal of research in using machine vision systems for pose
recognition. Some of the systems determine the pose of an object through the use of
multiple cameras. One such system uses six image capture devices to take pictures of
each side of an object [20]. Such a system would be very intrusive in many industrial
settings. Other systems use two cameras to determine depth through stereo vision.
However, these systems are very difficult to calibrate and can require significant
computational power [19]. Three-dimensional information may also be obtained from a
two-dimensional image through a-priori information. The vision systems use feature
vectors or geometric models to identify the object [21,22,23]. Once the object has been
identified, the system uses its knowledge of the particular object to estimate its pose from
the two-dimensional image. These systems store a significant amount of data about each
object. Because of the use of models and feature vectors, these systems are appropriate
for determining the pose of oddly shaped or unique objects. However, the systems may
have difficulty identifying and locating objects with symmetric faces or multiple objects
of the same shape.
4.3 Tactile Arrays
While machine vision systems are able to determine the position of an object
without touching it, there are other sensing techniques that rely on physical contact in
order to determine its position. Tactile array sensors measure a pressure profile that
results from contact with an object [24].
33
In [25], the author describes a mobile tactile array that a robot can use to explore
its environment. The array of sensors is attached to a tool that the robot can grasp and
use to gather information about the objects located in its workspace. Other research has
focused on embedding tactile sensors in the environment. A stationary array fixed to the
surface of the environment can be used to determine the presence of an object in the
environment [26]. The array can determine the "location, orientation, and shape of
objects in the workspace of a robot" [27]. An object located on the surface will compress
the tactile sensor, creating a profile similar to the pixels of an image. The sensor profile
is stored in a data structure, which can then be used to determine the position and
orientation of the object.
If the position and orientation of the object is the only necessary information,
force-sensing tactile sensors are not always necessary. A binary contact sensor is a form
of tactile sensor that determines the presence of a reaction force, but does not measure the
force itself. The sensor simply measures a profile of where forces are being applied, but
gives no information about the magnitude of the forces.
Figure 4.4a shows the same object as in Figure 4.1 a, but with a binary touch
sensor array introduced onto the surface of the environment. The object rests on the
environment, and all of the touch sensors beneath the object are being pressed. The rest
of the sensors feel no force. Figure 4.4b depicts the sensor array representation of the
object's location.
Without any knowledge of its shape, it may be difficult to accurately determine
the position and orientation of the object. The degree of accuracy to which the object's
pose can be determined is dependent upon the size and spacing of the sensors compared
34
AkY
A
00
0 00
00
00
00
0 00
0
00
00
00
00
0 00
0 00
0 00
0 00
0 00
00
0 00
00
0 00 00 00
000 0 0 0 0
00
0
0 0 0 0000
0
00
00
00
00
0
0
000
0
0 0 00
000
0 000
00
00
00
00
0 00
0 00
0 0 00
000000000000000
0
00
0
x
(a)
00
00
00
00
0 000
0
0
0
00
00
0 000
0
00 0 00 000
0 0 0 00 00
0000OS
0
000
0000
0
0 00
0 00
0
00000
00
00
00
00
@0
0 000
0 000
00
00
00
00
0 00
00
00
00
0
0
00
0
0
0
0
x
(b)
Figure 4.4: Object in Environment and Tactile Sensor Array
Representation
to the size of the object. For example, in Figure 4.5 the black line represents the longest
edge of an object, which has length L. The error in position is bounded by (s- d 2)'
where d is the diameter of each of the tactile sensors and s is the perpendicular distance
between each sensor. Taking the maximum amount of error in the vertical position of
2
diamnter
2
=d
S
S
Figure 4.5: Error in Measurements due to
Size and Spacing of Tactile Sensors
35
each end of the edge, we can calculate the maximum error 5 in the orientation of the
object, which is equal to tan 1 ( 2s- d.
In addition to requiring contact, tactile arrays have limitations similar to those of
visions systems. The arrays provide a two-dimensional representation of the
environment, which is inadequate for determining three-dimensional pose. They also
have the problem of differentiating between sides of like dimensions, and do not have the
ability to use markings to overcome this problem.
4.4 Radio Frequency
Another method of locating objects is through the use of radio frequency
triangulation. The Global Positioning System (GPS) relies on a network of satellites that
orbit the earth and transmit signals to receivers that determine position to within a few
meters.
In order to determine location, a receiver first calculates its distance from a number
of satellites. Because the satellites reside outside of the earth's atmosphere, and its
gravity and wind resistance, their locations are relatively easy to calculate [28]. In order
to ensure the accuracy of GPS position measurement, the satellites are constantly tracked
by the Department of Defense [29]. The receiver calculates its distance from a satellite
by multiplying a known velocity of the signal by the time required for the signal to travel
from the satellite to the receiver. The travel time is measured by comparing a Pseudo
Random Code, which is different for each satellite, with the version of the code sent by
36
the satellite [28]. The signal received from the satellite is delayed by an amount equal to
the travel time of the signal.
The GPS receiver accepts signals from at least four different satellites when
calculating its position. Position is computed in the following way. Given the distance
from one satellite, the position of the receiver is limited to the surface of a sphere
centered at the satellite with a radius equal to that distance. With two satellites, the
position is limited to the circle that spans the intersection of the two spheres surrounding
the two satellites. By adding a third satellite, the position of the receiver is further
narrowed to the two points that lie on the intersection of the three spheres. The fourth
satellite is used to account for errors in the calculation of travel time [28].
There are some obvious limitations of GPS. First, a GPS receiver can only
determine its location to within a few meters. This is not accuracy enough for automated
systems to locate and acquire an object. Second, the system cannot determine
orientation. Third, GPS signals cannot be received inside structures and are occluded by
terrain and foliage [30].
The Differential Global Positioning System (DGPS) provides the same information
as GPS, but with greater accuracy. By using a stationary receiver with a known position,
DGPS is able to eliminate much of the error in the calculation of travel time. Other
receivers, which are in contact with the stationary receiver, can apply correction factors
to their position measurements in order to reduce errors [31]. DGPS is able to determine
position to within less than a meter [32]. However, it shares other limitations with GPS
such as its inability to determine orientation or determine position within buildings.
37
Another use of radio frequency is Real Time Locating Systems (RTLS). These
systems are capable of locating objects within buildings, and are often used to track
assets within warehouses and other buildings [33]. RTLS works by using locating
devices that are installed within a building and tags that are attached to the objects within
that building. RTLS systems can track objects using fewer locating devices than the
number of readers required by an RFID system [34]. While RTLS solves the problem
that GPS has with locating objects within buildings, is does not meet the needs for pose
recognition. RTLS can only determine an object's location to within a few feet [35], and
cannot determine the orientation.
38
Chapter 5: Objects and Environment
5.1 Introduction
In this chapter, I will introduce assumptions and constraints on the objects and
environment. These constraints aid pose recognition while providing utility in common
commercial and industrial situations. I will also propose definitions for the object
coordinate frame that describes the pose of an object.
5.2 Shapes of Objects
I limited the class of objects that could be located by the pose recognition system
to rectangular solids. This limitation was justified based on current warehouse standards.
For Consumer Product Goods (CPG), which accounts for a large number of warehouses
in the U.S., the majority of objects that are manipulated in various stages of the supply
chain are rectangular solids. When products are shipped from the factory to the
warehouse, or from the warehouse to the retailer, they are usually shipped in a box, which
can be modeled as a rectangular solid. Also, products are often packed into containers or
stacked on pallets, both of which may also be modeled as rectangular solids. While they
reside in the warehouse, most products remain in boxes. When items are unloaded from
the shipping boxes, many are packaged in boxes. Cereal, crackers, microwaves, and
coffee makers are examples of products that are packaged in boxes.
39
5.3 Surface of the Environment
The pose recognition system was designed to operate in an environment that has a
flat, level surface. This type of surface was chosen for reasons similar to the choice of
rectangular solid objects. The shelves on which products are placed in most warehouses
have a similar flat and level surface. Also, the environment provides a stable surface on
which the objects can reside.
5.4 Object Coordinate Frame
To describe pose, I used a coordinate frame together with named faces for the
surfaces of the solids. The coordinate frame was fixed at the geometric center of the
solid with coordinate axes parallel to the faces of the solid. The names for the faces of
the object were based on its dimensions. For the purposes of this thesis, the length,
width, and height of the object will correspond to the longest, middle, and shortest
dimensions, respectively.
I defined the x-direction parallel to the height, y-direction parallel to width, and zdirection parallel to length. Figure 5.1 shows a rectangular solid, with its dimensions
labeled, along with the four possible coordinate frames that obey the right-hand rule.
The front and back of the object are parallel to the y-z plane and have dimensions
of length and width. The right and left are parallel to the x-z plane and have dimensions
of length and height. The top and bottom are parallel to the x-y plane and have
dimensions of width and height. For each pair of sides of like dimensions, an object
often has a primary side. For example, the front of a box of cereal is easily
distinguishable from the back. The same is true of the top and bottom of the box. For
40
each object, the front, right, and top will be considered the primary sides. The back, left,
and bottom are the secondary sides.
z
x
x
Length (L)
z
z
y
y
j Height (H)
Z
Width (W)
Figure 5.1: Object Dimensions and Possible Coordinate Frames
Each of the primary sides is located in the positive direction of the coordinate axis
to which it is perpendicular. The front is in the positive x-direction, the right is in the
positive y-direction, and the top is in the positive z-direction. I further constrained the
possible coordinate frames by choosing a primary side from one of the three pairs of
sides. For example, I chose one of the sides with dimensions of length and width as the
front, and used this side to determine the positive x-direction. Figure 5.2 shows the
object with its dimensions and front and back sides labeled, along with the two possible
coordinate frames that obey the right-hand rule.
41
z
Back
y
L
z/
Front
W
Figure 5.2: Front and Back Sides of Object and
Possible Coordinate Frames
Finally, I chose another primary side from one of the pairs of sides. I chose one
of the sides with dimensions of length and height as the right, and used this side to
determine the positive y-direction. This reduced the number of permissible coordinate
frames to one, and also constrained which side must be the top.
In addition to its name, each side was labeled with a number. This number was
embedded in the visual pattern as discussed in Chapter 7. The sides were numbered in
order of increasing surface area. For each pair of faces, the primary side was numbered
first, and the secondary side was numbered second. The front, back, right, left, top, and
bottom were numbered sides 1, 2, 3, 4, 5, and 6, respectively. Figure 5.3 shows the
object with each of its sides labeled and its coordinate frame fixed at its center.
42
Top (5)
z
Left (4) --
-Right
4-
L
Back (2)
1
x-
Front (1)
H
W
Bottom (6)
Figure 5.3: Object with all sides Labeled and
Coordinate Frame at Center
43
(3)
44
Chapter 6: Object Presence and Identification
6.1
Introduction
The pose recognition system relied on the Auto-ID Center infrastructure, which
makes use of RFID, in order to detect the presence of an object, and identify and describe
it. In order to use the infrastructure in this manner, each of the objects was fitted with an
RF tag, and the environment was monitored by a reader.
6.2 Auto-ID Center
Researchers at the Auto-ID Center at MIT have developed a framework for
describing physical objects and linking objects to their descriptions. This framework is
composed of three main components: the Electronic Product Code (EPC), Physical
Markup Language (PML), and Object Naming Service (ONS).
The Universal Product Code, or UPC, which is the numbering scheme used for
barcodes, uniquely identifies products. All products have different UPC codes, but all
instances of a particular product have the same UPC. The EPC is a numbering scheme
that is used to uniquely identify physical objects. Each instance of a particular product
has a different EPC. Designed as a 96-bit number, the amount of objects that can be
uniquely identified with EPCs should be more than sufficient for the foreseeable future.
More description of the EPC, as well as its design philosophy, is presented in [36].
PML is a language designed for describing physical objects. Based on the
eXtensible Markup Language (XML), PML is used to contain such information as the
geometry, location, and temperature of a physical object. The language was designed to
45
allow computers to easily access information about objects that could be used to perform
such tasks as business transactions or automated processes. More information about the
design approach and some of the core components is presented in [37,38].
In the Auto-ID infrastructure, the EPC is the method for uniquely identifying
objects, PML is the way of describing objects, and the ONS is the way of linking the
object to its information. Given an EPC, the ONS determines the network location of the
information about the object identified by that EPC [39]. This information, which is
received in the PML format, can then be used in further applications.
In addition to these components, the Auto-ID Center is investigating possible
applications of the infrastructure. While not restricted to RFID, the EPC has primarily
been stored in an RF tag that is affixed to the object. The EPC is read using a reader,
which is connected to a host running ONS. Once the network location of the information
is determined, the system obtains a PML file describing the object.
6.3 Object Identification
The pose recognition system relied on an RFID reader embedded within the
environment. When an object entered the environment of the system, the RFID system
read its EPC and used it to determine the identity of the object. In order for this to be
possible, the system had some awareness of the various objects that were tagged. For
each of the objects, there was a certain amount of information that was known. The EPC
of each object was the first important piece of information. For pose recognition, it was
helpful to know the geometry of the object. This information was stored for each object
that interacted with the pose recognition system. The EPC was used to determine the
46
location of the information, and the object descriptions were returned in a PML file that
was easily readable by the system. The descriptions were a key element of the pose
recognition system, and were necessary to determine the pose of the object.
6.4 Object Presence
In addition to gathering information about the object that entered the reader field,
the system also determined the presence of the object within the range of the system.
Once an object entered the reader field, the position of that object was limited to the
space contained within the reader field. The object's presence was what triggered the
system to use additional sensing capabilities to determine the object's pose.
Figure 6.1 shows a shelf that is monitored by a number of reader antennas. If an
object is present on the shelf, then its tag will be powered and read by the antenna within
Figure 6.1: Antennas Monitoring Shelf Sections
whose field the object is located. Before entering this field, the object could have resided
in another reader field, or could have been in a location out of the reach of any readers.
In the first case, the general location of the object would change from one reader field to
another. In the second case, the object's location is narrowed from somewhere in the
47
environment to somewhere within the reader field that is currently reading the tag. The
system could then determine where it should focus its additional sensing to determine the
pose of the object.
48
Chapter 7: Vision System
7.1
Introduction
After the RFID system determined the presence of an object, the vision system
captured an image of the environment. By this time, the pose recognition system already
had a good deal of information about the object. The RFID system had determined that
an object was within the range of the reader's antenna, and had identified the object and
determined its dimensions. The vision system was the component of the pose recognition
system that provided the last few pieces of information necessary to determine the pose
of an object.
7.2 Visual Pattern
The vision system relied on a visual pattern printed on the center of each side of
the object. Instead of trying to capture an image of the entire object, the system used the
pattern to determine the object's position and orientation.
7.2.1 Requirements of Pattern
The pattern had a number of functions. Since it was located at the center of the
side of the object, the system was able to determine the center of the object by calculating
the center of the pattern. The same was true for the orientation of the object. By
calculating the orientation of the pattern, the system was able to determine the orientation
of the object. The last, and most important, function of the pattern was determining the
side of the object that was facing the camera. Without using the pattern, it would have
49
been very difficult to distinguish between two sides of like dimensions. Each of the six
sides of the object was labeled with a unique pattern. By capturing an image of the
object, the vision system determined which side was facing the camera and calculated the
position and orientation of that side.
7.2.2 Color Scheme
Typically, the lighting of the environment can greatly affect the performance of a
vision system. To minimize its effects, I chose to limit the colors of the environment,
objects, and patterns to black and white. The surface of the environment was entirely
black. The objects themselves, with the exception of the patterns on each side, were also
black. The pattern, which is described in Section 7.2.3, was black and white. Ambient
light made the white portion of the pattern shine, while the object and environment
remained dark. The color scheme made it easier for the system to locate and read the
pattern.
While the system that I designed and implemented used this black-and-white
scheme, it is possible to incorporate other color schemes. For example, more
complicated image processing algorithms than the ones that I implemented could extract
the pattern from a color image. Also, the patterns could be printed on the sides of the
object in invisible fluorescent ink. The system could then use ultraviolet lighting to
illuminate the pattern.
50
7.2.3 Design of Pattern
There were two major issues that I considered when designing the pattern. First, I
wanted to make sure that the pattern could be modified to provide a unique identifier for
each of the six sides. Second, I made sure that the system would be able to determine the
orientation of the pattern. I wanted to eliminate the possibility of having errors in
orientation of 180 degrees.
I chose to identify each side by using a pattern of five squares, each of which was
either black or white. The five squares acted as bits in a five-digit binary number. Black
squares represented a zero, or an unset bit. White squares indicated a one, or a set bit.
The first and last bits, bit 0 and bit 4 respectively, were used to determine orientation.
The first bit was always set, and the last bit was always unset. Therefore, the system was
able to read the bits correctly, even if the pattern was upside down. The middle three
squares were used to calculate the number that represented a certain side of the object
according to the numbering scheme described in Section 5.2. The second, third, and
fourth bits were used to calculate the side number using the equation
SideNumber = (4 -Bitl) + (2 -Bit2) + (Bit3).
(7.1)
Figure 7.1 shows the patterns for each of the six different sides. Each of the white and
black patterns is shown against a black background.
7.2.4 Alignment of Pattern on Object
In addition to locating and reading the pattern, the system needed to know how it
was positioned and oriented on the side of the object. This was necessary so that the
system could determine the object's position and orientation from the pattern's position
51
Side I
Side 4
Side 2
Side 5
Side 3
Side 6
Figure 7.1: Patterns for each of the Six Sides of an Object
and orientation. On each face of the object, the pattern was centered about the center of
the face, with the short side of the pattern parallel to the shorter dimension of the face,
and the long side of the pattern parallel to the longer dimension of the face. The pattern
was oriented so that it was read, from bit zero to bit four, in the positive direction of the
axis parallel to the long side of the pattern. Figure 7.2 shows an example of the
appropriate positioning of the pattern on the side of an object. The figure shows the front
of an object, with dimensions of length by width. The pattern, which represents side
number one, is oriented so that it is read in the positive z-direction.
7.3 Information from Image
In order for the system to gather information about the pose of the object, it had to
capture an image of the object in the environment. Once the image was captured, the
system processed it in order to extract data. Many of the processing techniques relied on
knowledge of the visual pattern in order to gather the necessary information. The desired
52
x
z
y
W
L
*
Figure 7.2: Positioning of Pattern on Object Side
information was the position and orientation of the pattern within the image, and the
number of the side facing the camera.
The position and orientation were calculated using the pixels of the image. The
position was described by the column and row of the pixel located at the center of the
pattern. The orientation of the pattern was calculated as an angle from the horizontal.
However, before reading the pattern, the orientation may have been incorrect by 180
degrees. Once the position and angle had been calculated, the system read each of the
squares in the pattern, and determined their value. Using the values of the squares at
either end of the pattern, the system determined whether or not a 180-degree offset had to
be included in the orientation. The system also used the offset information in order to
assign the values of the middle three squares to the appropriate bit values. The system
then used equation (7.1) to calculate the number of the side facing the camera.
53
54
Chapter 8: Distributed Sensing
8.1 Introduction
The pose recognition system developed in this thesis used RFID and machine
vision to determine the position and orientation of an object. Using either one of these
sensors, the task of pose recognition would have been much more difficult. In this
section, I will discuss the range of each sensor and the required overlap of these ranges.
Also, I will discuss the data that each of the sensors is able to capture and show that a
single sensor could not complete the task of pose recognition. I will then show how the
data from each of the sensors was combined to determine the pose of an object.
8.2 Sensor Range
The read range of an RFID system is largely based on the power available at the
reader and the shape of the antenna. As discussed in Chapter 3, the size of the reader
field is dependent on the available power. The shape of the reader field, on the other
hand, is primarily dependent on the design of the antenna. An example of a reader field
is shown in Figure 8.1. For the most part, the field extends above the antenna. When
tagged objects pass through this field, the tags are able to communicate with the reader.
55
Reader field
Antenna
Figure 8.1: RFID System Antenna
and Reader Field
The field of view, or range, of a machine vision system is dependent on the image
capture device. Figure 8.2 shows the range of a vision system, which is a frustum
defined by the field of view of the camera and the maximum and minimum visual planes.
All objects within this range can be viewed by the system. The field of view of the
camera is a pyramid. The maximum and minimum planes are defined by the limits on
how close to or far from the camera an object can be in order to provide adequate
information. If the object is too far away from the camera, then there may not be enough
detail in the image to gather data through image processing. The maximum plane, which
forms the base of the frustum, is located at the furthest distance from the camera at which
the required amount of detail can be obtained. If the object is too close to the camera, the
entire pattern may not be captured in the image. The minimum plane, which forms the
56
top of the frustum, is located at the shortest distance from the camera at which the entire
pattern may be viewed.
Minimum
plane
Field of View
Maximum
plane
Figure 8.2: Machine Vision Field of View
In order for the pose recognition system to determine an object's pose, the object
had to be located within the intersection of the read range of the RFID system and the
field of view of the machine vision system. Therefore, when the RFID system discovered
that a tagged object was within its read range, the machine vision system could capture an
image of that object. For the pose recognition system to operate effectively, the field of
view of the camera had to completely encompass the read range of the reader, as shown
in Figure 8.3. Therefore, the camera could view any object that was sensed by the RFID
system.
57
Figure 8.3: Required Overlap of RFID
and Machine Vision System Ranges
8.3 Sensor Data
While the RFID system provided information that was very important for pose
recognition, it could not determine pose by itself. As discussed in Chapter 6, the RFID
system provided two main functions: identification and presence. The identity of the
object was used to determine its dimensions. The position of the object was limited to
the space within the range of the reader, and the RFID system could provide no
information about the orientation of the object.
The machine vision system, on the other hand, provided a great deal of
information about the pose of the object. The system captured an image that was used to
gather information about the position and orientation of the object. Through the use of
58
the visual patterns described in Chapter 7, the vision system completely determined the
orientation of the object. It also determined some information about the position of the
object. In Chapter 9, I will discuss the image processing techniques used to extract this
information from the image. However, without some critical information about the
object's geometry, the vision system could not accurately determine the position of the
object.
By combining the two sensing systems, my system completely determined the
pose of an object within the constraints described in Chapters 5 and 7. The RFID system
determined the identity of an object, and alerted the vision system that an object was
present within its range. The vision system then captured an image of the object, and
processed the image to gather the appropriate information. By combining the object
information with the vision system's measurements, the system completely determined
the pose of the object. In Chapter 10, when I discuss my implementation of the pose
recognition system, I will discuss how the object information is combined with the vision
system information in order to calculate the pose.
59
60
Chapter 9: Image Processing
9.1 Introduction
The pose recognition system used image processing to gather critical information
about the pose of the object. The system analyzed the image of the object to calculate the
position and orientation of the pattern and determine the side that faces the camera. In
this chapter, I describe the image processing algorithms in detail that are used to abstract
the pose.
9.2 Overview of Algorithms
The image processing portion of my pose recognition system was composed of
the following steps:
.
Correct Lens Distortion - The system corrects the image to remove distortion
due to the curvature of the camera lens.
.
Color Image to Binary Image - The color image is converted to a grayscale
image using an equation that calculates the luminance of each pixel. Using a
histogram of the grayscale image, the system chooses a threshold for converting
the grayscale image to a binary image.
.
Isolate Pattern - The system divides all of the white pixels in the binary image
into collections. Each collection is tested in order to isolate the pattern.
.
Locate Center - The system fills in the holes in the pattern and calculates the
center of area of the resulting rectangle.
61
.
Calculate Angle - The system calculates the second moments of the pattern to
determine the angle of the pattern.
.
Read Pattern - The system checks the value of each of the squares in the pattern,
and determines the side number.
The following sections describe each of these algorithms in detail.
9.3 Correct Lens Distortion
The first stage of the image processing is to undo distortion of the image due to
the curvature of the lens of the image capture device. While the distortion can be rather
minor, its effects are seen most at the corners of the image. The method used to correct
this distortion was taken from [40]. The coordinate (i, j, ) is the location of a pixel in
the desired corrected image, and is measured from the center of the image. The
coordinate (id Id) is the location of the same pixel in the distorted image, and is also
measured from the center of the image. For each of the three color bands, and for every
pixel in the corrected image, the location of the corresponding pixel in the distorted
image is found using the equations:
r
2
id
= x2 + y2
=i, .(1+kr 2 ), and
jd = j, _(I +kr2),
(9.1)
(9.2)
(9.3)
where k is a very small negative number that represents the amount of curvature in the
lens. The value of a pixel at location (i, j,) in the corrected image is equal to the value
of the corresponding pixel at location (id 5id) in the distorted image. However, the
62
values of id and Id may not be integers. Therefore, the system must calculate the value
of the pixel (i,
j,) as
a weighted average of the four pixels surrounding the location
in the distorted image. The weights used to calculate the value of the pixel are
(id , id)
inversely proportional to the distance from each of the surrounding pixels to the location
(id ,
)i.
Once the value of each pixel has been calculated, the resulting corrected image
is free of lens distortion.
9.4 Color Image to Binary Image
Although the environment and the objects are restricted to black-and-white, the
image is in color. The color picture actually looks like a grayscale image, but there are
still red, blue, and green values for each pixel. The color image is represented by 3
arrays, each having L columns and W rows. The red, green, and blue arrays each store
an 8-bit color value for each pixel. While many image processing techniques could be
performed on color images, the algorithms are greatly simplified for binary images. The
system first converts the color image to a grayscale image according to the NTSC
encoding scheme used in American broadcast television [41],
Y(i, j) = 0.299 -R(i, j) + 0.587 -G(i, j) + 0.114 -B(i, j),
where
i is the horizontal position of the pixel
(0
i < L),
j is the vertical position of the pixel (0 j < W),
Y(i, j) is the luminance of pixel (i, j),
R(i, j) is the red value of the pixel (i, j),
63
(9.4)
G(i, j) is the green value of the pixel (i, j), and
B(i, j) is the blue value of the pixel (i, j) .
The luminance of a pixel is the same as its grayscale value. The grayscale image is
represented by a single array of L columns and W rows. This array stores the grayscale
value for the each of the pixels in the image. Figure 9.1 shows the grayscale
representation of a picture taken by the webcam that shows an object and the
environment.
Figure 9.1: Grayscale Image
Once the system has developed the grayscale image, its next task is to convert it
to a binary image. Theoretically, this is an easy task. Given some threshold value
between 0 and 255, all pixels with a grayscale value less than the threshold are given a
value of zero, making the pixel black. For all pixels with a grayscale value greater than
or equal to the threshold, the pixel value is set to 255, making the pixel white. The
difficulty, however, lies in determining an appropriate threshold value.
64
In order to determine an appropriate threshold, the system generates a histogram
of the pixel intensity values in the grayscale image. Figure 9.2 shows the histogram of
the image in Figure 9.1. The pixel intensity values are listed on the horizontal axis while
L)
0
4.0
E
Grays cale Value
Figure 9.2: Histogram of Grayscale Values
the number of pixels that have a given intensity value are listed on the vertical axis. If
there are two peaks around which most of the pixel intensities are clustered, with a large
gap separating the peaks, then the histogram is said to be bi-modal. This type of
histogram is ideal for determining a threshold because an appropriate value can be chosen
in between the peaks [42]. All pixels to the left of the threshold will become black and
all pixels to the right will become white. Since the objects and the environment are
primarily black and white, the images taken by the system should usually produce a bimodal histogram. The system uses the histogram of the image to choose a threshold pixel
intensity that is located somewhere between the two peaks. Once the threshold has been
selected, the system uses the grayscale image, along with the threshold, to develop a
binary image of the object and environment. Figure 9.3 shows the binary image created
65
from the grayscale image of Figure 9.1 with a threshold chosen using the histogram in
Figure 9.2.
Figure 9.3: Binary Image
9.5 Isolate Pattern
Once the binary image has been created, the system is almost ready to begin
extracting data from the image. As mentioned before, all of the information that the
system will gather is based on the pattern in the image, and its position and orientation.
Theoretically, the pattern should contain the only white pixels in the image. Since the
entire environment is black, and the rest of the side facing the camera is black, the system
should be able to perform calculations on the image as a whole without having to single
out the pattern. However, there are times when there are white pixels in the image that
are not part of the pattern and can affect the calculations and introduce error into the
measurements. In order to avoid this, the system scans the binary image and determines
the number of collections of pixels. A collection is a set of all white pixels that are
connected to each other. Figure 9.4a shows two collections of white pixels. Two pixels
are connected if there exists a path between the two pixels. In Figure 9.4b, the pixels
66
labeled A and B are connected. A path consists of a number of pixels that form a
continuous route of neighboring white pixels. In Figure 9.4c, the four pixels that are
outlined and shaded form a path between pixels A and B. A pixel's neighbors consist of
the eight pixels that are directly adjacent to the pixel at its four edges and four corners. In
Figure 9.4d, the pixels labeled 2, 4, 5, and 7 are neighbors of pixel A along its four edges.
The pixels labeled 1, 3, 6, and 8 are its neighbors at its four corners.
(b)
(a)
(c)
(d)
Figure 9.4: Explanation of Collections, Connectivity,
Paths, and Neighbors
The system scans the image, looking for white pixels. When it finds one, it
checks to see if any of its neighbors belong to a collection. If a neighboring pixel is in a
collection, then that same collection number is assigned to the current pixel. The system
continues checking the rest of the pixels to make sure that the pixel does not have two
neighbors in different collections. If this occurs, the system merges the two collections
67
located on the side of the object. Figure 9.5a shows the grayscale version of an image
that shows how some of the pattern on the side of the object can be captured. Figure 9.5b
shows the binary version of the image. The presence of the additional white pixels
affected the calculations of the position and orientation of the pattern within the image.
Therefore, in order to remove this source of error, I implemented the portion of the
system that groups pixels into collections and tests each collection to see if it is a pattern.
(a)
(b)
Figure 9.5: Additional White Pixels from Pattern on Side of Object
9.6 Locate Center
Once the pattern has been isolated, and any other collections of pixels have been
eliminated, the system prepares to extract data from the image. The first information that
the system attempts to gather is the center of the pattern, which is used to describe the
position of the pattern within the image. In order to make the calculation easy, the
system first makes a copy of the image, and then attempts to fill in the holes in the
pattern. The system scans the entire image, and for every black pixel, the system checks
to see if it is bounded by white pixels. If the pixel is bounded, then it is set to white.
After scanning the entire image, the pattern is replaced by a solid rectangle of the same
69
located on the side of the object. Figure 9.5a shows the grayscale version of an image
that shows how some of the pattern on the side of the object can be captured. Figure 9.5b
shows the binary version of the image. The presence of the additional white pixels
affected the calculations of the position and orientation of the pattern within the image.
Therefore, in order to remove this source of error, I implemented the portion of the
system that groups pixels into collections and tests each collection to see if it is a pattern.
(a)
(b)
Figure 9.5: Additional White Pixels from Pattern on Side of Object
9.6 Locate Center
Once the pattern has been isolated, and any other collections of pixels have been
eliminated, the system prepares to extract data from the image. The first information that
the system attempts to gather is the center of the pattern, which is used to describe the
position of the pattern within the image. In order to make the calculation easy, the
system first makes a copy of the image, and then attempts to fill in the holes in the
pattern. The system scans the entire image, and for every black pixel, the system checks
to see if it is bounded by white pixels. If the pixel is bounded, then it is set to white.
After scanning the entire image, the pattern is replaced by a solid rectangle of the same
69
dimensions as the pattern. By filling in the holes, the center of the pattern can be found
by calculating the center of area of the rectangle. Figure 9.6 shows the rectangle
produced by filling in the holes in the pattern in Figure 9.3.
Figure 9.6: Rectangle used to
find Center of Pattern
The system scans the image with the rectangle, and for every black pixel in the
image, the function p(x, y) is set to zero. For every white pixel in the image, which is
those that are part of the rectangle, p(x, y) is set to one. Then, the system calculates the
area of the rectangle using the equation:
L
W
A =
(9.5)
p(x, y) .
x=O y=O
The first moments about the x and y-axes are calculated using the equations:
L W
x -p(x,y)
MX =
and
(9.6)
x=O y=O
L W
MY = IIy
-pAx, y)
(9.7)
x=0 y=O
where M, is the first moment about the x-axis and My is the first moment about the yaxis.
70
The center of area (X-,y) of the rectangle is calculated using the equations:
X=-Mx
A
A
and
(9.8)
Y- A'
(9.9)
where - is the position of the center of area of the rectangle along the x-axis, and Y is
the position of the center of area of the rectangle along the y-axis. The coordinate (X,
5),
which is both the center of area of the rectangle and the center of the pattern, is used by
the system to describe the position of the pattern within the image.
9.7 Calculate Angle
After calculating the center of the pattern, the system attempts to determine the
orientation of the pattern with respect to the image, using a method derived and described
in [43]. The angle 0, used to represent orientation, is measured counterclockwise from
the horizontal axis of the image to the axis of least second moment of the pattern. This
Y
u-axis
0
x
Figure 9.7: u-axis describing
Orientation of Pattern
71
axis, u, is a line parallel to the longer side of the pattern that passes through the center of
the pattern. Figure 9.7 shows a line representing the u-axis of the pattern in Figure 9.3.
The system calculates the second moments of the pattern, M,, MX, and M,
about the u-axis using the equations:
MXX = II(x
- V)-p(x, y)
(9.10)
x=O y=O
(x -5E) -(y- y) -p(x, y), and
MX, = j
(9.11)
x=O y=O
- Y-)2 -P(x, y).
M,, = II(y
(9.12)
x=O y=O
Once the second moments have been calculated, the system calculates the angle
0 using the equation:
1
__
2
MXX - MY
0 = - tan-I
_
_
_
1
0 = - sin
2
MX
2
VMX
I unless M
= 0 and M, = M,, or
(9.13)
if M,, = 0 and M, = M,,
(9.14)
+ (MX - MYYT
At this point, the system has calculated the angle at which the u-axis is oriented
from the horizontal axis. However, the system does not know whether the pattern is
oriented in the positive or negative direction of the u-axis. The method for determining
whether the pattern is oriented along the positive u-axis or the negative u-axis is
explained in the next section.
72
9.8 Read Pattern
Once it has determined the center of the pattern and the angle that the u-axis
makes with the horizontal axis, the system is able to read the pattern. The first step is to
determine the length of the pattern. Starting at the center of the pattern, the system
checks each pixel that lies on the u-axis. In each direction, it identifies the white pixel
that is furthest away from the center of the pattern. The coordinate of the white pixel that
is the furthest away along the positive u-axis is labeled (x,, ,,y,,) . The coordinate of
the white pixel that is the furthest away along the negative u-axis is labeled
(xneg Yneg)
The system then measures the horizontal and vertical distances, Ax and Ay respectively,
between these two pixels using the equations:
Ax=x,
- Xneg and
(9.15)
Ay =
- Yneg
(9.16)
yo,
Using the coordinate of the center of the pattern, and the values of Ax and Ay , the
system is able to calculate the location of the center of each of the five squares in the
pattern. These locations are displayed in Table 9.1. Notice that the five squares are not
labeled as bits zero through four. This is because it is not yet known whether the pattern
is oriented along the positive or negative u-axis. The first square may correspond to bit
zero or bit four, depending on the orientation of the pattern. Therefore, the locations of
the squares, not the bits, are given in the table.
73
Square
1
2
3
4
Location of Center in
Horizontal Direction
- 2-Ax
5
_Ax
5
AX
5
_ 2-Ax
5
x+
Location of Center in
Vertical Direction
2-Ay
5
- Ay
5
y
Ay
5
_ 2-Ay
y+5
__ _5
Table 9.1: Locations of Centers of Squares in Pattern
At each of the five locations, the system checks to see if the pixel is white or
black. It also checks to see if all of the pixel's neighbors are the same color as the pixel.
For each square, if the pixel and all of its neighbors are white, then the value of the
square is one. If the pixel and all of its neighbors are black, then the value of the square
is zero. If the pixel is not the same color as all of its neighbors, then there is an error
either in the pattern or in the location of the center of the square. Regardless of the
reason, if the pixel is not the same as its neighbors, then the value of the square is
negative one.
After each of the squares has been given a value, the system performs the first test
of the validity of the pattern. If any of the squares has a value of negative one, then the
pattern is invalid. The second validity test makes sure that the pattern meets the
requirements of a valid pattern according to the standards described in Chapter 7. Bit
zero must be equal to one and bit four must be equal to zero. Therefore, if the first and
74
fifth squares both have a value of zero, or both have a value of one, then the pattern is
invalid.
If the pattern passes these first two validity tests, then the system continues to
gather information from the pattern. The next step is to determine how the pattern is
oriented along the u-axis, and to determine the value of the offset angle 3, which can be
either zero or 180 degrees. If the pattern can be read from the bit zero to the bit four in
the positive direction of the u-axis, then 3 is equal to zero. In Figure 9.3, for which the
u-axis of the pattern is shown in Figure 9.7, there is no offset. In this case, the first
square corresponds to bit zero and the fifth square corresponds to bit four. If the pattern
is read backwards in the positive direction of the u-axis, then 3 is equal to 180 degrees.
This offset must be included in the orientation of the pattern. In Figure 9.8, for which the
u-axis of the pattern is also shown in Figure 9.7, there is a 180-degree offset. In this case,
the first square corresponds to bit four and the fifth square corresponds to bit zero.
Figure 9.8: Pattern with
180-degree Offset
Once the pattern has been used to determine its own orientation, the system uses it
to determine the number of the side that is facing the camera. First, the bits are assigned
75
the values of the appropriate squares. If there is no offset between the pattern and the uaxis, then bits one, two, and three are assigned the values of the second, third, and fourth
squares, respectively. If there is a 180-degree offset, then bits one, two, and three are
assigned the values of the fourth, third, and second squares, respectively.
Once the bits have been assigned their correct values, the system calculates the
side number using equation (7.1). If the side number is less than one or greater than six,
then the pattern is invalid. If the side number is within the bounds, then the pattern
passes the final validity test. At this point, the image processing is complete and all of
the data is gathered from the image.
76
Chapter 10: Implementation
10.1 Introduction
I implemented a pose recognition system that incorporates all of the components
discussed in the previous chapters. The objects and environment involved with the
system had all of the characteristics discussed in Chapter 5. The environment was a
black flat surface, and the objects were black-and-white boxes. Each side of the objects
was labeled with a pattern according to the standards that I developed and described in
Section 7.2. The radio frequency system determined the presence and identity of an
object as discussed in Chapter 6. The vision system gathered the appropriate information
mentioned in Chapter 7 using the image processing algorithms described in Chapter 9.
This chapter discusses my implementation of the system, and how all of the information
was combined to determine the pose of an object.
10.2 Hardware
To implement the pose recognition system, I needed hardware to perform the
necessary functions. The radio frequency identification system was composed of a
reader, an antenna, and tags, all of which were manufactured by Intermec [44,45]. The
antenna's dimensions were 9 inches by 9 inches, and the read range extended above the
antenna up to 7 feet. The tags were passive, and when powered by the reader's signal
responded with a 96-bit EPC. The vision system used a Veo Stingray webcam [46],
which produced color images of various sizes. The system used the camera to capture
color images composed of 320 columns and 240 rows of pixels. A standard personal
77
computer running Windows 2000 controlled all of the hardware using the software that
will be discussed in the following sections.
Figure 10.1 shows a schematic of the pose recognition system. Both the reader
and the camera were connected to the host computer that ran all of the software. The
camera was mounted on a shelf so that it was facing downward, and the reader was
located to the side of the system. The reader's antenna was located below the camera so
that its reader field was within the field of view of the camera. Because the antenna's
surface was not flat, I constructed a small table that covered the antenna. The top of the
table was covered with a piece of black paper and was used as the surface of the
environment.
Camera
Reader
Host Computer
Antenna
0
0
Table
Figure 10.1: Pose Recognition System Implementation Setup
78
10.3 Overview of Software
The software was divided into the following parts:
" Query Reader - The system queries the reader to see if there are any tags within
its read range. If there are tags present, it returns the EPCs of the tags.
" Access Physical Information - The system uses the EPC of the tag to determine
the location of the object's physical information. It then parses the object's PML
file and determines the dimensions of the object.
" Image Capture - The system captures an image of the environment and the
object that is present.
*
Image Processing - The system processes the image in order to gather important
information that will be used to determine the pose of the object.
" Combine Information - All of the information that has been gathered by the
system is combined to determine the pose of the object with respect to the
environment.
The software for the pose recognition system was written in Java. The following sections
will describe each of the parts of the software, including descriptions of the algorithms
implemented and the Java Application Programming Interfaces (APIs) used.
10.4 Query Reader
The system first queries the reader to see how many objects are present. Reader
communication software was developed by OATSystems, Inc. [47]. When queried, the
reader returns a list of the EPCs of the tags within its reader field. If the list is empty or
there is more than one EPC, the system waits for a period of time, and then queries the
79
reader again. It will continue to do this until the reader responds with a list that contains
only one EPC. When this occurs, the system acknowledges the identifying number in the
list as the EPC of an object that is residing within the environment of the system.
The system will only attempt to determine the pose of an object if there is only
one object within its environment. There are a number of reasons for this restriction.
First, the EPC is the only piece of information that the system uses to uniquely identify
objects. The system is capable of reading multiple patterns in the same image. However,
if the system were to attempt to determine the pose of two objects within the
environment, it would not be able to match each of the patterns with the appropriate EPC.
Therefore, the system would not be able to determine all of the information necessary for
pose recognition. Second, the system avoids variables that it cannot take into account.
For example, if two objects were stacked on top of each other within the environment, the
pose of the object on top would not be accurately determined. While the system can
determine some information about its position and all of the information about its
orientation, it would not be able to determine the vertical position of the object. The
system assumes that the object is lying on the surface of the environment. If this is not
the case, the system cannot completely determine the pose of the object. Methods of
dealing with multiple objects will be discussed in Chapter 11.
10.5 Access Physical Information
Once the system has determined an object's presence and identity, it retrieves its
geometry information. The system uses the EPC to link the object to its PML file that
contains the length, width, and height values.
80
The Auto-ID Center uses the ONS to find the location of the PML file. However,
since I am using a very small sample of objects for my system, I used a much simpler
method of locating the information. Instead of using ONS, the filename of each PML
files is the same as the EPC associated with that PML file. The system adds a suffix and
a prefix to the EPC in order to generate the Uniform Resource Locator (URL) that points
to the appropriate PML file. Figure 10.2 shows example of using a suffix and prefix,
along with an EPC, to generate a URL.
For a PML file located at:
C:/Pose Recognition/Object Descriptions/ 354A57182C246FFFFFFFFFFFF.xml
EPC = "354A57182C246FFFFFFFFFFFF";
Prefix = "file://C://Pose Recognition//Object Descriptions//";
Suffix = ".xmI";
URL = Prefix + EPC + Suffix;
Figure 10.2: Generating a URL that points to a PML file
Once the system has the URL of the PML file, it parses the file using both the
Simple API for XML (SAX) [48] and the Xerces Apache parser [49]. Figure 10.3 shows
the Document Type Definition (DTD) used to constrain the form of all of the PML files.
Figure 10.4 shows an example of a valid PML file that meets the constraints of the DTD.
As the file is being parsed, the system looks for particular element names and attributes in
order to extract the desired information. If the element name is VAL, the attribute name
is LABEL, and the attribute value is LENGTH, then the length of the object is set equal
81
to the value of that element. A similar search is done to determine the values of the width
and height of the object. When the parsing is done, the system has values for each of the
dimensions of the object.
<!DOCTYPE PML [
<!ELEMENT PML (EPC+,VAL?)>
<!ELEMENT EPC (#PCDATA)>
<!ELEMENT VAL (#PCDATA)>
<!ATTLIST VAL
LABEL CDATA #IMPLIED
M CDATA #IMPLIED
ACC CDATA #IMPLIED>
]>
Figure 10.3: PML DTD
<PML>
<EPC>354A57182C26FFFFFFFFFFFF</EPC>
<VAL LABEL = "LENGTH" M = "1">1O</VAL>
<VAL LABEL = "WIDTH" M = "1 ">5</VAL>
<VAL LABEL = "HEIGHT"M = "1 ">2</VAL>
</PML>
Figure 10.4: Example PML file
82
10.6 Image Capture
The camera was connected to the host computer via the USB port. I used the Java
Media Framework (JMF) to connect to the camera and control it in order to capture an
image of the object and the environment [50]. The JMF was the only Java API available
for connecting to devices through the USB port. Fortunately, it is tailored for use with
video and audio devices, but was not intended for capturing still images.
In order to use the camera, the system first has to set the format of the device from
which it will capture data. The JMF provides an application that was used to determine
the name and format of the camera that was used. The system then determines the
location of the media associated with that device. Once the system has completed this, it
attempts to connect to the camera. If successful, the system captures video. While
capturing video, the system isolates a single frame from the video, which it uses as a still
image of the environment. This image is then processed to gather the majority of the
information used to determine the pose of the object.
10.7 Image Processing
The system uses the image processing techniques described in Chapter 9 to
abstract information from the image. After completing the image processing, the system
is able to determine the position (ii, y) and orientation (0+3) of the pattern within the
image as defined in Chapter 9, and the number of the side facing the camera. The system
is then ready to combine all of the information that it has gathered in order to determine
the pose of the object.
83
10.8 Combine Information
Up to this point, the system has gathered a large amount of information about the
object including:
.
Object Dimensions
.
Position of Pattern in Image
.
Orientation of Pattern within Image
.
Side facing the Camera
The system combines the preceding data to determine pose in the following manner.
The first step in combining all of the information is to determine the position of
the object along the environment's Z-axis. Given the number of the side that is facing the
camera, the system is able to determine the dimensions of that side. The remaining
dimension is the vertical distance h from the surface of the environment to the side that
is facing the camera. The vertical position pz of the object is half of this distance,
because the object's coordinate frame is located at its geometric center. At this point, one
of the degrees of freedom in the pose of the object has been determined. There are five
remaining degrees of freedom.
The next step is to determine the position of the object along the environment's
X- and Y-axes. These two coordinates are both determined based on the position of the
pattern within the image. It is also necessary to know how much of the environment is
within the camera's field of view, and the position of the field of view within the
environment. Figure 10.5 shows a top-view of the environment, along with the size
('Joy' Wf 0V)
of the field of view and its position (xfo, 'Yf0v) with respect to the
environment. If the vertical position Pz of the object was equal to zero, meaning that the
84
pattern was on the surface of the environment, then px and p, could be calculated using
the equations:
PX =XfO +lfov
- and
(10.1)
py=yv+
-,
(10.2)
wf
L
W
where L and W are the number of columns and rows, respectively, in the image.
'Joy
(xf0 ,,
)
x
Figure 10.5: Top View of Environment and
Camera's Field of View
However, the varying height of the pattern affects the calculation of px and py. Figure
10.6 shows a side view of the environment and the camera's field of view. There are a
number of values that are labeled in the figure that are used in the calculation of py.
85
40
Center of patter
h
h
taqn(y)
i
2
h
al
a2/
,
-jov
yp/
Ya
Figure 10.6: Side View of Environment and Camera's Field of View
In addition to h , the height of the pattern above the surface of the environment, the
values labeled include:
Sal:
the angle of elevation from the front boundary of the field of view to the
camera lens.
.
a 2 : the angle of elevation from the back boundary of the field of view to the
camera lens.
.
Wfov:
the width of the field of view at the surface of the environment as labeled in
Figure 10.5.
.
wh : the width of the field of view at the height of the pattern.
86
.
y, : the perceived position, along the Y-axis, of the object, if no correction is
made due to the height of the pattern.
ya : the actual position, along the Y-axis, of the object, after the correction for the
*
height of the pattern has been made.
Since the camera is not positioned directly above the midpoint of the width of the field of
view, the angles al and a2 are different.
Given all of these values, the position py, along the Y-axis, of the object can be
calculated using the equations:
Wh
= Wfo,
h
ya =
tan(a)
h
tan(a,)
+ Wh
h
tan(a 2 )
W
and
Py = YjO, +ya -
(10.3)
(10.4)
(10.5)
Figure 10.7 shows a front view of the environment and the camera's field of view.
There are values in this figure that are used in the calculation of px. In addition to h,
these include:
8,: the angle of elevation from the left side boundary of the field of view to the
.
camera lens.
62 : the angle of elevation from the right side boundary of the field of view to the
.
camera lens.
,fov: the length of the field of view at the surface of the environment as labeled in
*
Figure 10.5.
.
1
h
: the length of the field of view at the height of the pattern.
87
.
xP: the perceived position, along the X-axis, of the object, if no correction is
made due to the height of the pattern.
.
Xa:
the actual position, along the X-axis, of the object, after the correction for the
height of the pattern has been made.
Since the camera is not positioned directly above the midpoint of the length of the field of
view, the angles
pi
and
2
are different.
9
Center of pattern
h
h
taqn( R_
ta-n(p)
2
I
h
'fov'*
xp
Xa
Figure 10.7: Front View of Environment and Camera's Field of View
The position px along the X-axis can be calculated using the equations:
h
x
"
fov
'h'fov
=
h
tan(/J)
h
+1 tan(fl 1 ) hL
88
h
tan(fl2 )'
(10.6)
and
(10.7)
Px =x
(10.8)
+Xa.
At this point, two more of the degrees of freedom in the pose of the object have
been determined. The system has established the position vector P, as defined in
Equation (2.1), that measures the position of the object with respect to the origin of the
coordinate frame of the environment. There are three remaining degrees of freedom.
The system must now establish the rotation matrix R that represents the orientation of
the object.
The next step is to calculate each of the elements of the rotation matrix using the
angle 9, the offset 3, and the number of the side that it facing the camera. Using these
three values, the system is able to calculate all nine elements of R using Table 10.1.
(Note that in the table, s( +3) is equal to sin(6 + 3) and c(O +3) is equal to
cos( + (5).)
Side Number
0
Cu
0
0
U,
r11
r12
r13
r21
r22
r23
r31
r32
r33
1
2
0
0
1
s(9+3)
-c(9+5)
0
c(9+3)
s(9+3)
0
0
0
-1
-s(9+3)
c(9+3)
0
c(9+3)
s(9+3)
0
3
-s(9+5)
c(6+3)
0
0
0
1
c(O+3)
s(9+3)
0
4
s(9+3)
-c(+3)
0
0
0
-1
c(9+3)
s(6+3)
0
5
s(9+3)
-c(6+5)
0
c(9+3)
s(0+9)
0
0
0
1
Table 10.1: Calculation of the Elements of the Rotation Matrix
89
6
-s(+3)
c(+5)
0
c(9+3)
s(9+3)
0
0
0
-1
Now, the system has determined the last three degrees of freedom. The side
number is used to limit two of the degrees of freedom. After that, 0 and S are used to
limit the last degree of freedom. The system has completely determined the pose of the
object, which can be represented by the position vector P and the rotation matrix R .
90
Chapter 11: Analysis
11.1
Introduction
When measuring the performance of my pose recognition system, I considered
accuracy, repeatability, and scalability. For the accuracy, I measured the difference
between the actual position and orientation of an object and the values calculated by the
system. For the repeatability, I compared outputs over multiple runs. For the scalability,
I considered how certain aspects of the system could be changed to accommodate its use
in a real-world application.
11.2 Accuracy
When the system determines values for px, p,, and 0, there is some error
introduced in the measurements. The system is able to determine the positions along the
X and Y-axes of the environment within about 0.2 inches of the actual positions, and can
determine the object's orientation about the Z-axis of the environment within about 1
degree of the actual orientation.
The inaccuracy of the system is primarily due to the mounting of the camera, the
equations used to combine sensor information to determine pose, and the calibration of
the system. In my implementation, the camera was not mounted so that it was centered
above the field of view. As is shown in Figure 10.6 and 10.7, the camera is mounted so
that it views the environment at an angle. By mounting the camera in this manner, the
field of view resembles a trapezoid instead of the rectangular shape that was assumed and
used to develop the equations for combining information. Therefore, there is some error
91
introduced into the equations because of the incorrect assumption of the shape of the field
of view. This could be corrected by introducing image processing techniques to correct
the image, or by changing the equations to better represent the appropriate shape of the
field of view. A simpler correction would be to mount the camera so that it is centered
above and perpendicular to the surface of the environment, resulting in a rectangular field
of view. Another source of error was the imprecise calibration of the system. It was
difficult to determine correct values for the size (1fov,
angles of elevation (a1 , a 2 , ,1,62)
Wf,)
, position (xf
0y),
V,
and
of the field of view. The equations containing these
values added to the inaccuracy of the information computed by the system.
11.3 Repeatability
The ability to operate consistently is another important aspect of the system.
Every time that the system determines the pose of an object, it should perform in the
same manner. The system should be able to give similar results when repeatedly
determining the pose of an object that remains in the same location. The only source of
inconsistency in the operation of the pose recognition system is the image capture
component. The webcam, which is used as the image capture device, has an auto white
balance feature that may be desirable for its intended use, but can cause problems when
the camera is used with a vision system. If the camera takes two pictures of the
environment, with the same object present and the same lighting conditions, the two
images may be different. The object will appear in the same location, but one of the
images may be lighter or darker than the other. For the most part, the variations in the
images do not affect the performance of the system. Occasionally, the auto white balance
92
can create an image from which it is difficult to extract data. The use of a camera that is
intended for machine vision systems would improve the repeatability of the pose
recognition system.
The rest of the components of the system perform their functions consistently.
The RFID system provides the EPC of the tag that is located within the reader field.
Given an EPC, the system is able to access the appropriate file and determine the object's
dimensions. While the image processing and information combination algorithms are
reliable, their output is a result of the inputted image from the webcam. Therefore, the
result of the variations in the image is seen in the output of these algorithms, but these
portions of the system operate consistently.
11.4 Scalability
There are a number of aspects of the current implementation, including size and
color constraints, which would make implementation in a warehouse impossible. The
ability to make changes to the system, without wholly changing the design, is a measure
of the scalability of the system.
The size of the environment is limited by the range of the RFID reader and the
field of view of the camera. For my implementation, the dimensions of the environment
were approximately 12 inches by 16 inches, and the field of view of the camera was
slightly smaller than the environment. Because of the limitations on the size of the
environment, the size of the objects also was limited. For each of the objects tested with
the system, the longest dimension was less than 6 inches. It would be difficult to
consistently and accurately determine the pose of any objects larger than the ones tested.
93
Because of this size constraint on both the environment and the objects, the system could
not realistically be implemented in any current warehouses. The size of the environment
and the objects could be increased drastically by using a reader with a larger read range
and mounting the camera so that the field of view is larger. As long as the image capture
device's field of view can completely encompass the field of view of a single reader, one
camera could be used in conjunction with multiple readers to greatly increase the range of
the system. The camera could be mounted on a moving platform so that it could move
horizontally above all of the read ranges. When an object entered one of the read ranges,
the platform could move to position the camera directly above the center of that read
range. Therefore, the size of the implementation is scalable to meet the needs of realworld applications.
The color constraint is another limitation of the system, and for a real-world
implementation, the system would need to operate under reasonable color constraints. By
using better pattern extraction algorithms and a better camera, the system could be made
to find the pattern on the side of a box that includes many colors. In the current
implementation, the pattern extraction algorithms are rather weak. The pattern is not
very complicated, so it is difficult to test the different connections to see if they are a
valid pattern. If the pattern were more complicated, it would be easier to isolate the
pattern. However, in order to capture the details of the pattern, it would be necessary to
have a camera that has a higher resolution. However, by increasing the resolution of the
camera, the number of pixels in the image is increased. Using algorithms similar to the
ones in the current implementation, it could take much longer for the system to operate.
The algorithm that separates all of the white pixels into collections is the most time
94
consuming, with a running time of O(n 2 ). It might be necessary to implement a faster
algorithm to perform this function.
Another way to determine the pose of color objects would be to print the patterns
in fluorescent ink. The vision system would then make use of ultraviolet light in order to
view the pattern. Additionally, the location of the pattern might be moved from the
center of the side to one of the corners of the side, so that the pattern would not get in the
way of the writing on the packaging. The color constraints on the objects can be
removed by embedding more detail in the pattern, using a stronger image capture device,
and implementing faster image processing algorithms.
For a real-world implementation, the system would need to be able to determine
the pose of multiple objects at the same time. To do this, the system would need to be
able to match an EPC with a pattern so that it could perform the appropriate calculations.
This could be accomplished by introducing an edge-detection algorithm that can
determine two of the dimensions of each object within the range of the system. By
comparing these dimensions with the information in each PML file, the system would be
able to match each pattern with its corresponding EPC.
The scalability of the system is the most important measurement of the pose
recognition system. By making the changes mentioned above, a system incorporating the
overall design of my pose recognition system should be capable of performing the
desired task.
95
96
Chapter 12: Conclusion
The primary application of the Auto-ID Center's infrastructure is in the supply
chain. By using radio frequency tags to uniquely identify objects, warehouse inventory
control becomes much easier. However, in order to make further use of the technology,
additional information must be supplied. For example, in order to automate the
movement or manipulation of objects in a warehouse, an automation system requires
information about the position and orientation of the objects. The pose recognition
system that I have designed will be capable of completing this task.
As mentioned in Section 11.4, the current implementation of the system would be
incapable of performing such a task. However, by making the changes in the hardware
and software discussed in the previous chapter, a system could be created that would
possess the necessary functionality. While incorporating different components, the
system would be based on the same design as the one developed and described in this
thesis.
The design is based on combining the Auto-ID infrastructure with a vision
system. The system determines the pose of the object by gathering information in order
to gradually limit the possible location of the object. Before encountering the system, the
location of the object is completely unknown. Once it passes into the range of the RFID
system, the location of the object is limited to the space contained within the read range
of the RF reader. The design of the pose recognition system requires that the field of
view of the machine vision system completely encompass the range of the reader.
Therefore, every object that is read by the RFID system can also be viewed by the
97
machine vision system. The machine vision system then gathers information that is used
to narrow the location of the object even further. The system can accurately determine
the pose of the object with respect to its environment. Once such information is known,
it can be used by an automation system to manipulate the object.
Further research on this topic would show the usefulness of such a system in
warehouse automation. The first step would be to develop a full-scale implementation of
the pose recognition system by incorporating some of the changes mentioned in Section
11.4. This would show the scalability of the design and the ability of the system to
operate under real-world conditions. The next step would be to incorporate an
automation system into the implementation. The pose recognition system would be used
to provide pose information about the objects, which would subsequently be manipulated
by the automation system. This would show the ability of the system to provide the
functionality for which it was designed.
98
References
[1] Auto-ID Center - Home, "Vision", <http://www.autoidcenter.org/homevision.asp>.
[2] James H. Williams, Jr., Fundamentalsof applied dynamics, John Wiley & Sons, Inc.,
New York, NY, 1996.
[3] John J. Craig, Introduction to robotics, Addison-Wesley Publishing Company,
Reading, MA, 1989.
[4] Kenneth J. Waldron and Gary L. Kinzel, Kinematics, Dynamics, andDesign of
Machinery, John Wiley & Sons, Inc., New York, NY, 1999.
[5] Intermec Technologies Corporation, "RFID overview: introduction to radio
frequency identification", Amtech Systems Corporation, 1999,
<http://epsfiles.intermec.com/epsfiles/epswp/radiofrequencywp.pdf>.
[6] Frontline Solutions Website: RFID Online Source Book, "Understanding radio
frequency identification (RFID) - FAQ's, applications, glossary", Advanstar
Communications Inc., 2000, <http://www.frontlinemagazine.com/rfidonline/wp/101 7.htm>.
[7] The Association of the Automatic Identification and Data Capture Industry, "Radio
frequency identification (RFID): a basic primer", AIM Inc., 2001,
<http://www.aimglobal.org/technologies/rfid/resources/RFIDPrimer.pdf>.
[8] Anthony Sabetti, Texas Instruments, "Applications of radio frequency identification
(RFID)", AIM Inc.,
<http://www.aimglobal.org/technologies/rfid/resources/papers/applicationsofrfid.htm>.
[9] Intermec Technologies Corporation, "RFID overview: introduction to radio
frequency identification", Amtech Systems Corporation, 1999, p. 4,
<http://epsfiles.intermec.com/epsfiles/epswp/radiofrequency_ wp.pdf>.
[10] Daniel W. Engels, Tom A. Scharfeld, and Sanjay E. Sarma, "Review of RFID
Technolgies", MIT Auto-ID Center, 2001.
[11] Clark Richter, "RFID: an educational primer", Intermec Technologies Corporation,
1999, <http://epsfiles.intermec.com/eps _files/epswp/rfidwp.pdf>.
[12] Susy d'Hont, "The Cutting Edge of RFID Technology and Applications for
Manufacturing and Distribution", Texas Instruments TIRIS,
<http://www.rfidusa.com/pdf/manuf dist.pdf>.
99
[13] Klaus Finkenzeller, RFID Handbook: Radio-frequency identificationfundamentals
and applications,John Wiley & Son, Ltd, New York, NY, 1999.
[14] Ching Law, Kayi Lee, and Professor Kai-Yeung Siu, Efficient memoryless protocol
for tag identification, MIT Auto-ID Center, 2000,
<http://www.autoidcenter.org/research/MIT-AUTOID-TR-003.pdf>.
[15] Destron Fearing, Electronic ID, <http://www.destron-fearing.com/elect/elect.html>.
[16] Massachusetts Turnpike Authority, FAST LANE, "Overview",
<http://www.mtafastlane.com/>.
[17] Peter K. Allen, Robotic object recognition using vision and touch, Kluwer
Academic Publishers, Boston, MA, 1987.
[18] Michael C. Fairhurst, Computer visionfor robotic systems, Prentice Hall, New
York, NY, 1988.
[19] Robin R. Murphy, Introductionto AI robotics, The MIT Press, Cambridge, MA,
2000.
[20] C.J. Page and H. Hassan, "The orienation of difficult components for automatic
assembly", Robot Sensors, Volume 1 - Vision, IFS (Publications) Ltd, UK, 1986.
[21] Martin Berger, Gernot Bachler, Stefan Scherer, and Axel Pinz, "A vision driven
automatic assembly unit: pose determination from a single image", Institute for Computer
Graphics and Vision, Graz University of Technology, Graz, Austria, 1999,
<http://www.icg.tu-graz.ac.at/bachler99a/caip99_gb.pdf>.
[22] Marcus A. Magnor, "Geometry-based automatic object localization and 3-d pose
detection", Computer Graphics Lab, Stanford University, Stanford, CA, 2002,
<http://www.mpi-sb.mpg.de/~magnor/publications/ssiai02.pdf>.
[23] Dongming Zhao, "Object pose estimation for robotic control and material
handling", Report Brief, Center for Engineering Education and Practice, University of
Michigan-Dearborn, 2000,
<http://www.engin.umd.umich.edu/ceep/techday/2000/reports/ECEreport6/ECEreport6.
htm>.
[24] S. R. Ruocco, Robot sensors and transducers,Halsted Press, New York, NY, 1987.
[25] N. Sato, "A method for three-dimensional part identification by tactile transducer",
Robot Sensors, Volume 2 - Tactile & Non- Vision, IFS (Publications) Ltd, UK, 1986.
[26] Philippe Coiffet, Robot technology, volume 2: Interaction with the environment,
Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.
100
[27] Ren-Chyuan Luo, Fuling Wang, and You-xing Liu, "An imaging tactile sensor with
magnetostrictive transduction", Robot Sensors, Volume 2 - Tactile & Non- Vision, IFS
(Publications) Ltd, UK, 1986.
[28] Trimble, All About GPS, "How GPS Works", Trimble Navigation Limited, 2002,
<http://www.trimble.com/gps/how.html>.
[29] B. Hofimann-Wellenhof, H. Lichtenegger, and J. Collins, GPS: theory andpractice,
Springer-Verlag, New York, NY, 2001.
[30] Garmin, About GPS, "What is GPS?", Garmin Ltd., 2002,
<http://www.garmin.com/aboutGPS/>.
[31] Trimble, All About GPS, "Differential GPS", Trimble Navigation Limited, 2002,
<http://www.trimble.com/gps/dgps.html>.
[32] Starlink Incorporated, DGPS Info, "DGPS Explained", Starlink Incorporated, 1999,
<http://www.starlinkdgps.com/dgpsexp.htm>.
[33] AIM, "Real Time Locating Systems (RTLS)", AIM Inc., 2000,
<http://www.aimglobal.org/technologies/rtls/default.htm>.
[34] AIM, Real Time Locating Systems (RTLS), "Frequently asked questions", AIM
Inc., 2000, <http://www.aimglobal.org/technologies/rtls/rtlsfaqs.htm>.
[35] Jim Geier and Roberta Bell, "RTLS: An eye on the future", Supply Chain Systems
Magazine, Peterborough, NH, 2001,
<http://www.idsystems.com/reader/2001/2001_03/rtlsO3O1/>.
[36] David L. Brock, The electronicproduct code (EPC): a naming scheme for physical
objects, MIT Auto-ID Center, 2001, <http://www.autoidcenter.org/research/MITAUTOID-WH-002.pdf>.
[37] David L. Brock, The physical markup language,MIT Auto-ID Center, 2001,
<http://www.autoidcenter.org/research/MIT-AUTOID-WH-003.pdf>.
[38] David L. Brock, Timothy P. Milne, Yun Y. Kang, and Brendon Lewis, The physical
markup language, core components: time andplace,MIT Auto-ID Center, 2001,
<http://www.autoidcenter.org/research/MIT-AUTOID-WH-005.pdf>.
[39] Joseph Timothy Foley, "An infrastructure for electromechanical appliances on the
internet", M.Eng. Thesis, MIT, Cambridge, MA, 1999.
[40] Hany Farid and Alin C. Popescu, "Blind Removal of Lens Distortion", Journal of
the Optical Society of America, 2001,
<http://www.cs.dartmouth.edu/-farid/publications/josa01.pdf>.
101
[41] Paul F. Whelan and Derek Molloy, Machine Vision Algorithms in Java, SpringerVerlag, London, UK, 2001.
[42] Robert Fisher, Simon Perkins, Ashley Walker, and Erik Wolfart, Hypermedia Image
ProcessingReference, "Intensity Histogram", 2000,
<http://www.dai.ed.ac.uk/HIPR2/histgram.htm>.
[43] Berthold Klaus Paul Horn, Robot Vision, The MIT Press, Cambridge, MA, 1986.
[44] Intermec - Products, "915 MHz Tag for RPC", Intermec Technologies Corporation,
2002,
<http://home.intermec.com/eprise/main/Internec/Content/Products/ProductsShowDetail
?section=Products&Product=RFID1_03&Category-RFID&Family-RFID1>.
[45] Intermec - Products, "UHF OEM Reader", Intermec Technologies Corporation,
2002,
<http://home.intermec.com/eprise/main/Intermec/Content/Products/ProductsShowDetail
?section=Products&Product=RFID2_02&Category=RFID&Family-RFID2>.
[46] Veo, "Products: Stingray", Xirlink Inc., 2002,
<http://www.veoproducts.com/Stingray/stingray.asp>.
[47] Oatsystems Inc., <http://www.oatsystems.com/>.
[48] David Megginson, "About SAX", <http://www.saxproject.org/>.
[49] The Apache XML Project, "Xerces Java Parser Readme", The Apache Software
Foundation, 2000, <http://xml.apache.org/xerces-j/index.html>.
[50] Java, "Java Media Framework API", Sun Microsystems Inc., 2002,
<http://java.sun.com/products/j ava-media/jmf/index.html>.
102
Download