Pose Recognition using Distributed Sensing: Radio Frequency Identification, Machine Vision, and Embedded Visual Patterns by Brendon W. Lewis Bachelor of Science, Mechanical Engineering Tufts University, 2000 Submitted to the Department of Mechanical Engineering in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN MECHANICAL ENGINEERING at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY MASCUSE1-TS STTE OF TECHNOLOGY September 2002 0 2002 Massachusetts Institute of Technology All rights reserved OCT 2 5 2002 LIBRARIES Signature of Author ................................ .. .................. De partment of Mechanical Engineering August 9, 2002 C ertified by ................................................. A ccepted by ...................................... .................. David L. Brock Principal Research Scientist Thesis Supervisor .. . .............. Ain A. Sonin Chairman, Department Committee on Graduate Students 2 Pose Recognition using Distributed Sensing: Radio Frequency Identification, Machine Vision, and Embedded Visual Patterns by Brendon W. Lewis Submitted to the Department of Mechanical Engineering on August 9, 2002 in partial fulfillment of the requirements for the degree of Master of Science in Mechanical Engineering Abstract Automation systems require a certain amount of information about an object in order to manipulate it. An important piece of information is the pose, or position and orientation, of the object. In this thesis, I discuss the design and implementation of a system capable of determining the pose of rectangular solids. The system combines radio frequency identification (RFID), machine vision, and visual patterns in order to determine the pose of an object. Traditionally, pose recognition requires either an intrusive or sophisticated sensing system. The design relies on distributed sensing and embedded information in order to reduce the complexity of the sensing system. The RFID component is used to determine the presence of an object and then communicate physical information about that object to the system. The vision system, which relies on visual patterns on the surfaces of the object, gathers additional information that, in conjunction with the dimensions of the object, is used to determine the pose of the object. Thesis Supervisor: Title: David L. Brock Principal Research Scientist 3 4 Acknowledgements I would first like to thank my advisor David Brock for all his help. He made numerous suggestions during both my research and writing that proved to be vital to the completion of this thesis. I would like to thank Dan Engels for all of his suggestions and help with the RFID system. He helped me get the hardware, told me who to talk to about the software, and explained to me much about the technology. Many thanks go to the guys from Oatsystems for their help with the RFID system. Prasad, Anup, Sumanth, Sridhar, and Gabriel all did a lot of work to help me get the Intermec reader up and running. I would also like to thank my officemates at the Auto-ID Center. Stephen, Tim, Robin, and Yun provided a great work environment and many hours of good conversation. I would additionally like to thank Stephen for helping me get acclimated to life at MIT. Thanks to my roommate Tom who proofread and edited many of my drafts. Lastly, I would like to thank my family and Kelly for being so patient and supportive over the past two years. 5 6 Table of Contents Chapter 1: Introduction ------------------------------------------1.1 Introduction ----------------------------------------------1.2 Motivation ------------------------------------------------1.3 Organization of Chapters --------------------------------------- 13 13 13 14 Chapter 2: Pose ------------------------------------------------2.1 Introduction -----------------------------------------------2.2 Coordinate Frames ------------------------------------------2.3 Measuring Position and Orientation -------------------------------2.3.1 Position --------------------------------------------2.3.2 Orientation ------------------------------------------2.4 Degrees of Freedom ------------------------------------------ 17 17 17 18 18 19 21 Chapter 3: Radio Frequency Identification ----------------------------3.1 Introduction -----------------------------------------------3.2 Core Components -------------------------------------------- 23 23 24 3.3 Reader Fields ----------------------------------------------3.4 Benefits of RFID -------------------------------------------- 25 27 Chapter 4: Position Sensors ---------------------------------------4.1 Introduction -----------------------------------------------4.2 Machine Vision --------------------------------------------4.3 Tactile Arrays ---------------------------------------------4.4 Radio Frequency -------------------------------------------- 29 29 29 33 36 Chapter 5: Objects and Environment --------------------------------5.1 Introduction ------------------------------------------------ 39 39 5.2 Shapes of Objects -------------------------------------------- 39 5.3 Surface of the Environment ------------------------------------5.4 Object Coordinate Frame -------------------------------------- 40 40 Chapter 6: Object Presence and Identification -------------------------6.1 Introduction -----------------------------------------------6.2 Auto-ID Center --------------------------------------------6.3 Object Identification -----------------------------------------6.4 Object Presence --------------------------------------------- 45 45 45 46 47 Chapter 7: Vision System ----------------------------------------7.1 Introduction -----------------------------------------------7.2 Visual Pattern ----------------------------------------------7.2.1 Requirements of Pattern ---------------------------------7.2.2 Color Scheme ----------------------------------------7.2.3 Design of Pattern --------------------------------------- 49 49 49 49 50 51 7 7.2.4 Alignment of Pattern on Object ---------------------------7.3 Information from Image ---------------------------------------- 51 52 Chapter 8: Distributed Sensing ------------------------------------8.1 Introduction -----------------------------------------------8.2 Sensor Range -----------------------------------------------8.3 Sensor Data ------------------------------------------------ 55 55 55 58 Chapter 9: Image Processing --------------------------------------9.1 Introduction -----------------------------------------------9.2 Overview of Algorithms --------------------------------------9.3 Correct Lens Distortion --------------------------------------9.4 Color Image to Binary Image ----------------------------------9.5 Isolate Pattern ---------------------------------------------9.6 Locate Center ----------------------------------------------9.7 Calculate Angle --------------------------------------------9.8 Read Pattern ----------------------------------------------- 61 61 61 62 63 66 69 71 73 Chapter 10: Implementation --------------------------------------10.1 Introduction ----------------------------------------------10.2 Hardware -----------------------------------------------10.3 Overview of Software --------------------------------------10.4 Query Reader --------------------------------------------10.5 Access Physical Information ----------------------------------10.6 Image Capture --------------------------------------------10.7 Image Processing ------------------------------------------10.8 Combine Information ---------------------------------------- 77 77 77 79 79 80 83 83 84 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 91 91 91 92 93 ------------------------------------------- 97 Chapter 11: Analysis 11.1 Introduction 11.2 Accuracy 11.3 Repeatability 11.4 Scalability Chapter 12: Conclusion References ----------------------------------------------------- 8 99 List of Figures Figure 2.1: Environment, Object, and Coordinate Frames -----------------------Figure 2.2: Figure 2.3: Figure 2.4: Figure 3.1: Figure 4.1: Figure 4.2: Figure 4.3: Figure 4.4: Figure 4.5: Figure 5.1: Figure 5.2: Figure 5.3: Figure 6.1: Figure 7.1: Figure 7.2: Figure 8.1: 17 Position Vector P ------------------------------------------18 Components of unit vector u in the directions of the axes of OXYZ ---- 19 Lower-order Pair Joints -----------------------------------22 RFID System Components ---------------------------------23 Object in Environment and Machine Vision Representation ----------- 30 Base of Field of View Proportional to Distance from Camera Lens ----31 Error in Orientation Measurement due to Discretization of Image ------- 32 Object in Environment and Tactile Sensor Array Representation ------35 Error in Measurements due to Size and Spacing of Tactile Sensors ----35 Object Dimensions and Possible Coordinate Frames ---------------41 Front and Back Sides of Object and Possible Coordinate Frames ------42 Object with All Sides Labeled and Coordinate Frame at Center -------43 Antennas Monitoring Shelf Sections --------------------------47 Patterns for each of the Six Sides of an Object -------------------52 Positioning of Pattern on Object Side -------------------------53 RFID System Antenna and Reader Field -----------------------56 Figure 8.2: Machine Vision Field of View ------------------------------- 57 Figure 8.3: Required Overlap of RFID and Machine Vision System Ranges -------Figure 9.1: Grayscale Image ----------------------------------------Figure 9.2: Histogram of Grayscale Values -----------------------------Figure 9.3: Binary Image ------------------------------------------Figure 9.4: Explanation of Collections, Connectivity, Paths, and Neighbors -------Figure 9.5: Additional White Pixels from Pattern on Side of Object ------------Figure 9.6: Rectangle used to find Center of Pattern -----------------------Figure 9.7: u-axis describing Orientation of Pattern ------------------------Figure 9.8: Pattern with 180-degree Offset ------------------------------Figure 10.1: Pose Recognition System Implementation Setup -----------------Figure 10.2: Generating a URL that points to a PML file --------------------- 58 64 65 66 67 69 70 71 75 78 81 Figure 10.3: PMLDTD 82 Figure Figure Figure Figure 10.4: 10.5: 10.6: 10.7: -------------------------------------------- Example PML file -------------------------------------82 Top View of Environment and Camera's Field of View ------------- 85 Side View of Environment and Camera's Field of View -----------86 Front View of Environment and Camera's Field of View ----------88 9 10 List of Tables Table 9.1: Locations of Centers of Squares in Pattern -----------------------------------74 Table 10.1: Calculation of the Elements of the Rotation Matrix ---------------------------- 89 11 12 Chapter 1: Introduction 1.1 Introduction In this thesis, I present a system that combines machine vision and radio frequency identification in order to determine the pose of an object. There are many different types of sensors that are capable of sensing various types and amounts of data. Different sensors are appropriate for different applications depending on the information that is required. However, there are some applications in which it is difficult to gather the appropriate information using a single type of sensor. Pose recognition, which is the identification of the position and orientation of an object, is a task that is more easily and accurately performed by multiple sensors. Many types of sensors are capable of determining some information about an object's position and orientation. Machine vision is a type of sensor that is capable of providing much pose information. However, if the vision system does not have information about the identity and geometry of the object that it is sensing, the task is more difficult. Radio frequency identification systems provide a means for an object to identify itself to the sensing system. The pose recognition system that I developed uses both types of sensing systems to determine the pose of an object. 1.2 Motivation At the Auto-ID Center at MIT, researchers are developing an infrastructure for describing physical objects, storing these descriptions, and accessing the information. They are trying to "create a universal environment in which computers understand the 13 world without help from human beings" [1]. In most applications, a radio frequency identification system is used in order for an object to communicate its identity, which is used to access information about that object. There are many applications, particularly in the supply chain, that can take advantage of this infrastructure. These applications primarily involve inventory management. Other applications, such as warehouse automation, require location information. The Auto-ID system provides a description of the world to computers, but the coarseness of the information limits the range of applications. It can only determine the location of an object to within a reader field. Warehouse automation systems may require more accurate location information. In order to manipulate an object, its pose must be known. The pose recognition system described in this thesis incorporates the Auto-ID infrastructure, together with embedded structure and sensors, in order to determine the pose of objects. 1.3 Organization of Chapters This thesis is divided into two major sections. The first section provides background information related to pose recognition. In Chapter 2, I will discuss coordinate frames, degrees of freedom, and pose definition. In Chapter 3, I will discuss radio frequency identification, which is a key component of my pose recognition system. In Chapter 4, I will discuss the different position sensing technologies that are currently available. Machine vision, which is one of these technologies, is another key component of my pose recognition system. 14 The second section of this thesis introduces a new approach to pose recognition. In Chapter 5, I will establish the assumptions related to the objects and the environment. In Chapter 6, I will explain how the system uses the Auto-ID Center infrastructure to become aware of the presence and identification of an object. In Chapter 7, I will explain the requirements and design of the visual patterns used by the vision system. I will also discuss the information that the vision system will gather from an image of the environment. In Chapter 8, I will discuss how the system combines data from the radio frequency identification infrastructure and vision system in order to complete the task. I will also discuss the ranges of each system and the required overlap of these ranges. In Chapter 9, I will explain the image processing algorithms used to extract pose from an image. In Chapter 10, I will explain my implementation of the pose recognition system. In Chapter 11, I will analyze the design and implementation of the system. Finally, in Chapter 12, I will discuss my conclusions and areas of possible future work. 15 16 Chapter 2: Pose 2.1 Introduction To determine the pose of an object, we must first define pose. An object's pose consists of its position and orientation with respect to a given coordinate frame. In the case of the pose recognition system described in this thesis, the pose of the object will be described with respect to a fixed coordinate frame of the environment. 2.2 Coordinate Frames To describe the pose of an object, we must specify two coordinate frames: one for the object and one for the environment. Figure 2.1 shows the environment, an object, and their corresponding coordinate frames. The coordinate frame of the object is a body- Y 0 X Figure 2.1: Environment, Object, and Coordinate Frames coordinate frame, meaning the coordinate frame is fixed to the object and moves with it. The pose of the object can be described simply as the object's coordinate frame. Here we name the object's coordinate frame as oxyz, with its origin o and principal axes x, y, and 17 z. The coordinate frame of the environment, OXYZ, is a fixed coordinate frame with origin 0 and its principle axes X, Y, and Z [2]. 2.3 Measuring Position and Orientation Once the two coordinate frames have been specified, we can determine the pose of the object by measuring the position and orientation of oxyz with respect to OXYZ. 2.3.1 Position The position of the object is the position of the origin o with respect to point 0. Figure 2.2 shows frame OXYZ, point o, and the position vector P that measures the position of point o. The vector P, which has components px, py, and pz, can be written as P = pxx + py Uy +Pz Uz ' where ur , u Y, and uz are unit vectors along the X, Y, and Z-axes, respectively. Y x Figure 2.2: Position Vector P 18 (2.1) 2.3.2 Orientation The orientation of frame oxyz is measured about the axes of the frame OXYZ. For an arbitrarily oriented frame oxyz, unit vectors u , u, , and u along each of the axes can be written in terms of their components along the axes of frame OXYZ so that HX = r Iu (2.2) + re2uY + r13Uz , H_ = r2 uX + r22 uY + r23 -Z ut = r3 u 5and (2.3) (2.4) + r3 2 uY + r 3 3 H z. Figure 2.3 shows such a frame oxyz along with the components r , r , and r of the unit vector u . Using all of these components, one can develop a rotation matrix R that describes the orientation of the body fixed frame oxyz with respect to the environment frame OXYZ. The rotation matrix is written as R= r, r52 r13 r2 , r22 r 23 r31 r32 r3 3 . (2.5) . z r13 ,1 -........... ............... 0 r j 0 X Figure 2.3: Components of unit vector ux in the directions of the axes of OXYZ 19 The rotation matrix can then be used to calculate angles that describe the orientation. There are many different representations such as X-Y-Z fixed angles or Z-YZ Euler angles [3]. For the X-Y-Z fixed angle representation, the first rotation is about the X-axis by an angle y. The second rotation is about the Y-axis by an angle 8, and the third rotation is about the Z-axis by an angle a. The X-Y-Z fixed angles are calculated using the following equations: For cos(p3) #00, r,22j p6=tan - r 31 (2.6) , r + r2 a = tan 1_ cos(p) , and rcos(p3) (2.7) y = tan_ /cos(p) (2.8) cos(p) For cos(p8) =00, p= (2.9) 90, (2.10) a =0', and . S= ±tan-' (2.11) r22 For the Z-Y-Z Euler angle representation, the first rotation is about the Z-axis by an angle a, creating frame X'Y'Z'. The second rotation, creating X"Y"Z", is about Y' by an angle 81, and the third rotation is about the Z"-axis by an angle y . 20 The Z-Y-Z Euler angles are calculated using the following equations: For sin(fl) $00, 6= tan! (2.12) ,, + r3 , 233 ta /'3sin(6)J ,ad(.3 sin(p) y = tan-tai[32 si , (2.14) sin(i), For sin(p8) =00, P =0' or 1800, (2.15) a =0', and (2.16) y = tanj-2 or tanJ1 . (2.17) 2.4 Degrees of Freedom The number of degrees of freedom of an object is the number of independent variables necessary to locate it. The degrees of freedom also express the amount of information that is unknown in the pose of an object. For an object that has no constraints on its position, six variables are necessary to determine its pose: three position and three orientation. The degrees of freedom are reduced if the object's movement is constrained by another object. This constraint may be described as a joint. If objects touch each other at 21 a point or along a line, the connection is called a higher-order pair joint. If the two objects contact each other on a surface, the connection is termed a lower-order pair joint. As a reference, Figure 2.4 shows each of the six types of lower-order pair joints and the number of degrees of freedom associated with each [4]. Spherical pair (S-pair) Revolute pair (R-pair) C Co Planar pair (E-pe') Prismatic pair (P -pair) bre wpair (H-pair) CyIindrical pair (C-pair) Figure 2.4: Lower-order Pair Joints Courtesy: David L. Brock 22 Chapter 3: Radio Frequency Identification 3.1 Introduction I used Radio Frequency Identification (RFID) as an integral part of my pose recognition system. RFID is not a new technology, but recently has been used in a number of supply chain applications. In an RFID system, data is transmitted using radio frequency waves and does not require human interaction. The pose recognition system used RFID to identify objects and their presence in the environment. A typical RFID system consists of tags and readers, together with a host computer that controls and communicates with the reader. Figure 3.1 shows the key components of an RFID system and how they interact. A reader constantly sends out signals to create a reader field. Any Tag Host Computer Reader (including Transceiver) Figure 3.1: RFID System Components 23 Antenna tags within the range of the field use the signal as a source of power, and send their own signal back to the reader, where it is decoded to determine the identity of the tag. 3.2 Core Components The tags contain data that can be transmitted through radio frequency (RF) signals. Tags can either be passive or active. Passive tags consist of a small memory chip and an antenna. They are powered by the RF signal from a reader's antenna. Once powered, the tag modulates the reader's signal in order to transmit its data [5]. An active tag, on the other hand, actively transmits its data to a reader. In addition to a chip and an antenna, active tags have batteries and do not require the presence of a reader's RF signal for power. Instead of modulating the reader's signal, the active tag transmits its own signal. Because of the lack of a battery, passive tags are often smaller and less expensive than active tags [6]. However, the active tags' batteries allow them to be read over a much greater distance. There are three different kinds of tag memory: read-only, write once read many (WORM), and read/write. Read-only tags are programmed by the manufacturer, and the data stored cannot be changed. WORM tags can be programmed by the user, but the data cannot be changed once it has been programmed. Read/write tags can be programmed and changed by the user [5,7,8]. The readers are responsible for transmitting and receiving RF signals from the tags. The reader's transceiver has two major functions. First, it generates the RF energy that is used to send a signal to power passive tags. For a system that uses active tags, the reader is not required to send a signal to power them. The transceiver also filters and 24 amplifies the signal that is returned by a tag [9]. The reader controls both of these functions. In order for the reader to transmit an RF signal, the transceiver supplies the power and the signal is sent through the reader's antenna. The antenna is the part of the system that transmits and receives the RF signal [5]. When the antenna receives a signal, the signal is sent to the transceiver, where it is decoded in order to interpret the data the tag has sent. The reader then sends this data to the host computer, where it is used to run an application. 3.3 Reader Fields When the reader transmits an RF signal, an electromagnetic field is created in the vicinity of the reader. The RF energy of this field supplies power to the tags. The size of the reader field is dependent on the power available at the reader. For a circular loop antenna, which is a common type of antenna, the strength H(x) of the electromagnetic field is described by the equation INRR 2 2(R 2 +x (3.1) 2 3 ) /2 where I is the current in the antenna coil, NR is the number of turns in the antenna coil, R is the radius of the antenna, and x is the perpendicular distance from the antenna. From equation (3.1), the strength of the field is inversely proportional to the cube of the distance from the antenna, and the power to the tags decreases accordingly. The amount of power that the tag can generate is a factor that affects the performance of an RFID system. Even if the signal sent by the receiver is strong enough to power a tag at a given distance, the signal that the tag sends must be strong enough to 25 be detected by the reader. Although the tag's power does not affect the actual reader field, it does affect the read range, which is the distance across which a reader can read tags. For this reason, active tags can be read across much greater distances than passive tags. The frequency of the reader's signal and the environment in which an RFID system is implemented affect the performance and read range of the system. The frequencies are commonly divided into the following categories: . Low Frequency (0 - 300 kHz), . High Frequency (3 - 30 MHz), . Ultra High Frequency (300 MHz - 3 GHz), and . Microwave (> 3 GHz) [10]. Some mediums within the environment can act to reflect, absorb, or block certain RF signals [11]. Therefore, the frequency of the signal for an RFID system should be chosen based on the environment in which the system is operating. For a given medium, signals of certain frequencies can pass through without being affected, while signals of a different frequency might be weakened or even blocked. For example, "[t]he absorption rate for water at 100 kHz is lower by a factor of 100,000 than it is at 1 GHz" [10]. In general, the lower the frequency of the RF signal, the more it is able to penetrate through materials [12]. The frequency of the signal must be chosen so that the signal is not blocked by anything in the read range of the reader. Otherwise, tags that should be located within the read range of the reader would not be read and the system would not operate effectively. 26 If the power and frequency requirements are met, the system should be able to effectively read a tag located within the read range of the reader. However, many of the applications require that the system be able to simultaneously read multiple tags located within the same reader field. This characteristic of an RFID system is called anticollision [13]. A discussion of the anti-collision problem and an example of an anticollision algorithm are presented in [14]. 3.4 Benefits of RFID There are many aspects of RFID systems that make them attractive in industrial applications. One of the advantages of RFID is that the interaction between the reader and tags is not line of sight. This is very important for many applications in which it is difficult or impossible to use alternate technologies such as barcodes. For example, if a tag is within a reader field, but there is another object between the tag and the reader, the tag will still be read as long as the system uses a signal with an appropriate frequency. One current application of RFID is for animal identification [15]. A pet can be implanted with an RFID tag so that if the pet becomes lost, its tag can be read and its owner identified. The tag is inserted under the animal's skin, so the reader and the tag must be able to communicate through its skin. This would be impossible for systems requiring line of sight. Another benefit of RFID systems is autonomy. Software can control the operation of the reader and store information received from tags. This ability of the systems to operate unattended by humans makes them very useful. 27 In most warehouses, barcodes and scanners are used to manage inventory. When a product is being shipped or received, a worker has to physically scan the barcode so that the inventory in the warehouse is accurately recorded in the system. Using RF tags and readers, the operation of most warehouses could be radically improved by eliminating human labor through automatic inventory. Rapid identification is another benefit of RFID systems. Automatic toll collection is now a common use of RFID technology. Cars with appropriate tags can pass through the reader field without stopping and are identified within milliseconds [16]. 28 Chapter 4: Position Sensing 4.1 Introduction I used machine vision, in conjunction with RFID, to determine the pose of an object. There are many different kinds of sensors that can be used to determine the position and orientation of an object. Usually, a single type of sensor is used for pose recognition, but in this thesis I explore combining multiple sensor types (specifically vision and RFID) to perform the task. I will briefly discuss some of the different kinds of position sensors, their limitations, and some of the pose recognition research. 4.2 Machine Vision Machine vision is one of the most common types of sensing in robotics [17]. These systems try to mimic or take advantage of one of the most powerful human senses - sight. Vision systems take a picture of the environment and then process that image to make decisions about the state of the environment. The two key steps in machine vision are image capture and image processing. A vision system must first capture an image of the environment. Any camera or image capture device can be used for this purpose. The picture that the camera takes is composed of many small colored squares. Each square, or pixel, is entirely one color. However, since the pixels are so small, the camera is able to generate an accurate picture of the environment by using thousands, or even millions, of pixels. The image is encoded into an array by storing the color of each pixel in an element of a two-dimensional array [18]. A binary image is represented by an array of 2-bit values, a grayscale image is 29 often represented by an array of 8-bit grayscales values, and a color image is typically represented by an array of 24-bit numbers, each of which contains the red, green, and blue color values. In Figure 4.1, there is a schematic and binary image of a rectangular solid object lying on a flat surface. Y Y 11i (a) x i (b) x Figure 4.1: Object in Environment and Machine Vision Representation Once the machine vision system has a data structure that represents the image of the environment, it extracts information from the image using various image processing techniques. The type of information being captured depends on the specific application, but machine vision systems are frequently used for locating objects. Because vision systems are calibrated, the system is able to determine the location of an object in the environment from the position of the pixels representing that object in the image. The area A -, represented by a single pixel is equal to the area Afov of the base of the field of view of the camera divided by the number of pixels in the image. The area Afov is proportional to the distance from the camera lens to the base of the field of view, as shown in Figure 4.2. 30 Camera lens Base of field of view Figure 4.2: Base of Field of View Proportional to Distance from Camera Lens The error in position due to the discretization of the image is bounded by where 1 is the length of the side of a pixel and is equal to Ap~e . , The system can also determine the orientation of the object in the environment from the pixels representing the object in the image. The error in orientation can be calculated based on the error in position. For example, in Figure 4.3 the black line represent the longest edge of an object, which has length L. Taking the maximum amount of error in the vertical position of the pixels at the ends of the edge, we can calculate the maximum error 1 in the orientation of the object. The error in orientation due to the discretization of the image is bounded by tan' (L). For a given field of view of a camera, as the number of pixels increases, the amount of error decreases. 31 2Y2 Figure 4.3: Error in Orientation Measurement due to Discretization of Image While it provides a great deal of pose information, there are limitations of machine vision. One disadvantage of machine vision is that it generates a twodimensional representation of the environment. Machine vision systems can use multiple cameras, motion, or other techniques to attempt to generate a three-dimensional representation of the environment, but the image processing algorithms become much more complicated [19]. Another problem with machine vision is that it is difficult to deal with symmetry and differentiate between object faces with similar dimensions. Using a common image processing technique called edge-detection, a machine vision system can isolate the edges or boundaries of an object. A system would likely use edge-detection to determine the position of an object. Since the edges could convey some information about the dimensions of the object, it would also be possible to gather some information about the orientation of the object. However, because the system would not be able to distinguish between two sides of like dimensions, it would not be able to completely determine the pose of the object. Lighting can be another problem when using machine 32 vision systems. If light reflects off of a black surface, the camera could take picture that depicts an inaccurate representation of its surroundings. There has been a great deal of research in using machine vision systems for pose recognition. Some of the systems determine the pose of an object through the use of multiple cameras. One such system uses six image capture devices to take pictures of each side of an object [20]. Such a system would be very intrusive in many industrial settings. Other systems use two cameras to determine depth through stereo vision. However, these systems are very difficult to calibrate and can require significant computational power [19]. Three-dimensional information may also be obtained from a two-dimensional image through a-priori information. The vision systems use feature vectors or geometric models to identify the object [21,22,23]. Once the object has been identified, the system uses its knowledge of the particular object to estimate its pose from the two-dimensional image. These systems store a significant amount of data about each object. Because of the use of models and feature vectors, these systems are appropriate for determining the pose of oddly shaped or unique objects. However, the systems may have difficulty identifying and locating objects with symmetric faces or multiple objects of the same shape. 4.3 Tactile Arrays While machine vision systems are able to determine the position of an object without touching it, there are other sensing techniques that rely on physical contact in order to determine its position. Tactile array sensors measure a pressure profile that results from contact with an object [24]. 33 In [25], the author describes a mobile tactile array that a robot can use to explore its environment. The array of sensors is attached to a tool that the robot can grasp and use to gather information about the objects located in its workspace. Other research has focused on embedding tactile sensors in the environment. A stationary array fixed to the surface of the environment can be used to determine the presence of an object in the environment [26]. The array can determine the "location, orientation, and shape of objects in the workspace of a robot" [27]. An object located on the surface will compress the tactile sensor, creating a profile similar to the pixels of an image. The sensor profile is stored in a data structure, which can then be used to determine the position and orientation of the object. If the position and orientation of the object is the only necessary information, force-sensing tactile sensors are not always necessary. A binary contact sensor is a form of tactile sensor that determines the presence of a reaction force, but does not measure the force itself. The sensor simply measures a profile of where forces are being applied, but gives no information about the magnitude of the forces. Figure 4.4a shows the same object as in Figure 4.1 a, but with a binary touch sensor array introduced onto the surface of the environment. The object rests on the environment, and all of the touch sensors beneath the object are being pressed. The rest of the sensors feel no force. Figure 4.4b depicts the sensor array representation of the object's location. Without any knowledge of its shape, it may be difficult to accurately determine the position and orientation of the object. The degree of accuracy to which the object's pose can be determined is dependent upon the size and spacing of the sensors compared 34 AkY A 00 0 00 00 00 00 0 00 0 00 00 00 00 0 00 0 00 0 00 0 00 0 00 00 0 00 00 0 00 00 00 000 0 0 0 0 00 0 0 0 0 0000 0 00 00 00 00 0 0 000 0 0 0 00 000 0 000 00 00 00 00 0 00 0 00 0 0 00 000000000000000 0 00 0 x (a) 00 00 00 00 0 000 0 0 0 00 00 0 000 0 00 0 00 000 0 0 0 00 00 0000OS 0 000 0000 0 0 00 0 00 0 00000 00 00 00 00 @0 0 000 0 000 00 00 00 00 0 00 00 00 00 0 0 00 0 0 0 0 x (b) Figure 4.4: Object in Environment and Tactile Sensor Array Representation to the size of the object. For example, in Figure 4.5 the black line represents the longest edge of an object, which has length L. The error in position is bounded by (s- d 2)' where d is the diameter of each of the tactile sensors and s is the perpendicular distance between each sensor. Taking the maximum amount of error in the vertical position of 2 diamnter 2 =d S S Figure 4.5: Error in Measurements due to Size and Spacing of Tactile Sensors 35 each end of the edge, we can calculate the maximum error 5 in the orientation of the object, which is equal to tan 1 ( 2s- d. In addition to requiring contact, tactile arrays have limitations similar to those of visions systems. The arrays provide a two-dimensional representation of the environment, which is inadequate for determining three-dimensional pose. They also have the problem of differentiating between sides of like dimensions, and do not have the ability to use markings to overcome this problem. 4.4 Radio Frequency Another method of locating objects is through the use of radio frequency triangulation. The Global Positioning System (GPS) relies on a network of satellites that orbit the earth and transmit signals to receivers that determine position to within a few meters. In order to determine location, a receiver first calculates its distance from a number of satellites. Because the satellites reside outside of the earth's atmosphere, and its gravity and wind resistance, their locations are relatively easy to calculate [28]. In order to ensure the accuracy of GPS position measurement, the satellites are constantly tracked by the Department of Defense [29]. The receiver calculates its distance from a satellite by multiplying a known velocity of the signal by the time required for the signal to travel from the satellite to the receiver. The travel time is measured by comparing a Pseudo Random Code, which is different for each satellite, with the version of the code sent by 36 the satellite [28]. The signal received from the satellite is delayed by an amount equal to the travel time of the signal. The GPS receiver accepts signals from at least four different satellites when calculating its position. Position is computed in the following way. Given the distance from one satellite, the position of the receiver is limited to the surface of a sphere centered at the satellite with a radius equal to that distance. With two satellites, the position is limited to the circle that spans the intersection of the two spheres surrounding the two satellites. By adding a third satellite, the position of the receiver is further narrowed to the two points that lie on the intersection of the three spheres. The fourth satellite is used to account for errors in the calculation of travel time [28]. There are some obvious limitations of GPS. First, a GPS receiver can only determine its location to within a few meters. This is not accuracy enough for automated systems to locate and acquire an object. Second, the system cannot determine orientation. Third, GPS signals cannot be received inside structures and are occluded by terrain and foliage [30]. The Differential Global Positioning System (DGPS) provides the same information as GPS, but with greater accuracy. By using a stationary receiver with a known position, DGPS is able to eliminate much of the error in the calculation of travel time. Other receivers, which are in contact with the stationary receiver, can apply correction factors to their position measurements in order to reduce errors [31]. DGPS is able to determine position to within less than a meter [32]. However, it shares other limitations with GPS such as its inability to determine orientation or determine position within buildings. 37 Another use of radio frequency is Real Time Locating Systems (RTLS). These systems are capable of locating objects within buildings, and are often used to track assets within warehouses and other buildings [33]. RTLS works by using locating devices that are installed within a building and tags that are attached to the objects within that building. RTLS systems can track objects using fewer locating devices than the number of readers required by an RFID system [34]. While RTLS solves the problem that GPS has with locating objects within buildings, is does not meet the needs for pose recognition. RTLS can only determine an object's location to within a few feet [35], and cannot determine the orientation. 38 Chapter 5: Objects and Environment 5.1 Introduction In this chapter, I will introduce assumptions and constraints on the objects and environment. These constraints aid pose recognition while providing utility in common commercial and industrial situations. I will also propose definitions for the object coordinate frame that describes the pose of an object. 5.2 Shapes of Objects I limited the class of objects that could be located by the pose recognition system to rectangular solids. This limitation was justified based on current warehouse standards. For Consumer Product Goods (CPG), which accounts for a large number of warehouses in the U.S., the majority of objects that are manipulated in various stages of the supply chain are rectangular solids. When products are shipped from the factory to the warehouse, or from the warehouse to the retailer, they are usually shipped in a box, which can be modeled as a rectangular solid. Also, products are often packed into containers or stacked on pallets, both of which may also be modeled as rectangular solids. While they reside in the warehouse, most products remain in boxes. When items are unloaded from the shipping boxes, many are packaged in boxes. Cereal, crackers, microwaves, and coffee makers are examples of products that are packaged in boxes. 39 5.3 Surface of the Environment The pose recognition system was designed to operate in an environment that has a flat, level surface. This type of surface was chosen for reasons similar to the choice of rectangular solid objects. The shelves on which products are placed in most warehouses have a similar flat and level surface. Also, the environment provides a stable surface on which the objects can reside. 5.4 Object Coordinate Frame To describe pose, I used a coordinate frame together with named faces for the surfaces of the solids. The coordinate frame was fixed at the geometric center of the solid with coordinate axes parallel to the faces of the solid. The names for the faces of the object were based on its dimensions. For the purposes of this thesis, the length, width, and height of the object will correspond to the longest, middle, and shortest dimensions, respectively. I defined the x-direction parallel to the height, y-direction parallel to width, and zdirection parallel to length. Figure 5.1 shows a rectangular solid, with its dimensions labeled, along with the four possible coordinate frames that obey the right-hand rule. The front and back of the object are parallel to the y-z plane and have dimensions of length and width. The right and left are parallel to the x-z plane and have dimensions of length and height. The top and bottom are parallel to the x-y plane and have dimensions of width and height. For each pair of sides of like dimensions, an object often has a primary side. For example, the front of a box of cereal is easily distinguishable from the back. The same is true of the top and bottom of the box. For 40 each object, the front, right, and top will be considered the primary sides. The back, left, and bottom are the secondary sides. z x x Length (L) z z y y j Height (H) Z Width (W) Figure 5.1: Object Dimensions and Possible Coordinate Frames Each of the primary sides is located in the positive direction of the coordinate axis to which it is perpendicular. The front is in the positive x-direction, the right is in the positive y-direction, and the top is in the positive z-direction. I further constrained the possible coordinate frames by choosing a primary side from one of the three pairs of sides. For example, I chose one of the sides with dimensions of length and width as the front, and used this side to determine the positive x-direction. Figure 5.2 shows the object with its dimensions and front and back sides labeled, along with the two possible coordinate frames that obey the right-hand rule. 41 z Back y L z/ Front W Figure 5.2: Front and Back Sides of Object and Possible Coordinate Frames Finally, I chose another primary side from one of the pairs of sides. I chose one of the sides with dimensions of length and height as the right, and used this side to determine the positive y-direction. This reduced the number of permissible coordinate frames to one, and also constrained which side must be the top. In addition to its name, each side was labeled with a number. This number was embedded in the visual pattern as discussed in Chapter 7. The sides were numbered in order of increasing surface area. For each pair of faces, the primary side was numbered first, and the secondary side was numbered second. The front, back, right, left, top, and bottom were numbered sides 1, 2, 3, 4, 5, and 6, respectively. Figure 5.3 shows the object with each of its sides labeled and its coordinate frame fixed at its center. 42 Top (5) z Left (4) -- -Right 4- L Back (2) 1 x- Front (1) H W Bottom (6) Figure 5.3: Object with all sides Labeled and Coordinate Frame at Center 43 (3) 44 Chapter 6: Object Presence and Identification 6.1 Introduction The pose recognition system relied on the Auto-ID Center infrastructure, which makes use of RFID, in order to detect the presence of an object, and identify and describe it. In order to use the infrastructure in this manner, each of the objects was fitted with an RF tag, and the environment was monitored by a reader. 6.2 Auto-ID Center Researchers at the Auto-ID Center at MIT have developed a framework for describing physical objects and linking objects to their descriptions. This framework is composed of three main components: the Electronic Product Code (EPC), Physical Markup Language (PML), and Object Naming Service (ONS). The Universal Product Code, or UPC, which is the numbering scheme used for barcodes, uniquely identifies products. All products have different UPC codes, but all instances of a particular product have the same UPC. The EPC is a numbering scheme that is used to uniquely identify physical objects. Each instance of a particular product has a different EPC. Designed as a 96-bit number, the amount of objects that can be uniquely identified with EPCs should be more than sufficient for the foreseeable future. More description of the EPC, as well as its design philosophy, is presented in [36]. PML is a language designed for describing physical objects. Based on the eXtensible Markup Language (XML), PML is used to contain such information as the geometry, location, and temperature of a physical object. The language was designed to 45 allow computers to easily access information about objects that could be used to perform such tasks as business transactions or automated processes. More information about the design approach and some of the core components is presented in [37,38]. In the Auto-ID infrastructure, the EPC is the method for uniquely identifying objects, PML is the way of describing objects, and the ONS is the way of linking the object to its information. Given an EPC, the ONS determines the network location of the information about the object identified by that EPC [39]. This information, which is received in the PML format, can then be used in further applications. In addition to these components, the Auto-ID Center is investigating possible applications of the infrastructure. While not restricted to RFID, the EPC has primarily been stored in an RF tag that is affixed to the object. The EPC is read using a reader, which is connected to a host running ONS. Once the network location of the information is determined, the system obtains a PML file describing the object. 6.3 Object Identification The pose recognition system relied on an RFID reader embedded within the environment. When an object entered the environment of the system, the RFID system read its EPC and used it to determine the identity of the object. In order for this to be possible, the system had some awareness of the various objects that were tagged. For each of the objects, there was a certain amount of information that was known. The EPC of each object was the first important piece of information. For pose recognition, it was helpful to know the geometry of the object. This information was stored for each object that interacted with the pose recognition system. The EPC was used to determine the 46 location of the information, and the object descriptions were returned in a PML file that was easily readable by the system. The descriptions were a key element of the pose recognition system, and were necessary to determine the pose of the object. 6.4 Object Presence In addition to gathering information about the object that entered the reader field, the system also determined the presence of the object within the range of the system. Once an object entered the reader field, the position of that object was limited to the space contained within the reader field. The object's presence was what triggered the system to use additional sensing capabilities to determine the object's pose. Figure 6.1 shows a shelf that is monitored by a number of reader antennas. If an object is present on the shelf, then its tag will be powered and read by the antenna within Figure 6.1: Antennas Monitoring Shelf Sections whose field the object is located. Before entering this field, the object could have resided in another reader field, or could have been in a location out of the reach of any readers. In the first case, the general location of the object would change from one reader field to another. In the second case, the object's location is narrowed from somewhere in the 47 environment to somewhere within the reader field that is currently reading the tag. The system could then determine where it should focus its additional sensing to determine the pose of the object. 48 Chapter 7: Vision System 7.1 Introduction After the RFID system determined the presence of an object, the vision system captured an image of the environment. By this time, the pose recognition system already had a good deal of information about the object. The RFID system had determined that an object was within the range of the reader's antenna, and had identified the object and determined its dimensions. The vision system was the component of the pose recognition system that provided the last few pieces of information necessary to determine the pose of an object. 7.2 Visual Pattern The vision system relied on a visual pattern printed on the center of each side of the object. Instead of trying to capture an image of the entire object, the system used the pattern to determine the object's position and orientation. 7.2.1 Requirements of Pattern The pattern had a number of functions. Since it was located at the center of the side of the object, the system was able to determine the center of the object by calculating the center of the pattern. The same was true for the orientation of the object. By calculating the orientation of the pattern, the system was able to determine the orientation of the object. The last, and most important, function of the pattern was determining the side of the object that was facing the camera. Without using the pattern, it would have 49 been very difficult to distinguish between two sides of like dimensions. Each of the six sides of the object was labeled with a unique pattern. By capturing an image of the object, the vision system determined which side was facing the camera and calculated the position and orientation of that side. 7.2.2 Color Scheme Typically, the lighting of the environment can greatly affect the performance of a vision system. To minimize its effects, I chose to limit the colors of the environment, objects, and patterns to black and white. The surface of the environment was entirely black. The objects themselves, with the exception of the patterns on each side, were also black. The pattern, which is described in Section 7.2.3, was black and white. Ambient light made the white portion of the pattern shine, while the object and environment remained dark. The color scheme made it easier for the system to locate and read the pattern. While the system that I designed and implemented used this black-and-white scheme, it is possible to incorporate other color schemes. For example, more complicated image processing algorithms than the ones that I implemented could extract the pattern from a color image. Also, the patterns could be printed on the sides of the object in invisible fluorescent ink. The system could then use ultraviolet lighting to illuminate the pattern. 50 7.2.3 Design of Pattern There were two major issues that I considered when designing the pattern. First, I wanted to make sure that the pattern could be modified to provide a unique identifier for each of the six sides. Second, I made sure that the system would be able to determine the orientation of the pattern. I wanted to eliminate the possibility of having errors in orientation of 180 degrees. I chose to identify each side by using a pattern of five squares, each of which was either black or white. The five squares acted as bits in a five-digit binary number. Black squares represented a zero, or an unset bit. White squares indicated a one, or a set bit. The first and last bits, bit 0 and bit 4 respectively, were used to determine orientation. The first bit was always set, and the last bit was always unset. Therefore, the system was able to read the bits correctly, even if the pattern was upside down. The middle three squares were used to calculate the number that represented a certain side of the object according to the numbering scheme described in Section 5.2. The second, third, and fourth bits were used to calculate the side number using the equation SideNumber = (4 -Bitl) + (2 -Bit2) + (Bit3). (7.1) Figure 7.1 shows the patterns for each of the six different sides. Each of the white and black patterns is shown against a black background. 7.2.4 Alignment of Pattern on Object In addition to locating and reading the pattern, the system needed to know how it was positioned and oriented on the side of the object. This was necessary so that the system could determine the object's position and orientation from the pattern's position 51 Side I Side 4 Side 2 Side 5 Side 3 Side 6 Figure 7.1: Patterns for each of the Six Sides of an Object and orientation. On each face of the object, the pattern was centered about the center of the face, with the short side of the pattern parallel to the shorter dimension of the face, and the long side of the pattern parallel to the longer dimension of the face. The pattern was oriented so that it was read, from bit zero to bit four, in the positive direction of the axis parallel to the long side of the pattern. Figure 7.2 shows an example of the appropriate positioning of the pattern on the side of an object. The figure shows the front of an object, with dimensions of length by width. The pattern, which represents side number one, is oriented so that it is read in the positive z-direction. 7.3 Information from Image In order for the system to gather information about the pose of the object, it had to capture an image of the object in the environment. Once the image was captured, the system processed it in order to extract data. Many of the processing techniques relied on knowledge of the visual pattern in order to gather the necessary information. The desired 52 x z y W L * Figure 7.2: Positioning of Pattern on Object Side information was the position and orientation of the pattern within the image, and the number of the side facing the camera. The position and orientation were calculated using the pixels of the image. The position was described by the column and row of the pixel located at the center of the pattern. The orientation of the pattern was calculated as an angle from the horizontal. However, before reading the pattern, the orientation may have been incorrect by 180 degrees. Once the position and angle had been calculated, the system read each of the squares in the pattern, and determined their value. Using the values of the squares at either end of the pattern, the system determined whether or not a 180-degree offset had to be included in the orientation. The system also used the offset information in order to assign the values of the middle three squares to the appropriate bit values. The system then used equation (7.1) to calculate the number of the side facing the camera. 53 54 Chapter 8: Distributed Sensing 8.1 Introduction The pose recognition system developed in this thesis used RFID and machine vision to determine the position and orientation of an object. Using either one of these sensors, the task of pose recognition would have been much more difficult. In this section, I will discuss the range of each sensor and the required overlap of these ranges. Also, I will discuss the data that each of the sensors is able to capture and show that a single sensor could not complete the task of pose recognition. I will then show how the data from each of the sensors was combined to determine the pose of an object. 8.2 Sensor Range The read range of an RFID system is largely based on the power available at the reader and the shape of the antenna. As discussed in Chapter 3, the size of the reader field is dependent on the available power. The shape of the reader field, on the other hand, is primarily dependent on the design of the antenna. An example of a reader field is shown in Figure 8.1. For the most part, the field extends above the antenna. When tagged objects pass through this field, the tags are able to communicate with the reader. 55 Reader field Antenna Figure 8.1: RFID System Antenna and Reader Field The field of view, or range, of a machine vision system is dependent on the image capture device. Figure 8.2 shows the range of a vision system, which is a frustum defined by the field of view of the camera and the maximum and minimum visual planes. All objects within this range can be viewed by the system. The field of view of the camera is a pyramid. The maximum and minimum planes are defined by the limits on how close to or far from the camera an object can be in order to provide adequate information. If the object is too far away from the camera, then there may not be enough detail in the image to gather data through image processing. The maximum plane, which forms the base of the frustum, is located at the furthest distance from the camera at which the required amount of detail can be obtained. If the object is too close to the camera, the entire pattern may not be captured in the image. The minimum plane, which forms the 56 top of the frustum, is located at the shortest distance from the camera at which the entire pattern may be viewed. Minimum plane Field of View Maximum plane Figure 8.2: Machine Vision Field of View In order for the pose recognition system to determine an object's pose, the object had to be located within the intersection of the read range of the RFID system and the field of view of the machine vision system. Therefore, when the RFID system discovered that a tagged object was within its read range, the machine vision system could capture an image of that object. For the pose recognition system to operate effectively, the field of view of the camera had to completely encompass the read range of the reader, as shown in Figure 8.3. Therefore, the camera could view any object that was sensed by the RFID system. 57 Figure 8.3: Required Overlap of RFID and Machine Vision System Ranges 8.3 Sensor Data While the RFID system provided information that was very important for pose recognition, it could not determine pose by itself. As discussed in Chapter 6, the RFID system provided two main functions: identification and presence. The identity of the object was used to determine its dimensions. The position of the object was limited to the space within the range of the reader, and the RFID system could provide no information about the orientation of the object. The machine vision system, on the other hand, provided a great deal of information about the pose of the object. The system captured an image that was used to gather information about the position and orientation of the object. Through the use of 58 the visual patterns described in Chapter 7, the vision system completely determined the orientation of the object. It also determined some information about the position of the object. In Chapter 9, I will discuss the image processing techniques used to extract this information from the image. However, without some critical information about the object's geometry, the vision system could not accurately determine the position of the object. By combining the two sensing systems, my system completely determined the pose of an object within the constraints described in Chapters 5 and 7. The RFID system determined the identity of an object, and alerted the vision system that an object was present within its range. The vision system then captured an image of the object, and processed the image to gather the appropriate information. By combining the object information with the vision system's measurements, the system completely determined the pose of the object. In Chapter 10, when I discuss my implementation of the pose recognition system, I will discuss how the object information is combined with the vision system information in order to calculate the pose. 59 60 Chapter 9: Image Processing 9.1 Introduction The pose recognition system used image processing to gather critical information about the pose of the object. The system analyzed the image of the object to calculate the position and orientation of the pattern and determine the side that faces the camera. In this chapter, I describe the image processing algorithms in detail that are used to abstract the pose. 9.2 Overview of Algorithms The image processing portion of my pose recognition system was composed of the following steps: . Correct Lens Distortion - The system corrects the image to remove distortion due to the curvature of the camera lens. . Color Image to Binary Image - The color image is converted to a grayscale image using an equation that calculates the luminance of each pixel. Using a histogram of the grayscale image, the system chooses a threshold for converting the grayscale image to a binary image. . Isolate Pattern - The system divides all of the white pixels in the binary image into collections. Each collection is tested in order to isolate the pattern. . Locate Center - The system fills in the holes in the pattern and calculates the center of area of the resulting rectangle. 61 . Calculate Angle - The system calculates the second moments of the pattern to determine the angle of the pattern. . Read Pattern - The system checks the value of each of the squares in the pattern, and determines the side number. The following sections describe each of these algorithms in detail. 9.3 Correct Lens Distortion The first stage of the image processing is to undo distortion of the image due to the curvature of the lens of the image capture device. While the distortion can be rather minor, its effects are seen most at the corners of the image. The method used to correct this distortion was taken from [40]. The coordinate (i, j, ) is the location of a pixel in the desired corrected image, and is measured from the center of the image. The coordinate (id Id) is the location of the same pixel in the distorted image, and is also measured from the center of the image. For each of the three color bands, and for every pixel in the corrected image, the location of the corresponding pixel in the distorted image is found using the equations: r 2 id = x2 + y2 =i, .(1+kr 2 ), and jd = j, _(I +kr2), (9.1) (9.2) (9.3) where k is a very small negative number that represents the amount of curvature in the lens. The value of a pixel at location (i, j,) in the corrected image is equal to the value of the corresponding pixel at location (id 5id) in the distorted image. However, the 62 values of id and Id may not be integers. Therefore, the system must calculate the value of the pixel (i, j,) as a weighted average of the four pixels surrounding the location in the distorted image. The weights used to calculate the value of the pixel are (id , id) inversely proportional to the distance from each of the surrounding pixels to the location (id , )i. Once the value of each pixel has been calculated, the resulting corrected image is free of lens distortion. 9.4 Color Image to Binary Image Although the environment and the objects are restricted to black-and-white, the image is in color. The color picture actually looks like a grayscale image, but there are still red, blue, and green values for each pixel. The color image is represented by 3 arrays, each having L columns and W rows. The red, green, and blue arrays each store an 8-bit color value for each pixel. While many image processing techniques could be performed on color images, the algorithms are greatly simplified for binary images. The system first converts the color image to a grayscale image according to the NTSC encoding scheme used in American broadcast television [41], Y(i, j) = 0.299 -R(i, j) + 0.587 -G(i, j) + 0.114 -B(i, j), where i is the horizontal position of the pixel (0 i < L), j is the vertical position of the pixel (0 j < W), Y(i, j) is the luminance of pixel (i, j), R(i, j) is the red value of the pixel (i, j), 63 (9.4) G(i, j) is the green value of the pixel (i, j), and B(i, j) is the blue value of the pixel (i, j) . The luminance of a pixel is the same as its grayscale value. The grayscale image is represented by a single array of L columns and W rows. This array stores the grayscale value for the each of the pixels in the image. Figure 9.1 shows the grayscale representation of a picture taken by the webcam that shows an object and the environment. Figure 9.1: Grayscale Image Once the system has developed the grayscale image, its next task is to convert it to a binary image. Theoretically, this is an easy task. Given some threshold value between 0 and 255, all pixels with a grayscale value less than the threshold are given a value of zero, making the pixel black. For all pixels with a grayscale value greater than or equal to the threshold, the pixel value is set to 255, making the pixel white. The difficulty, however, lies in determining an appropriate threshold value. 64 In order to determine an appropriate threshold, the system generates a histogram of the pixel intensity values in the grayscale image. Figure 9.2 shows the histogram of the image in Figure 9.1. The pixel intensity values are listed on the horizontal axis while L) 0 4.0 E Grays cale Value Figure 9.2: Histogram of Grayscale Values the number of pixels that have a given intensity value are listed on the vertical axis. If there are two peaks around which most of the pixel intensities are clustered, with a large gap separating the peaks, then the histogram is said to be bi-modal. This type of histogram is ideal for determining a threshold because an appropriate value can be chosen in between the peaks [42]. All pixels to the left of the threshold will become black and all pixels to the right will become white. Since the objects and the environment are primarily black and white, the images taken by the system should usually produce a bimodal histogram. The system uses the histogram of the image to choose a threshold pixel intensity that is located somewhere between the two peaks. Once the threshold has been selected, the system uses the grayscale image, along with the threshold, to develop a binary image of the object and environment. Figure 9.3 shows the binary image created 65 from the grayscale image of Figure 9.1 with a threshold chosen using the histogram in Figure 9.2. Figure 9.3: Binary Image 9.5 Isolate Pattern Once the binary image has been created, the system is almost ready to begin extracting data from the image. As mentioned before, all of the information that the system will gather is based on the pattern in the image, and its position and orientation. Theoretically, the pattern should contain the only white pixels in the image. Since the entire environment is black, and the rest of the side facing the camera is black, the system should be able to perform calculations on the image as a whole without having to single out the pattern. However, there are times when there are white pixels in the image that are not part of the pattern and can affect the calculations and introduce error into the measurements. In order to avoid this, the system scans the binary image and determines the number of collections of pixels. A collection is a set of all white pixels that are connected to each other. Figure 9.4a shows two collections of white pixels. Two pixels are connected if there exists a path between the two pixels. In Figure 9.4b, the pixels 66 labeled A and B are connected. A path consists of a number of pixels that form a continuous route of neighboring white pixels. In Figure 9.4c, the four pixels that are outlined and shaded form a path between pixels A and B. A pixel's neighbors consist of the eight pixels that are directly adjacent to the pixel at its four edges and four corners. In Figure 9.4d, the pixels labeled 2, 4, 5, and 7 are neighbors of pixel A along its four edges. The pixels labeled 1, 3, 6, and 8 are its neighbors at its four corners. (b) (a) (c) (d) Figure 9.4: Explanation of Collections, Connectivity, Paths, and Neighbors The system scans the image, looking for white pixels. When it finds one, it checks to see if any of its neighbors belong to a collection. If a neighboring pixel is in a collection, then that same collection number is assigned to the current pixel. The system continues checking the rest of the pixels to make sure that the pixel does not have two neighbors in different collections. If this occurs, the system merges the two collections 67 located on the side of the object. Figure 9.5a shows the grayscale version of an image that shows how some of the pattern on the side of the object can be captured. Figure 9.5b shows the binary version of the image. The presence of the additional white pixels affected the calculations of the position and orientation of the pattern within the image. Therefore, in order to remove this source of error, I implemented the portion of the system that groups pixels into collections and tests each collection to see if it is a pattern. (a) (b) Figure 9.5: Additional White Pixels from Pattern on Side of Object 9.6 Locate Center Once the pattern has been isolated, and any other collections of pixels have been eliminated, the system prepares to extract data from the image. The first information that the system attempts to gather is the center of the pattern, which is used to describe the position of the pattern within the image. In order to make the calculation easy, the system first makes a copy of the image, and then attempts to fill in the holes in the pattern. The system scans the entire image, and for every black pixel, the system checks to see if it is bounded by white pixels. If the pixel is bounded, then it is set to white. After scanning the entire image, the pattern is replaced by a solid rectangle of the same 69 located on the side of the object. Figure 9.5a shows the grayscale version of an image that shows how some of the pattern on the side of the object can be captured. Figure 9.5b shows the binary version of the image. The presence of the additional white pixels affected the calculations of the position and orientation of the pattern within the image. Therefore, in order to remove this source of error, I implemented the portion of the system that groups pixels into collections and tests each collection to see if it is a pattern. (a) (b) Figure 9.5: Additional White Pixels from Pattern on Side of Object 9.6 Locate Center Once the pattern has been isolated, and any other collections of pixels have been eliminated, the system prepares to extract data from the image. The first information that the system attempts to gather is the center of the pattern, which is used to describe the position of the pattern within the image. In order to make the calculation easy, the system first makes a copy of the image, and then attempts to fill in the holes in the pattern. The system scans the entire image, and for every black pixel, the system checks to see if it is bounded by white pixels. If the pixel is bounded, then it is set to white. After scanning the entire image, the pattern is replaced by a solid rectangle of the same 69 dimensions as the pattern. By filling in the holes, the center of the pattern can be found by calculating the center of area of the rectangle. Figure 9.6 shows the rectangle produced by filling in the holes in the pattern in Figure 9.3. Figure 9.6: Rectangle used to find Center of Pattern The system scans the image with the rectangle, and for every black pixel in the image, the function p(x, y) is set to zero. For every white pixel in the image, which is those that are part of the rectangle, p(x, y) is set to one. Then, the system calculates the area of the rectangle using the equation: L W A = (9.5) p(x, y) . x=O y=O The first moments about the x and y-axes are calculated using the equations: L W x -p(x,y) MX = and (9.6) x=O y=O L W MY = IIy -pAx, y) (9.7) x=0 y=O where M, is the first moment about the x-axis and My is the first moment about the yaxis. 70 The center of area (X-,y) of the rectangle is calculated using the equations: X=-Mx A A and (9.8) Y- A' (9.9) where - is the position of the center of area of the rectangle along the x-axis, and Y is the position of the center of area of the rectangle along the y-axis. The coordinate (X, 5), which is both the center of area of the rectangle and the center of the pattern, is used by the system to describe the position of the pattern within the image. 9.7 Calculate Angle After calculating the center of the pattern, the system attempts to determine the orientation of the pattern with respect to the image, using a method derived and described in [43]. The angle 0, used to represent orientation, is measured counterclockwise from the horizontal axis of the image to the axis of least second moment of the pattern. This Y u-axis 0 x Figure 9.7: u-axis describing Orientation of Pattern 71 axis, u, is a line parallel to the longer side of the pattern that passes through the center of the pattern. Figure 9.7 shows a line representing the u-axis of the pattern in Figure 9.3. The system calculates the second moments of the pattern, M,, MX, and M, about the u-axis using the equations: MXX = II(x - V)-p(x, y) (9.10) x=O y=O (x -5E) -(y- y) -p(x, y), and MX, = j (9.11) x=O y=O - Y-)2 -P(x, y). M,, = II(y (9.12) x=O y=O Once the second moments have been calculated, the system calculates the angle 0 using the equation: 1 __ 2 MXX - MY 0 = - tan-I _ _ _ 1 0 = - sin 2 MX 2 VMX I unless M = 0 and M, = M,, or (9.13) if M,, = 0 and M, = M,, (9.14) + (MX - MYYT At this point, the system has calculated the angle at which the u-axis is oriented from the horizontal axis. However, the system does not know whether the pattern is oriented in the positive or negative direction of the u-axis. The method for determining whether the pattern is oriented along the positive u-axis or the negative u-axis is explained in the next section. 72 9.8 Read Pattern Once it has determined the center of the pattern and the angle that the u-axis makes with the horizontal axis, the system is able to read the pattern. The first step is to determine the length of the pattern. Starting at the center of the pattern, the system checks each pixel that lies on the u-axis. In each direction, it identifies the white pixel that is furthest away from the center of the pattern. The coordinate of the white pixel that is the furthest away along the positive u-axis is labeled (x,, ,,y,,) . The coordinate of the white pixel that is the furthest away along the negative u-axis is labeled (xneg Yneg) The system then measures the horizontal and vertical distances, Ax and Ay respectively, between these two pixels using the equations: Ax=x, - Xneg and (9.15) Ay = - Yneg (9.16) yo, Using the coordinate of the center of the pattern, and the values of Ax and Ay , the system is able to calculate the location of the center of each of the five squares in the pattern. These locations are displayed in Table 9.1. Notice that the five squares are not labeled as bits zero through four. This is because it is not yet known whether the pattern is oriented along the positive or negative u-axis. The first square may correspond to bit zero or bit four, depending on the orientation of the pattern. Therefore, the locations of the squares, not the bits, are given in the table. 73 Square 1 2 3 4 Location of Center in Horizontal Direction - 2-Ax 5 _Ax 5 AX 5 _ 2-Ax 5 x+ Location of Center in Vertical Direction 2-Ay 5 - Ay 5 y Ay 5 _ 2-Ay y+5 __ _5 Table 9.1: Locations of Centers of Squares in Pattern At each of the five locations, the system checks to see if the pixel is white or black. It also checks to see if all of the pixel's neighbors are the same color as the pixel. For each square, if the pixel and all of its neighbors are white, then the value of the square is one. If the pixel and all of its neighbors are black, then the value of the square is zero. If the pixel is not the same color as all of its neighbors, then there is an error either in the pattern or in the location of the center of the square. Regardless of the reason, if the pixel is not the same as its neighbors, then the value of the square is negative one. After each of the squares has been given a value, the system performs the first test of the validity of the pattern. If any of the squares has a value of negative one, then the pattern is invalid. The second validity test makes sure that the pattern meets the requirements of a valid pattern according to the standards described in Chapter 7. Bit zero must be equal to one and bit four must be equal to zero. Therefore, if the first and 74 fifth squares both have a value of zero, or both have a value of one, then the pattern is invalid. If the pattern passes these first two validity tests, then the system continues to gather information from the pattern. The next step is to determine how the pattern is oriented along the u-axis, and to determine the value of the offset angle 3, which can be either zero or 180 degrees. If the pattern can be read from the bit zero to the bit four in the positive direction of the u-axis, then 3 is equal to zero. In Figure 9.3, for which the u-axis of the pattern is shown in Figure 9.7, there is no offset. In this case, the first square corresponds to bit zero and the fifth square corresponds to bit four. If the pattern is read backwards in the positive direction of the u-axis, then 3 is equal to 180 degrees. This offset must be included in the orientation of the pattern. In Figure 9.8, for which the u-axis of the pattern is also shown in Figure 9.7, there is a 180-degree offset. In this case, the first square corresponds to bit four and the fifth square corresponds to bit zero. Figure 9.8: Pattern with 180-degree Offset Once the pattern has been used to determine its own orientation, the system uses it to determine the number of the side that is facing the camera. First, the bits are assigned 75 the values of the appropriate squares. If there is no offset between the pattern and the uaxis, then bits one, two, and three are assigned the values of the second, third, and fourth squares, respectively. If there is a 180-degree offset, then bits one, two, and three are assigned the values of the fourth, third, and second squares, respectively. Once the bits have been assigned their correct values, the system calculates the side number using equation (7.1). If the side number is less than one or greater than six, then the pattern is invalid. If the side number is within the bounds, then the pattern passes the final validity test. At this point, the image processing is complete and all of the data is gathered from the image. 76 Chapter 10: Implementation 10.1 Introduction I implemented a pose recognition system that incorporates all of the components discussed in the previous chapters. The objects and environment involved with the system had all of the characteristics discussed in Chapter 5. The environment was a black flat surface, and the objects were black-and-white boxes. Each side of the objects was labeled with a pattern according to the standards that I developed and described in Section 7.2. The radio frequency system determined the presence and identity of an object as discussed in Chapter 6. The vision system gathered the appropriate information mentioned in Chapter 7 using the image processing algorithms described in Chapter 9. This chapter discusses my implementation of the system, and how all of the information was combined to determine the pose of an object. 10.2 Hardware To implement the pose recognition system, I needed hardware to perform the necessary functions. The radio frequency identification system was composed of a reader, an antenna, and tags, all of which were manufactured by Intermec [44,45]. The antenna's dimensions were 9 inches by 9 inches, and the read range extended above the antenna up to 7 feet. The tags were passive, and when powered by the reader's signal responded with a 96-bit EPC. The vision system used a Veo Stingray webcam [46], which produced color images of various sizes. The system used the camera to capture color images composed of 320 columns and 240 rows of pixels. A standard personal 77 computer running Windows 2000 controlled all of the hardware using the software that will be discussed in the following sections. Figure 10.1 shows a schematic of the pose recognition system. Both the reader and the camera were connected to the host computer that ran all of the software. The camera was mounted on a shelf so that it was facing downward, and the reader was located to the side of the system. The reader's antenna was located below the camera so that its reader field was within the field of view of the camera. Because the antenna's surface was not flat, I constructed a small table that covered the antenna. The top of the table was covered with a piece of black paper and was used as the surface of the environment. Camera Reader Host Computer Antenna 0 0 Table Figure 10.1: Pose Recognition System Implementation Setup 78 10.3 Overview of Software The software was divided into the following parts: " Query Reader - The system queries the reader to see if there are any tags within its read range. If there are tags present, it returns the EPCs of the tags. " Access Physical Information - The system uses the EPC of the tag to determine the location of the object's physical information. It then parses the object's PML file and determines the dimensions of the object. " Image Capture - The system captures an image of the environment and the object that is present. * Image Processing - The system processes the image in order to gather important information that will be used to determine the pose of the object. " Combine Information - All of the information that has been gathered by the system is combined to determine the pose of the object with respect to the environment. The software for the pose recognition system was written in Java. The following sections will describe each of the parts of the software, including descriptions of the algorithms implemented and the Java Application Programming Interfaces (APIs) used. 10.4 Query Reader The system first queries the reader to see how many objects are present. Reader communication software was developed by OATSystems, Inc. [47]. When queried, the reader returns a list of the EPCs of the tags within its reader field. If the list is empty or there is more than one EPC, the system waits for a period of time, and then queries the 79 reader again. It will continue to do this until the reader responds with a list that contains only one EPC. When this occurs, the system acknowledges the identifying number in the list as the EPC of an object that is residing within the environment of the system. The system will only attempt to determine the pose of an object if there is only one object within its environment. There are a number of reasons for this restriction. First, the EPC is the only piece of information that the system uses to uniquely identify objects. The system is capable of reading multiple patterns in the same image. However, if the system were to attempt to determine the pose of two objects within the environment, it would not be able to match each of the patterns with the appropriate EPC. Therefore, the system would not be able to determine all of the information necessary for pose recognition. Second, the system avoids variables that it cannot take into account. For example, if two objects were stacked on top of each other within the environment, the pose of the object on top would not be accurately determined. While the system can determine some information about its position and all of the information about its orientation, it would not be able to determine the vertical position of the object. The system assumes that the object is lying on the surface of the environment. If this is not the case, the system cannot completely determine the pose of the object. Methods of dealing with multiple objects will be discussed in Chapter 11. 10.5 Access Physical Information Once the system has determined an object's presence and identity, it retrieves its geometry information. The system uses the EPC to link the object to its PML file that contains the length, width, and height values. 80 The Auto-ID Center uses the ONS to find the location of the PML file. However, since I am using a very small sample of objects for my system, I used a much simpler method of locating the information. Instead of using ONS, the filename of each PML files is the same as the EPC associated with that PML file. The system adds a suffix and a prefix to the EPC in order to generate the Uniform Resource Locator (URL) that points to the appropriate PML file. Figure 10.2 shows example of using a suffix and prefix, along with an EPC, to generate a URL. For a PML file located at: C:/Pose Recognition/Object Descriptions/ 354A57182C246FFFFFFFFFFFF.xml EPC = "354A57182C246FFFFFFFFFFFF"; Prefix = "file://C://Pose Recognition//Object Descriptions//"; Suffix = ".xmI"; URL = Prefix + EPC + Suffix; Figure 10.2: Generating a URL that points to a PML file Once the system has the URL of the PML file, it parses the file using both the Simple API for XML (SAX) [48] and the Xerces Apache parser [49]. Figure 10.3 shows the Document Type Definition (DTD) used to constrain the form of all of the PML files. Figure 10.4 shows an example of a valid PML file that meets the constraints of the DTD. As the file is being parsed, the system looks for particular element names and attributes in order to extract the desired information. If the element name is VAL, the attribute name is LABEL, and the attribute value is LENGTH, then the length of the object is set equal 81 to the value of that element. A similar search is done to determine the values of the width and height of the object. When the parsing is done, the system has values for each of the dimensions of the object. <!DOCTYPE PML [ <!ELEMENT PML (EPC+,VAL?)> <!ELEMENT EPC (#PCDATA)> <!ELEMENT VAL (#PCDATA)> <!ATTLIST VAL LABEL CDATA #IMPLIED M CDATA #IMPLIED ACC CDATA #IMPLIED> ]> Figure 10.3: PML DTD <PML> <EPC>354A57182C26FFFFFFFFFFFF</EPC> <VAL LABEL = "LENGTH" M = "1">1O</VAL> <VAL LABEL = "WIDTH" M = "1 ">5</VAL> <VAL LABEL = "HEIGHT"M = "1 ">2</VAL> </PML> Figure 10.4: Example PML file 82 10.6 Image Capture The camera was connected to the host computer via the USB port. I used the Java Media Framework (JMF) to connect to the camera and control it in order to capture an image of the object and the environment [50]. The JMF was the only Java API available for connecting to devices through the USB port. Fortunately, it is tailored for use with video and audio devices, but was not intended for capturing still images. In order to use the camera, the system first has to set the format of the device from which it will capture data. The JMF provides an application that was used to determine the name and format of the camera that was used. The system then determines the location of the media associated with that device. Once the system has completed this, it attempts to connect to the camera. If successful, the system captures video. While capturing video, the system isolates a single frame from the video, which it uses as a still image of the environment. This image is then processed to gather the majority of the information used to determine the pose of the object. 10.7 Image Processing The system uses the image processing techniques described in Chapter 9 to abstract information from the image. After completing the image processing, the system is able to determine the position (ii, y) and orientation (0+3) of the pattern within the image as defined in Chapter 9, and the number of the side facing the camera. The system is then ready to combine all of the information that it has gathered in order to determine the pose of the object. 83 10.8 Combine Information Up to this point, the system has gathered a large amount of information about the object including: . Object Dimensions . Position of Pattern in Image . Orientation of Pattern within Image . Side facing the Camera The system combines the preceding data to determine pose in the following manner. The first step in combining all of the information is to determine the position of the object along the environment's Z-axis. Given the number of the side that is facing the camera, the system is able to determine the dimensions of that side. The remaining dimension is the vertical distance h from the surface of the environment to the side that is facing the camera. The vertical position pz of the object is half of this distance, because the object's coordinate frame is located at its geometric center. At this point, one of the degrees of freedom in the pose of the object has been determined. There are five remaining degrees of freedom. The next step is to determine the position of the object along the environment's X- and Y-axes. These two coordinates are both determined based on the position of the pattern within the image. It is also necessary to know how much of the environment is within the camera's field of view, and the position of the field of view within the environment. Figure 10.5 shows a top-view of the environment, along with the size ('Joy' Wf 0V) of the field of view and its position (xfo, 'Yf0v) with respect to the environment. If the vertical position Pz of the object was equal to zero, meaning that the 84 pattern was on the surface of the environment, then px and p, could be calculated using the equations: PX =XfO +lfov - and (10.1) py=yv+ -, (10.2) wf L W where L and W are the number of columns and rows, respectively, in the image. 'Joy (xf0 ,, ) x Figure 10.5: Top View of Environment and Camera's Field of View However, the varying height of the pattern affects the calculation of px and py. Figure 10.6 shows a side view of the environment and the camera's field of view. There are a number of values that are labeled in the figure that are used in the calculation of py. 85 40 Center of patter h h taqn(y) i 2 h al a2/ , -jov yp/ Ya Figure 10.6: Side View of Environment and Camera's Field of View In addition to h , the height of the pattern above the surface of the environment, the values labeled include: Sal: the angle of elevation from the front boundary of the field of view to the camera lens. . a 2 : the angle of elevation from the back boundary of the field of view to the camera lens. . Wfov: the width of the field of view at the surface of the environment as labeled in Figure 10.5. . wh : the width of the field of view at the height of the pattern. 86 . y, : the perceived position, along the Y-axis, of the object, if no correction is made due to the height of the pattern. ya : the actual position, along the Y-axis, of the object, after the correction for the * height of the pattern has been made. Since the camera is not positioned directly above the midpoint of the width of the field of view, the angles al and a2 are different. Given all of these values, the position py, along the Y-axis, of the object can be calculated using the equations: Wh = Wfo, h ya = tan(a) h tan(a,) + Wh h tan(a 2 ) W and Py = YjO, +ya - (10.3) (10.4) (10.5) Figure 10.7 shows a front view of the environment and the camera's field of view. There are values in this figure that are used in the calculation of px. In addition to h, these include: 8,: the angle of elevation from the left side boundary of the field of view to the . camera lens. 62 : the angle of elevation from the right side boundary of the field of view to the . camera lens. ,fov: the length of the field of view at the surface of the environment as labeled in * Figure 10.5. . 1 h : the length of the field of view at the height of the pattern. 87 . xP: the perceived position, along the X-axis, of the object, if no correction is made due to the height of the pattern. . Xa: the actual position, along the X-axis, of the object, after the correction for the height of the pattern has been made. Since the camera is not positioned directly above the midpoint of the length of the field of view, the angles pi and 2 are different. 9 Center of pattern h h taqn( R_ ta-n(p) 2 I h 'fov'* xp Xa Figure 10.7: Front View of Environment and Camera's Field of View The position px along the X-axis can be calculated using the equations: h x " fov 'h'fov = h tan(/J) h +1 tan(fl 1 ) hL 88 h tan(fl2 )' (10.6) and (10.7) Px =x (10.8) +Xa. At this point, two more of the degrees of freedom in the pose of the object have been determined. The system has established the position vector P, as defined in Equation (2.1), that measures the position of the object with respect to the origin of the coordinate frame of the environment. There are three remaining degrees of freedom. The system must now establish the rotation matrix R that represents the orientation of the object. The next step is to calculate each of the elements of the rotation matrix using the angle 9, the offset 3, and the number of the side that it facing the camera. Using these three values, the system is able to calculate all nine elements of R using Table 10.1. (Note that in the table, s( +3) is equal to sin(6 + 3) and c(O +3) is equal to cos( + (5).) Side Number 0 Cu 0 0 U, r11 r12 r13 r21 r22 r23 r31 r32 r33 1 2 0 0 1 s(9+3) -c(9+5) 0 c(9+3) s(9+3) 0 0 0 -1 -s(9+3) c(9+3) 0 c(9+3) s(9+3) 0 3 -s(9+5) c(6+3) 0 0 0 1 c(O+3) s(9+3) 0 4 s(9+3) -c(+3) 0 0 0 -1 c(9+3) s(6+3) 0 5 s(9+3) -c(6+5) 0 c(9+3) s(0+9) 0 0 0 1 Table 10.1: Calculation of the Elements of the Rotation Matrix 89 6 -s(+3) c(+5) 0 c(9+3) s(9+3) 0 0 0 -1 Now, the system has determined the last three degrees of freedom. The side number is used to limit two of the degrees of freedom. After that, 0 and S are used to limit the last degree of freedom. The system has completely determined the pose of the object, which can be represented by the position vector P and the rotation matrix R . 90 Chapter 11: Analysis 11.1 Introduction When measuring the performance of my pose recognition system, I considered accuracy, repeatability, and scalability. For the accuracy, I measured the difference between the actual position and orientation of an object and the values calculated by the system. For the repeatability, I compared outputs over multiple runs. For the scalability, I considered how certain aspects of the system could be changed to accommodate its use in a real-world application. 11.2 Accuracy When the system determines values for px, p,, and 0, there is some error introduced in the measurements. The system is able to determine the positions along the X and Y-axes of the environment within about 0.2 inches of the actual positions, and can determine the object's orientation about the Z-axis of the environment within about 1 degree of the actual orientation. The inaccuracy of the system is primarily due to the mounting of the camera, the equations used to combine sensor information to determine pose, and the calibration of the system. In my implementation, the camera was not mounted so that it was centered above the field of view. As is shown in Figure 10.6 and 10.7, the camera is mounted so that it views the environment at an angle. By mounting the camera in this manner, the field of view resembles a trapezoid instead of the rectangular shape that was assumed and used to develop the equations for combining information. Therefore, there is some error 91 introduced into the equations because of the incorrect assumption of the shape of the field of view. This could be corrected by introducing image processing techniques to correct the image, or by changing the equations to better represent the appropriate shape of the field of view. A simpler correction would be to mount the camera so that it is centered above and perpendicular to the surface of the environment, resulting in a rectangular field of view. Another source of error was the imprecise calibration of the system. It was difficult to determine correct values for the size (1fov, angles of elevation (a1 , a 2 , ,1,62) Wf,) , position (xf 0y), V, and of the field of view. The equations containing these values added to the inaccuracy of the information computed by the system. 11.3 Repeatability The ability to operate consistently is another important aspect of the system. Every time that the system determines the pose of an object, it should perform in the same manner. The system should be able to give similar results when repeatedly determining the pose of an object that remains in the same location. The only source of inconsistency in the operation of the pose recognition system is the image capture component. The webcam, which is used as the image capture device, has an auto white balance feature that may be desirable for its intended use, but can cause problems when the camera is used with a vision system. If the camera takes two pictures of the environment, with the same object present and the same lighting conditions, the two images may be different. The object will appear in the same location, but one of the images may be lighter or darker than the other. For the most part, the variations in the images do not affect the performance of the system. Occasionally, the auto white balance 92 can create an image from which it is difficult to extract data. The use of a camera that is intended for machine vision systems would improve the repeatability of the pose recognition system. The rest of the components of the system perform their functions consistently. The RFID system provides the EPC of the tag that is located within the reader field. Given an EPC, the system is able to access the appropriate file and determine the object's dimensions. While the image processing and information combination algorithms are reliable, their output is a result of the inputted image from the webcam. Therefore, the result of the variations in the image is seen in the output of these algorithms, but these portions of the system operate consistently. 11.4 Scalability There are a number of aspects of the current implementation, including size and color constraints, which would make implementation in a warehouse impossible. The ability to make changes to the system, without wholly changing the design, is a measure of the scalability of the system. The size of the environment is limited by the range of the RFID reader and the field of view of the camera. For my implementation, the dimensions of the environment were approximately 12 inches by 16 inches, and the field of view of the camera was slightly smaller than the environment. Because of the limitations on the size of the environment, the size of the objects also was limited. For each of the objects tested with the system, the longest dimension was less than 6 inches. It would be difficult to consistently and accurately determine the pose of any objects larger than the ones tested. 93 Because of this size constraint on both the environment and the objects, the system could not realistically be implemented in any current warehouses. The size of the environment and the objects could be increased drastically by using a reader with a larger read range and mounting the camera so that the field of view is larger. As long as the image capture device's field of view can completely encompass the field of view of a single reader, one camera could be used in conjunction with multiple readers to greatly increase the range of the system. The camera could be mounted on a moving platform so that it could move horizontally above all of the read ranges. When an object entered one of the read ranges, the platform could move to position the camera directly above the center of that read range. Therefore, the size of the implementation is scalable to meet the needs of realworld applications. The color constraint is another limitation of the system, and for a real-world implementation, the system would need to operate under reasonable color constraints. By using better pattern extraction algorithms and a better camera, the system could be made to find the pattern on the side of a box that includes many colors. In the current implementation, the pattern extraction algorithms are rather weak. The pattern is not very complicated, so it is difficult to test the different connections to see if they are a valid pattern. If the pattern were more complicated, it would be easier to isolate the pattern. However, in order to capture the details of the pattern, it would be necessary to have a camera that has a higher resolution. However, by increasing the resolution of the camera, the number of pixels in the image is increased. Using algorithms similar to the ones in the current implementation, it could take much longer for the system to operate. The algorithm that separates all of the white pixels into collections is the most time 94 consuming, with a running time of O(n 2 ). It might be necessary to implement a faster algorithm to perform this function. Another way to determine the pose of color objects would be to print the patterns in fluorescent ink. The vision system would then make use of ultraviolet light in order to view the pattern. Additionally, the location of the pattern might be moved from the center of the side to one of the corners of the side, so that the pattern would not get in the way of the writing on the packaging. The color constraints on the objects can be removed by embedding more detail in the pattern, using a stronger image capture device, and implementing faster image processing algorithms. For a real-world implementation, the system would need to be able to determine the pose of multiple objects at the same time. To do this, the system would need to be able to match an EPC with a pattern so that it could perform the appropriate calculations. This could be accomplished by introducing an edge-detection algorithm that can determine two of the dimensions of each object within the range of the system. By comparing these dimensions with the information in each PML file, the system would be able to match each pattern with its corresponding EPC. The scalability of the system is the most important measurement of the pose recognition system. By making the changes mentioned above, a system incorporating the overall design of my pose recognition system should be capable of performing the desired task. 95 96 Chapter 12: Conclusion The primary application of the Auto-ID Center's infrastructure is in the supply chain. By using radio frequency tags to uniquely identify objects, warehouse inventory control becomes much easier. However, in order to make further use of the technology, additional information must be supplied. For example, in order to automate the movement or manipulation of objects in a warehouse, an automation system requires information about the position and orientation of the objects. The pose recognition system that I have designed will be capable of completing this task. As mentioned in Section 11.4, the current implementation of the system would be incapable of performing such a task. However, by making the changes in the hardware and software discussed in the previous chapter, a system could be created that would possess the necessary functionality. While incorporating different components, the system would be based on the same design as the one developed and described in this thesis. The design is based on combining the Auto-ID infrastructure with a vision system. The system determines the pose of the object by gathering information in order to gradually limit the possible location of the object. Before encountering the system, the location of the object is completely unknown. Once it passes into the range of the RFID system, the location of the object is limited to the space contained within the read range of the RF reader. The design of the pose recognition system requires that the field of view of the machine vision system completely encompass the range of the reader. Therefore, every object that is read by the RFID system can also be viewed by the 97 machine vision system. The machine vision system then gathers information that is used to narrow the location of the object even further. The system can accurately determine the pose of the object with respect to its environment. Once such information is known, it can be used by an automation system to manipulate the object. Further research on this topic would show the usefulness of such a system in warehouse automation. The first step would be to develop a full-scale implementation of the pose recognition system by incorporating some of the changes mentioned in Section 11.4. This would show the scalability of the design and the ability of the system to operate under real-world conditions. The next step would be to incorporate an automation system into the implementation. The pose recognition system would be used to provide pose information about the objects, which would subsequently be manipulated by the automation system. This would show the ability of the system to provide the functionality for which it was designed. 98 References [1] Auto-ID Center - Home, "Vision", <http://www.autoidcenter.org/homevision.asp>. [2] James H. Williams, Jr., Fundamentalsof applied dynamics, John Wiley & Sons, Inc., New York, NY, 1996. [3] John J. Craig, Introduction to robotics, Addison-Wesley Publishing Company, Reading, MA, 1989. [4] Kenneth J. Waldron and Gary L. Kinzel, Kinematics, Dynamics, andDesign of Machinery, John Wiley & Sons, Inc., New York, NY, 1999. [5] Intermec Technologies Corporation, "RFID overview: introduction to radio frequency identification", Amtech Systems Corporation, 1999, <http://epsfiles.intermec.com/epsfiles/epswp/radiofrequencywp.pdf>. [6] Frontline Solutions Website: RFID Online Source Book, "Understanding radio frequency identification (RFID) - FAQ's, applications, glossary", Advanstar Communications Inc., 2000, <http://www.frontlinemagazine.com/rfidonline/wp/101 7.htm>. [7] The Association of the Automatic Identification and Data Capture Industry, "Radio frequency identification (RFID): a basic primer", AIM Inc., 2001, <http://www.aimglobal.org/technologies/rfid/resources/RFIDPrimer.pdf>. [8] Anthony Sabetti, Texas Instruments, "Applications of radio frequency identification (RFID)", AIM Inc., <http://www.aimglobal.org/technologies/rfid/resources/papers/applicationsofrfid.htm>. [9] Intermec Technologies Corporation, "RFID overview: introduction to radio frequency identification", Amtech Systems Corporation, 1999, p. 4, <http://epsfiles.intermec.com/epsfiles/epswp/radiofrequency_ wp.pdf>. [10] Daniel W. Engels, Tom A. Scharfeld, and Sanjay E. Sarma, "Review of RFID Technolgies", MIT Auto-ID Center, 2001. [11] Clark Richter, "RFID: an educational primer", Intermec Technologies Corporation, 1999, <http://epsfiles.intermec.com/eps _files/epswp/rfidwp.pdf>. [12] Susy d'Hont, "The Cutting Edge of RFID Technology and Applications for Manufacturing and Distribution", Texas Instruments TIRIS, <http://www.rfidusa.com/pdf/manuf dist.pdf>. 99 [13] Klaus Finkenzeller, RFID Handbook: Radio-frequency identificationfundamentals and applications,John Wiley & Son, Ltd, New York, NY, 1999. [14] Ching Law, Kayi Lee, and Professor Kai-Yeung Siu, Efficient memoryless protocol for tag identification, MIT Auto-ID Center, 2000, <http://www.autoidcenter.org/research/MIT-AUTOID-TR-003.pdf>. [15] Destron Fearing, Electronic ID, <http://www.destron-fearing.com/elect/elect.html>. [16] Massachusetts Turnpike Authority, FAST LANE, "Overview", <http://www.mtafastlane.com/>. [17] Peter K. Allen, Robotic object recognition using vision and touch, Kluwer Academic Publishers, Boston, MA, 1987. [18] Michael C. Fairhurst, Computer visionfor robotic systems, Prentice Hall, New York, NY, 1988. [19] Robin R. Murphy, Introductionto AI robotics, The MIT Press, Cambridge, MA, 2000. [20] C.J. Page and H. Hassan, "The orienation of difficult components for automatic assembly", Robot Sensors, Volume 1 - Vision, IFS (Publications) Ltd, UK, 1986. [21] Martin Berger, Gernot Bachler, Stefan Scherer, and Axel Pinz, "A vision driven automatic assembly unit: pose determination from a single image", Institute for Computer Graphics and Vision, Graz University of Technology, Graz, Austria, 1999, <http://www.icg.tu-graz.ac.at/bachler99a/caip99_gb.pdf>. [22] Marcus A. Magnor, "Geometry-based automatic object localization and 3-d pose detection", Computer Graphics Lab, Stanford University, Stanford, CA, 2002, <http://www.mpi-sb.mpg.de/~magnor/publications/ssiai02.pdf>. [23] Dongming Zhao, "Object pose estimation for robotic control and material handling", Report Brief, Center for Engineering Education and Practice, University of Michigan-Dearborn, 2000, <http://www.engin.umd.umich.edu/ceep/techday/2000/reports/ECEreport6/ECEreport6. htm>. [24] S. R. Ruocco, Robot sensors and transducers,Halsted Press, New York, NY, 1987. [25] N. Sato, "A method for three-dimensional part identification by tactile transducer", Robot Sensors, Volume 2 - Tactile & Non- Vision, IFS (Publications) Ltd, UK, 1986. [26] Philippe Coiffet, Robot technology, volume 2: Interaction with the environment, Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983. 100 [27] Ren-Chyuan Luo, Fuling Wang, and You-xing Liu, "An imaging tactile sensor with magnetostrictive transduction", Robot Sensors, Volume 2 - Tactile & Non- Vision, IFS (Publications) Ltd, UK, 1986. [28] Trimble, All About GPS, "How GPS Works", Trimble Navigation Limited, 2002, <http://www.trimble.com/gps/how.html>. [29] B. Hofimann-Wellenhof, H. Lichtenegger, and J. Collins, GPS: theory andpractice, Springer-Verlag, New York, NY, 2001. [30] Garmin, About GPS, "What is GPS?", Garmin Ltd., 2002, <http://www.garmin.com/aboutGPS/>. [31] Trimble, All About GPS, "Differential GPS", Trimble Navigation Limited, 2002, <http://www.trimble.com/gps/dgps.html>. [32] Starlink Incorporated, DGPS Info, "DGPS Explained", Starlink Incorporated, 1999, <http://www.starlinkdgps.com/dgpsexp.htm>. [33] AIM, "Real Time Locating Systems (RTLS)", AIM Inc., 2000, <http://www.aimglobal.org/technologies/rtls/default.htm>. [34] AIM, Real Time Locating Systems (RTLS), "Frequently asked questions", AIM Inc., 2000, <http://www.aimglobal.org/technologies/rtls/rtlsfaqs.htm>. [35] Jim Geier and Roberta Bell, "RTLS: An eye on the future", Supply Chain Systems Magazine, Peterborough, NH, 2001, <http://www.idsystems.com/reader/2001/2001_03/rtlsO3O1/>. [36] David L. Brock, The electronicproduct code (EPC): a naming scheme for physical objects, MIT Auto-ID Center, 2001, <http://www.autoidcenter.org/research/MITAUTOID-WH-002.pdf>. [37] David L. Brock, The physical markup language,MIT Auto-ID Center, 2001, <http://www.autoidcenter.org/research/MIT-AUTOID-WH-003.pdf>. [38] David L. Brock, Timothy P. Milne, Yun Y. Kang, and Brendon Lewis, The physical markup language, core components: time andplace,MIT Auto-ID Center, 2001, <http://www.autoidcenter.org/research/MIT-AUTOID-WH-005.pdf>. [39] Joseph Timothy Foley, "An infrastructure for electromechanical appliances on the internet", M.Eng. Thesis, MIT, Cambridge, MA, 1999. [40] Hany Farid and Alin C. Popescu, "Blind Removal of Lens Distortion", Journal of the Optical Society of America, 2001, <http://www.cs.dartmouth.edu/-farid/publications/josa01.pdf>. 101 [41] Paul F. Whelan and Derek Molloy, Machine Vision Algorithms in Java, SpringerVerlag, London, UK, 2001. [42] Robert Fisher, Simon Perkins, Ashley Walker, and Erik Wolfart, Hypermedia Image ProcessingReference, "Intensity Histogram", 2000, <http://www.dai.ed.ac.uk/HIPR2/histgram.htm>. [43] Berthold Klaus Paul Horn, Robot Vision, The MIT Press, Cambridge, MA, 1986. [44] Intermec - Products, "915 MHz Tag for RPC", Intermec Technologies Corporation, 2002, <http://home.intermec.com/eprise/main/Internec/Content/Products/ProductsShowDetail ?section=Products&Product=RFID1_03&Category-RFID&Family-RFID1>. [45] Intermec - Products, "UHF OEM Reader", Intermec Technologies Corporation, 2002, <http://home.intermec.com/eprise/main/Intermec/Content/Products/ProductsShowDetail ?section=Products&Product=RFID2_02&Category=RFID&Family-RFID2>. [46] Veo, "Products: Stingray", Xirlink Inc., 2002, <http://www.veoproducts.com/Stingray/stingray.asp>. [47] Oatsystems Inc., <http://www.oatsystems.com/>. [48] David Megginson, "About SAX", <http://www.saxproject.org/>. [49] The Apache XML Project, "Xerces Java Parser Readme", The Apache Software Foundation, 2000, <http://xml.apache.org/xerces-j/index.html>. [50] Java, "Java Media Framework API", Sun Microsystems Inc., 2002, <http://java.sun.com/products/j ava-media/jmf/index.html>. 102