ACE Project Report October 31, 2008 Reid Simmons, Sanjiv Singh, Bradley Hamner, Seth Koterba Robotics Institute Carnegie Mellon University 1. Introduction This report covers the period from September 1, 2008 through October 31, 2008. During this period, we began looking at the problem of tracking a moving target, in preparation for next period’s work on doing insertions into a moving target. The foci this period were (1) developing a mechanism for moving the task board, (2) speeding up the vision system, (3) developing a method to track the task board using the laser, and (4) developing algorithms to track the moving task board, using either vision or laser, and doing experiments to demonstrate how well the tracking works. Based on the results of this period, we are in good shape to complete our deliverable for next period – demonstrating plug insertion into a task board moving at (roughly) constant speed. 2. Moving Task Board To simulate a moving assembly line, we bolted the task board onto the back of Clifford, an ATRV-Jr. robot on loan to us from NASA (Figure 1). The robot’s wheels are actuated and the task board trails behind on caster wheels. Control programs had been written for Clifford, but none were directly compatible with our software. To bridge the gap, we wrote a program that listens for velocity commands over IPC and relays them to the robot hardware. This allows us to start and stop the “assembly line” robot as well as the docking robot from one computer. With a series of velocity commands to Clifford, we can simulate any time-velocity profile of our choosing. In our work so far, though, we have commanded Clifford to drive at a constant velocity of 3 cm/s. The velocity does vary, however, due to imperfections in the floor, the casters on the task board, and the connection between the task board and Clifford. In one test, for instance, we observed that the task board took 97 seconds to move three meters, for an average of 3.09 cm/s. We believe that this variation is actually a good thing, for our purposes, since it will force the mobile manipulator to rely on its sensors for estimating velocity, in realtime, rather than depending on some predefined velocity profile. 3. Speeding up Vision System In order to accurately and responsively track the moving task board, we anticipated a need to speed up our current fiducial pose estimation algorithm. The typical frequency throughout much of the last year of development was approximately 3-5Hz. Our goal was to increase this to at least 10Hz. The primary bottleneck in the pose estimation process is the fiducial detection step. Currently we use ARTag, a third party software package, which is a closed source package. Because it is closed source, there is no easy way to rewrite and optimize the code; therefore we chose to improve speed by passing it smaller images to process. Prior to our speed up attempts, ARTag was processing two full 1024 x 768 pixel images during each iteration (left and right images). However, each image contained large portions of uninteresting data that did not contribute to the solution. The goal of the speed up was to chop Figure 1. Clifford, an ATRV-Jr. robot, pulls the task board. out the uninteresting portions of the image and pass less data into the ARTag detection function. To do this, we need some estimate of where the fiducials are in the image. Therefore, the first iteration performs a normal full-image search during. At the end of this iteration, the four corners of each found fiducial are stored for each image. On subsequent iterations, we use the corner data from the previous iteration as a hint for where to find the fiducials in the current image pair. Due to another limitation of the ARTag library, we were not able to process a dynamically changing window size. Therefore, we chose to initialize four separate instances of the ARTag library: one instance accepts images at the full 1024 X 768 size; the other three instances accept images that are integer divisions of the full image size – 1/3, 1/5, and 1/7 in each dimension. Thus, the number of pixels to process are 1/9, 1/25, and 1/49 of those for the full image (Figure 2-A). These sizes were chosen through empirically to capture the approximate size of fiducials at various stages of the current scenario. At 2.0m, a typical 10cm fiducial occupies less 1/50 of the image, thus the 1/7 size sub-image can be used. At typical docking ranges, a fiducial will fill less than 1/9 of the image and the 1/3 size sub-image can be used. With the corner hints from the previous iteration, we can then compute the size of sub-image to use. We compute the width and height of the fiducial in image coordinates, choose the largest of the two and then multiple by 1.5 (empirically determined) to ensure enough buffer to account for motion between iterations. If the result is smaller than 1/7 of the full image, the smallest subimage is used. If not, we check progressively larger and larger sub-images. Ultimately, a subimage size will be selected and positioned within the full image. The positioning is performed by computing the centroid of the corner hints and centering the sub-image at the centroid point. This window is then stored as an image and passed along to the appropriate ARTag instance for processing. If the ARTag library does not detect a fiducial in the sub-image, it is possible that the fiducial is occluded, or that the fiducial has moved enough from the previous iteration that it no longer occurs inside the sub-image. To improve detection rates, we then search the next biggest subimage size, continuing up to the full image size if detection still does not occur. This process is repeated for all fiducials that were detected in the previous iteration, whether it is just one fiducial or many (Figure 2-B). Since many sub-windows will be processed when many fiducials are observed, it is necessary to check that a fiducial is not detected more than once. Given the additional buffer size that is added when determining the sub-image size, it is conceivable that a fiducial might fall within two different sub-images (note the example on the right in Figure 2-B). We handle this simply by keeping track of which fiducials have already been found. Finally, since we are processing only sub-images centered on where we saw a fiducial in the previous image, it is unlikely that new fiducials that come into view will be detected. To enable discovery of fiducials, we perform a full image search every so often. Currently, we do a full search once every 40 iterations, which means that a full image search is performed approximately every 2 seconds. This approach resulted in a significant speed up over the original, full image search. Currently we can achieve fiducial detection at up to 20Hz, which is the frame rate of the Bumblebee cameras that we are using. 1/7 1/5 1/3 Full 1024 X 768 A. B. Figure 2. A. Selected sub-image sizes; B. Fiducials are shown in grey and a sub-image is shown centered on each fiducial. 4. Laser Tracking In addition to tracking the task board visually, we began developing an algorithm that uses the laser to track the task board. In order to detect the task board reliably, we added pieces of sheet metal to create a “laser fence” at the height of the laser (Figure 3). The fence is approximately linear, with a few salient corners. A model consisting of points spaced every 5mm was created to replicate this shape. Tracking with the laser requires a way to convert raw laser hits into a pose representing the transformation from the laser to the task board. The raw laser data is initially collected as a series of range measurements. Using the known minimum and maximum angles of the scan range and the angular offset between each individual range measurement, we first convert the laser data into Cartesian data. Initially, we decided to use the Iterative Closest Point algorithm (ICP), a technique used for matching two point clouds; often the matching is performed between observed data and a model [Besl & McKay, 1992]. ICP is a common algorithm and we found a lightweight implementation of it freely available online. Initial tests revealed that it worked well, but only when the offset between data and model was small, in particular, less than 5-10cm and less than 10 degrees. Additionally, if any laser hits that did not correspond to the data were included, the ICP match was poor. Thus, we concluded that we needed to perform some initial preprocessing of the data, prior to using ICP. The following text describes the stages of preprocessing that we are currently using to obtain clean data. Figure 3. Laser fence on the task board. The first preprocessing step is to mask out any laser returns that are in regions that we know the task board will not occupy. The laser has a 270 degree field of view and can detect hits out to 30 meters; since we are only interested in data within a few meters of the laser, and nothing directly to the sides or behind it, much of this space can be trimmed away. After the relevant data has been pared down, we then identify the points that correspond to the task board. This is done in a two step process and takes advantage of the linearity of the laser fence on the task board. First, we use a technique based on the well known RANSAC algorithm [Fischler & Bolles, 1981]. We search through all remaining points sequentially, starting with the point corresponding to the counterclockwise-most scan. Taking this point and a second point, generated from the range measurement N scans ahead in the clockwise direction, we generate a line (N was chosen to be 40). Next, we check the perpendicular distance to this line for all points. If this distance is within the prescribed threshold (currently 15cm), then we count that point as a hit on that line. By sequentially stepping through the laser data by increments of M scans (M = 5), we generate a series of lines and the one that receives the most hits is considered to be a line roughly aligned with the task board. This assumes that the task board is the longest linear feature in the data, an assumption that should always hold true after the data has been pared down, as described above. Figure 4, shows some example line fits. A . B . C . Figure 4. A) Raw laser data of the task board. B) and C) two examples of RANSAC style fitting. The line in C has more hits than the line in B, and so is more likely associated with the task board’s laser fence. Figure 5. Clustering result with additional clutter and people near the task board. Once the algorithm finds a rough line though the task board, it needs to cluster the points corresponding to the task board. We start with the two points that we used to generate the line that had the most hits, and find a point half way between them. Starting from here, we step sequentially in each direction from point to point, checking whether each subsequent point is within a distance threshold of the previous point (currently 10cm). It is possible that a bad reading might cause a single point to be incorrect, so we check at least two additional points past the first bad point to ensure that all task board points are found. After this stage of processing, the algorithm has found all points that correspond to the task board. In practice, this clustering is quite robust and works well even in hard conditions, such as when people stand near the task board (see Figure 5). Once the task board cluster is identified, and prior to running ICP, we perform a rough transformation to align the cluster with the model. Using all points in the cluster, we fit a line to the data. Using this line and the centroid of the data, we can translate and rotate the cluster for a rough fit. Finally, ICP is performed matching the roughly aligned cluster to the model. Figure 6, shows an example of the data and model alignment before and after ICP. Even though the data is defined to be planar in the x-y plane, ICP occasionally introduces a 180 degree roll about the x-axis. After being modified by other transforms in computing tracking motion commands, this flip can cause problems. Therefore, the current algorithm is equipped to check for this flip and correct it. In addition to this check, several other checks are put in place to reduce the possibility of sending a bad pose to the subsequent control blocks. For example, if the number of data points in the cluster is too small (less than 50) we do not return a pose. The assumption is that the clustering algorithm selected the wrong group of data. Second, we perform an error check by averaging the distance between the each point in the final cluster fit and the closest point in the model. If this average distance is greater than a threshold then we assume a poor fit and do not return a pose. The current threshold of 2cm was selected experimentally. Figure 6. Top) rough alignment before ICP Bottom) alignment following ICP. Although the laser modeling algorithm has proven to be successful when the data is well behaved, we have discovered at least two problems that can cause poor results. The first problem is one of alignment. The laser fence is approximately 10cm high. If the laser is further than 1-1.5 meters, the alignment of the laser, levelness of the floor, and weight distribution of the robot can easily cause the laser to miss hitting the fence and strike the base of the task board instead (Figure 7, top). Although we have see detections approaching 3m, we could not reliably repeat detections at this distance due to variability in alignment. A simple solution is to build a new fence that is taller, and this will be done during the next work period. Figure 7. Top) Example of laser alignment problem resulting in laser hits on the task board itself rather than on the laser fence. Bottom) Example of “bowed” laser data, due to noise and bias. The second problem is due to one of noise and bias in the laser data. At distances closer than about 1m, the laser range measurements are very noisy and the range values are biased towards the laser. This causes an observable “bowing” in the data (Figure 7, bottom). This effect causes the model fitting to be noisier and often it is enough to cause the final fit check to fail due to the average distance between cluster and model points being too large. Currently, we do not have a good candidate solution, although it may be possible to cover the fence with less reflective material, as the problem seems to be related to the intensity of the laser returns. 5. Following the Moving Task board To reuse as much as possible of our original software architecture, we planned on reusing our existing visual servo block to have the base follow the task board,. “Visual servo” is actually a misnomer: this block actually takes in pose estimates of the target object relative to the robot, regardless of the tracking source, and outputs pose commands to the robot, with the goal to move it to a waypoint relative to the target. The visual servo block was used in the pluginsertion demonstration to align the arm’s end-effector with the target hole. Here, we have a desired position for the base to maintain relative to the task board, with sensor data coming from either visual or laser data, and we use the visual servo block to command the base to achieve that desired waypoint. In previous demonstrations, the coordinated controller program converted pose commands to the end-effector into arm and base velocities. Here, only the base is used to follow the task board, so we needed a different method to track the goal point with the base. We decided to use the traditional pure pursuit algorithm [Coulter 1992] due to its simplicity and robustness. This algorithm controls the base’s angular velocity to attract to the goal position and heading: is the difference between the vehicle’s angle and the goal angle; disty is the perpendicular distance to the goal point; vtrans is the translational velocity being commanded; k and ky are proportional gain terms (see Figure 8 for an illustration). The second term attracts the vehicle to a line extending from the goal angle: as the vehicle gets closer to the line, the sine term goes to zero, and then the goal angle attraction term dominates. The commanded velocity is proportional to the x-distance to the goal. The algorithm is designed for a moving-goal scenario; notice there is no attractor for the goal position itself. As such, in a static goal case, the vehicle may not precisely reach the goal point. However, for our scenario pure pursuit is ideal. disty distx Figure 8. Illustration of terms used in the pure pursuit algorithm. To handle the moving task board, we added three new blocks in the mobile manipulator agent’s behavioral layer, as shown in Figures 9. The “laser control” block initializes the connection to the laser, continually reads laser data, and outputs it to any blocks to which it is connected. The “laser tracking” block receives the laser data, looks for the task board (as described in Section 4), and outputs the task board’s position in the format needed by the (pre-existing) visual tracking block. Finally, the “base pure pursuit” block receives a goal position and outputs base velocity commands to steer the base to the goal. As can be seen in Figure 9, much of the infrastructure to accomplish the task-board-following scenario was already in place. We were able to reuse all of the visual tracking and robot command blocks. We also had a visual servo block type, used in previous demonstrations, that receives pose tracking information and sends out pose commands to move the robot to a target position relative to the tracked object. Although this code was all reusable, we did need to make a few minor modifications to adapt it to work in a wider range of scenarios. One modification was to generalize the visual servo task, which enables and connects the blocks used in the visual servo process. Previously, the visual servo task had assumed that the visual servo block would always take input from the visual tracking system (hence the term “visual” in its name). However, as mentioned above, we are now using the laser to provide tracking information (Figure 9a). Also, the visual servo task had assumed that there is only one possible visual servo block. However, in the mobile docking scenario planned for the next work period, we will use two visual servo blocks simultaneously – one to move the base to follow the task board and the other to move the arm to servo to the target hole (Figure 9c). To handle both problems, we generalized the visual servo task by adding parameters that indicate the names of the blocks to be used for the visual servo. Whereas, previously, the task had Laser Tracking Laser Tracking Visual Tracking Visual Tracking Target Pose Target Pose Base Visual Servo Block Base Visual Servo Block Arm Visual Servo Block Relative Pose Command Relative Pose Command Pure Pursuit Block Pure Pursuit Block Velocity Command Velocity Command Mobile Manipulator Control Block Mobile Manipulator Control Block Velocity Command Base Controller Arm Visual Servo Block Velocity Command Coordinated Controller Arm Controller Base Controller (a) Laser Tracking Coordinated Controller Arm Controller (b) Visual Tracking Target Pose Target Pose Base Visual Servo Block Arm Visual Servo Block Relative Pose Command Relative Pose Command Pure Pursuit Block Velocity Command Mobile Manipulator Control Block Relative Pose Command Velocity Command Coordinated Controller Joint Velocity Commands Base Controller Arm Controller (c) Figure 9. Diagram of the behavioral layer during different phases of the mobile dock. In (a), as the robot approaches the task board from far away, the base’s visual servo block uses the tracking information from the laser. In (b), as the robot gets closer, the executive layer (not shown) switches the servo block to use visual tracking. In (c), as the dock begins, visual tracking is sending information to two blocks. One controls the base to follow the task board while the other servos the arm to the target hole. assumed that the visual tracking system would be connected to the visual servo, one can now specify which tracking block to use and to which instance of the visual servo block its output will be directed. Additional infrastructure was needed to handle the dynamic connections made by tasks. Previously, some block connections, such as the one passing commands from the visual servo block to the mobile manipulator control block, were made statically when the agent was started. In our new scenario, though, a visual servo block could be connected to a pure pursuit block or the mobile manipulator control block. When we started making the connections with the tasks, we discovered that making dynamic connections between blocks on a single agent was unimplemented. A task could make dynamic connections between agents, such as connecting visual tracking from the eye agent to the visual servo block on the mobile manipulator agent, but not within the same agent, such as having laser tracking connected to the visual servo. We extended the skill manager, which is used to implement the behavioral layer, by adding this case to the underlying block-connection functionality. Tracking Results In this section, we present results from following the task board using visual tracking and the new laser tracking system. The first experiment compared the pose estimates from the two systems with the base and the task board both fixed. The results with the base one meter away from the task board are shown in Figure 11, Figure 12 shows results with the base two meters, and Figure 13 shows results with the base at 3 meters. At shorter distances, the visual tracking system tends to have tight clusters, but multiple of them. Figure 14 shows the visual tracking data with lines connecting consecutive data points, to emphasize that the system does not jump randomly between clusters. The laser shows better tracking accuracy from farther away. As mentioned in Section 4, the laser data is less accurate at shorter distances, so this result is not surprising. The visual tracking system has a standard deviation of [.003, .001, .002] in (x,y,z) respectively. The standard deviation of the laser tracking system is [.001, .004, 0]. As described in Section 3, the sped-up visual tracking system operates at up to 20 Hz, while the laser system operates at approximately 5 Hz. Looking ahead, in order to accomplish reliably do plug insertion on a moving task board, we will need the base to follow the task board as smoothly as possible. Any variation in the base’s distance to the task board can cause the plug to lose contact with the target surface. A variation in the base’s angle will cause the plug to slide along the surface, moving the plug away from the target hole. The variance in tracking shown in Figures 11-14 (and note that these variances will be worse when both bodies are moving) will cause exactly these variations in base commands. To reduce the effect of noisy pose estimation, we apply low-pass filters. A low-pass filter dampens high-frequency signals by averaging over past values on the signal. For a new reading rt and a smoothing factor k, the filtered r’t is defined as: We apply a filter both to the task board’s position in the global frame and to the command sent to the base. To test the capability of the entire system, we commanded Clifford to pull the task board at 3 cm/s. We ran a visual servo task on the mobile manipulator to follow the task board using the pure pursuit algorithm on the base. The base’s target position was 1 meter away from the task board. To test whether the tracking was smooth and accurate enough for later phases of the project, we commanded the arm to hold contact with the task board via a constrained move task (as described in the August 2008 report). The commanded move was null, and the constraint was to maintain a constant force on the end-effector in the Z-direction. We tested the base-following system using both visual and laser tracking. The commands sent to the base are shown in Figure 15. The speed commands were mostly the same when using the laser or visual systems, but the visual system did produce a few larger spikes. This is consistent with the data shown in the previous figures, in which the visual system shows a larger variance in the x-direction, forward from the base. This direction is largely controlled by speed commands. Based on the stationary data, we did not expect that the laser-tracking would be so bad. This suggests there was a problem with the y-coordinate estimation in the laser, since a large jump in the left-right direction will result in a large angular velocity command. Comparing the ycoordinate of the pose estimate with the angular velocity command in Figure 16, we do see a correlation. However, since the system is giving commands based on the pose estimate, which will in turn cause changes in the pose estimate, we cannot tell from this data alone whether the pose estimate jumps are causing the command jumps, or the other way around. Figure 17 gives us a better clue as to the cause of the bad following using laser tracking. Control iterations are triggered whenever a new pose estimate is received. The laser is providing data at over 10 Hz, but the tracking system outputs a value only when it is sure its pose estimate is good. In the figure, we see gaps of up to nearly two seconds between pose estimates. Each spike in the time between pose readings corresponds to a spike in the angular velocity command. We conclude that the laser tracking system periodically loses the task board, during which time the base drifts away from its target pose. When another pose is received, the system must send a large command to get the base back on track. The poor laser tracking for the moving case is probably caused by our small laser fence. Imperfections in the floor, as well as pitching of the base during motion, are enough to cause the laser scan to miss the fence, and thus get a bad pose. To get better tracking from the laser, the easiest solution is a larger fence. Base control was very smooth with the visual tracking system. In our tests, we found the arm was able to keep constant contact with the task board (Figure 10). The plug slid across the surface by a few centimeters, but it never lost contact with the board. We believe that we are now well-poised to successfully do autonomous plug-insertion into a moving task board during the next work period. Figure 10. The arm keeps contact with the moving task board while the base uses pure pursuit to follow. (a) (b) Figure 11. Comparison of visual tracking (a) and laser tracking (b) with the base fixed 1 meter away from the motionless task board. Each case shows 30 seconds worth of readings. The laser tracking system does not detect height of the task board, so only X (forward from the vehicle) and Y (left from the vehicle) are shown. The visual tracking system has tighter clusters, but multiple distinct clusters. The laser system precisely locates in the X direction, but varies by more than 2 cm in the Y direction. (a) (b) Figure 12. Comparison of visual tracking (a) and laser tracking (b) with the base fixed roughly 2 meters away from the motionless task board. Each case shows 30 seconds worth of readings. The visual system again has multiple clusters, but spread farther apart than the 1-meter case. The laser system tracks more precisely than it did at 1 meter, due to increased laser range accuracy from this distance. Figure 13. Visual pose estimates with the base fixed roughly 3 meters away from the motionless task board, 30 seconds worth. The data shows the same possible range between readings as with the shorter distances, but with none of the same clustering. Also, there are 1/3 as many data points in this set as with the other two, indicating that from this distance the visual tracking system finds the fiducial only 1/3 of the time. Due to imperfections in the floor and laser leveling, the laser does not hit the fence on the task board from this distance. (a) (b) (c) Figure 14. The same visual pose estimate data as in previous figures, but with lines connecting consecutive data points. At 1 meter (a) and 2 meters (b), the estimate jumps only once from one data cluster to the other. At 3 meters (c), there is no order to the pose estimates. (a) (b) Figure 15. Speed (a) and angular velocity (b) commands sent to the base during two task boardfollowing missions. One test was run with the laser pose estimation system, shown in blue. The other was run with the visual pose estimation system, shown in red. Figure 16. A comparison of the angular velocity command to the base with the y-coordinate of the target’s position from the laser tracking system. The data are correlated, but it is unclear which is the cause and which is the effect. Figure 17. Comparison of the time between pose readings and angular velocity commands using the laser tracking system. Large times between readings lead to the base drifting away from its target pose, following which the system must send large angular velocity commands to compensate. 6. References [Coulter 1992] R. C. Coulter. “Implementation of the Pure Pursuit Path Tracking Algorithm.” Technical Report, CMU-RI-TR-92-01, Robotics Institute, Carnegie Mellon University, January, 1992. [Besl & McKay, 1992] P. J. Besl and N. D. McKay. “A Method for Registration of 3-D Shapes.” IEEE Transactions of Pattern Analysis and Machine Intelligence, 14(2):239-256, February 1992. [Fischler & Bolles, 1981] M. A. Fischler and R. C. Bolles. “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography.” Communications of the ACM, 24(6):381-395, June 1981.