pptx

SLAM Accelerated Using Hardware to improve SLAM algorithm performance Project Overview  Team Members  Roy Lycke  Ji Li  Ryan Hamor  Take existing SLAM algorithm and implement on computer  Analyze Performance of algorithm to determine kernels to be accelerated in HW  Implement SLAM algorithm on PowerPC with previously identified kernels in HW RH What is SLAM?  SLAM stands for Simultaneous Localization and Mapping  Predict pose using previous and current data  Types of pose sensors  Wheel Encoders  GPS  Detect landmarks and correlated to robot using predicted pose.  Types of Observation Sensors  Sonar  Infrared  Laser Scanners  Video RH Current State of SLAM Algorithms  SLAM algorithms fall into two main categories  Extend Kalman Filter  Large Covariance Matrix to Process  Particle Filter  Each Particle contains pose estimate and map RH Particle Filter Algorithm RH What we have Decided to do  Started with existing SLAM implementation  ratbot-slam developed by Kris Beevers  ratbot-slam  Uses particle filter algorithm and multiple observation scans using just wheel encoders and 5 IR sensors  We modified ratbot-slam to use log files taken from radish.sourceforge.net RH Ratbot-slam Modifications  Create new observation function using laser scans vs. original IR sensors.  Modify motion model to use dead-reckoned odometry RH Demo of Modified ratbotslam RH Profile of Modified Code RL Areas that can be Accelerated  Decided to accelerate predict step included:  motion_model_deadreck  gaussian_pose  Estimated Maximum speed up 39% or 1.64x  Why not squared_distance_point_segment?  Least understood of algorithms we could accelerate  If we had more time we would have developed this RL Function Acceleration  Design Decisions  Fixed or Floating Point?  Fixed point  Implementation done in fixed point  Resources required to do floating point were significantly heavier  Heavily Pipeline or Create Predict Stage for each particle?  Heavily Pipelined  Data is serially loaded through load and save function to coprocessor  It would take too many resources to implement predict stages in parallel for each particle RL Top Level Design RL Motion Model C-Code RH MotionModel Data Flow RH MotionModel Data Flow RH MotionModel HDL Stats RH Gaussian Pose void gaussian_pose(const pose_t *mean, const cov3_t *cov, pose_t *sample) { sample->x = gaussian(mean->x, fp_sqrt(cov->xx)); sample->y = gaussian(mean->y, fp_sqrt(cov->yy)); sample->t = gaussian(mean->t, fp_sqrt(cov->tt)); } JL Gaussian Pose fixed_t gaussian(fixed_t mean, fixed_t stddev) { static int cached = 0; static fixed_t extra; static fixed_t a, b, c, t; if(cached) { cached = 0; return fp_mul(extra, stddev) + mean; } // pick random point in unit circle do { a = fp_mul(fp_2, fp_rand_0_1()) - fp_1; b = fp_mul(fp_2, fp_rand_0_1()) - fp_1; c = fp_mul(a,a) + fp_mul(b,b); } while(c > fp_1 || c == 0); t = pgm_read_fixed(&unit_gaussian_table[c >> unit_gaussian_shift]); extra = fp_mul(t, a); cached = 1; return fp_mul(fp_mul(t, b), stddev) + mean; } JL Parallelism & Acceleration Techniques  Parallelism  gaussian_pose function is consists of three gaussian functions.  gaussian functions can be separated into two parts  Acceleration TechniquesPipelineMulti-thread JL Top Level Diagram of gaussian_Pose JL Random Number Generator  Xorshift random number generators are developed. They generate the next number in their sequence by repeatedly taking the exclusive or (XOR) of a number with a bit shifted version of itself. JL Random_Number_Manager JL Gaussian Entity JL Demo of FPGA System RL Timing Analysis of Original System  Timing analysis was performed via run-time clock counts and print statements to the minicom  Sections of code timed include: Predict Step, Multiscan Feature Extraction and Data Association Step, & Filter Health Evaluation and Re-sample Step  The Predict Step was implemented on the FPGA for acceleration  Initial timing analysis : Operation Predict Step - Original Multiscan Step - Original Filter Step - Original Average Runtime Present in (in microseconds) percentage of runs 100% 107,502 2,487,969 2.17% 3,394 2.17% RL Timing Analysis of Accelerated System  Timing analysis for accelerated implementation was performed in same manner as original implementation  Results shown along with original timing analysis  From the data collected, the Predict Step was accelerat ed by 88% Operation Predict Step - Original Multiscan Step - Original Average Runtime (microseconds) 107,502 Present in percentage of runs 100% 2,487,969 2.17% Filter Step - Original 3,394 2.17% Predict Step - Accelerated 12,784 100% 1,982,950 1.94% 13,291 1.94% Multiscan Step - Accelerated Filter Step - Accelerated RL Result Analysis  With the Predict Step accelerated by 88.108%, the overall system is accelerated by:  34% = 39% x 88%  Result is a reliable and sizable acceleration to the system execution time  Analysis of other components  Multiscan Step accelerated by 20.29%  Filter Step slowed by 74.46%  Differences may be due to different values generated by FPGA implementation vs. Original implementation  Both implementations use random values  More accurate values may lead to longer calculation in other components RL Difficulties with Project Implementation  Networking issues  Data transfer - differences between PowerPC and Linux  Limitations of FPGA  Unpredictable execution halting  Lack of resource libraries  Timing performed with specialized Xilinx library  Code needed to be modified to run  PC vs. FPGA Environment  Output file format is different  Issue figuring out how to add multiple files to custom IP RL Conclusions  Based on the run-time analysis of our implementation of the accelerated SLAM algorithm there was an appreciable speed up achieved.  Our Implementation achieved a speed up of approximately 34% or 1.51x out of an ideal 39% or 1.64x  This result shows that if more of the SLAM algorithm was implemented on an FPGA there could be a greater acceleration.  Top issue in SLAM implementations is getting algorithm’s implemented on embedded real time systems RH Future Directions  Add more regions of the Algorithm to the FPGA acceleration  Current implementation only accelerates 39% of system  Run SLAM system on different FPGA  FPGAs with more robust processors may overcome some of the limitations our implementation faced  Run different SLAM algorithm  Current implementation is a particle filter algorithm, a Kalman filter algorithm would be next  Load data onto board rather than using PC interaction  Load data via memory card  Perform single data load and perform memory management on the FPGA RL References 1. Durrant-Whyte, Bailey, “Simultaneous Localization and Mapping: Part 1”, IEEE Robotics and Automation Magazine, June 2006, pg 99 – 1082. 2. Durrant-Whyte, Bailey, “Simultaneous Localization and Mapping: Part 2”, IEEE Robotics and Automation Magazine, September 2006, pg 108 - 1173. 3. Bonato, Peron, Wolf, Holanda, Marques, Cardoso, “An FPGA Implementation for a Kalman Filter with Application to Mobile Robotics”, Industrial Embedded Systems, 2007, pg 148 – 1554. 4. Bonato, Marques, Constantinides, “A Floating-point Extended Kalman Filter Implementation for Autonomous Mobile Robots”, Field Programmable Logic and Applications, 2007, pg 576-5795. 5. Beevers K.R., Huang, W.H., “SLAM with Sparse Sensing”, Robotics and Automation 2006, pg 2285-2290 RL Questions? RL

pptx

Related documents

Products

Support

pptx

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib