Down to Earth Computer Vision Accurate identification of a floor relative to a moving spherical object (With Sphero) Sean Ong EGGN 512 – Computer Vision (Professor William Hoff) May 1, 2013 BACKGROUND • The Sphero • A robotic ball that can be controlled from a smartphone or tablet. • Augmented reality and Sphero • • • Existing apps can overlay virtual characters on the Sphero. http://www.youtube.com/watch?v=UPn3jVGQw68 Floor detection could improve. • Computer Vision for floor detection • My project demonstrates a novel method to accurately identify the floor relative to the Sphero using only camera images (no information received from Sphero). TRADITIONAL POSE ESTIMATION • Hough Lines? – May work if lines are present, but not a robust technique for this application. • Given known points on an object, linear or least-squares pose estimation techniques can be used • However, this is very challenging for spherical objects! MY POSE ESTIMATION TECHNIQUE • Takes advantage of “calibration” step. • The camera pose can be determined from two images: • Image of Sphero near • Image of Sphero far • Use feature tracking (e.g. SIFT and RANSAC) to correct for motion • This technique essentially turns the sphere into a “rod” - enabling conventional pose estimation techniques to work! HOW IT WORKS 1. 2. 3. 4. 5. Identify Sphero in “near” image and “far” image Correct for camera motion with feature tracking Determine model (“rod”) dimensions and corresponding image points. Use conventional methods to determine camera pose and ground plane Keep track of pose and ground plane using feature tracking. THE DEMO • http://www.youtube.com/watch?v=DfbFuuU3rf8&feature=youtu.be • Note: the floor is a little ‘jumpy’ 1. TRACKING THE SPHERO • Use Circular Hough Transform (CHT) • Fairly common in computer vision packages • • OpenCV: HoughCircles() Matlab: imfindcircles() Resources: T.J Atherton, D.J. Kerbyson. "Size invariant circle detection." Image and Vision Computing. Volume 17, Number 11, 1999, pp. 795-803 H.K Yuen, .J. Princen, J. Illingworth, and J. Kittler. "Comparative study of Hough transform methods for circle finding." Image and Vision Computing. Volume 8, Number 1, 1990, pp. 71–77. E.R. Davies, Machine Vision: Theory, Algorithms, Practicalities. Chapter 10. 3rd Edition. Morgan Kauffman Publishers, 2005 http://www.mathworks.com/help/images/ref/imfindcircles.html http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/hough_circle/hough_circle.html 2. CORRECTING FOR CAMERA MOTION • Use SIFT and RANSAC • Scale-Invariant Feature Transform (SIFT) • Identifies features in an image • vl_sift() • Random Sample Consensus (RANSAC) • Iteratively fit a fundamental matrix to randomly selected feature matches until a set of inliers are found. Resources: Lowe, D. G., “Distinctive Image Features from Scale‐Invariant Keypoints”, Int’l Journal of Computer Vision, 60, 2, pp. 91‐110, 2004. http://www.vlfeat.org/overview/sift.html Fischler, M. A., & Bolles, R. C. 1981. Random sampling consensus: A paradigm for model fitting with applications to image analysis and Automated cartography. Communications of the Associations for Computing Machinery, 24(26), 381–395. http://inside.mines.edu/~whoff/courses/EGGN512/lectures/27-Ransac.pdf http://inside.mines.edu/~whoff/courses/EGGN512/lectures/15-SIFTBasedObjectRecog.pdf 3. MAKING THE ROD • For each image (near and far), find the left-most Sphero point, right-most point, and top-most point. • Distance Sphero traveled can be measured by pixel diameter • , • We now know the physical (model) dimensions and coordinates of the “rod” and also the image point correspondences. • Conventional pose estimation techniques are now possible. 4. ESTIMATING CAMERA POSE AND FLOOR • Use pose estimation (this project uses least-squares pose estimation) to find extrinsic matrix that best fits estimated image points to actual image points. • Note: intrinsic camera parameters (mainly focal length) are needed. • Once the extrinsic matrix is calculated, the floor (Z=0 plane) can be accurately superimposed on the image. Coordinate axis can also be drawn, in addition to any other augmented reality object. • I show the virtual floor and coordinate axis in my demo Resources: http://inside.mines.edu/~whoff/courses/EGGN512/lectures/16-PoseEstimation.pdf Richard Szeliski, Computer Vision: Algorithms and Applications, 2010 Springer http://inside.mines.edu/~whoff/courses/EGGN512/lectures/19-LinearPoseEstimation.pdf http://inside.mines.edu/~whoff/courses/EGGN512/lectures/24-StructureFromMotion.pdf THE CODE for i=1:15 % Get predicted image points y = fProject(x, P_M, K); • Least-squares pose estimation % Estimate Jacobian e = 0.00001; % a tiny number J(:,1) = ( fProject(x+[e;0;0;0;0;0],P_M,K) J(:,2) = ( fProject(x+[0;e;0;0;0;0],P_M,K) J(:,3) = ( fProject(x+[0;0;e;0;0;0],P_M,K) J(:,4) = ( fProject(x+[0;0;0;e;0;0],P_M,K) J(:,5) = ( fProject(x+[0;0;0;0;e;0],P_M,K) J(:,6) = ( fProject(x+[0;0;0;0;0;e],P_M,K) % Error is observed image points - predicted dy = y0 - y; • Code iteratively solves for pose parameters y )/e; y )/e; y )/e; y )/e; y )/e; y )/e; image points % Ok, now we have a system of linear equations dy = J dx % Solve for dx using the pseudo inverse dx = pinv(J) * dy; % Stop if parameters are no longer changing if abs( norm(dx)/norm(x) ) < 1e-6 break; end x = x + dx; % Update pose estimate end 4. CONTINUE TO KEEP TRACK OF CAMERA MOTION • Again, SIFT and RANSAC are used to keep track of camera motion • [Demo showing results without camera tracking] • I attempted several methods: 1. 2. 3. 4. Extracting extrinsic matrix from fundamental matrix (via RANSAC) Extracting extrinsic matrix from homography Transform augmented elements using matching correspondences / homography Back-calculate 3D position of each feature point SUMMARY • Goal of improving floor detection • A novel technique was demonstrated • • • Creation of a virtual “rod” using two images Enables the use of conventional pose estimation techniques on the Sphero Utilizes feature tracking methods to keep track of camera motion • The new technique was demonstrated to work, although there is room for further improvement and optimization. POTENTIAL IMPROVEMENTS • Data from Sphero and phone/tablet can be used to improve accuracy of floor detection • • • • Distance traveled Location Speed Phone sensors: gyroscope, accelerometer, etc… • Improvements to algorithm • • • Use a combination of feature matching to first image and between images to decrease false matches and floor detection errors Potential algorithm that uses Sphero motion exclusively, avoiding the issue of tracking camera motion using featureless surfaces. General algorithm optimization to improve speed and accuracy. QUESTIONS?