Killer Bunny Rabbit Detector An Application of Object Detection James Maher Image from: http://thereelcult.com/wp‐content/uploads/2012/04/Monty‐Python‐rabbit_400.jpg Outline • • • • • Introduction Previous Work Methodology Results Discussion and Conclusion Introduction • My Rabbit Problem • PyCon 2012 Talk – Militarizing Your Backyard with Python…, by: Kurt Grandis [1]. • Project Goal – Apply a more sophisticated Computer Vision algorithm to detect objects Images from [1]. Previous Work • Background Segmentation – Issues • Ghosts [2] • Multi‐step process is computationally intensive • Feature Extraction – SIFT [3] – Histogram of Oriented Gradients (HOGs) [4] Images from [2,5]. Image from [3]. Methodology • Followed P. Felzenszwalb, et al.’s, Discriminatively Trained Deformable Parts Model [6],[7]. • Deformable parts are parts that can change shape and position, but only within a limited range. Image from [8]. An early attempt at object detection through deformable parts. Methodology • Histograms of Oriented Gradients (HOGs) are used for feature extraction • The magnitude and direction of the gradients are calculated for each pixel and combined into one histogram per cell. – A cell is defined as 8x8 pixels Gradient Direction Gradient Magnitude Gradient Masks 1 0 1 1 0 1 Methodology • Each pixel’s gradient direction and orientation votes into a histogram for the cell – 9 orientation bins and 4 magnitude bins were used • These histograms are normalized by blocks of 2x2 cells – Each cell is normalized by it’s 8 neighbors Image from [6]. Methodology • A feature pyramid of HOGs was used to match the root filter and the deformable parts Image from [7]. • For implementation, deformable parts used twice the resolution of the root filter Methodology • Two factors determined the score for potential matches: 1. How well the image’s HOG features matched the trained model 2. How much the parts were deformed from the trained model • This results in: max ∙ Φ x, z ∈ ∙ , ∙ , ∙ , Methodology • A visual representation of the filters and penalty terms for a trained model: Image from [6]. Methodology • A Latent Support Vector Machine (LSVM) is max ∙ Φ x, z used to train the term in: ∈ • The latent variables are in the z term Methodology • OpenCV contains a library for LatentSVMs – Must provide a trained model. The only trained models available are for objects in the PASCAL competition. • MATLAB source code is available at: http://people.cs.uchicago.edu/~rbg/latent/ • Code only compiles on Mac and Linux • Modifications for training a new model are non‐ trivial, but details are included in my paper Methodology • Used 126 training images, from videos of rabbits near my house • Wrote MATLAB scripts to extract training images and create files training the model • Held back 14 testing photos from rabbits near Golden and 10 photos from the original movies that were not used in training the model. Results Results • None of the testing images from Golden were detected • Detection/Non‐Detection in the training videos was dependent on how close the camera was to the rabbit – 50% of the video clips were detected • No false positives. Bounding box surrounded the rabbit each time • Increasing the number/variety of training examples increases detection Discussion and Conclusion • Good – Images similar to training images were detected – No false positives • Bad – Algorithm is not fast enough to be used with real‐ time, 30fps video – Must be close to the target • Notes: – Closer images worked better than distant images References [1] K. Grandis, Militarizing Your Backyard with Python: Computer Vision and the Squirrel Hordes. PyCon USA 2012. [2] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati, “Detecting moving objects, ghosts, and shadows in video streams,” Pattern Anal. Mach. Intell. IEEE Trans., vol. 25, no. 10, pp. 1337–1342, 2003. [3] D. G. Lowe, “Distinctive image features from scale‐invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004. [4] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” presented at the International Conference on Computer Vision & Pattern Recognition, 2005, vol. 1, pp. 886–893. [5] W. Hoff, “Motion‐Based Segmentation,” presented at the EGGN 512: Computer Vision, 2013. [6] P. Felzenszwalb, D. McAllester, and D. Ramanan, “A Discriminatively Trained, Multiscale, Deformable Part Model,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 2008, pp. 1–8. [7] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object Detection with Discriminatively Trained Part Based Models,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, vol. 32. References [8] M. A. Fischler and R. A. Elschlager, “The representation and matching of pictorial structures,” Comput. Ieee Trans., vol. 100, no. 1, pp. 67–92, 1973.