Technischen Universität München Winter Semester 2009/2010 TRACKING and DETECTION in COMPUTER VISION Slobodan Ilić A Picture’s Worth a Thousand Words! “One of the Family”, Frederick Cotman, 1880 Intro: Tracking and Detection in Computer Vision Ilic Slobodan A Picture’s Worth a Thousand Words! “One of the Family”, Frederick Cotman, 1880 Intro: Tracking and Detection in Computer Vision Ilic Slobodan What is Computer Vision? Human Vision (eyes and the visual cortex in the brain) discovers from images what object are present in the scene, where they are, how they move and what is their shape. Computer Vision (using cameras attached to the computers) automatically interprets images trying to understand their content similar to the human vision. Intro: Tracking and Detection in Computer Vision Ilic Slobodan What is not Computer Vision? Image Processing ‐ Takes an image and process is to produce new, more desirable image. Image enhancement, image compression, image restoration. Pattern Recognition ‐ Takes a pattern and classifies it into one of predefined, finite set of classes. Computer Graphics ‐ Synthesize images using powerful algorithms so that they correspond as close as possible to the real images. Intro: Tracking and Detection in Computer Vision Ilic Slobodan How did everything start? Computer Vision started as a semester project at MIT in 1965. The assumptions were very strong (block world) and the data were perfect, so it seemed to researchers to be an easy task. However the wold is not perfect ! Intro: Tracking and Detection in Computer Vision Ilic Slobodan Why to study Computer Vision? • Intellectual curiosity -- try to mimic the most powerful human sense • A number of industrial applications: • automation of industrial processes • visualization and augmented reality • • medicine, diagnostics • • communication • security and surveillance • military and space research entertainment: film and video games human computer interactions (HCI) Intro: Tracking and Detection in Computer Vision Ilic Slobodan The eye Fovea • Retina measures about 5 × 5 cm and contains 10^8 sampling elements (rods: sense brightness, low intensity, e.g night vision and cones: sense color with higher intensity light). •The eye’s spatial resolution is about 0.01◦ over a 150◦ field of view (not evenly spaced, there is a fovea and a peripheral region). •Intensity resolution is about 11 bits/element, spectral resolution is about 2 bits/ element (400–700nm). •Temporal resolution is about 100 ms (10 Hz). •Two eyes give a data rate of about 3 GBytes/s! •A large chunk of our brain is dedicated to processing the signals from our eyes. Intro: Tracking and Detection in Computer Vision Ilic Slobodan The camera •A typical digital SLR CCD measures about 24×16mm and contains about 6×10^6 sampling elements(pixels). •Intensity resolution is about 8bits/pixel for each colour chanel (RGB). •Most computer vision applications work with monochrome images. •Temporal resolution is about 40 ms (25 Hz), SNR is about 50dB(Pulnix camera spec.). •One camera gives a raw data rate of about 450MBytes/s (color), i.e 150MBytes/s (mono) Intro: Tracking and Detection in Computer Vision Ilic Slobodan Should we copy biology? • No! Human vision is a product of millions of years of the evolution created under different constraints. • • It consists of 60 billion neurons heavily interconnected. Computes we have today cannot perform like a human brain. We really do not understand how the brain works! We need to try understand underlying principles rather then the particular implementation. Intro: Tracking and Detection in Computer Vision Ilic Slobodan Image ambiguities GiveImage forma Image formation is many to one mapping. It is simple projection of the 3D object representation and does not say anything about the depth. Intro: Tracking and Detection in Computer Vision Ilic Slobodan What should we do? “One cannot understand what seeing is and how it works unless one understands the underlying information processing task being solved” David Marr The imaging process is ambiguous and we should try to resolve the ambiguities by introducing constraints to our problem like: • • • use more then one image of the scene make assumptions about the world in the scene introduce knowledge about the observed problem Intro: Tracking and Detection in Computer Vision Ilic Slobodan What is tracking? • Tracking means following one or multiple objects or of interest in the scene providing continuously their position • estimate parameters of the dynamic system, e.g. feature point positions, object position, human joint angles etc. • The applications of tracking are various and represent one of the primal Computer Vision tasks • The source of information is video from one or multiple cameras Intro: Tracking and Detection in Computer Vision Ilic Slobodan Different Approaches to Vision-based 3D Tracking Consider natural features Consider natural features Make use of visual markers No a priori 3D knowledge Use some 3D knowledge Use some 3D knowledge Intro: Tracking and Detection in Computer Vision Ilic Slobodan Tracking examples Tracking 3D objects, CVLAB, EPFL Intro: Tracking and Detection in Computer Vision Tracking in 2D CONDENSATION alg. M. Isard, A. Blake Ilic Slobodan Face tracking 2D face tracking using AAM courtesy of Robotics Institute CMU 3D face tracking courtesy of CVLAB, EPFL Intro: Tracking and Detection in Computer Vision Ilic Slobodan People tracking - monocular A. Ess and B. Leibe and K. Schindler and and L. van Gool , A Mobile Vision System for Robust Multi‐Person Tracking, CVPR 2008 Intro: Tracking and Detection in Computer Vision Ilic Slobodan People tracking multi camera F. Fleuret, J. Berclaz, R. Lengagne and P. Fua, Multi‐Camera People Tracking with a Probabilistic Occupancy Map, IEEE Transactions on PAMI, Vol. 30, Nr. 2, pp. 267 ‐ 282, February 2008 Intro: Tracking and Detection in Computer Vision Ilic Slobodan Articulated people tracking R. Urtasun, D. Fleet and P. Fua, Temporal Motion Models for Monocular and Multiview 3‐‐D Human Body Tracking, Computer Vision and Image Understanding, Vol. 104, Nr. 2, pp. 157 ‐ 177, December 2006. Intro: Tracking and Detection in Computer Vision Ilic Slobodan Deformable surface tracking S. Ilic, M. Salzmann, P. Fua, Implicit Meshes for Effective Silhouette Handling, International Journal of Computer Vision, Vol. 72, Nr. 2, pp. 159 ‐ 178, 2007 Intro: Tracking and Detection in Computer Vision Ilic Slobodan Augmented reality J. Pilet, A. Geiger, P. Lagger, V. Lepetit and P. Fua, An all‐in‐one solution to geometric and photometric calibration, International Symposium on Mixed and Augmented Reality, October 2006 M. Salzmann, J.Pilet, S.Ilic, P.Fua, Surface Deformation Models for Non‐Rigid 3‐‐D Shape Recovery, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, Nr. 8, pp. 1481 ‐ 1487, August 2007 Intro: Tracking and Detection in Computer Vision Ilic Slobodan Sport applications Video is courtesy of Dartfish N. Gehrig, V. Lepetit, and P. Fua. "Golf Club Visual Tracking for Enhanced Swing Analysis Tools", In proceedings of British Machine Vision Conference, September 2003. Intro: Tracking and Detection in Computer Vision Ilic Slobodan Robotic and industrial AR applications Video is courtesy of University of Cambridge Video is courtesy of METAIO Intro: Tracking and Detection in Computer Vision Ilic Slobodan Entertainment applications Videos above are courtesy of Prof. A. Hilton, Univ. of Surrey C. Cagniart, E. Boyer, S. Ilic, “Iterative Mesh Deformation for Dense Surface Tracking”, 3DIM workshop on ICCV09, Kyoto, Japan, 2009 Intro: Tracking and Detection in Computer Vision Ilic Slobodan Entertainment applications Videos above are courtesy of Prof. A. Hilton, Univ. of Surrey C. Cagniart, E. Boyer, S. Ilic, “Iterative Mesh Deformation for Dense Surface Tracking”, 3DIM workshop on ICCV09, Kyoto, Japan, 2009 Intro: Tracking and Detection in Computer Vision Ilic Slobodan Medical applications Intro: Tracking and Detection in Computer Vision Ilic Slobodan Medical applications Intro: Tracking and Detection in Computer Vision Ilic Slobodan More applications VIdeos are courtesy of EPFL, Alinghi and Hydroptere companies Intro: Tracking and Detection in Computer Vision Ilic Slobodan What is detection? • • • Detection means finding the object of interest and providing its position in the image. • • no assumption of the system dynamics the response is not based on temporal consistency The applications of detection are various: machine vision and quality control, surveillance, robotics etc. The source of information is a single image Intro: Tracking and Detection in Computer Vision Ilic Slobodan Toy example M. Ozuysal, M. Calonder, V. Lepetit and P. Fua, Fast Keypoint Recognition using Random Ferns, accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. Intro: Tracking and Detection in Computer Vision Ilic Slobodan 3D object detection S. Hinterstoisser, S. Benhimane, N. Navab, “N3M: Natural 3D Markers for Real‐Time Object Detection and Pose Estimation”, IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, October 14‐20, 2007 Intro: Tracking and Detection in Computer Vision Ilic Slobodan Face detection Intro: Tracking and Detection in Computer Vision Ilic Slobodan Deformable object detection J. Pilet, V. Lepetit, and P. Fua, Fast Non‐Rigid Surface Detection, Registration and Realistic Augmentation, International Journal of Computer Vision, Vol. 76, Nr. 2, February 2008. Intro: Tracking and Detection in Computer Vision Ilic Slobodan People detection M. Dimitrijevic, V. Lepetit and P. Fua, Human Body Pose Detection Using Bayesian Spatio‐Temporal Templates, Computer Vision and Image Understanding, Vol. 104, Nr. 2, pp. 127 ‐ 139, December 2006 A. Fossati, M. Dimitrijevic, V. Lepetit and P. Fua, Bridging the Gap between Detection and Tracking for 3D Monocular Video‐Based Motion Capture, Conference on Computer Vision and Pattern Recognition, Minneapolis, MI, June 2007 Intro: Tracking and Detection in Computer Vision Ilic Slobodan Detection in Machine Vision A. Hocauser, Carsten Steger, N. Navab, Harmonic deformation model for edge based template matching, International Conference on Computer Vision Theory and Applications, Funchal, Portugal, January 2008. Intro: Tracking and Detection in Computer Vision Ilic Slobodan Detecting objects in range data Ajmal S. Mian, M. Bennamoun and R. Owens, "3D Model‐based Object Recognition and Segmentation in Cluttered Scenes", to appear in IEEE Transactions in Pattern Analysis and Machine Intelligence (PAMI), 2006 Intro: Tracking and Detection in Computer Vision Ilic Slobodan Patch detection S. Hinterstoisser, O. Kutter, N. Navab, P. Fua, V. Lepetit, Real‐Time Learning of Accurate Patch Rectification (Oral presentation), IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Miami, Florida (USA), June 2009 Intro: Tracking and Detection in Computer Vision Ilic Slobodan Contour Detection S. Holzer , S. Hinterstoisser, S. Ilic, N. Navab, Distance Transform Templates for Object Detection and Pose Estimation, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Miami, Florida (USA), June 2009. Intro: Tracking and Detection in Computer Vision Ilic Slobodan Overview of the course • Introduction • Filtering and edge detection: Convolution; Gaussians; Image derivatives; Edge detection; Canny edge detector • Local invariant feature detectors: • Corner detection: Harris corner detector; Scale space; HarrisLaplace; Harris-Affine; EBR(Edge based regions) • Region detectors: MSER(Maxumal Stable Extremal Regions); IBR(Image Based Regions) • • Blob detectors: Hessian; Hessian Laplace/Affine; Feature descriptors: SIFT, SURF, HoG • Feature point recognition: Randomized Trees, FERNS,Keypoint Signatures • Advanced methods: Panter, Lepard, Gepard and DTT Intro: Tracking and Detection in Computer Vision Ilic Slobodan Overview of the course • Camera models and projections; Model based tracking; Pose estimation form 2D-3D correspondences(DLT, PnP,POSIT); Rotation parameterization; Non-linear optimization; • • Robust estimators; RANSAC. Active appearance models • • • Kalman filtering, Particle filtering, CONDENSATION Alg. Template matching: Similarity measures; KLT tracker; Juri-Dhome algorithm; ESM Visual SLAM, PTAM Haar features; Integral images; Ada-Boost;Viola-Jones face detection. Intro: Tracking and Detection in Computer Vision Ilic Slobodan Exam, exercises and homeworks • • • • • • Final exam 100pts (50pts to pass) Mid-term exam (20pts) + Home works (20pts) Total: 140pts (100pts = 1.0!!!) Lectures: Mondays from 10am-11:30am at MI 03.13.010 Exercises: Wednesdays 10am-11:30am at MI 03.13.008 • mainly practical on the computer and will serve to explain you given homework tasks from theoretical and practical point of views. • • check your previous home work you can start doing your home work on the exercises class and ask questions Home works: will be check individually during the exercises! Intro: Tracking and Detection in Computer Vision Ilic Slobodan