Introduction to Computer Vision Olac Fuentes Computer Science Department University of Texas at El Paso El Paso, TX, U.S.A. What is Computer Vision? Computer Vision is the process of extracting knowledge about the world from one or more digital images Digital Images are 2D arrays (matrices) of numbers: Digital Images Color Images are formed with three 2-D arrays, representing the Red, Green and Blue components of the image. Computer Vision – Main Tasks • • • • Model generation Object Recognition Object Detection Tracking Computer Vision – Object Detection Detecting Faces Computer Vision – Object Detection Detecting Faces Computer Vision – Object Detection Detecting Pedestrians Computer Vision – Object Detection Detecting Cars Computer Vision – Object Detection How to do it? Idea: Use Machine Learning Training: Training Set: • • Positive examples are images of objects that belong to the class of interest Negative examples are images of objects that don’t belong to that class Train classifier using the training set Detection Given an image to analyze, apply classifier to every subimage (there are lots of them, so a low false positive rate is important!) Face Detection – Training Images Efficient Object Detection Viola & Jones, 2005 Idea #1: Classifier Structure Build a cascade classifiers: Where stage i is simpler (and faster) than stage i+1 Efficient Object Detection Viola & Jones, 2005 Idea #2: Features Use a large number of very simple features: Efficient Object Detection Viola & Jones, 2005 Idea #3: Feature Computation Compute the features very efficiently using the integral image: Efficient Object Detection Viola & Jones, 2005 Idea #4: Dealing with multiple scales Efficient Object Detection Viola & Jones, 2005 Idea #4: Dealing with multiple scales Obvious solution: Build a detector for each possible scale Efficient Object Detection Viola & Jones, 2005 Idea #4: Dealing with multiple scales Obvious solution: Build a detector for each possible scale Efficient Object Detection Viola & Jones, 2005 Idea #4: Dealing with multiple scales Obvious solution: Build a detector for each possible scale Better idea: Build a detector for a single scale During detection, scale the image Efficient Object Detection The Modified census transform (Froba and Ernst, 2004) Used local intensity descriptors as features Efficient Object Detection The Modified census transform (Froba and Ernst, 2004) Used local intensity descriptors as features Used simple voting classifiers and Adaboost to build a cascade of classifiers Efficient Object Detection Histograms of Gradients (Dalal, 2005) Histograms of Gradients (Dalal, 2005) Used histograms of oriented gradients as features Used Support Vector Machine as classifier Best results to date Training Object Recognition Testing Owl ?? Duck Toucan ?? Egret Object Recognition – Face Recognition Eigenfaces are a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. First four eigenfaces from the AT&T database Eigenfaces • One person's face might be made up of 10% from face 1, 24% from face 2 and so on. Very few eigenvector terms are needed to give a fair likeness of most people's faces Eigenfaces provide a means of applying data compression to faces for identification purposes. Eigenfaces • Let E1,...,En, be the eigenfaces obtained from a face database Let F1,...,Fm be the images in our training/testing sets. (For the training images we also know the person’s identity) The attributes of Fi are given by the sum of the pixel by pixel products of Fi and E1,...,En, that is, Fi is represented by n numbers: [Fi·E1, Fi·E2, ..., Fi·En] Using the attribute vectors and the class information we can now construct a classifier Tracking Continuous detection of objects of interest in video streams Tracking Continuous detection of objects of interest in video streams Reconstruction Build a 3D models of world given 2D Images Most-common Approach: Stereo Vision •Inspired by human 3D perception •Use two cameras of known geometry Reconstruction Build a 3D models of world given 2D Images Most-common Approach: Stereo Vision •Inspired by human 3D perception •Use two cameras of known geometry •Take images Reconstruction Build a 3D models of world given 2D Images Most-common Approach: Stereo Vision •Inspired by human 3D perception •Use two cameras of known geometry •Take images •Find correspondences •Reconstruct using correspondences and known geometry Reconstruction Reconstruction Problems with Stereo Vision: Finding matches reliably is difficult Calibration is difficult It hard to deal with featureless areas Computationally expensive Reconstruction Microsoft to the rescue! Reconstruction Microsoft to the rescue! Seriously! Reconstruction Microsoft Kinect Reconstruction using active illumination Project a known pattern of light at an invisible wavelength Learn the appearance of that pattern at different distances Fast and easy Reconstruction Microsoft Kinect Reconstruction Microsoft Kinect