Computer Vision Lecture 7 Classifiers This Lecture • Bayesian decision theory (22.1, 22.2) – – – – General theory Gaussian distributions Nearest neighbor classifier Histogram method • Feature selection (22.3) – Principal component analysis – Canonical variables • Neural networks (22.4) – Structure – Error criterion – Gradient descent • Support vector machines (SVM) (22.5) Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 1 Motivation • Task: Find faces in image • Method: – While (image not explored) • Obtain pixels from a rectangular ROI in image (call this set of pixels x) • Classify according to the set of values Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 2 Example of Result • Figure 22-5 from text. Classifier has found the faces in image, and indicated their locations with polygonal figures. Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 3 Conceptual Framework • Assume that there are two classes, 1 and 2 • We know p(1|x) and p(2|x) • Intuitive classification - see below Com po ne nts o f a m ix tur e 1 .8 T hre s ho l t 1 .6 p(x) 1 .4 1 .2 p1 1 p2 0 .8 0 .6 0 .4 0 .2 0 0 0 .5 1 1 .5 2 x Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 4 Decision Theory • What are the consequences of an incorrect decision? • Bayes method – Assume that a loss (cost) of L(1->2) if we assign object of class 1 to category 2 and L(2->1) if we assign object of class 2 to category 1. • To minimize the average loss we decide as follows – Choose 1 if L(1->2)P(1|x) < L(2->1)P(2|x) – Choose 2 otherwise Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 5 Experimental Approaches • We are given a training (learning) sample (xi, yi) of data vectors xi and their classes yi, and we need to construct the classifier • Estimate p(1|x), p(2|x) and build classifier – Parametric method – Nearest neighbor method – Histogram method • Use classifier of specified structure and adjust classifier parameters from training sample. Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 6 Example: Gaussian Classifier • Assume we have data vectors xk,i for i = 1, 2. The probabilities Pr{i} are known. The Bayes loss is equal for both categories. • Estimate the means and covariances N N i 1 i 1 i xk,i ; i (xk,i i )(xk,i i )T . Ni k1 (Ni 1) k1 • Classifier: Given unknown x. Compute g{i | x} (x i )T 1 i (x i ) 2ln Pr(i) ln i – Chose class that has the lower value of g(i|x) Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 7 Example: Nearest Neighbor • Assume we have data vectors xk,i for i = 1, 2. • Classifier: Given unknown x – Find di min k x xi,k for i 1,2 • Chose i for smaller value of di Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 8 Example Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 9 Histogram Method • Assume we have data vectors xk,i for i = 1, 2. The Bayes loss is equal for both categories. • Divide input space into J ‘bins’,J < N1, N2 • Find hi,j the number of training vectors of category i in bin j. • Given: unknown x. Find its bin j. Decide according to which hi,j is higher. Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 10 Curse of Dimension • If the number of measurements in x is very large, we are in trouble – The covariance matrix ∑ will be singular (not possible to find ∑-1, ln|∑| = -∞). Gaussian method does not work. – Hard to divide space of x into bins. Histogram method does not work • Feature extraction? Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 11 Principal Component Analysis • Picture Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 12 Principal Component Analysis • Formulas 1 n 1 n xi , (xi )(xi )T . n i1 n 1 i1 • Find vi, the eigenvalues and eigenvectors of ∑ • Chose x•vi for large eigenvalues as features Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 13 Issues • Data may not be linearly seperable • Principal components may not be appropriate Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 14 Canonical Transformation • For two categories: Fisher linear discriminant function n n 1 i 1 2 i i xi,k , (xi,k i )(xi,k i )T . ni k1 n 1 i1 k1 v 1( 2 1 ) Tx as only feature. • Use v • See textbook for general formula Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 15 Neural Networks • Nonlinear classifiers • Parameters are found by iterative (slow) training method • Can be very effective Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 16 Two-Layer Neural Network Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 17 Curse of Dimension • High order neural network has limited ability to generalize (and can be very slow to train) • Problem structure can improve performance Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 18 Useful Preprocessing • Brightness/contrast equalization • Position, size adjustment • Angle adjustment Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 19 Architecture for Face Search Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 20 Architecture for Character Recognition Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 21 Support Vector Machines (SVM) • General nonlinear (and linear) classifier • Structure similar to perceptron • Training based on sophisticated optimization theory • Currently, the ‘Mercedes-Bentz’ of classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 22 Summary • Goal: Find object in image • Method: Look at windows, classify (template search with classifiers) • Issues: – Classifier design – Training – Curse of dimension – Generalization Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 23 Template Connections Computer Vision, Lecture 6 Oleh Tretiak © 2005 Slide 24