Methods of Eye and Face Detection Devin Karns EGGN512 5/1/13 Eye and Face Detection Widely used in many applications Biometrics Visual surveillance Human-robot interactions Eyes represent most essential physical information of face Closely connected to other facial feature positions Can determine other facial characteristics from relative eye orientations Methods Segmentation texture Dynamic time warping Divide image into NxN blocks Get block textures from FFT Derive facial features from texture major axis projections Feature vector is a waveform composed of horizontal and vertical projections of an image Template feature vector compared against image subsections Scores accumulated and thresholded Local gradient patterns Pixel neighborhoods of image gradient converted to 8-bit codes Lookup table of AdaBoost pixel classifiers developed from massive database of face and non-face images Image pixel codes weighted based on a lookup table as either face or non-face Segmentation Texture - FFT Get facial symmetry axis from edge image inertia matrix Image is divided into blocks with partial overlap in sampling Take FFT of each block and threshold Segmentation Texture – Face Region Calculate Emajor from block eigenvalues and vectors Perform elliptical hough transform on binarized Emajor distribution Projected ellipse encloses face region Segmentation Texture – Eye Position Calculate binarized FFT block white-to-total pixel ratio (tau) Max of projection of tau along rows (within face region) dictates eye row Segmentation Texture – Eye Position Horizontal positions determined from local maxima in column projection of tau about symmetry axis Segmentation Texture – Performance Overall detection rate: ~50% Proper detection: ~28% Partial detection: ~42% False detection: ~30% Segmentation – Conclusions Pros No database/template required, fast (~1-3 sec/image) Can account for some head tilt fairly well Usually finds at least one eye Cons If face symmetry axis not correctly found, eye orientations will be skewed Emajor does not always sufficiently outline head, can lead to face region mismatching Fooled by glasses, dimples, moustaches Dynamic Time Warping – Feature Vector Measures similarity between two sequences Image region row and column projections weighted by triangle function to emphasize nose bridge and eye regions. Vectors are concatenated to form a feature vector. Dynamic Time Warping – Eye Location Warping path determines minimum distance between vectors Minimum distances accumulated over image at each pixel Minimum region of accumulated distances determines potential eye locations Dynamic Time Warping – Process Obtain edge image using sobel filter Sample image at every pixel NxM region where NxM is template size and convert to feature vector Compare template feature vectors to image feature vectors Accumulate minimum distances and threshold to determine eye locations Dynamic Time Warping – Performance Overall detection rate: ~21.37% Proper detection: ~10% Partial detection: ~28.5% False detection: ~61.5% Dynamic Time Warping – Conclusions Pros Only requires one or more templates Less sensitive to different head poses if enough templates are used With proper thresholding and weighting, will usually find eye rows properly Cons If eyes are shaded, they will most likely be missed Increasing number of templates and sizes drastically decreases performance Frequently thinks that chins, necks, eyebrows, facial hair, cheeks, noses, teeth, foreheads, scalps, glasses, shadows, and the background are eyes Generalized thresholding is difficult Local Gradient Pattern – Pixel Codes Uses small kernel to summarize local structure of an image Samples surrounding pixel intensities with bilinear interpolation Surrounding-to-center intensity delta computed for local structure pixels Deltas thresholded and read clockwise to form binary code Local Gradient Pattern – AdaBoost Learning Defines weighting lookup tables based on collections of face and non-face images Lookup table is a 3D matrix (NxNx256) of classifiers that defines weighting for every pixel of an NxN region for every pixel’s intensity (0-255) Weights are learned through iterative searches for feature point intensities that are common through known face and non-face images Scores image regions by scanning lookup table over LGPcoded image to obtain strong classifier values Strong classifier maxima determine face regions Local Gradient Pattern - Procedure Local Gradient Pattern – Conclusions Could not get this to work With larger databases, AdaBoost learning time can take days on fast systems Significant processing time on normal machines Most likely was not coded optimally Pros Lack of database images? (~4400 here vs >1,000,000 in paper) Lack of variety of faces? (only 40 different people) Maybe I didn’t do it correctly? Could theoretically efficiently find faces at multiples scales invariant to intensity Cons… Questions?