Face Detection (a MATLAB implementation of a face detection algorithm based on Viola-Jones’ approach) MATTEO VENANZI bvenanzi@libero.it LEONARDO BENUCCI dawhiterabbit@hotmail.com Abstract. This paper explains the work done to implement in a MATLAB framework an efficient face detection algorithm based on Viola-Jones method, as a project for Computer Vision course. We have implemented step by step the cascaded face detector based on AdaBoost technique that is illustrated in the article by Viola-Jones1; then we have defined appropriate data structures, a proper usage strategy and we have tried to obtain a trade-off between fast computing and good performances; at the end we have tested it on many instances, varying the parameters of configuration, and from the results we have observed good performances in terms of detected faces and committed errors. In this discusion, we will show that the behaviour of our resulting system is similar to the one described in Viola-Jones’ paper. 1. Introduction This article describes a project of a face detector realized in MATLAB, based on Viola-Jones’ approach. Our face detection system classifies images basing on the value of four particular kind of Haar-like features, so the first step is the implementation of a function capable to generate randomically sets of features. The second step is the realization of the boosting algorithm, the procedure that, given a set of simple weak classifiers (each one with a low accuracy) elaborated on a training set of positive and negative face examples, round by round builds a strong classifier with high accuracy by the combination of the T best weak classifiers. The third step is the realization of the cascade, an optimization process designed for reducing the number of features to use and so useful to increase the general performances. The last step consists in testing the system on a set of examples and observing his behaviour, in terms of detection rate and accuracy. 2. Features Viola-Jones’ method is based on the notion of feature; a feature corresponds to a set of two or more adjacent rectangles. To classify an image, we use a set of 3 different kind of features: 1 This article is avalaible on http://www.dis.uniroma1.it/~visiope/Progetto2008/Repository/integralImages.pdf 2-rectangle feature (direct, 90°) 3-rectangle feature (direct, 90°): 4-rectangle feature: The value of a feature is computed by subtracting the sum of the pixels within the white rectangle from the sum of the pixels within the black one. We have created a Matlab function that generates the entire set of features on a generic image through the following procedure: 1. 2. 3. 4. 5. Build a feature of size 1x1 localized on the left-up corner of the image Scroll the feature on the image along all the lines and the columns Increase of one the width and go to step 1 until the feature is contained into the image Increase of one the height and repeat the step 1 until the feature is contained into the image Repeat this algorithm for all kind of features Running this function, we obtained a set of about 160.000 features. 3. AdaBoost AdaBoost is the boosting procedure suggested in Viola-Jones’ paper for the training recognition step: it returns a strong classifier based on a linear combination of T weak classifiers. In our implementation we work on a training set2 composed by images of 59 faces and 49 non-faces; round by round we evaluate the weak classifiers3, calculate the weighted error and finally choose as best weak classifier the one with the minimum error. Then the best weak classifier is evaluated on a validation set of 10 faces and 10 non-faces to extract data useful for giving an estimation of the best weak classifier’s quality. 3.1 Accuracy In figure 2 is shown the accuracy trend of the best classifier in respect to the iterations; we can observe that after a few rounds (10-15) the optimal accuracy is reached. This behaviour is almost identical to the one described in the adaBoost literature. 2 Taken from http://www.mathworks.com/matlabcentral/fileexchange/ The learning weak classifier module is avalaible in the package ABdemo.rar at the url http://www.pudn.com/upload_log.asp?e=ncepu_ly*126.com&n=ncepu_ly# 3 Figure 2: Accuracy curve for the 5000 feature classifier in 25 rounds; it shows the convergence of the algorithm. The best accuracy is already achieved after 15 iterations. 3.2 ROC curve The ROC curve of our adaBoost implementation (Figure 3) shows that increasing the false positive rate the accuracy improves, according to the theorical results described in the paper. Figure 3: Receiver operating characteristic (ROC) for the 5000 feature classifier in 50 rounds. The main difference between our chart and the one presented in the paper is that our detector achieves an accuracy rate of 95% when the false positive rate is 8 x 10-3, while the other reaches a good accuracy after a false positive rate of 3 x 10-3. 4. Cascade The evolution introduced by Viola and Jones in face detection field is the cascade of classifiers. Instead of learning a classifier through adaBoost on the entire set of features (160.000 on a 24x24 sub window), that would require weeks, they process a sub window through a series of classifiers, where at the first levels there are simpler classifiers used only for rejecting many of the negative sub windows, and subsequently there are more complex classifiers to detect the real faces between those “faces” who have not been filtered in the previous steps; the algorithm keeps on adding levels to the cascade until it achieves a certain false positive rate (Ftarget) obtaining, step by step, a minimum improvement driven by the input values f and d (figure 4). All sub windows F1 ≤ f D1 ≥ d Strong classifier Level 1 F2 ≤ f x F1 D2 ≥ d True Strong classifier Level 2 False Fn ≤ f x Fn-1 Dn ≥ d True Strong classifier Level n False Sub window accepted as a face False Sub window rejected Figure 4: Detection cascade schema. 5. Results We launched the cascade builder with these parameters: Target false positive rate = 10-2 F = 0.8 D = 0.95 and it achieved a target false positive rate of 8 x10-3 after about 1 hour of execution (CPU: 1,5 GHz; RAM = 512 MB) building a cascade of 8 levels, using 40 features. Then, we tried the classifier, able to detect only 18x27 faces, on a small test set of images and on figure 5 there are the results. 4.1 Comments We can say that our classifier is not able to detect all the faces in the image but we can observe that the majority of its detections is in presence of some face. In this project we did not realize a mechanism for filtering the isolated incorrect detection so a part of this errors could be easily removed; notice that the most of the false detections are localized in sub windows with areas of skin or with bright and light colours (similar to the human skin). The most important advantage underlined by the tests is that our classifier behaves rather good in presence of rotated or inclined faces (this and the change of lightness are two tipical problems for the common face detectors). Of course the first cause of this suboptimal performance is that this classifier is closer to a false positive rate of 10-2 while, for a practical face detector on real application, the false positive rate must be at least of 10-6. 4.2 What else can we do Below, we suggest two principal points not included in this project (essentially for lack of time) but that can easily improve and complete our work: Multiscale detection: Our face detector has been learned over a training set of 18x27 so it can detect only faces smaller but not larger than this size. The technique for solving this problem is scaling the feature of a certain constant value and apply it on the larger sub window (this will also increase the number of false detections). Postprocess of the detected subwindows to combine overlapping detections into a single detection: The inevitable presence of overlapping detection on the same face requires a mechanism for combining it into a single bounding box that includes all the detections. A possible solution is computing the centre of mass of a group of nearest detections and take it for representing all the group. Figure 5: Output of the face detector with 10-2 of false positive rate on a set of test images.