Robust Real-Time Face Detection

Robust Real-Time Face Detection 指導老師: 萬書言老師報告學生: 何炳杰報告日期: 2010/11/26 1 文章出處  Title: Robust Real-Time Face Detection  Author: Paul Viola, MICHAEL J. JONES  Publication: International Journal of Computer Vision  Publisher: Springer  Date: May 1, 2004 2 http://www.springerlink.com/content/q70v4h6715v5p152/ Outline  Background  Key words  Abstract  1. Introduction  2. Features  3. Learning Classification Functions  4. The Attentional Cascade  5. Results  6. Conclusions  References 3 Background Chellappa, R. Sinha, P. Phillips, P.J. 2010. Face Recognition by Computers and Humans. IEEE Computer Society. 4 http://www.opencv.org.cn/index.php/Cv%E6%A8%A1%E5%BC%8F%E8%AF%86%E5%88%AB Background  目標檢測方法最初由Paul Viola提出，並由Rainer Lienhart 對這一方法進行了改善。首先，利用樣本（大約幾百幅樣本圖片）的harr特徵進行分類器訓練，得到一個級聯的boosted分類器。訓練樣本分為正例樣本和反例樣本，其中正例樣本是指待檢目標樣本(例如人臉或汽車等)，反例樣本指其它任意圖片，所有的樣本圖片都被歸一化(統一待策測影像大小)成同樣的尺寸大小 (例如，20 x 20)。 5 http://www.opencv.org.cn/index.php/Cv%E6%A8%A1%E5%BC%8F%E8%AF%86%E5%88%AB Background  分類器訓練完以後，就可以應用於輸入圖像中的感興趣區域(與訓練樣本相同的尺寸)的檢測。檢測到目標區域 (汽車或人臉)分類器輸出為1，否則輸出為0。為了檢測整幅圖像，可以在圖像中移動搜索視窗，檢測每一個位置來確定可能的目標。為了搜索不同大小的目標物體，分類器被設計為可以進行尺寸改變，這樣比改變待檢圖像的尺寸大小更為有效。所以，為了在圖像中檢測未知大小的目標物體，掃描程式通常需要用不同比例大小的搜索視窗對圖片進行幾次掃描。 6 http://www.opencv.org.cn/index.php/Cv%E6%A8%A1%E5%BC%8F%E8%AF%86%E5%88%AB Background  分類器中的“級聯”(cascade)是指最終的分類器是由幾個簡單分類器級聯組成。在圖像檢測中，被檢視窗依次通過每一級分類器，這樣在前面幾層的檢測中大部分的候選區域就被排除了，全部通過每一級分類器檢測的區域即為目標區域。  “級聯”: 7 http://www.opencv.org.cn/index.php/Cv%E6%A8%A1%E5%BC%8F%E8%AF%86%E5%88%AB Key words  Integral image  AdaBoost  Cascade classifier 8 Abstract Three main concepts: 積分影像 Integral image 串聯/級聯分類器 Adaptive Boosting AdaBoost 9 Cascade classifier Abstract  This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. - The first is the introduction of a new mage representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. - The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm to select a small number of critical visual features from a very large set of potential features. - The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. 10 1. Introduction  前言: - 在本篇文章中Paul Viola等人利用了三個演算法，來快速找到人臉。第一個是Integral Image，第二個是 AdaBoost，第三個是Cascade classifier。並重述摘要中所提到的主要三個貢獻。 11 1.1 Overview  Section 2: - It will detail the form of the features as well as a new scheme for computing them rapidly.  Section 3: - It will discuss the method in which these features are combined to form a classifier .(AdaBoost)  Section 4: - It will describe a method for constructing a cascade of classifiers.  Section 5: - It will describe a number of experimental results.  Section 6: - It contains a discussion of this system and its relationship to related systems. 12 Features  Authors use three kinds of features. The value of a two- rectangle feature is the difference between the sum of the pixels within two rectangular regions. 13 Features  The regions have the same size and shape and are horizontally or vertically adjacent.  Finally a four-rectangle feature computes the difference between diagonal pairs of rectangles.  The base resolution of the detector is 24 × 24, the exhaustive set of rectangle features is quite large, 160,000. 14 2.1 Integral Image  概念: 一個簡單的微積分類比：如果我們要經常計算，那我們會先計算，那麼。積分圖的含意與此類似。積分圖與積分的類比 15 2.1 Integral Image  The integral image at location x, y contains the sum of the pixels above and to the left of x, y, inclusive: 16 2.1 Integral Image - ii(x, y) is the integral image. - i(x, y) is the original image. 17 2.1 Integral Image - s(x, y) is the cumulative row sum. - s(x, -1) = 0, ii(-1, y) = 0 其中，s(x, y)為點(x, y)及其y方向向上所有原始影像之和，稱為〝列積分和〞座標A(x, y)的積分圖定義為其左上角矩形所有像素之和(圖中陰影部分)。 s(x, y) 為A(x, y) 及其y方向向上所有像素之和。 18 2.1 Integral Image  Using the integral image any rectangular sum can be computed in four array references (see Fig. 3). 19 2.1 Integral Image - The authors point out that in the case of linear operations (e.g. f · g), any invertible linear operation can be applied to f or g if its inverse is applied to the result. ex: In the case of convolution, if the derivative operator is applied both to the image and the kernel the result must then be double integrated. 20 2.1 Integral Image - The authors go on to show that convolution can be significantly accelerated if the derivatives of f and g are sparse (or can be made so). A similar insight is that an invertible linear operation can be applied to f if its inverse is applied to g. 21 2.1 Integral Image - Viewed in this framework computation of the rectangle sum can be expressed as a dot product, i · r ,where i is the image and r is the box car image (with value 1 within the rectangle of interest and 0 outside). This operation can be rewritten. 22 2.2 Feature Discussion  Rectangle features are also sensitive to the presence of edges, bars, and other simple image structure, they are quite coarse.  The only orientations available are vertical, horizontal and diagonal. 23 Learning Classification Function  AdaBoost概念: - AdaBoost全名為Adaptive Boosting。 AdaBoost是一种迭代算法，其核心思想是針對同一個訓練集(training set)訓練不同的分類器(弱分類器)，然後把這些弱分類器集合起來，構成一個更強的最终分類器(強分類器)。 - In our system a variant of AdaBoost is used both to select the features and to train the classifier . In its original form, the AdaBoost learning algorithm is used to boost the classification performance of a simple learning algorithm. 24 Learning Classification Function  AdaBoost概念: - It does this by combining a collection of weak classification functions to form a stronger classifier. In the language of boosting the simple learning algorithm is called a weak learner. - 〝Weak learner〞:隨機猜測一個是或否的問題，將會有 50%的正確率。如果一個假設能夠稍微地提高猜測正確的機率，那麼這個假設就是弱學習算法，得到這個算法的過程稱為弱學習；反之，如果一個假設能夠顯著地提高猜測正確的機率，那麼這個假設就稱為強學習。 25 Learning Classification Function  The learner is called weak because we do not expect even the best classification function to classify the training data well. (i.e. for a given problem the best perceptron may only classify the training data correctly 51% of the time).  A weak classifier (h(x, f, p, θ)) thus consists of a feature ( f ), a threshold (θ) and a polarity (p) indicating the direction of the inequality: 26 Here x is a 24 × 24 pixel sub-window of an image. Learning Classification Function  Table1. (See paper p.142, please.) ＊ Given example images(x1, y1), ... ,(xn, yn) where yi = 0, 1 for negative and positive examples respectively. ＊ Initialize weights w1,i = 1/2m, 1/2l for yi = 0, 1 respectively, where m and l are the number of negatives and positives respectively. ＊ For t =1, ... , T： 1. Normalize the weights, 2. Select the best weak classifier with respect to the weighted error 27 Learning Classification Function  Table1. (See paper p.142, please.) 3. Define where are the minimizers of . 選取最佳的弱分類器 ht ( x ) (擁有最小錯誤率 ) 4. Update the weights: (按照這個最佳弱分類器，調整權重) （其中分類。 28 表示被正確地分類，）表示被錯誤地 Learning Classification Function  Table1. (See paper p.142, please.) ＊ The final strong classifier is: 29 3.1 Learning Discussion  The algorithm described in Table 1 is used to select key weak classifiers from the set of possible weak classifiers.  Since there is one weak classifier for each distinct feature/threshold combination, there are effectively KN weak classifiers, where K is the number of features and N is the number of examples.  Therefore the total number of distinct thresholds is N. Given a task with N = 20000 and K = 160000 there are 3.2 billion distinct binary weak classifiers. 30 3.1 Learning Discussion  The weak classifier selection algorithm proceeds as follows: (弱分類器的訓練及選取) 1. For each feature, the examples are sorted based on feature value. 2. The AdaBoost optimal threshold for that feature can then be computed in a single pass over this sorted list. 3. For each element in the sorted list, four sums are maintained and evaluated: - T  : The total sum of positive example weights.  T : The total sum of negative example weights. S : The sum of positive weights below the current example. S : The sum of negative weights below the current example.   31 3.1 Learning Discussion  The weak classifier selection algorithm proceeds as follows: (弱分類器的訓練及選取) 3. For each element in the sorted list, four sums are maintained and evaluated: - T  : 全部人臉樣本的權重的和。  T : 全部非人臉樣本的權重的和。 S : 在此元素之前的人臉樣本的權重的和。 S : 在此元素之前的非人臉樣本的權重的和。   32 3.1 Learning Discussion  The weak classifier selection al 因此，透過把這個排序的表從頭到尾掃描一遍就可以為弱分類器選擇使分類誤差最小的閥值（最佳閥值），也就是選取了一個最佳弱分類器。 33 訓練並選取最佳分類器算法 3.2 Learning Results  Initial experiments demonstrated that a classifier constructed from 200 features would yield reasonable results (see Fig. 4). 34 3.2 Learning Results  The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks (see Fig. 5).  The second feature selected relies on the property that the eyes are darker than the bridge of the nose. 35 3.2 Learning Results  In summary the 200-feature classifier provides initial evidence that a boosted classifier constructed from rectangle features is an effective technique for face detection. 36 The Attentional Cascade  Cascade概念: - 對於Cascade classifier的概念，就如Figure 6所示。我們一開始將feature分成好幾個classifier。最前面的classier辨識率最低，但是可以先篩選掉很大一部份不是人臉的圖片；接下來的 Classifier處理比較難處理一點的case篩選掉的圖片也不如第一個classifier多了；依此下去，直到最後一個classifier為止。最後留下來的就會是我們想要的人臉的照片。 37 The Attentional Cascade  This section describes an algorithm for constructing a cascade of classifiers which achieves increased detection performance while radically reducing computation time.  The key insight is that smaller, and therefore more efficient, boosted classifiers can be constructed which reject many of the negative sub-windows while detecting almost all positive instances. 38 The Attentional Cascade  A positive result from the first classifier triggers the evaluation of a second classifier which has also been adjusted to achieve very high detection rates. A positive result from the second classifier triggers a third classifier, and so on. A negative outcome at any point leads to the immediate rejection of the sub-window. 39 4.1 Training a Cascade of Classifiers  In order to achieve good detection rates (between 85 and 95 percent) and extremely low false positive rates, The number of cascade stages and the size of each stage must be sufficient to achieve similar detection performance while minimizing computation. - F: The false positive rate of the classifier. - K: The number of classifiers. - f i : The false positive rate of the ith classifier. 40 4.1 Training a Cascade of Classifiers  In order to achieve good detection rates (between 85 and 95 percent) and extremely low false positive rates, The number of cascade stages and the size of each stage must be sufficient to achieve similar detection performance while minimizing computation. - D: The false positive rate of the classifier. - K: The number of classifiers. - d i : The false positive rate of the ith classifier. 41 4.1 Training a Cascade of Classifiers  Purpose: - Given concrete goals for overall false positive and detection rates, target rates can be determined for each stage in the cascade process. - Ex: ※ For a detection rate of 0.9 can be achieved by a 10 stage classifier if each stage has a detection rate of 0.99 ( since 0.9 ≈ 0.99 10 ). 42 4.1 Training a Cascade of Classifiers  The key measure of each classifier is its “positive rate”, the proportion of windows which are labelled as potentially containing a face.  The expected number of features which are evaluated is: - N : The expected number of features evaluated. - K : The number of classifiers. 43 - p i : The positive rate of the ith classifier. - n i : The number of features in the ith classifier. 4.1 Training a Cascade of Classifiers  Table 2. (See paper p.146, please.) ＊ User selects values for f , the maximum acceptable false positive rate per layer and d, the minimum acceptable detection rate per layer. ＊ User selects target overall false positive rate, Ft arg et . ＊ P = set of positive examples N = set of negative examples ＊ F0 = 1.0 ; D 0 = 1.0 ; i = 0 ＊ 1 2 3 4 44 then evaluate the current cascaded detector on the set of non-face images and put any false detections into the set N 4.2 Simple Experiment  In order to explore the feasibility of the cascade approach two simple detectors were trained: - A monolithic 200-feature classifier.(集成的概念) - A cascade of ten 20-feature classifiers. ＊ The first stage: - The classifier in the cascade was trained using 5000 faces and 10000 non-face sub-windows randomly chosen from non-face images. The second stage: - The second stage classifier was trained on the same 5000 faces plus 5000 false positives of the first classifier. 45 4.2 Simple Experiment All Subwindows T Mono -lithic Type 1: Outcome F Type 2: All Subwindows T 1 46 T 2 F T T 3 F 10 F F Outcome 4.2 Simple Experiment  Two-typed experiments’ outcome: - Type 1: The monolithic 200-feature classifier was trained on the union of all examples used to train all the stages of the cascaded classifier. Note that without reference it might be difficult to select a set of non-face training examples to train the monolithic classifier. - Type 2: The sequential way in which the cascaded classifier is trained effectively reduces the non-face training set by throwing out easy examples and focusing on the “hard” ones. 47 4.2 Simple Experiment  Fig 7. Figure 7.ROC curves comparing a 200-feature classifier with a cascaded classifier containing ten 20-feature classifiers. Accuracy is not significantly different, but the speed of the cascaded classifier is almost 10 times faster. 48 Results  Preface: - This section describes the final face detection system. The discussion includes details on the structure and training of the cascaded detector as well as results on a large real-world testing set. 49 Training Dataset  The face training set consisted of 4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels.  This bounding box was then enlarged by 50% and then cropped and scaled to 24 by 24 pixels. 50 Structure of the Detector Cascade  The final detector is a 38 layer cascade of classifiers which included a total of 6060 features.  The first classifier in the cascade is constructed using two features and rejects about 50% of non-faces while correctly detecting close to 100% of faces. The next classifier has ten features and rejects 80% of non-faces while detecting almost 100% of faces. 51 Experiments on a Real-World Test Set  Authors test their system on the MIT + CMU frontal face test set.  This set consists of 130 images with 507 labeled frontal faces. 52 Experiments on a Real-World Test Set 53 Experiments on a Real-World Test Set  Images from the MIT + CMU test set. 54 Conclusions  Authors have presented an approach for face detection which minimizes computation time while achieving high detection accuracy.  The approach was used to construct a face detection system which is approximately15 times faster than any previous approach.  Main contributions: - The first contribution is a new a technique for computing a rich set of image features using the integral image. - The second contribution of this paper is a simple and efficient classifier built from computationally efficient features using AdaBoost for feature selection. - The third contribution of this paper is a technique for constructing a cascade of classifiers which radically reduces computation time while improving detection accuracy. 55 Integral image AdaBoost Cascade claasifier References  Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation, 11:1691–1715.  Crow, F. 1984. Summed-area tables for texture mapping. In Proceedings of SIGGRAPH, 18(3):207–212.  Fleuret, F. and Geman, D. 2001. Coarse-to-fine face detection. Int. J. Computer Vision, 41:85–107.  Freeman,W.T. and Adelson, E.H. 1991. The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9):891–906.  Freund, Y. and Schapire, R.E. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Computational Learning Theory: Eurocolt 95, Springer-Verlag, pp. 23–37.  Greenspan, H., Belongie, S., Gooodman, R., Perona, P., Rakshit, S., and Anderson, C. 1994. Overcomplete steerable pyramid filters and rotation invariance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.  Itti, L., Koch, C., and Niebur, E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Patt. Anal. Mach. Intell., 20(11):1254-–1259. 56 References  John, G., Kohavi, R., and Pfeger, K. 1994. Irrelevant features and the subset selection problem. In Machine Learning Conference Proceedings.  Osuna, E., Freund, R., and Girosi, F. 1997a. Training support vector machines: An application to face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.  Osuna, E., Freund, R., and Girosi, F. 1997b. Training support vector machines: an application to face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.  Papageorgiou, C., Oren, M., and Poggio, T. 1998. A general framework for object detection. In International Conference on Computer Vision.  Quinlan, J. 1986. Induction of decision trees. Machine Learning, 1:81–106.  Roth, D., Yang, M., and Ahuja, N. 2000. A snow based face detector. In Neural Information Processing 12.  Rowley, H., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE Patt. Anal. Mach. Intell., 20:22–38. 57 References  Schapire, R.E., Freund, Y., Bartlett, P., and Lee, W.S. 1997. Boosting the margin: A new explanation for the effectiveness of votingvmethods. In Proceedings of the Fourteenth International Conference on Machine Learning.  Schapire, R.E., Freund, Y., Bartlett, P., and Lee, W.S. 1998. Boosting the margin: A new explanation for the effectiveness of votingvmethods. Ann. Stat., 26(5):1651–1686.  Schneiderman, H. and Kanade, T. 2000. A statistical method forv3D object detection applied to faces and cars. In International Conference on Computer Vision.  Simard, P.Y., Bottou, L., Haffner, P., and LeCun, Y. (1999). Boxlets: A fast convolution algorithm for signal processing and neural networks. In M. Kearns, S. Solla, and D. Cohn (Eds.), Advances in Neural Information Processing Systems,vol. 11, pp. 571-577.  Sung, K. and Poggio, T. 1998. Example-based learning for view based face detection. IEEE Patt. Anal. Mach. Intell., 20:39–51.  Tieu, K. and Viola, P. 2000. Boosting image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.  Tsotsos, J., Culhane, S., Wai, W., Lai, Y., Davis, N., and Nuﬂo, F. 1995. Modeling visual-attention via selective tuning. Artificial Intelligence Journal, 78(1/2):507–545.  Webb, A. 1999. Statistical Pattern Recognition. Oxford University  58 Press: New York. Thank You! 59

Robust Real-Time Face Detection

Related documents

Products

Support

Robust Real-Time Face Detection

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib