Hierarchical Method for Foreground Detection Using Codebook Model Jing-Ming Guo, Member, IEEE and Chih-Sheng Hsu Department of Electrical Engineering National Taiwan University of Science and Technology Taipei, Taiwan E-mail: jmguo@seed.net.tw, seraph1220@gmail.com ABSTRACT This paper presents a hierarchical scheme with block-based and pixel-based codebooks for foreground detection. The codebook is mainly used to compress information to achieve high efficient processing speed. In the block-based stage, 12 intensity values are employed to represent a block. The algorithm extends the concept of the Block Truncation Coding (BTC), and thus it can further improve the processing efficiency by enjoying its low complexity advantage. In detail, the block-based stage can remove the most noise without reducing the True Positive (TP) rate, yet it has low precision. To overcome this problem, the pixel-based stage is adopted to enhance the precision, which also can reduce the False Positive (FP) rate. In addition to the basic algorithm, we combine short term information to improve background updating for adaptive current environment. As documented in the experimental results, the proposed algorithm can provide superior performance to that of the former approaches. Experimental Results For measuring the accuracy of the results, the criterions FP rate, TP rate, Precision, and Similarity [12] are employed as defined below: FP rate tp fp tp tp Similarity TP rate Precision tp fp fn , fp tn , tp fn , tp fp , where tp denotes the total number of true positives; tn denotes the number of true negative; fp denotes the number of true positives; fn denotes the number of false negative; (tp + fn) indicates the total number of pixels presented in the foreground, and (fp + tn) indicates the total number of pixels presented in the background. And we implementing in C program language with Intel core 2, 2.4GHz CPU, 2G RAM, and Windows XP SP2 operating system. Experimental results for foreground detection using the proposed method. Here, we describe some different sequences, and compared with former MOG [7], Rita’s method [4], CB [11], Chen’s method [9] and Chiu’s method [22] schemes. In our experimental results is without any post processing and short term information for measuring the accuracy of the results. All result for different sequence can download with ftp://HMFD@140.118.7.72:222/ 1. Sequence IR, Campus, Highway_I and Laboratory Size: 320*240 Source:[19], file name: IR (row 1), Campus (row 2), Highway_I (row 3) and Laboratory (row 4) To provide a better understanding about the detected results, four colors, red, green and blue, are employed to represent shadows, highlight and foreground, respectively. (a) (b) (c) Fig. 1. Classified results of sequence [19] for IR (row 1), Campus (row 2), Highway_I (row 3) and Laboratory (row 4) with shadow (red), highlight (green), and foreground (blue). (a) Original image, (b) block-based stage only with block of size 10x10, and (c) proposed method. 2. Sequence Waving Trees (WT) Size: 160*120 Source:[21], file name: Waving Trees (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) Fig. 2. Foreground (white) classified results with sequence WT [21]. (a) Original image (frame 247), (b) ground truth, (c) MOG [7], (d) Rita’s method [4], (e) CB, (f) Chen’s method [9], (g) Chiu’s method [22], (h)-(k) block-based only with block of size (h) 5x5, (i) 8x8, (j) 10x10, (k) 12x12, (l)-(o) proposed cascaded method with block of size (l) 5x5, (m) 8x8, (n) 10x10, and (o) 12x12. 0.4 1 0.35 0.9 TP rate ( %) 0.8 0.2 0.15 0.7 0.6 0.1 0.5 0.05 0.4 0 0.3 200 203 206 209 212 215 218 221 224 227 230 233 236 239 242 245 248 251 254 257 260 263 266 269 272 275 278 FP rate (%) 0.3 0.25 242 (a) 246 248 250 252 254 256 258 (b) 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Similarity (%) Precision (%) 244 Frame number Frame number 0.6 0.5 0.4 0.3 0.2 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0 0 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 Frame number Frame number (c) (d) Fig. 3. The accuracy values in each frame for sequence WT [21]. (a) FP rate, (b) TP rate, (c) Precision and (d) Similarity. TABLE 1. THE AVERAGE OF ACCURACY VALUES FOR SEQUENCE WT FP TP Precision Similarity fps MOG [7] 0.0913 0.9307 0.6955 0.6729 40.13 Rita’s method [4] 0.3041 0.901 0.4628 0.4422 30.56 CB [11] 0.0075 0.9434 0.9290 0.8913 102.43 Chen’s method [9] 0.1165 0.8562 0.6450 0.5962 64.35 Chiu’s method [22] block-based stage 5x5 block-based stage 8x8 block-based stage 10x10 block-based stage 12x12 proposed method (5x5) proposed method (8x8) proposed method (10x10) proposed method (12x12) 0.0603 0.5641 0.7037 0.4599 320.05 0.0208 0.9755 0.8413 0.8276 269.36 0.0164 0.9674 0.8511 0.8294 320.88 0.0158 0.9749 0.8379 0.8199 365.29 0.0167 0.93 0.8077 0.7691 394.08 0.0027 0.9517 0.97 0.9266 165.28 0.0020 0.9408 0.9767 0.9204 186.56 0.0018 0.9474 0.9795 0.9285 197.04 0.0018 0.9059 0.9718 0.8853 205.61 3. Sequence WATERSURFACE [20] Size: 160*128 Source:[20], file name: WATERSURFACE (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) Fig. 4. Foreground (white) classified results with WATERSURFACE [20]. (a) Original image (frame 529), (b) ground truth, (c) MOG [7], (d) Rita’s method [4], (e) CB, (f) Chen’s method [9], (g) Chiu’s method [22], (h)-(k) block-based only with block of size (h) 5x5, (i) 8x8, (j) 10x10, (k) 12x12, (l)-(o) proposed cascaded method with block of size (l) 5x5, (m) 8x8, (n) 10x10, and (o) 12x12. 1 0.06 0.9 0.8 0.7 0.04 TP rate (%) FP rate (%) 0.05 0.03 0.02 0.6 0.5 0.4 0.3 0.2 0.01 0.1 0 0 480 485 490 495 500 505 510 515 520 481 525 486 491 496 (a) 506 511 516 521 526 (b) 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Similarity (%) Precision (%) 501 Frame number Frame number 0.6 0.5 0.4 0.3 0.2 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0 0 481 486 491 496 501 506 511 516 521 526 481 Frame number 486 491 496 501 506 511 516 521 526 Frame number (c) (d) Fig. 6. The accuracy values in each frame for sequence WATERSURFACE [20]. (a) FP rate, (b) TP rate, (c) Precision and (d) Similarity. TABLE 2. THE AVERAGE OF ACCURACY VALUES FOR SEQUENCE WATERSURFACE FP TP Precision Similarity fps MOG [7] 0.0431 0.8969 0.5515 0.5183 46.26 Rita’s method [4] 0.0265 0.8122 0.6370 0.5595 30.23 CB [11] 0.0038 0.8118 0.9247 0.7639 101.01 Chen’s method [9] 0.0228 0.8215 0.6680 0.5835 62.48 Chiu’s method [22] block-based stage 5x5 block-based stage 8x8 block-based stage 10x10 block-based stage 12x12 proposed method (5x5) proposed method (8x8) proposed method (10x10) proposed method (12x12) 0.0012 0.7153 0.9539 0.6965 284.36 0.0399 0.9588 0.5835 0.5722 213.52 0.0549 0.9568 0.5144 0.5052 273.97 0.0580 0.9291 0.4893 0.4754 320.05 0.0723 0.9355 0.4417 0.4340 348.83 0.0049 0.9087 0.8983 0.8283 147.65 0.0043 0.9030 0.9098 0.8331 182.92 0.0051 0.8800 0.8947 0.8026 192.01 0.0051 0.8812 0.8923 0.8080 202.02 4. Sequence CAMPUS [20] Size: 160*128 Source:[20], file name: CAMPUS (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) Fig. 7. Foreground (white) classified results with CAMPUS [20]. (a) Original image (frame 695), (b) ground truth, (c) MOG [7], (d) Rita’s method [4], (e) CB, (f) Chen’s method [9], (g) Chiu’s method [22], (h)-(k) block-based only with block of size (h) 5x5, (i) 8x8, (j) 10x10, (k) 12x12, (l)-(o) proposed cascaded method with block of size (l) 5x5, (m) 8x8, (n) 10x10, and (o) 12x12. 0.35 1 0.3 0.9 0.8 TP rate (%) FP rate (%) 0.25 0.2 0.15 0.1 0.7 0.6 0.5 0.4 0.05 0.3 0 0.2 600 610 620 630 640 650 660 670 680 690 700 710 720 636 646 656 Frame number 666 (a) 690 700 710 (b) 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Similarity (%) Precision (%) 676 Frame number 0.6 0.5 0.4 0.3 0.2 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0 0 636 646 656 666 676 690 700 710 636 Frame number 646 656 666 676 690 700 710 Frame number (c) (d) Fig. 8. The accuracy values in each frame for sequence CAMPUS [20]. (a) FP rate, (b) TP rate, (c) Precision and (d) Similarity. TABLE 3. THE AVERAGE OF ACCURACY VALUES FOR SEQUENCE CAMPUS FP TP Precision Similarity fps MOG [7] 0.1478 0.8811 0.2862 0.2725 53.26 Rita’s method [4] 0.1781 0.7225 0.2310 0.2030 23.16 CB [11] 0.0342 0.9219 0.5567 0.5280 85.87 Chen’s method [9] 0.1614 0.7517 0.2562 0.2295 51.81 Chiu’s method [22] block-based stage 5x5 block-based stage 8x8 block-based stage 10x10 block-based stage 12x12 proposed method (5x5) proposed method (8x8) proposed method (10x10) proposed method (12x12) 0.0604 0.4926 0.3533 0.2406 278.16 0.0447 0.9243 0.4971 0.4796 174.14 0.0383 0.9256 0.5042 0.4884 278.16 0.0358 0.9272 0.5176 0.5023 304.64 0.0433 0.8564 0.4455 0.4260 336.71 0.0125 0.9061 0.7195 0.6712 110.81 0.0095 0.9025 0.7672 0.7125 141.44 0.0093 0.9037 0.7708 0.7169 156.26 0.0084 0.8349 0.7820 0.6965 161.03 5. Sequence MO [21] Size: 160*120 Source:[21], file name: moving object Figure 9 shows the sequence MO [21] with a moving object, containing 1745 frames of size 160x120. The sequence MO is employed to test the adaptability of the background model. When the chair is moved at frame 888 in Fig. 10, after a period of time this chair becomes a part of background in background model. We achieved this by applying short term information in background model to improve its adaptation, and T_add set 100. In Fig.10, frame 986 shows a good result without any noise or foreground regions. Frame 600 Frame 650 Frame 700 Frame 750 Frame 800 Frame 850 Frame 888 Frame 950 Frame 980 Frame 982 Frame 984 Frame 986 Fig. 9. Foreground (blue) classified results with MO [21], and processed result with the proposed method with short term information. Conclusions Table 4 organizes the average of accuracy results from Table 1-3 with the three test sequences. It is clear that the proposed algorithm provides the highest accuracy performance among the various compared methods. Moreover, the fps of the proposed method is also superior to the five former approaches. In general, the larger block can achieve a higher processing speed, yet lower TP rate, and vice versa, as indicated in Table 4. We would like to recommend a processing-speed-oriented application to choose a larger block, while a smaller block would be a promising choice for TP rate-oriented application. A hierarchical method for foreground detection has been proposed by using block-based and pixel-based. The block-based can enjoy high speed processing speed and detect most of the foreground without reducing TP rate, while pixel-based can further improve the precision of the detected foreground object with reducing FP rate Moreover, a color model and match function have also been introduced in this study that can classify a pixel into shadow, highlight, background, and foreground. As documented in the experimental results, the hierarchical method provides high efficient for background subtraction which can be a good candidate for vision-based applications, such as human motion analysis or surveillance systems. TABLE 4. THE AVERAGE OF ACCURACY VALUES. FP TP Precision Similarity MOG [7] 0.0941 0.9029 0.5111 0.4879 fps 64.22 Rita’s method [4] 0.1696 0.8119 0.4436 0.4016 27.98 CB [11] 0.0152 0.8924 0.8035 0.7278 96.44 Chen’s method [9] 0.1002 0.8098 0.5231 0.4698 59.55 Chiu’s method [22] block-based stage 5x5 block-based stage 8x8 block-based stage 10x10 block-based stage 12x12 proposed method (5x5) proposed method (8x8) proposed method (10x10) proposed method (12x12) 0.0406 0.5907 0.6703 0.4657 294.19 0.0351 0.9529 0.6407 0.6265 219.01 0.0365 0.9499 0.6233 0.6077 291.00 0.0366 0.9438 0.6149 0.5992 329.99 0.0441 0.9073 0.5650 0.5431 359.87 0.0067 0.9222 0.8626 0.8087 141.25 0.0053 0.9154 0.8846 0.8220 170.31 0.0054 0.9104 0.8817 0.8160 181.77 0.0051 0.8740 0.8821 0.7966 189.55 References [1] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: principles and practice of background maintenance,” In Proc. IEEE Conf. Computer Vision, vol. 1, pp. 255–261, Sept. 1999. [2] T. Horprasert, D. Harwood, and L. S. Davis, “A statistical approach for real-time robust background subtraction and shadow detection,” IEEE ICCV Frame-Rate Applications Workshop, Kerkyra, Greece, Sept. 1999. [3] R. Cucchiara, C. Grana, M. Piccardi, A. Prati, and S. Sirotti, “Improving shadow suppression in moving object detection with HSV color information,” IEEE Conf. Intelligent Transportation Systems, pp. 334-339, Aug. 2001. [4] R. Cucchiara, M. Piccard, and A. Prati,”Detectin moving objects, ghosts, and shadows in video streams,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 10, Oct. 2003. [5] M. Izadi, and P. Saeedi, “Robust region-based background subtraction and shadow removing using color and gradient information,” In proc. 19th International Conference on Pattern Recognition, art. no. 4761133, Dec. 2008. [6] M. Shoaib, R. Dragon, and J. Ostermann, “Shadow detection for moving humans using gradient-based background subtraction,” IEEE Conf. Acoustics, Speech and Signal Processing, art. No. 4959698, pp. 773-776, Apri. 2009. [7] C. Stauffer and W.E.L Grimson, “Adaptive background mixture models for real-time tracking,” IEEE International Conference on Computer Vision and Pattern Recognition, vol.2, pp. 246–52, June, 1999. [8] C. Stauffer and W.E.L Grimson, “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 747-757, Aug. 2000. [9] Y. T. Chen, C. S. Chen, C. R. Huang, and Y. P. Hung, “Efficient hierarchical method for background subtraction,” Pattern Recognition, vol. 40, pp. 2706-2715, Oct. 2007. [10] N. Martel-Brisson, and A. Zaccarin, “Learning and removing cast shadows through a multidistribution approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no.7, pp. 1133-1146, July, 2007. [11] K. Kim, T.H. Chalidabhongse, D. Harwood, and L. Davis, “Real-time foreground-background segmentation using codebook model,” Real-Time Imaging, vol. 11, no. 3, pp. 172-185, June. 2005. [12] L. Maddalena, and A. Petrosino, “A self-organizing approach to background subtraction for visual surveillance applications,” IEEE Trans. Image Processing, vol. 17, no. 7, pp. 1168-1177, July, 2008. [13] L. Massalena, and A. Petrosino, “Multivalued background/ foreground separation for moving object detection,” Lecture Notes in Computer Science, vol. 5571, pp.263-270, 2009. [14] T. Kohonen, Self-organization and Associative Memory, 2nd ed. Berlin, Germany:Springer-Verlag, 1988. [15] K. A. Patwardhan, G. Sapiro, and V. Morellas, “Robust foreground detection in video using pixel layers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 746-751, April, 2008. [16] M. Heikkila, and M. Pietikainen, “A Texture-Based Method for Modeling the Background and Detecting Moving Objects,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, [17] [18] [19] [20] [21] [22] no. 4, pp. 657-662, April, 2006. E. J. Delp and O. R. Mitchell, “Image compression using block truncation coding,” IEEE Trans. Communications Systems, vol. COMM-27, no. 9, pp. 1335-1342, Sept. 1979. E. J. Carmona, J. Martinez-Cantos and J. Mira, “A new video segmentation method of moving objects based on blob-level knowledge,” Pattern Recognition Letters, vol. 29, issue 3, pp. 272-285, Feb. 2008. Http://cvrr.ucsd.edu/aton/shadow/index.html http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html http://research.microsoft.com/en-us/um/people/jckrumm/WallFlower/TestImages.htm C. C. Chiu, M. Y. Ku and L. W. Liang, “A robust object segmentation system using a probability-based background extraction algorithm,” IEEE Trans. Circuits and Systems for Video Technology, vol. 20, no. 4, April, 2010. [23] C. Benedek and T. Sziranyi, “Bayesian foreground and shadow detection in uncertain frame tate surveillance videos,” IEEE Trans. Image Processing, vol. 17, no. 4, April, 2008. [24] W. Zhang, X. Z. Fang, X. K. Yang and Q. M. J. Wu, “Moving cast shadows detection using ratio edge,” IEEE Trans. Multimedia, vol. 9, no. 6, Oct. 2007.