Experimental Results

advertisement
Hierarchical Method for Foreground Detection
Using Codebook Model
Jing-Ming Guo, Member, IEEE and Chih-Sheng Hsu
Department of Electrical Engineering
National Taiwan University of Science and Technology
Taipei, Taiwan
E-mail: jmguo@seed.net.tw, seraph1220@gmail.com
ABSTRACT
This paper presents a hierarchical scheme with block-based and pixel-based codebooks for
foreground detection. The codebook is mainly used to compress information to achieve high efficient
processing speed. In the block-based stage, 12 intensity values are employed to represent a block.
The algorithm extends the concept of the Block Truncation Coding (BTC), and thus it can further
improve the processing efficiency by enjoying its low complexity advantage. In detail, the
block-based stage can remove the most noise without reducing the True Positive (TP) rate, yet it has
low precision. To overcome this problem, the pixel-based stage is adopted to enhance the precision,
which also can reduce the False Positive (FP) rate. In addition to the basic algorithm, we combine
short term information to improve background updating for adaptive current environment. As
documented in the experimental results, the proposed algorithm can provide superior performance to
that of the former approaches.
Experimental Results
For measuring the accuracy of the results, the criterions FP rate, TP rate, Precision, and Similarity
[12] are employed as defined below:
FP rate 
tp
fp
tp
tp
Similarity 
TP rate 
Precision 
tp  fp  fn ,
fp  tn ,
tp  fn ,
tp  fp ,
where tp denotes the total number of true positives; tn denotes the number of true negative; fp denotes
the number of true positives; fn denotes the number of false negative; (tp + fn) indicates the total
number of pixels presented in the foreground, and (fp + tn) indicates the total number of pixels
presented in the background. And we implementing in C program language with Intel core 2, 2.4GHz
CPU, 2G RAM, and Windows XP SP2 operating system.
Experimental results for foreground detection using the proposed method. Here, we describe some
different sequences, and compared with former MOG [7], Rita’s method [4], CB [11], Chen’s method
[9] and Chiu’s method [22] schemes. In our experimental results is without any post processing and
short term information for measuring the accuracy of the results. All result for different sequence can
download with ftp://HMFD@140.118.7.72:222/
1. Sequence IR, Campus, Highway_I and Laboratory
Size: 320*240
Source:[19], file name: IR (row 1), Campus (row 2), Highway_I (row 3) and Laboratory (row 4)
To provide a better understanding about the detected results, four colors, red, green and blue, are
employed to represent shadows, highlight and foreground, respectively.
(a)
(b)
(c)
Fig. 1. Classified results of sequence [19] for IR (row 1), Campus (row 2), Highway_I (row 3) and
Laboratory (row 4) with shadow (red), highlight (green), and foreground (blue). (a) Original image, (b)
block-based stage only with block of size 10x10, and (c) proposed method.
2. Sequence Waving Trees (WT)
Size: 160*120
Source:[21], file name: Waving Trees
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
Fig. 2. Foreground (white) classified results with sequence WT [21]. (a) Original image (frame 247),
(b) ground truth, (c) MOG [7], (d) Rita’s method [4], (e) CB, (f) Chen’s method [9], (g) Chiu’s
method [22], (h)-(k) block-based only with block of size (h) 5x5, (i) 8x8, (j) 10x10, (k) 12x12, (l)-(o)
proposed cascaded method with block of size (l) 5x5, (m) 8x8, (n) 10x10, and (o) 12x12.
0.4
1
0.35
0.9
TP rate ( %)
0.8
0.2
0.15
0.7
0.6
0.1
0.5
0.05
0.4
0
0.3
200
203
206
209
212
215
218
221
224
227
230
233
236
239
242
245
248
251
254
257
260
263
266
269
272
275
278
FP rate (%)
0.3
0.25
242
(a)
246
248
250
252
254
256
258
(b)
1
1
0.9
0.9
0.8
0.8
0.7
0.7
Similarity (%)
Precision (%)
244
Frame number
Frame number
0.6
0.5
0.4
0.3
0.2
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0
0
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259
Frame number
Frame number
(c)
(d)
Fig. 3. The accuracy values in each frame for sequence WT [21]. (a) FP rate, (b) TP rate, (c)
Precision and (d) Similarity.
TABLE 1. THE AVERAGE OF ACCURACY VALUES FOR SEQUENCE WT
FP
TP
Precision
Similarity
fps
MOG [7]
0.0913
0.9307
0.6955
0.6729
40.13
Rita’s method [4]
0.3041
0.901
0.4628
0.4422
30.56
CB [11]
0.0075
0.9434
0.9290
0.8913
102.43
Chen’s method [9]
0.1165
0.8562
0.6450
0.5962
64.35
Chiu’s method [22]
block-based stage
5x5
block-based stage
8x8
block-based stage
10x10
block-based stage
12x12
proposed method
(5x5)
proposed method
(8x8)
proposed method
(10x10)
proposed method
(12x12)
0.0603
0.5641
0.7037
0.4599
320.05
0.0208
0.9755
0.8413
0.8276
269.36
0.0164
0.9674
0.8511
0.8294
320.88
0.0158
0.9749
0.8379
0.8199
365.29
0.0167
0.93
0.8077
0.7691
394.08
0.0027
0.9517
0.97
0.9266
165.28
0.0020
0.9408
0.9767
0.9204
186.56
0.0018
0.9474
0.9795
0.9285
197.04
0.0018
0.9059
0.9718
0.8853
205.61
3. Sequence WATERSURFACE [20]
Size: 160*128
Source:[20], file name: WATERSURFACE
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
Fig. 4. Foreground (white) classified results with WATERSURFACE [20]. (a) Original image (frame
529), (b) ground truth, (c) MOG [7], (d) Rita’s method [4], (e) CB, (f) Chen’s method [9], (g) Chiu’s
method [22], (h)-(k) block-based only with block of size (h) 5x5, (i) 8x8, (j) 10x10, (k) 12x12, (l)-(o)
proposed cascaded method with block of size (l) 5x5, (m) 8x8, (n) 10x10, and (o) 12x12.
1
0.06
0.9
0.8
0.7
0.04
TP rate (%)
FP rate (%)
0.05
0.03
0.02
0.6
0.5
0.4
0.3
0.2
0.01
0.1
0
0
480
485
490
495
500
505
510
515
520
481
525
486
491
496
(a)
506
511
516
521
526
(b)
1
1
0.9
0.9
0.8
0.8
0.7
0.7
Similarity (%)
Precision (%)
501
Frame number
Frame number
0.6
0.5
0.4
0.3
0.2
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0
0
481
486
491
496
501
506
511
516
521
526
481
Frame number
486
491
496
501
506
511
516
521
526
Frame number
(c)
(d)
Fig. 6. The accuracy values in each frame for sequence WATERSURFACE [20]. (a) FP rate, (b) TP
rate, (c) Precision and (d) Similarity.
TABLE 2. THE AVERAGE OF ACCURACY VALUES FOR SEQUENCE WATERSURFACE
FP
TP
Precision
Similarity
fps
MOG [7]
0.0431
0.8969
0.5515
0.5183
46.26
Rita’s method [4]
0.0265
0.8122
0.6370
0.5595
30.23
CB [11]
0.0038
0.8118
0.9247
0.7639
101.01
Chen’s method [9]
0.0228
0.8215
0.6680
0.5835
62.48
Chiu’s method [22]
block-based stage
5x5
block-based stage
8x8
block-based stage
10x10
block-based stage
12x12
proposed method
(5x5)
proposed method
(8x8)
proposed method
(10x10)
proposed method
(12x12)
0.0012
0.7153
0.9539
0.6965
284.36
0.0399
0.9588
0.5835
0.5722
213.52
0.0549
0.9568
0.5144
0.5052
273.97
0.0580
0.9291
0.4893
0.4754
320.05
0.0723
0.9355
0.4417
0.4340
348.83
0.0049
0.9087
0.8983
0.8283
147.65
0.0043
0.9030
0.9098
0.8331
182.92
0.0051
0.8800
0.8947
0.8026
192.01
0.0051
0.8812
0.8923
0.8080
202.02
4. Sequence CAMPUS [20]
Size: 160*128
Source:[20], file name: CAMPUS
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
Fig. 7. Foreground (white) classified results with CAMPUS [20]. (a) Original image (frame 695), (b)
ground truth, (c) MOG [7], (d) Rita’s method [4], (e) CB, (f) Chen’s method [9], (g) Chiu’s method
[22], (h)-(k) block-based only with block of size (h) 5x5, (i) 8x8, (j) 10x10, (k) 12x12, (l)-(o) proposed
cascaded method with block of size (l) 5x5, (m) 8x8, (n) 10x10, and (o) 12x12.
0.35
1
0.3
0.9
0.8
TP rate (%)
FP rate (%)
0.25
0.2
0.15
0.1
0.7
0.6
0.5
0.4
0.05
0.3
0
0.2
600
610
620
630
640
650
660
670
680
690
700
710
720
636
646
656
Frame number
666
(a)
690
700
710
(b)
1
1
0.9
0.9
0.8
0.8
0.7
0.7
Similarity (%)
Precision (%)
676
Frame number
0.6
0.5
0.4
0.3
0.2
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0
0
636
646
656
666
676
690
700
710
636
Frame number
646
656
666
676
690
700
710
Frame number
(c)
(d)
Fig. 8. The accuracy values in each frame for sequence CAMPUS [20]. (a) FP rate, (b) TP rate, (c)
Precision and (d) Similarity.
TABLE 3. THE AVERAGE OF ACCURACY VALUES FOR SEQUENCE CAMPUS
FP
TP
Precision
Similarity
fps
MOG [7]
0.1478
0.8811
0.2862
0.2725
53.26
Rita’s method [4]
0.1781
0.7225
0.2310
0.2030
23.16
CB [11]
0.0342
0.9219
0.5567
0.5280
85.87
Chen’s method [9]
0.1614
0.7517
0.2562
0.2295
51.81
Chiu’s method [22]
block-based stage
5x5
block-based stage
8x8
block-based stage
10x10
block-based stage
12x12
proposed method
(5x5)
proposed method
(8x8)
proposed method
(10x10)
proposed method
(12x12)
0.0604
0.4926
0.3533
0.2406
278.16
0.0447
0.9243
0.4971
0.4796
174.14
0.0383
0.9256
0.5042
0.4884
278.16
0.0358
0.9272
0.5176
0.5023
304.64
0.0433
0.8564
0.4455
0.4260
336.71
0.0125
0.9061
0.7195
0.6712
110.81
0.0095
0.9025
0.7672
0.7125
141.44
0.0093
0.9037
0.7708
0.7169
156.26
0.0084
0.8349
0.7820
0.6965
161.03
5. Sequence MO [21]
Size: 160*120
Source:[21], file name: moving object
Figure 9 shows the sequence MO [21] with a moving object, containing 1745 frames of size 160x120.
The sequence MO is employed to test the adaptability of the background model. When the chair is
moved at frame 888 in Fig. 10, after a period of time this chair becomes a part of background in
background model. We achieved this by applying short term information in background model to
improve its adaptation, and T_add set 100. In Fig.10, frame 986 shows a good result without any noise
or foreground regions.
Frame 600
Frame 650
Frame 700
Frame 750
Frame 800
Frame 850
Frame 888
Frame 950
Frame 980
Frame 982
Frame 984
Frame 986
Fig. 9. Foreground (blue) classified results with MO [21], and processed result with the proposed
method with short term information.
Conclusions
Table 4 organizes the average of accuracy results from Table 1-3 with the three test sequences. It is
clear that the proposed algorithm provides the highest accuracy performance among the various
compared methods. Moreover, the fps of the proposed method is also superior to the five former
approaches. In general, the larger block can achieve a higher processing speed, yet lower TP rate, and
vice versa, as indicated in Table 4. We would like to recommend a processing-speed-oriented
application to choose a larger block, while a smaller block would be a promising choice for TP
rate-oriented application.
A hierarchical method for foreground detection has been proposed by using block-based and
pixel-based. The block-based can enjoy high speed processing speed and detect most of the foreground
without reducing TP rate, while pixel-based can further improve the precision of the detected
foreground object with reducing FP rate Moreover, a color model and match function have also been
introduced in this study that can classify a pixel into shadow, highlight, background, and foreground.
As documented in the experimental results, the hierarchical method provides high efficient for
background subtraction which can be a good candidate for vision-based applications, such as human
motion analysis or surveillance systems.
TABLE 4. THE AVERAGE OF ACCURACY VALUES.
FP
TP
Precision
Similarity
MOG [7]
0.0941
0.9029
0.5111
0.4879
fps
64.22
Rita’s method [4]
0.1696
0.8119
0.4436
0.4016
27.98
CB [11]
0.0152
0.8924
0.8035
0.7278
96.44
Chen’s method [9]
0.1002
0.8098
0.5231
0.4698
59.55
Chiu’s method [22]
block-based stage
5x5
block-based stage
8x8
block-based stage
10x10
block-based stage
12x12
proposed method
(5x5)
proposed method
(8x8)
proposed method
(10x10)
proposed method
(12x12)
0.0406
0.5907
0.6703
0.4657
294.19
0.0351
0.9529
0.6407
0.6265
219.01
0.0365
0.9499
0.6233
0.6077
291.00
0.0366
0.9438
0.6149
0.5992
329.99
0.0441
0.9073
0.5650
0.5431
359.87
0.0067
0.9222
0.8626
0.8087
141.25
0.0053
0.9154
0.8846
0.8220
170.31
0.0054
0.9104
0.8817
0.8160
181.77
0.0051
0.8740
0.8821
0.7966
189.55
References
[1] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: principles and practice of
background maintenance,” In Proc. IEEE Conf. Computer Vision, vol. 1, pp. 255–261, Sept.
1999.
[2] T. Horprasert, D. Harwood, and L. S. Davis, “A statistical approach for real-time robust
background subtraction and shadow detection,” IEEE ICCV Frame-Rate Applications
Workshop, Kerkyra, Greece, Sept. 1999.
[3] R. Cucchiara, C. Grana, M. Piccardi, A. Prati, and S. Sirotti, “Improving shadow suppression
in moving object detection with HSV color information,” IEEE Conf. Intelligent
Transportation Systems, pp. 334-339, Aug. 2001.
[4] R. Cucchiara, M. Piccard, and A. Prati,”Detectin moving objects, ghosts, and shadows in
video streams,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 10, Oct.
2003.
[5] M. Izadi, and P. Saeedi, “Robust region-based background subtraction and shadow removing
using color and gradient information,” In proc. 19th International Conference on Pattern
Recognition, art. no. 4761133, Dec. 2008.
[6] M. Shoaib, R. Dragon, and J. Ostermann, “Shadow detection for moving humans using
gradient-based background subtraction,” IEEE Conf. Acoustics, Speech and Signal Processing,
art. No. 4959698, pp. 773-776, Apri. 2009.
[7] C. Stauffer and W.E.L Grimson, “Adaptive background mixture models for real-time tracking,”
IEEE International Conference on Computer Vision and Pattern Recognition, vol.2, pp.
246–52, June, 1999.
[8] C. Stauffer and W.E.L Grimson, “Learning patterns of activity using real-time tracking,” IEEE
Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 747-757, Aug. 2000.
[9] Y. T. Chen, C. S. Chen, C. R. Huang, and Y. P. Hung, “Efficient hierarchical method for
background subtraction,” Pattern Recognition, vol. 40, pp. 2706-2715, Oct. 2007.
[10] N. Martel-Brisson, and A. Zaccarin, “Learning and removing cast shadows through a
multidistribution approach,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29,
no.7, pp. 1133-1146, July, 2007.
[11] K. Kim, T.H. Chalidabhongse, D. Harwood, and L. Davis, “Real-time foreground-background
segmentation using codebook model,” Real-Time Imaging, vol. 11, no. 3, pp. 172-185, June.
2005.
[12] L. Maddalena, and A. Petrosino, “A self-organizing approach to background subtraction for
visual surveillance applications,” IEEE Trans. Image Processing, vol. 17, no. 7, pp. 1168-1177,
July, 2008.
[13] L. Massalena, and A. Petrosino, “Multivalued background/ foreground separation for moving
object detection,” Lecture Notes in Computer Science, vol. 5571, pp.263-270, 2009.
[14] T. Kohonen, Self-organization and Associative Memory, 2nd ed. Berlin,
Germany:Springer-Verlag, 1988.
[15] K. A. Patwardhan, G. Sapiro, and V. Morellas, “Robust foreground detection in video using
pixel layers,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp.
746-751, April, 2008.
[16] M. Heikkila, and M. Pietikainen, “A Texture-Based Method for Modeling the Background and
Detecting Moving Objects,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28,
[17]
[18]
[19]
[20]
[21]
[22]
no. 4, pp. 657-662, April, 2006.
E. J. Delp and O. R. Mitchell, “Image compression using block truncation coding,” IEEE
Trans. Communications Systems, vol. COMM-27, no. 9, pp. 1335-1342, Sept. 1979.
E. J. Carmona, J. Martinez-Cantos and J. Mira, “A new video segmentation method of
moving objects based on blob-level knowledge,” Pattern Recognition Letters, vol. 29, issue 3,
pp. 272-285, Feb. 2008.
Http://cvrr.ucsd.edu/aton/shadow/index.html
http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html
http://research.microsoft.com/en-us/um/people/jckrumm/WallFlower/TestImages.htm
C. C. Chiu, M. Y. Ku and L. W. Liang, “A robust object segmentation system using a
probability-based background extraction algorithm,” IEEE Trans. Circuits and Systems for
Video Technology, vol. 20, no. 4, April, 2010.
[23] C. Benedek and T. Sziranyi, “Bayesian foreground and shadow detection in uncertain frame
tate surveillance videos,” IEEE Trans. Image Processing, vol. 17, no. 4, April, 2008.
[24] W. Zhang, X. Z. Fang, X. K. Yang and Q. M. J. Wu, “Moving cast shadows detection using
ratio edge,” IEEE Trans. Multimedia, vol. 9, no. 6, Oct. 2007.
Download