DETECTION AND CLASSIFICATION FOR GROUP MOVING HUMANS

advertisement
DETECTION AND CLASSIFICATION FOR GROUP MOVING
HUMANS
WALID SULIMAN ELGENAIDI
A dissertation submitted in partial fulfillment of the
requirements for the award of the degree of
Master of Engineering
(Electrical-Electronics & Telecommunication)
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
May, 2007
DEDICATION
“To My Beloved Father, Mother, Brothers and Sisters”
ACKNOWLEDGEMENTS
First and foremost, I thank God Almighty for giving me the strength to
complete my research. I would also like to express my gratitude and respect to my
research supervisor PM DR. SYED ABD. RAHMAN AL-ATTAS for his constant
support and guidance during my graduate studies at Universiti Teknologi Malaysia.
Thanks to all of my colleagues and friends with whom I had the opportunity
to learn, share and enjoy. It has been a pleasure. Finally, special and infinite thanks
to the most important people in my life, my parents, for their love, prayers, sacrifice
and support.
ABSTRACT
In the case of moving group of humans the recognition algorithms more often
misclassify it as vehicles or large moving object. It is there fore the aim of this
project to detect and classify moving object as either Group of humans or something
else. The background subtraction technique has been employed in this work as it is
able to provide complete feature of the moving object. However, it is extremely
sensitive to dynamic changes like change of illumination. The detected foreground
pixels usually contain noise, small movements like tree leaves. These isolated pixels
are filtered by some of preprocessing operations; such as median filter and sequence
of morphological operations dilation and erosion. Then the object will be extracted
using border extraction technique. The classification makes use the shape of the
object. The performance of the proposed technique has achieved 75% accuracy based
on 18 test samples. This result shows that if it possible to distinctly classify a group
of humans moving in the video sequence from other large moving objects such as
vehicles.
ABSTRAK
Dalam kebanyakan kes pengecaman manusia yang bergerak secara
berkumpulan, algoritma pengecaman selalunya salah mengklasifikasikan kumpulan
manusia tersebut dan mengklasifikasikannya sebagai kenderaan atau sebagai satu
objek besar yang bergerak. Oleh yang demikian, adalah objektif utama projek ini
untuk mengesan dan mengklasifikasikan objek bergerak tersebut sebagai satu
kumpulan manusia bergerak atau sebaliknya. Teknik penolakan latar belakang
digunakan supaya ciri-ciri objek bergerak yang sempurna dapat diperolehi. Namun
demikian, teknik ini adalah amat sensitif terhadap perubahan dinamik seperti
perubahan pada pencahayaan. Piksel-piksel yang telah diasingkan, biasanya
mengandungi banyak hingar dan pergerakan-pergerakan kecil, seperti pergerakan
daun-daun pokok. Piksel-piksel ini akan diproses dengan beberapa penapis seperti
penapis median dan dituruti pula dengan beberapa operasi morfologi iaitu operasi
pengembangan dan penghakisan. Kemudian, sempadan objek akan diekstrak
menggunakan teknik pengekstrakkan sempadan. Proses klasifikasi pula mengunakan
maklumat bentuk objek tersebut. Berdasarkan eksperimen terhadap 18 sampel ujian,
didapati teknik yang telah dicadangkan mempunyai ketepatan sehingga 75%.
Daripada keputusan ini, didapati bahawa algoritma ini berpotensi untuk
mengklasifikasikan kumpulan manusia bergerak secara tepat berbanding dengan
objek bergerak yang lain seperti kenderaan.
TABLE OF CONTANTS
CHAPTER
TITLE
DECLARATION
iv
DEDICATION
v
ACKNOWLEDGEMENTS
vi
ABSTRACT
vii
ABSTRAK
viii
TABLE OF CONTENTS
1
2
PAGE
ix
LIST OF TABLES
xii
LIST OF FIGURES
xiii
LIST OF NOMENCLATURES
xv
INTRODUCTION
1
1.1
Overview
2
1.2
Overview system stages
3
1.3
Objective of this study
3
1.4
Scope of this study
4
1.5
Project’s Outline
4
LITERATURE REVIEW
6
2.1
Introduction
6
2.2
Reference Based Approach
7
3
2.3.
Experimental Results
8
2.4
Background Subtraction
9
2.5
Motion Detection and Shape - Based Detection
9
2.6
Summary
9
PROJECT METHODOLOGY
11
3.1
Introduction
11
3.2
System Overview
11
3.3
Detection stage
11
3.3.1
Image Capture
13
3.3.2
Background Model
13
3.3.3
Foreground Model
14
3.4
Object Preprocessing
3.4 .1
3.5
Median Filter
15
15
3.4.2
Morphological Operations
16
3.4.3
Dilation
17
3.4.4
Erosion
18
3.4.5
Border Extraction
20
Classification Stage
21
3.5.1 Extracting Object Features
21
3.5.1.1 Centre of Mass
22
3.5.1.2 Distance between the center point and the border of the
3.5.2
4
object
23
Classification Metric
25
EXPERIMENTAL RESULT
28
4.1
Introduction
28
4.2
Moving object Position
28
4.3
Image Capture Results
30
4.4
Background Subtraction Results
32
4.5
Median Filter
32
5
4.6
Dilation
34
4.7
Erosion
34
4.8
Region Filling
36
4.9
Border Extraction
36
4.10
Feature Extraction Result
38
4.11
Classification Results
41
4.12
Recognition Accuracy
42
4.13
Factors that Contribute to Low Accuracy
43
CONCLUSION AND SUMMARY
44
5.1
Summary
44
5.2
Conclusions
44
5.3
Recommendations for Future Work
45
REFERENCES
46
APPENDIX A: MATLAB DECLARATIONS
49
APPENDIX B: IMAGES USED IN THE DATA BASE
57
LIST OF TABLES
TABLE NO
4.1
TITLE
The results of the classification metric for 12 samples of the group
of humans, where (THB= 0.04).
4.2
40
The results of the classification metric for 8 samples of others,
where (THB= 0.04).
4.3
PAGE
Performance accuracy
41
42
LIST OF FIGURES
FIGURE NO
TITLE
PAGE
2.1
Flow chart of the proposed approach
7
2.2
Multiple human detection from indoor sequences
8
2.3
Multiple human detection from outdoor sequences
8
3.1
A generic framework for the algorithm
12
3.2
The system block diagram.
12
3.3
Convert the frames to grayscale frame
13
3.4
a) The original image (unfiltered image). b) After replacing Center
value (previously 97) is replaced by the median of all nine values (4)
3.5
(a) Structure ‘B’ (b) a simple binary image ’A’(c) Result of erosion
process.
3.6
18
(a) Structure ‘B’ (b) a simple binary image ’A’(c) Result of erosion
process.
3.7
16
19
Show the output of the border after subtracting the eroded
image from original one
20
3.8
(a) A simple binary image (b) Result of using border extraction
20
3.9
Sample objects and their silhouettes
20
3.10
Exacted the border from the foreground
3.11
(a) and (c) The objects border with distance and center point (b) and
(d) Sample distance signal calculation and normal distance signals
3.12
22
24
(a), (b) The border and distance graphs of the vehicle. (c) The border
and distance graph of the group of humans.
25
3.13
(a) Graph of the AVG. (b) The AVG graph between 60-120
27
4.1
Different camera positions.
29
4.2
Result of the image capture and grayscale image
31
4.3
Result of the background subtraction
32
4.4
Results of the median filter
33
4.5
Results of the dilation process
34
4.6
Result of the erosion process.
35
4.7
The result of the region filling process
36
4.8
The result of the border extraction process.
38
4.9
Results of the shape feature
40
4.10
Results for classifying the group of humans
41
4.11
Results for classifying the others
42
4.12
Incorrect results of classification
43
LIST OF SYMBOLS
B(j,i) -
Background image
C(j,i) -
Current image
g(j,i)
-
The output after the threshold process
TH
-
The threshold value
Cm
-
Center point
YCm
-
Center point for y coordinates
XCm
-
Center point for x coordinates
Dist
-
The Euclidian distance
DS
-
The normalized distance signal
CHAPTER 1
INTRODUCTION
1.1
Introduction
Video surveillance systems have long been in used to monitor security
sensitive areas. The history of video surveillance consists of three generations of
systems (generation surveillance systems ) which are called 1GSS, 2GSS and 3GSS
[11].
The first generation surveillance systems (1GSS, 1960-1980) were based on
analog sub systems for image acquisition, transmission and processing. They
extended human eye in spatial sense by transmitting the outputs of several cameras
monitoring a set of sites to the displays in a central control room. They had the major
drawbacks like requiring high bandwidth, difficult archiving and retrieval of events
due to large number of video tape requirements and difficult online event detection
which only depended on human operators with limited attention span.
The next generation surveillance systems (2GSS, 1980-2000) were hybrids in
the sense that they used both analog and digital sub systems to resolve some
drawbacks of its predecessors. They made use of the early advances in digital video
processing methods that provide assistance to the human operators by filtering out
spurious events. Most of the work during 2GSS is focused on real time event
detection.
2
Third generation surveillance systems (3GSS, 2000- ) provide end-to-end
digital systems. Image acquisition and processing at the sensor level, communication
through mobile and fixed heterogeneous broadband networks and image storage at
the central servers benefit from low cost digital infrastructure.
Unlike previous generations, in 3GSS some parts of the image processing are
distributed towards the sensor level by the use of intelligent cameras that are able to
digitize and compress acquired analog image signals and perform image analysis
algorithms like motion and face detection with the help of their attached digital
computing components.
Moving object detection is the basic step for further analysis of video. It
handles segmentation of moving objects from stationary background objects. This
not only creates a focus of attention for higher level processing but also decreases
computation time considerably. Commonly used techniques for object detection are
background subtraction, statistical models, temporal differencing and optical flow.
Due to the dynamic environmental conditions such as illumination changes, shadows
and waving tree branches in the wind object segmentation is a difficult and
significant problem that needs to be handled well for a robust visual surveillance
system.
Object classification step categorizes detected objects into predefined classes
such as human, vehicle, animal, clutter, etc. It is necessary to distinguish objects
from each other in order to track and analyze their actions reliably. Currently, there
are two major approaches towards moving object classification, which are shapebased and motion-based methods [15].
1.1
Overview
This project is to design group of humans recognition system that can be
integrated into an ordinary visual surveillance system with moving object detection
classification .The present system which operates on gray scale video imagery from
3
a video camera, the system is handled by the use of an adaptive background
subtraction scheme[3] which works reliably in an out-door environments. After
segmenting moving pixels from the static background, connected regions are
classified into predetermined object categories: group of humans or vehicle or some
thing else.
1.2
Overview system stages
The proposed system is capable of detecting moving objects .The system
extracts features of these moving objects and then classifies them into two categories
“Group of Humans or something else”. The methods used can be summarized as
follows:
1.
1.3
Detection step:
•
Background model.
•
Foreground detection.
2.
Object Preprocessing.
3.
Feature Extraction.
4.
Classification.
Objective of this study
The main objective of this project is to design a system that can detect and
differentiate the group of humans moving from the moving objects. The object will
be processed before classification using some image processing techniques to
accommodate environmental change during the acquiring process. This work can be
an important part for intelligent security surveillance purposes.
4
1.4
Scope of this study
To accomplish this objective, the scope of this study would be divided into
several stages as follow:
1. The scene does not include night vision.
2. Method developed is meant only for outdoor environment.
3. This method makes use of the objects silhouette contour, length and area
to classify the detected objects.
4. The camera is facing the front of the object.
5. The system classifies a group of 3 humans and above
6. The systems programmed using MATLAB.
7. The processing will be done is off line.
1.5
Projects Outline
The project is organized into five chapters. The outline is as following;
Chapter 1- Introduction
This chapter discuses the objective and scope of the project and gives general
introduction on the history of video surveillance and classification of the moving
objects that will be detected.
Chapter 2- Review of Literature Review
This chapter review previous approach for detection of multiple moving
objects from binocular video sequences is reported. First an efficient motion
estimation method is applied to sequences acquired from each camera.
5
Chapter 3- Project Methodology
This chapter presents the overall system methodology and discusses in details
each step that has to be taken into consideration for classification purposes.
Chapter 4- Experimental Results
This chapter shows the results for each process done on the image for this
system, and final results of the system.
Chapter 5- Conclusion
This chapter consists of conclusions and recommendation for future
improvement.
6
CHAPTER 2
LITERATURE REVIEW
2.1
Introduction
A lot of researchers have begun working on the detection and classifications
of the objects. This paper will be introduced in the next section.
Yang Ran et al[1] .developed detection of multi moving people from
binocular sequences. A novel approach for detection of multiple moving objects
from binocular video sequences is reported. First an efficient motion estimation
method is applied to sequences acquired from each camera. The motion estimation is
then used to obtain cross camera correspondence between the stereo pair. Next,
background subtraction is achieved by fusing of temporal difference and depth
estimation. Finally moving foregrounds are further segmented into moving object
according to a distance measure defined in a 2.5D feature space, which is done in a
hierarchical strategy. The proposed approach has been tested on several indoor and
outdoor sequences. Preliminary experiments have shown that the new approach can
robustly detect multiple partially occluded moving persons in a noisy background.
Representative human detection results are presented.
7
2.2
Reference Based Approach
In this paper, Yang Ran proposed a novel approach for detecting moving
human from binocular videos. It used a fast and accurate sub-pixel accuracy motion
estimation technique to extract object motion information, which significantly
reduces ambiguity and computation cost in establishing dense stereo correspondence.
In this approach, both motion consistency between the two cameras and stereo
disparity map are used for background subtraction and moving object
segmentation/grouping. The motion correspondence significantly improves the
background subtraction process; while stereo correspondence trims down the
searching computation. Fig. 2.1 shows a flow chart of this approach.
Figure2.1
Flow chart of the proposed approach
8
2.3
Experimental Results
Yang Ran applied an algorithm to a number of stereo sequences acquired by a
stationary stereo camera. Two representative results are presented here. The videos
are captured at 320x240 resolutions, 25 frames per second. Figure 2.2 shows an
example of detection results in an indoor scene. The background for indoor scene is
constant during capturing. Shown in the left column are two input frames (#16 and
#25) taken from the left camera. Shown in the central column are motion
(foreground) detection results. Shown in the right column are the person
segmentation/grouping results, where different individuals are assigned with
different gray levels. The first row is the case where no occlusion occurs and the
people are at different distances. The second row is the case where occlusion
happens and the persons are at different distances.
Figure 2.2
Multiple human detection from indoor sequences
Figure 2.3 shows an example of detection two people in an outdoor scene.
The test demonstrates that even under cluttered background (due to background
vegetation motion) and shadows.
Figure 2.3
Multiple human detection from outdoor sequences
9
2.4
Background Subtraction
A common approach to identifying moving objects from a video sequence is
a fundamental and critical task is background subtraction, which identifies moving
objects from the portion of a video frame that differs significantly from a background
model. There are many challenges in developing a good background subtraction
algorithm. First, it must be robust against changes in illumination. Second, it should
avoid detecting non-stationary background objects such as moving leaves, rain,
snow, and shadows cast by moving objects. Finally, its internal background model
should react quickly to changes in background such as starting and stopping of
vehicles.
2.5
Motion Detection and Shape - Based Detection
Object classification step categorizes detected objects into predefined classes
such as human, vehicle, animal, clutter, etc. It is necessary to distinguish objects
from each other in order to track and analyze their actions reliably. Currently, there
are two major approaches towards moving object classification, which are shapebased and motion-based methods. Shape-based methods make use of the objects’ 2D
spatial information whereas motion-based methods use temporal tracked features of
objects for the classification solution.
2.6
Summary
A novel approach for detection of multiple occluded moving persons from
binocular video sequences is presented. By integrating the motion estimation result
into every step in the whole detecting process, monocular and binocular
correspondences are fused to generate robust detections, which is the work
contribution. First an efficient motion estimation method is applied to sequences
from each camera. The motion estimation is then used to obtain cross camera
10
correspondence between the stereo pair. Next, background subtraction is achieved by
fusion of temporal difference and depth estimation. Finally foregrounds are further
segmented into moving objects according to a distance measure defined in a 2.5D
feature space. The proposed approach has been tested on several indoor and outdoor
sequences.
11
CHAPTER 3
PROJECT METHODOLOGY
3.1
Introduction
The system extracts the features from the moving objects and classifies them
into “group of humans, vehicle or some thing else “.This chapter presents in details
the methodology of the proposed system
3.2
System Overview
The flowchart of the system architecture approach is shown in Fig 3.1. This
chart gives an overview of the main stages of the methodology; this system is
divided into two main stages: detection stage and classification stage.
3.3
Detection stage
The system operates on gray scale video imagery from the video frames. The
system is handled by the use background subtraction scheme which reliably works in
outdoor environments.
12
Object
Detection
Figure 3.1
Object
Classification
Decision
A generic framework for the algorithm
ageCapture
Capture
Im
IIm
mage
age
Capture
Background Model
Current
Image
Background
Foreground Model
Foreground Image
Object Preprocessing
Object with Feature
Feature Extraction
Classification
Figure 3.2
The system block diagram.
13
3.3.1
Image Capture
The system captures the images off line from the video (25 frames per
second). The system will start to initialize the background using the first frame, the
captured frames are converted to grayscale images.
A grayscale (or graylevel) image is simply one in which the only colors are
shades of gray. The reason for differentiating such images from any other sort of
color image is that less information needs to be provided for each pixel. In fact a
`gray' color is one in which the red, green and blue components all have equal
intensity in RGB space, and so it is only necessary to specify a single intensity value
for each pixel, as opposed to the three intensities needed to specify each pixel in a
image. The grayscale intensity is stored is an 8-bit integer giving 256 possible
different shades of gray from black to white.
Grayscale images are very common, in part because much of today's display
and image capture hardware. In addition, grayscale images are entirely sufficient for
many tasks and so there is no need to use more complicated and harder-to-process
color images.
Figure 3.3
3.3.2
Convert the frames to grayscale frame
Background Model
Each application that benefit, from smart video processing has different
needs, thus requires different treatment. However, they have something in common:
14
moving objects. Thus, detecting regions that correspond to moving objects such as
Group of Humans or something else in video is the first basic step of almost every
vision system since it provides a focus of attention and simplifies the processing on
subsequent analysis steps. Due to dynamic changes in natural scenes such as sudden
illumination and weather changes, repetitive motions that cause clutter (tree leaves
moving in blowing wind), motion detection is a difficult problem to process reliably.
Frequently used techniques for moving object detection are background subtraction,
whose description is given below.
Background subtraction is particularly a commonly used technique for
motion segmentation in static scenes. It attempts to detect moving regions by
subtracting the current image pixel-by-pixel from a reference background image that
is created by averaging images over time in an initialization period. Background
subtraction method, a reference background is initialized at the start of the system
with the first frame of video.
3.3.3
Foreground Model
At each new frame, foreground pixels are detected by subtracting the
intensity values from the background and filtering the absolute value of the
differences with value of threshold per pixel. The pixels where the difference is
above a threshold are classified as foreground.
Let B(j,i) represents the gray-level Background image, B(j,i) which is in the
range [0, 255]. Let C(j,i) be the Current image[8]. As the generic background
subtraction scheme suggests, a pixel at position (j,i) in the current video image
belongs to foreground if it satisfies
Foreground (j,i) = |B (j,i) – Current image (j,i) | ≥ TH
(3.1)
Where TH is the threshold value. The above equation is used to generate the
foreground pixel map which represents the foreground regions as a binary array
15
where a 1 corresponds to a foreground pixel and a 0 stands for a background pixel.
The reference background B(j,i) is initialized with the first video image and the
threshold image is obtained from empirical experiments.
3.4
Object Preprocessing
The outputs of foreground region detection algorithms in which explained in
previous three sections generally contain noise and therefore are not appropriate for
further processing without noise filtering.
In this system, the first method of using simple intensity value has been
applied. The threshold value was fixed to the value (TH=32) and followed by the rule
below g(j,i) is the output after the threshold process.
⎧0
⎩1
g (j,i) = ⎨
3.4.1
If .Foreground ( j , i ) < TH
otherwise
(3.2)
Median Filter
The median filter is normally used to reduce noise in an image and it is a
simple and very effective noise removal filtering process. Its performance is
particularly good for removing shot noise. Shot noise consists of strong spikelike
isolated values. The median filter is also a sliding-window spatial filter, but it
replaces the center value in the window with the median of all the pixel values in the
window. Example of median filtering of a single 3x3 window of values is shown
below [16].
16
unfiltered values
6
2
0
3
97
4
19
3
10
(a)
In order: 0, 2, 3, 3, 4, 6, 10, 15, 97
median filtered
*
*
*
*
4
*
*
*
*
(b)
Figure 3.4
a) The original image (unfiltered image). b) After replacing Center
value (previously 97) is replaced by the median of all nine values (4).
3.4.2 Morphological Operations
The field of mathematical morphology contributes a wide range of operators
to image processing, all based around a few simple mathematical concepts from set
theory. The operators are particularly useful for the analysis of binary images and
common usages include edge detection, noise removal, image enhancement and
image segmentation.
The two most basic operations in mathematical morphology are erosion and
dilation. Both of these operators take two pieces of data as input: an image to be
eroded or dilated, and a structuring element (also known as a kernel). The two pieces
of input data are each treated as representing sets of coordinates in a way that is
slightly different for binary and grayscale images.
17
Morphological operations, erosion and dilation, are applied to remove noisy
foreground pixels that do not correspond to actual foreground regions and to remove
the noisy background pixels near and inside object regions that are actually
foreground pixels.
Basic operation of a morphology-based approach is the translation of a
structuring element over the image and the erosion and/or dilation of the image
content based on the shape of the structuring element. A morphological operation
analyses and manipulates the structure of an image by marking the locations where
the structuring element fits. In mathematical morphology, neighborhoods are,
therefore, defined by the structuring element, i.e., the shape of the structuring
element.
There are many types of morphological operation that can be used but in this
project ,only three of them will be used as preprocessing and these are erosion ,
dilation, and connected component labeling.
3.4.3
Dilation
Dilation is one of the two basic operators in the area of mathematical
morphology, the other being erosion. It is typically applied to binary images, but
there are versions that work on grayscale images. The basic effect of the operator on
a binary image is to gradually enlarge the boundaries of regions of foreground pixels
(i.e. white pixels, typically). Thus areas of foreground pixels grow in size while holes
within those regions become smaller. So the areas of foreground pixels grow in size
while holes within those regions become smaller as shown in Figure.3.5.
The dilation operator takes two pieces of data as input. The first is the image
which is to be dilated. The second is a set of coordinate points known as a structuring
element (also known as a kernel) as shown figure. It is this structuring element that
determines the precise effect of dilation on the input image. To compute the dilation
of a binary input image by structuring element, each of the background pixels in the
18
input image is considered in turn. For each background pixel(or input pixel), the
structuring element is super imposed on the top of input image so that the origin of
the structuring element coincides with the input pixel position. If at least one pixel in
the structuring element coincides with a foreground pixel in the image underneath,
then the input pixel is set to the foreground value. If all the corresponding pixels in
the image are background however, the input pixel is left at the background value
[5].
D [A , B ] = A ⊕ B =
(A + B )
U
β
∈B
Figure 3.5
(3.3)
(a) structure ‘B’ (b) a simple binary image ’A’(c)Result of erosion
process.
Dilation process has many good criteria such as it can repair the broken edges,
help in getting smoother border etc, but its drawback is when applying on a small
object. The following steps below have been applied in order to obtain better results.
(a)
Calculating the entire area of the object.
(b)
If area of the object is >500 then dilation process will be applied to the
object other wise no dilation process is performed.
3.4.4
Erosion
Erosion is the other basic operators in the area of mathematical morphology.
The basic operation is to erode the boundaries of region of foreground pixels (i.e.
19
white pixels, typically). Thus areas of foreground pixels shrink in size, and holes
with those areas become larger [18] as shown in Figure 3.6. The erosion operator
takes two pieces of data as inputs. The first is the image which is to be eroded and
the second is (usually small) set of coordinate points known as a structuring element
(also known as kernel). It is this structuring element that determines the precise
effect of the erosion on the input image.
To compute the erosion of a binary input image by this structuring element,
each of the foreground pixels in the input image is considered in turn. For each
foreground pixel (which is called the input pixel) the structuring element is
superimposed on top of the input image so that the original image of the structuring
element coincides with the input pixel coordinates. If for every pixel in the
structuring element, the corresponding pixel in the image underneath is a foreground
pixel, then the input pixel is left as it is. If any of the corresponding pixels in the
image are background however, the input pixel is also set to background value [18].
E[A, B ] = AΘ(− B ) =
(A − B)
I
β
∈B
Figure 3.6
(3.4)
(a) structure ‘B’ (b) a simple binary image ’A’(c)Result of erosion
process.
20
3.4.5
Border Extraction
This method is to extract the outline of the border using eroding the image
once and then subtract the input image from the eroded one using the formula bellow
[19]:
B ( A) = A − ( AΘ B )
( AΘB ) =
A=
B(A)=
Figure 3.7
shows the output of the border after subtracting the eroded
image from original one.
(a)
Figure 3.8
(b)
(a) A simple binary image (b) Result of using border extraction
Figure 3.9
Sample objects and their silhouettes
21
3.5 Classification Stage
Categorizing the type of a detected video object is a crucial step in achieving
this goal. With the help of object type information, more specific and accurate
methods can be developed to recognize the objects. Hence, in this project developed
a novel video object classification method based on object shape.
Typical video scenes may contain a variety of objects such as group of
humans, vehicles, animals, natural phenomenon (e.g. rain, snow), plants and clutter.
However, main target of interest in surveillance applications are generally group of
humans.
3.5.1 Extracting Object Features
After detecting foreground regions and applying post-processing operations
to remove noise and shadow regions. After finding individual blobs that correspond
to objects, spatial features like bounding box, size, center of mass and silhouettes of
these regions are calculated.
22
Labeled Foreground regions (Blobs)
Dilation
Erosion
Filtered Foreground regions (Blobs)
Center
Centreof
ofRegion
Region
Object only with Border
Figure 3.10
Exacted the border from the foreground
3.5.1.1 Centre of Mass
After extracting the border of the foreground region as shown above in
figure 3.8, the center of the object is calculated by simply finding the average of all x
coordinates and y coordinates.
23
In order to calculate the center of mass point, Cm = (XCm, YCm), of an object
[4], we use the following equation:
XCm =
∑
n
i
Xi
n
∑ Yi
(3.5)
n
YCm =
i
n
(3.6)
Where n is the number of pixels in object.
3.5.1.2 Distance between the center point and the border of the object
After calculating the center of the mass of the object and extracted the border
of object, the distance between the border and center is be calculated. The algorithm
is used to calculate the distance is show in Figure 3.11.
Let S = {P1, P2… Pn } be the silhouette of an object O consisting of 180
points ordered from (0 degree) with the coordinates of center point of the detected
region in opposite clockwise direction to (180 degree). The distance signal DS = {d1,
d2… dn} is generated by calculating the distance between Cm and each Pi starting
from 1 through 181 as follows:
di = Dist (C m , Pi ).......∀i ∈ [1......180]
(3.7)
Where the Dist function is the Euclidian distance.
Different objects have different shapes in video and therefore have silhouettes
of varying sizes. Even the same object has altering contour size from frame to frame.
24
In the next step, the scaled distance signal d(i) is calculated and normalized
to have integral unit area. The normalized distance signal DS is calculated using the
following equation:
DS [i ] =
d (i )
180
(3.8)
Ds[i]
Points
(a)
(b)
Ds[i]
Points
(c)
Figure 3.11
(d)
(a) and (c) The objects border with distance and center point (b) and
(d) Sample distance signal calculation and normal distance signals
The main concept to extract the shape feature is to look at the location of
heads. This feature is detected because of the peaks appear in the case of group of
humans. Location of the heads from (0-180 degree). Most heads will be located in
between 60 and 120 degree. The feature is illustrated in the Figure 3.12.
25
(a)
(b)
(c)
Figure 3.12
(a), (b) The border and distance graphs of the vehicle. (c) The border
and distance graph of the group of humans.
3.5.2
Classification Metric
There are numerous methods been used to classify the object based on shapes
[14, 3, 13, 2, 10]. Our object classification metric is based on the similarity of object
26
border of shapes. After obtaining the distance, as in Figure 3.12 (b) and (c). The next
step is the comparison between the input object (i.e. border distance) and the stored
border distance, which can be calculated offline as follows:
Result=
⎧ group of humans
⎨
⎩others
∑ Dst
AB
− AVG ≥ TDB
otherwise
(3.9)
Where
The DstAB is the distance extracted from the object.
The AVG is the offline measured value
The TDB = 0.04 obtained from empirical experiments.
The AVG is illustrated in Figure 3.13. By applying the above rule, the
classification has achieved
(a)
27
(b)
Figure 3.13. (a) Graph of the AVG. (b) The AVG graph between 60-120.
28
CHAPTER 4
EXPERIMENTAL RESULTS
4.1
Introduction
This chapter presents the experimental results of this project. The results for
each process (include the preprocessing stage) are also presented in this chapter. In
addition this chapter also discusses the performance of the technique and factors that
affect the accuracy of the system.
4.2
Moving object Position
The movement of the object in the different distances, object location and the
camera positioning are the most important issues during the extraction of the
features. In this system, the camera is facing the front of the object. Figure 4.1 below
shows the positions of the camera used.
29
(a)
(b)
(c)
Figure 4.1
Different camera positions.
30
4.3
Image Capture Results
The system captured the frames from the video and converted to the grayscale
image. Figure 4.2 shows the results of the capture image.
(a)
(b)
31
(c)
(d)
Figure 4.2
Result of the image capture and grayscale image
32
4.4
Background Subtraction Results
The results of the background subtraction algorithm after comparing with threshold
value as it explained in Chapter 3. Shown in Figure 4.3.
(a)
(b)
(c)
(d)
Figure 4.3
4.5
Result of the background subtraction
Median Filter
Median filter is applied to remove the noise it appears in the image after the
background subtraction operation. Figure 4.4 below shows the results of the median
filter.
33
(a)
(b)
(c)
(d)
Figure 4.4
Results of the median filter
34
4.6
Dilation
Dilation process can link the broken border of the object to be with same
shape. Figure 4.5 below shows the results of dilation operation
(a)
(c)
Figure 4.5
4.7
(b)
(d)
Results of the dilation process
Erosion
Erosion process can not loose all the small details of the object to help for
improve the border of the object. Figure 4.6 below shows the results of erosion
process.
35
(a)
(b)
(c)
(d)
Figure 4.6
Result of the erosion process.
36
4.8
Region Filling
This process is to fill the object with white pixels in order to improve the
border extraction techniques. Figure 4.7 shows the result of this process.
(a)
(b)
(c)
Figure 4.7
4.9
(d)
The result of the region filling process.
Border Extraction
The border extraction algorithm to extract the outline of the border. Figure 4.8
shows the result of the algorithm.
37
(a)
(b)
(c)
38
(d)
Figure 4.8
4.10
The result of the border extraction process.
Feature Extraction Result
In this section, the extraction of the feature of the shape is shown in the graphs
below; the border of the object is represented in the distance graph. Figure 4.9 shows
these results.
(a)
39
(b)
(c)
40
(d)
Figure 4.9
Results of the shape feature
Table 4.1: The results of the classification metric for 12 samples of the group of
humans, where (THB= 0.04).
No
Result =
∑ Dst
AB
− AVG ≥ THB
Comment
Sample_1
0.0598
(True the result>THB)
Sample_2
0.156
-
Sample_3
0.0600
-
Sample_4
0.207
-
Sample_5
0.2299
-
Sample_6
0.0821
-
Sample_7
0.0325
(Failed the result<THB)
Sample_8
0.0771
(True the result>THB)
Sample_9
0.0056
(Failed the result<THB)
Sample_10
0.0574
(True the result>THB)
41
Table 4.2: The results of the classification metric for 8 samples of others, where
(THB= 0.04).
No
Results=
∑ Dst
AB
− AVG ≥ THB
Comment
Sample_1
0.0108
(True the result<THB)
Sample_2
0.0381
-
Sample_3
0.1462
Sample_4
0.0665
(Failed the
result<THB)
-
Sample_5
0.0158
(True the result<THB)
Sample_6
0.0074
-
Sample_7
0.0056
-
Sample_8
0.0108
-
4.11
Classification Results
The result of the classification process is the crucial key point in this system
in order to classify the objects into two classes (group of humans and others) as
shows in Figure 4.10 and Figure 4.11 below.
Figure 4.10
Results for classifying the group of humans
42
Figure4.11
4.12
Results for classifying the others
Recognition Accuracy
A high accuracy system with low error rate is required. The main target of
classification processes is to classify the group of humans from other objects. The
video samples are selected to test the accuracy in the way that covers all the possible
conditions like different positions for the objects, different camera positioning etc.
The recognition accuracy for this system is shown in Table 4.3.
Table 4.3: Performance accuracy
Object
Classification
accuracy
Samples
Success
Fail
Group of humans
10
8
2
80%
Others
8
6
2
75%
Average Success Rate
77%
43
4.13
Factors that Contribute to Low Accuracy
There are several factors that affect the accuracy of the classification. The
first one is caused by poor quality video. The second one is due to in the changes the
shape of the object caused by preprocessing stages (dilation, erosion, etc). Others like
the distance between the camera and the objects, the natural scenes such as sudden
illumination and weather changes.
Figure 4.12 show example of the failure
classification.
Figure 4.12
Incorrect results of classification
44
CHAPTER 5
CONCLUSION AND SUMMARY
5.1
Summary
The program for detection and classification of group moving humans has
been developed in this project using the object silhouettes shape. In detecting the
moving objects background subtraction has been used because of its high
performance of handling the moving objects. The results show that the presented
method is promising. The shape feature extraction method has been used in this
project to classify the moving objects. Finally the classification of group moving
humans has been successfully achieved with some misclassification error which was
contributed by poor quality of the video.
5.2
Conclusions
In general, the objective of the detection and classification of group moving
humans has been achieved. The program developed is currently fit for offline
application. The images have been captured from same camera position. To ensure
the obtained are reliable, the system must capture good quality images.
45
The input image for this system has been passed through many preprocessing
stages before it is viable for the classification process. Extracting the features vector
is the most important part of this project so that it would be able to achieve the main
aim of this project. By extracting the most descriptive features from the moving
objects, a system with high accuracy for classification can be produced.
5.3
Recommendations for Future Woke
All the objectives were accomplished within the scope and the limitation of
the project. There are few recommendations which might be helpful in the future
work as given in below.
•
The use of the different color range of the image and big mass coverage
of the color in the case of class vehicles.
•
The use of 3D image can help in detecting and classifying of the objects.
•
Consider object motion in different situations.
•
Increase the feature vector so that high accuracy can be achieved for
classification.
•
Convert this system from offline application to online applications so that
the actual performance of the algorithm can be verified.
46
REFERENCES
[1]
Ran, Y and Zheng, Q. Multi moving people detection from binocular
sequences. Center for automation research institute of advanced computer
studies, University of Maryland, USA.
[2]
Arkin, E.M. Chew, L. P. Huttenlocher, D. P. Kedem, K., and Mitchell, J. S.
B. (1991). An e_ciently computable metric for comparing polygonal shapes.
IEEE Transactions on Pattern Recognition and Machine Intelligence,
13:209–216,
[3]
Collins, R. T. Gross, R. and Shi, J. (2002). Silhouette-based human
identification from body shape and gait. In Proc. of Fifth IEEE Conf. on
Automatic Face and Gesture Recognition, pages 366–371.
[4]
Collins, R. T. (2000). A system for video surveillance and monitoring: VSAM
final report. Technical report CMU-RI-TR-00-12, Robotics Institute, Carnegie
Mellon University.
[5]
Brodsky, T (2002). Visual Surveillance in Retail Stores and in the Home,
chapter 4, pages 51–61. Video-Based Surveillance Systems. Kluwer Academic
Publishers, Boston.
[6]
Fujiyoshi, H. and Lipton, A. J. (1998). Real time human motion analysis by
image skeletonization. In Proc. of Workshop Applications of Computer Vision,
pages 15–21.
[7]
Healey, G. Slater, Lin, D. T. Drda, B., and Goedeke, D. (1993). A system for
real-time fire detection. Computer Vision and Pattern Recognition, pages 605–
606.
[8]
Heijden, F. (1996). Image Based Measurement Systems: Object Recognition
and Parameter Estimation. Wiley.
[9]
Heikkila, J. and Silven, O. (1999). A real-time system for monitoring of
cyclists and pedestrians. In Proc. of Second IEEE Workshop on Visual
Surveillance, pages 74–81, Fort Collins, Colorado.
47
[10] Ramoser, H. Schlgl, T. Winter, M. and Bischof, H. (2003). Shape-based
detection of humans for video surveillance. In Proc. of IEEE Int. Conf. on
Image Processing, Barcelona, Spain.
[11] Loncaric, S. (1998). A survey of shape analysis techniques. Pattern
Recognition, 31(8):983–1001.
[12]
Oberti, F. Ferrari, G. and Regazzoni, C. S. (2002). A Comparison between
Continuous and
Burst, Recognition Driven Transmission Policies in
Distributed3GSS, chapter 22, pages 267–278. Video-Based Surveillance
Systems. Kluwer Academic Publishers, Boston.
[13]
Saykol, E.
Gudukbay, U.
and Ulusoy, O. (2002). A histogram-based
approach for object-based query-by-shape-and-color in multimedia databases.
Technical Report BUCE-0201, Bilkent University.
[14]
Saykol, E. Gulesir, G. Gudukbay, U. and Ulusoy, O. (2002). KiMPA: A
kinematicsbased method for polygon approximation. In International
Conference on Advances in Information Systems (ADVIS’02), pages 186–
194, Izmir, Turkey.
[15]
Veltkamp, R.C.
and. Hagedoorn, M. (2001). State-of-the-art in shape
matching, pages. Principles of Visual Information Retrieval. Springer. 87–
119
[16]
Wang, L. Hu, W. and Tan, T. (2003). Recent developments in human motion
analysis. Pattern Recognition, 36(3):585–601.
[17]
http://www.cee.hw.ac.uk/hipr/html/median.html
[18]
Hypermedia image processing Reference. (1995). Department of Artificial
Intelligence, University of Edinburgh, UK, Version 1.
[19]
http://www.developertutorials.com/tutorials/photoshop/extract-dialog-panel050619/page1.html
48
APPENDIX A
MATLAB DECLARATIONS
% Capture the Image from the Video
X = aviread ('video sample .avi');
I1 = frame2im (number of frame);
I2= frame2im(number of frame);
% Change to Gray Levels
I1 = rgb2gray(I1);
I2 = rgb2gray(I2);
I1_d = double(I1);
I2_d = double(I2);
% Background Subtraction
Vwidth = 384;
Vheight= 288;
for j=1:vheight
49
for i=1:vwidth
backsubtract(j,i)=(double(I1_d(j,i))-double(I2_d(j,i)));
backabs(j,i)=abs(backsubtract(j,i));
end
end
% Thresholding
for j = 1:vheight
for i= 1:vwidth
if backabs(j,i)<32
backabs(j,i)=0;
else
backabs(j,i) = 255;
end
end
end
%Median filter
MedImage = medfilt2(backabs);
MedImage1 = medfilt2(MedImage);
% Morophological Operations
BW = bwareaopen(MedImage1,50);
areaObj =bwarea(BW);
if areaObj > 500
50
se= strel('square',3);
dilate1=imdilate(BW,se);
%dilation process
dilate2=imdilate(dilate1,se);
erode1=imerode(dilate2,se);
% erosion process
Imgefilled= imfill(erode1,'holes');
else
Imgefilled= imfill(BW,'holes');
% Region filling process
end
se = strel('square',3);
erode =imerode(Imgefilled,se);
f = erode;
%Calculate the Center Point and the Distances
count=0;
for j=1:vheight
for i=1:vwidth
if f(j,i)==1
count=count+1;
end
end
end
%calculate the summation for vertical
sumc =0;
for j=1:vheight
for i=1:vwidth
if f(j,i)==1
51
sumc=sumc+j;
end
end
end
%calculate the summation for horizontal
sumr=0;
for j=1:vheight
for i=1:vwidth
if f(j,i)==1
sumr=sumr+i;
end
end
end
%Calculate the Center of the Mass
Xc =round(sumr(1,1)/count(1,1));
Yc =round(sumc/count);
% Border Extraction
z=0;
for i=yc:-1:1
if f(i,xc)==0
y_top=i+1;
break;
end
end
52
for i=xc:vwidth
if f(yc,i)==0
x_right=i-1;
break;
end
end
f_erode = imerode(f,se);
f_diff = uint8(f - f_erode);
f_out = uint8(f_erode - f);
imwrite(f_diff, 'border.bmp', 'bmp');
% Calculate the Distances between Center Point and Border from (0
_180 degree)
dist
=0
width = vwidth;
height = vheight;
white = 1;
wdist = 0;
loop = 0;
step = 1;
N = 180/step;
for theta = 0:step:180
theta_r = pi() * theta/ 180;
loop = loop + 1;
if(theta ~= 90) %disp('do');
wdist(loop,1) = 0;
53
wdist(loop,2) = 0;
for x=1:(width-xc)
if(theta > 90)
y = -x*tan(theta_r);
else
y = x*tan(theta_r);
end
if( theta > 90)
cx = round(xc - x);
else
cx = round(x + xc);
end
cy = round(yc - y);
if(cy <= 0) cy = 1;
end
if(cx <= 0) cx = 1;
end
if (f_diff(cy,cx) == white)
% calculate distance
dist = sqrt( (cy - yc).^2 + (cx - xc).^2 );
wdist(loop,1) = theta;
wdist(loop,2) = dist;
break;
% if white
end
if(loop > 1)
54
if(wdist(loop,1) == 0) wdist(loop,1) = theta;
end
if(wdist(loop,2) == 0) wdist(loop,2) = wdist(loop-1,2);
end
end
end
% for width
else
%theta != 90
wdist(loop,1) = theta;
for y=1:1:(yc-1)
if(f_diff(yc-y, xc) == white)
wdist(loop,2) = y;
break;
end
end
end
end
u = wdist(:,2);
total = sum(u);
u_norm = u/total;
% Range( 60 – 120 )degrees
u_norm_new = u_norm;
range = 60;
for i=1:range
u_norm_new(i) = 0.0;
u_norm_new(182-i) = 0.0;
55
end
%Classification step
ref = AVO;
sample =u_norm_new;
r = 60;
for i=1:r
sample(i) = 0.0;
sample(182-i) = 0.0;
ref(i) = 0.0;
ref(182-i) = 0.0;
end
resultis = sum(abs(sample - ref))
if resultis > 0.04
% THB
disp ('Classifiied as Group of human');
else
disp ('Others!');
end
56
APPENDIX B
IMAGES USED IN THE DATA BASE
5 Images used for calculate the graph of AVG
(a)
57
(b)
(c)
58
(d)
(e)
Download