Human Detection with Action Recognition in Images via

advertisement
1
Human Detection with Action Recognition in Images via Piecewise Linear
Support Vector Machine
1
Reshma Manohar C, 2Safiya K.M.
1
M.Tech Student of Ilahia college of engineering, Kerala, India,reshma_manohar88@yahoo.com
2
Assistant Professor, CSE Dept, Ilahia college of engineering, Kerala,India

pattern,multiscale orientation are proposed for human
Abstract-- The main problem in the field of human detection
detection
from images is the view and posture variation.To tackle this
problem,piecewise linear support vector machine method is
proposed
in
this
paper.Here
we
use
the
piecewise
discriminative function to construct a classification boundary
that is nonlinear and can discriminate different human bodies
with action from the background.PL-SVM combines the
For the detection of humans from images first the
Histogram of Oriented Gradients and Block oriented
features are extracted.Then the extracted features are trained
by fedding into a classifier.Linear SVM is the most popular
procedure of linear SVM training and feature space
classifier but it shows poor performance when we need to
division.For the detection purpose,a cascaded detector is used
detect multiposture and multi view humans from images. It
which uses two types of features 1.Block orientation feature
is observed in experiments that multi view and multi posture
2.Histogram of oriented gradient features.When compared
humans form a manifold,which is difficult to be linearly
with other recent SVM methods,our method show more
classified from the negatives.An algorithm that requires
accuracy in detection and also have more computational
multi-view and multi-posture humans to be correctly
efficiency
classified by a linear SVM in the training process will
Index Terms--
Human detection; Piecewise linear support
vector machine; Histogram of oriented gradient; Block
increase the computation overhead.Kernel SVMs can be
used to tackle this problem but it is much more
orientation
computationaly expensive than other linear methods
I.
INTRODUCTION
Human detection from images and video frames is one of
To deal with multiview and multiposture problem, some
the main problem in image sensing field. The application
approaches use divide and conquer strategy.In divide and
include
driving,
conquer meathods the training positives are divided into
surveillance, entertainment etc. Human detection from static
subclasses and then the multiple models are trained for
video background has been developed a lot in recent years
detection.The divide and conquer method has the advantage
but the detection of humans from complex backgrounds and
of reduced empirical error in the training process and
with large view and posture variation is still a challenging
improved
problem
disadvantage is the high structural risks and false positives
The two main problem in existing human detection methods
The
are feature representation and classifier design.several
performance than global kernel SVMs.To tackle the multi-
feature
view and multi-posture problem some approaches segment
robotics,
descriptors
pedestrian
like
warning
while
HOG,V-HOG,local
binary
performance
piecewise
and
in
detection
localized
SVMs
but
the
have
main
superior
the human body into parts assuming that each part has some
deformation, lower dimensionality and non-linearity,so that
they can be better detected with a linear classifier [2], [3],
2
[4].The maximum structural risk of piecewise SVM can be
the detection efficiency. The main disadvantage is that it
derived,but the problem of how to construct the boundary of
brings high structural risk and also more false positives
piecewise decision is not addressed [5].In some cases cross
Another method to deal with multiview and multiposture
distance minimization algorithm designed to compute the
problem is segmenting human bodies into different parts
margin of non kernel SVMs.An extension of binary SVM
assuming that each part has smaller deformation and non
called
linearity, so it can be detected by linear classifier more
multicategory
SVM
is
proposed
to
include
multicategory case, but the multicategory SVM is different
accurately [4].
from our proposed PL-SVM in the training procedure and
also in theoretical basis.
A deformable part based model (DPM) is also used for
human detection. In this method the human parts are
The PL-SVM model proposed here exploits the piecewise
modeled by structure SVM with latent variables. A local
discriminative function.PL-SVM constructs a nonlinear
searching operation is done during training and detection to
classification boundary that differentiate multiple positive
optimize the location of each model. By doing so the view
subclasses and negative subclasses.Nearest point analysis on
and posture variations can be avoided .An extension of
convex hull is combined with iterative linear SVM is used
DPM is proposed which allows sharing of objects that
for training of PL-SVM.It guarentees the maximum-margin
results in more compact models [4].
of the final output.
Piecewise SVMs gain more popularity due to their superior
II. RELATED WORKS
performance over other linear SVMs. An upper bound for
Linear SVM is the most popular classifier that is used in the
the structural risk of piecewise SVM can be derived,but it
field of human detection but the main problem is that its
did not addressed the problem of how to construct a
efficiency drop out significantly while detecting multiview
piecewise decision boundary in high dimensional feature
and
space. To compute margin of non-kernel SVMs cross
multiposture
human
simultaneously[6].Through
experiments it is found that human samples with continuous
distance
view and posture variations is difficult to be linearly
[8].Multicategory SVMs are proposed as an extension to
classified by using linear SVMs.
binary SVM to include multicategory case [9].
Other methods involves Kernel SVMs which are options to
minimization
algorithm
is
designed
III. PROPOSED SYSTEM
handle multiview and multiposture variations. But when
compared to linear methods they are much more expensive.
The proposed method starts with the training phase.First a
When using kernel methods in a very high dimensional
set of training images is given as input.Then the block
space,it will increase the curse of dimensionality.
oriented and histogram of oriented gradient features are
extracted from the training images.Then the training images
Some approaches uses divide and conquer method to deal
are sampled by using K means clustering algorithm.The
with the posture problem .In divide and conquer method the
whole set of training images is divided into clusters and
training positives are first divided into subclasses and then
then each clusters are trained.This procedure will be done in
multiple models are trained for detection[7].The tree
a recursive manner. Next is the detection phase,In this phase
structure and pyramid boosting methods are used for
a test image is given as the input.By using sliding window
detection purposes. The advantage of divide and conquer
method the Block oriented and the Histogram of oriented
strategy is it reduces the empirical error and also improves
3
gradient features are extracted.The human in the image is
detected by
The training of PL-SVM is an iterative procedure that
consists of iterative division of training samples and the
feature space.When considering the convergence of
iteration,it is shown that the graph is monotonically
increasing.Thus we can say that PL-SVM is a maximal
margin classifier
Here a new kind of feature called Block Oriented feature is
used for human detection.This BO and HOG features are
Figure 1:PL-SVM training
incorporated with the cascaded PL-SVMs resulting in an
using a cascaded detector.Cascaded detector consists of a
improvement in accuracy and efficiency.
two stage classification process.when a test image is given
IV. SOLUTION METHODOLOGY
as an input,first the BO features are extracted and tested,if it
gives a positive result that is if it classified as human then
A. PL-SVM MODEL
the HOG feature is extracted and tested.If the first one gives
A PL-SVM is a combination of several linear SVMs.It can
negative result then the second classification is not
be described as
f (x) = arg max {Ck (x)}
considered
(1)
Ck(x) is the maximum membership degree.We can use
probability
function
to
define
the
membership
degree.According to the viewpoint of probability
Ck(x)=Pk(y=1|x)
(2)
Pk(y=1|x) is the probability of x to be positive.By using the
membership maximization criteria,each linear SVM in the
PL-SVM is used as a subspace for the classification
purpose.We can convert the equation f(x)=argmax {Ck(x)}
into a discriminative function
Figure 2: Human Detection
F(x) = sin (f(x))
(3)
In this paper, pedestrian detection uses a high dimensional
feature space where it is formulated as a nonlinear
classification problem.To tackle multiview and multiposture
human
detection,PL-SVM
model
is
used.The
main
difference between PL-SVM and other piecewise SVMs is
the feature space division and model training strategy.We
are training the images in PL-SVM with a membership
maximization criteria.While training,the whole feature space
is divided into subspaces and each subspace can better
discriminate the linear SVMs .When using PL-SVM the
empirical risk will be less than using linear SVMs.
B. PL-SVM TRAINING
For the training purpose,we have to first divide the human
samples into subsets.The division into subsets is done by
using K-means clustering algorithm.K-means clustering is
one of the method for vector quantization.K-means
clustering partition n observations to k clusters so that each
observation is included in the cluster having nearest
mean.After the clustering,the human samples when assigned
to same subsets with small difference will lead to a better
sample division.
4
To construct the human manifolds,Local Linear Embedding
E. FEATURE REPRESENTATION
(LLE) algorithm is used.LLE is used as a dimensionality
reduction algorithm.LLE calculates low dimensional and
neighbourhood preserving embedding by mapping each
point to a low dimensional space.When a set of human
samples in a high dimensional space is given,LLE begin by
finding the nearest neighbor of each point by using
euclidean distance.Then the LLE find out the optimal
convex combinations that are local to the nearest neighbours
to represent each sample.The final embedded space is
obtained by solving the eigen vector problem.The main
Figure 3: Feature Representation
reason for converting the high dimensional feature space to
(a) Human example. (b) HOG cells. (c) HOG feature
low dimensional is to make the computation easy
extraction in a block. (d) BO feature extraction in a cell.(e)
Stroke pattern in a cell (enlarged) with noise and its HOG
C. ALGORITHM:PL-SVM TRAINING
and BO features. (f) Region pattern in a cell with noise and
Initialization :
Given
sample
training
X={(xn,yn)},n=1,…..,N
set
and
of
K
human
initial
object
subsets
K=1,…..k,train linear SVMs as the initial Pl-SVM model
Iteration :
its HOG and BO features. (g) Visualization of the HOG
features multiplying with the SVM norm vector. (h)
Visualization of the BO features multiplying with the SVM
norm vector.
a) Calculate the membership degree ck(xn) where k=1…k
b) We select a random positive sample (xn,yn) .Then we
select k value that will maximize the membership degree of
xn
c) Then we check,whether the assignment is correct or not
that is whether the distance between positive and negative
convex hulls is reduced.if so, then we will select another
random positive sample
d) Train the linear SVMs by using the current subsets
e) If the ratio of reassigned sample is larger than the
threshold value we will again calculate the membership
degree with an incremented k value
Output:
The two features used for human detection are Histogram of
oriented gradients and block oriented features.Initially ,a
sample which is 64 × 128 pixels is divided into cells of 8×8
pixels.Each 2×2 cells are grouped into one block in sliding
method and the block overlap with each other.We have to
extract two kinds of features HOG and BO.To extract HOG
features,first the gradient orientation of each pixels in the
cells
is
calculated.Then
for
each
cell,9-dimensional
histogram of gradient orientations are calculated as
features.36 dimensional feature vector is used to represent
each block.Each of the block is described by 420 cells,that
corresponds to a 3780 dimensional HOG vector
The output of training is k sample subsets and trained PLSVM that consists of K linear SVMs
The second feature used here is Block Orientation
feature.To extract the BO features,each cells are divided
D. HUMAN DETECTION
In the proposed PL-SVM,Two kinds of features are used for
human detection.For detection, a cascaded detector is used
increase the performance
into left-right subcells and then to up-down subcells.After
division the gradients are calculated by
5
called PL-SVM. PL-SVM is a combination of multiple
Bh = max{(left subcell)∑ Ic(X) – (right subcell)∑ Ic(X)}(4)
linear SVMs and this PL-SVM has the ability to perform
Bv = max{(up subcell)∑ Ic(X) – (down subcell)∑ Ic(X)}(5)
non-linear classification.When PL-SVM is applied to
human detection,each linear SVM in the PL-SVM is
Where Ic(X) is one of the R,G,B component values at pixel
responsible for a particular cluster of humans in a specific
X
view or posture. BO features are also presented as a
compliment to the HOG features. Future work includes
The BO features are obtained by normalizing Bh and Bv.If
extending this method to detect human from videos where
we are using only HOG features for human detection some
not only static visual cues but also motion or context
false positives may result.So we are incorporating BO
information is available.
features with HOG features,which will reduce false
positives
VI. ACKNOWLEDGMENT
F.CASCADED DETECTOR
The authors wish to thank the Management and Principal
Two PL-SVM models are trained for the given set of
and Head of the Department(CSE) of Ilahia College of
training samples.One is trained with BO features and the
Engineering and Technology for their support and help in
other with HOG features.Histogram equalization and
completing this work.
median filtering method is used for the detection
procedure.The methods are applied on the test image
VII. REFERENCES
firstly.The test image is reduced to a factor of 1.1 in its
[1] Qixiang Ye, Zhenjun Han, Jianbin Jiao, and Jianzhuang
size.From each layer of pyramid the sliding windows are
Liu, “Human Detection in Images via Piecewise Linear
extracted.
Support Vector Machines,” IEEE Trans. on image
processing, vol. 22, no. 2, february 2013
The BO features are extracted from each window and it is
[2] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and
tested with the PL-SVM in the first stage.In the first stage if
D. Ramanan,“Object detection with discriminatively trained
the window is classified as human,the image is again tested
part based models,” IEEETrans. Pattern Anal. Mach. Intell.,
by extracting the HOG features in the second stage.After
vol. 32, no. 9, pp. 1627–1645, Sep.2010.
this two stages,we can decide whether it is a human or not
[3] C. H. Lampert, “An efficient divide-and-conquer
cascade for nonlinear object detection,” in Proc. IEEE Int.
In the first stage,if the window classified as non human,the
Conf. Comput. Vis. Pattern Recognit., Jun. 2010,
second stage is not used.When this scheme is used,most of
[4] P. Ott and M. Everingham, “Shared parts for deformable
the windows are rejected in the first stage and it will leads to
part-based models,” in Proc. IEEE Int. Conf. Comput. Vis.
high detection efficiency
Pattern Recognit., Jun. 2011, pp. 1513–1520.
.
[5] S. Q. Ren, D. Yang, X. Li, and Z. W. Zhuang,
V. CONCLUSION AND FUTURE WORK
“Piecewise support vector machines,” Chin. J. Comput., vol.
32, no. 1, pp. 77–85, 2009.
Robustness to the variation in view and posture is important
[6] N. Dalal and B. Triggs, “Histograms of oriented
in the field of human detection in practical application and it
gradients for human detection,” in Proc. IEEE Int. Conf.
still remains as an open problem.A solution to this problem
Comput. Vis. Pattern Recognit., Jun.2005, pp. 886–893.
is proposed here by developing a new classification method
6
[7] B. Wu and R. Nevatia, “Cluster boosted tree classifier
for multi-view,multi-pose object detection,” in Proc. IEEE
Int. Conf. Comput. Vis., Oct.
2007, pp. 1–8.
[8] Y. Li, B. Liu, X. Yang, Y. Fu, and H. Li,
“Multiconlitron: A general piecewise linear classifier,”
IEEE Trans. Neural Netw., vol. 22, no. 2,pp. 276–289, Feb.
2011.
[9] Y. Lee, Y. Lin, and G. Wahba, “Multicategory support
vector machines,” Dept. Stat., Univ. Wisconsin-Madison,
Madison, Tech. Rep. 1063, 2001.
Download