Uploaded by Information Iaeme

HUMAN ACTION RECOGNTION USING INTEREST POINT DETECTOR WITH KTH DATASET

advertisement
International Journal of Civil Engineering and Technology (IJCIET)
Volume 10, Issue 04, April 2019, pp. 333–343, Article ID: IJCIET_10_04_034
Available online at http://www.iaeme.com/ijmet/issues.asp?JType=IJCIET&VType=10&IType=4
ISSN Print: 0976-6308 and ISSN Online: 0976-6316
© IAEME Publication
Scopus Indexed
HUMAN ACTION RECOGNTION USING
INTEREST POINT DETECTOR WITH KTH
DATASET
Zahraa Salim David, Amel Hussain Abbas
Department of computer science
Mustansiriyah University, Baghdad, Iraq
ABSTRACT
Human action recognition and detection is very important in many application
specially in security for monitoring and surveillance systems, for interacting field such
as games, and interacting application, in this paper has focused on showing the
challenges of capturing the video such as lighting, noise, scaling that exist in KTH data
set, the propose method extracted corner, blob, and ridge interest point, as mention that
all challenge of KTH has been tested, so the numbers of enter data are huge that’s why
the classification method has choosing is K-Nearest-neighbor( KNN) which works well
with big data. The accuracy is 90% with this propose algorithm.
Keyword: Human Action Recognition, KTH, Corner, Blob, Ridge, KNN.
Cite this Article: Zahraa Salim David, Amel Hussain Abbas, Human Action
Recogntion Using Interest Point Detector with Kth Dataset. International Journal of
Civil Engineering and Technology 10(4), 2019, pp. 333–343.
http://www.iaeme.com/IJCIET/issues.asp?JType=IJCIET&VType=10&IType=4
1. INTRODUCTION
Now days millions of videos are taken and uploaded to many application, different movement
and different action are done by the same person, beside that different person can do the same
action in different way, that’s make it really a huge challenge to recognize the human action.
Human action recognition is the process of correctly identifying the action performed by user
[1]. Actually, this subject has taken a lot of interest in the past year because it inters in many
subjects such as: security field, the camera now days everywhere, so using human action
information to recognize the unusual movement in road, elderly care, shopping, healthcare
systems [2, 3]. Actually, in everything including surveillance issue, beside that it enters in
human-computer interaction for example games [4].
Actually human action is not new field, it has been studies many Centuries ago, in 1983
Haralick, R. [5] consider the first implementation for first computer vision project that analyze
the human action Figure (1).
http://www.iaeme.com/IJCIET/index.asp
333
editor@iaeme.com
Zahraa Salim David, Amel Hussain Abbas
In recent research of this scope, Forsyth et al. [2] was interest in recovery of human poses
and motion in image, Ronald Poppe et al. [4] focus on label the name of action to sequence of
image, Turaga et al. [6] was interest in recognize human activity which consider higher level
than actions, some research focus on human skeleton to recognize the motion[7], other method
focus on tracking the person [8,9] .
As mention earlier that human action recognition is enter in a lot of application, so the aim
is recognizing the action in robust way that recognize the action as the human do, but a lot of
problem has been faced when studying this subject, the intra-classes variation [3] which is
means there is some similar action such as waking and running make it very difficult to
recognize by the propose system, besides that, the input is video so a lot challenge we face:
lightning, environment, and we should mention a really important problem that the same action
can done in different ways, such as each person has his own way in walking different each one
from the other that make it very difficult for classifier to efficiently classify as human do, beside
all of that we take a data set which is consider the most challenging data set because it contain
all the problem we mention, also there is some low resolution video, scale videos which make
it big challenge for classifier to recognize these videos.
This work try to recognize human action using corner, blob, and ridge features that extract
from each frame of each video and store the features in suitable way for classifier , unsupervised
machine learning algorithms (K-Nearest-Neighbor, and Support vector machine) the input
video has taken from the KTH data set that split to training and testing to test the our classifier.
2. MATERIAL AND METHODS
This section shows the type of data set that has been used, which methods depends on to collect
the enough data to recognize the action, and which classification method is the best for this
work. Figure (1) show the propose method have been used which is basically the same steps
for training and testing so explain in the same Figure.
Dataset
Preprocessing
Result
Descriptor
Classifier
Figure 1 Explain the basic step of training or testing for the propose algorithm for human action
recognition
2.1. Dataset
The dataset that used here for training and testing is KTH data set that consider the most famous
reliable dataset that contain human action, it is contain almost 2391 video which is: six action
(running, boxing, walking, waving, jogging, clipping) and 25 persons (Male, female) each
action of each person is done in four scenarios indoor, outdoor, scaling, different clothes Figure
(2) shows the four situation( indoor, outdoor, scaling, different clothes) of male person do
boxing action.
http://www.iaeme.com/IJCIET/index.asp
334
editor@iaeme.com
Human Action Recogntion Using Interest Point Detector with Kth Dataset
Figure 2 Show boxing action of KTH data set in four scenarios outdoor, indoor, different clothes, and
scaling.
This project takes subset of KTH waving and boxing actions in all cases and all challenges
in this data set the total number of videos are 200 videos, the selected subset is splitting to
training 160 videos and testing groups 40 videos.
2.2. Experimental Framework
We explain earlier human action recognition and how much it is important in many applications
but also it face a lot of challenge, so this paper explaining how to face these challenges using
corner, blob, and ridge features combination to recognize the motion of human.
2.2.1. Preprocessing
As mention in earlier sections the proposed data set is a subset of the famous KTH data set, so
the first phase of preprocessing is convert the video into gray scale, second phase is dividing
the video into frame so we can separately working on this frame as shown in Figure (3), number
of frame is 50 frame for each video this give us the most suitable information for each video.
Figure 3 some frame of boxing video
http://www.iaeme.com/IJCIET/index.asp
335
editor@iaeme.com
Zahraa Salim David, Amel Hussain Abbas
2.2.2. Extract Information
The aim of this project to recognize the motion and to do that the feature must extracted from
each frame, this work depend on extract corner, blob, and ridge detectors combination, after
explaining earlier the preprocessing to the video then the second phase we prepare the frame to
extract the feature from it so performing Gaussian transformation in both x direction and y
direction individually.
Firstly introduce the Corner detector: first type of feature is corner which defines as the
cross of two edges. It can extracted using harrier detector, First one who introduced harrier
detector by Laptev and Lindeberg [10], perform Gaussian transformation on the image in both
x direction and y direction Equ. (1), after that to extract the corner it depend on second order
matrix Equ. (2), H is the value of corner after applying Equ. (3)
𝑓(π‘₯, 𝑦) =
2
2
2
1
𝑒 −[(π‘₯−πœ‡π‘¦ ) +(𝑦−πœ‡π‘¦ ) ]/(2𝜎 )
2πœ‹πœŽ 2
(1)
Where σ is the stander division and πœ‹ is the center of the peak of Gaussian form.
𝐼 π‘₯2(π‘₯, 𝑦)
𝑀(π‘₯, 𝑦) = ∑𝑒,𝑣 𝑀(𝑒, 𝑣) ∗ [
𝐼π‘₯ 𝐼𝑦 (π‘₯, 𝑦
𝐼π‘₯ 𝐼𝑦 (π‘₯, 𝑦)
]
𝐼 𝑦2(π‘₯, 𝑦)
(2)
Where 𝐼π‘₯ deviation in x dimension is 𝐼𝑦 is deviation in y dimension, 𝑀(𝑒, 𝑣) is weighted window
of Gaussian.
𝐻 = det(M) − ktrace(M) , H > 0
(3)
Where: 𝑑𝑒𝑑 = (𝐼π‘₯ 𝐼𝑦 ) − 𝐼π‘₯,𝑦 2 and π‘˜π‘‘π‘Ÿπ‘Žπ‘π‘’ = (𝐼π‘₯ + 𝐼𝑦 )2
Blob detector: second type of feature that has been extracted is blob feature, Blob actually in
computer vision is the areas that have different properties for example color or brightness,
it can be extracted using Hessian detector, it was introduce by [11], where this detector extracts
spatio-temporal Gaussian blob depends on second order matrix Equ. (2).
det(𝐻) = 𝐼π‘₯π‘₯ 𝐼𝑦𝑦 − 𝐼π‘₯𝑦
(4)
Where 𝐼π‘₯π‘₯ , 𝐼π‘₯𝑦 , and 𝐼𝑦𝑦 are second-order image derivatives computed using Gaussian function
of standard deviation σ Equ. (1).
Ridge detector: a lot of people confuse about the ridge and corner and think it is the same
thing, ridge are set of curves not a corners, it was first introduce by [5].
𝑅 = β€–(𝐼π‘₯π‘₯ − 𝐼𝑦𝑦 )2 + 4(𝐼π‘₯𝑦 2 )β€–
(5)
Where 𝐼π‘₯π‘₯ , 𝐼π‘₯𝑦 , and 𝐼𝑦𝑦 are second-order image derivatives computed using Gaussian function
of standard deviation σ Equ. (1).
The last stage of feature detection of all type of feature mention earlier is local maxima
that’s applied on the extracted feature that determine the strongest point from each detector
result depend on number determine manually (20 points for each detector) beside that it’s
determining the final location of interest point from blob, corner, and ridge, this data is represent
as vector of N-1 dimension.
2.2.3. Classification method
After the feature has been prepared in specific form then these feature inter to new stage which
is classification these video belong to which class. This process depend on the propose classifier
which is the K-Nearest-Neighbor (KNN) classifier.
http://www.iaeme.com/IJCIET/index.asp
336
editor@iaeme.com
Human Action Recogntion Using Interest Point Detector with Kth Dataset
KNN is simple algorithm that depend on the similarity in classification, it store all the cases
Detail of the training set and classify the new case(testing feature) base how close this point to
the identified classes in feature space, it calculate the distance between point depend on
Euclidean distance function Equ. (6)
𝑑(π‘₯, 𝑦) = β€–π‘₯ − 𝑦‖ = √(π‘₯ − 𝑦). (π‘₯ − 𝑦)
2
1/2
= (∑π‘š
𝑖=1((π‘₯𝑖 − 𝑦𝑖 ) )))
(6)
m
Where x and y are histogram in X=R
The advantage of KNN and why it has been used, it works on large data also on multiple
model classes and its classification on small neighborhood similarity, that’s will lead to good
accuracy.
3. EXPREMENTIAL RESULT
This section explain the result after Appling the methods of preprocessing, detector, and
classification on the proposed KTH. The main challenge focused in this paper is the KTH data
set difficulties which takes almost all cases of each action, in our point of view the creator of
the KTH [8] try to take all the cases when normal camera in shops and roads capture the action
and what the challenge, such as low resolution video or scaled video Figure (4), also it’s capture
in different lightning different view point on each 25 person, another important thing that each
person do actions in his own way so this dataset takes almost all the behaviors of the action,
such as the boxing is done by 25 person that means 25 different way for the same action this is
such challenge cause we talk about huge amount of data enter to the classifier (KNN) in training
stage above 8300 features, and testing stage 200 feature for one video, all of this challenge has
been taken in consider in result bellow.
(a)
(b)
(c)
Figure (4.a) person do the boxing action in one video of KTH dataset, (4.b) the same person
in same video after scaling, (4.c) low resolution video in KTH video for person do boxing
action.
First thing we try to detect only corner using harrier detector, from each frame and we get
the result that shown in Figure (5), Table (1).
http://www.iaeme.com/IJCIET/index.asp
337
editor@iaeme.com
Zahraa Salim David, Amel Hussain Abbas
Figure 5 confusion matrix of classification two classes (boxing, and waving) using KNN classifier
and harrier detector.
After that we try to detect blob also using Hessian Detector and the corner as combination
we get the result in Figure (6).
Figure 6 confusion matrix of classification two classes (boxing, and walking) depend on KNN
classifier and using harrier and Hessian Detectors
After that we try improve the accuracy so third detector is calculated which is ridge detector
and we get the result in Figure (7).
Figure 7 confusion matrix of classification two classes (boxing, and walking) depend on KNN
classifier and using harrier, Hessian, and ridge detectors.
http://www.iaeme.com/IJCIET/index.asp
338
editor@iaeme.com
Human Action Recogntion Using Interest Point Detector with Kth Dataset
Table 1 summary of the methods and result of human action recognition using KNN classifier
Method
Detector type
Harrier
Corner
Harrier and Hessian Corner and Blob
Harrier, Hessian, and Corner, Blob, and
ridge
ridge
Accuracy
67%
85%
90%
Negative Predictions in
KNN waving class
32%
21%
17%
Negative Predictions in KNN
boxing class
34%
10%
6%
4. CONCLUSION
This work try’s to recognize the human actions (boxing and waving) which consider important
in security, to do that its depend on extracting interest point (corner, blob, and ridge) the learning
algorithm using here is K-Nearest-Neighbor (KNN) algorithm that works well with huge
amount of data, KTH data set contains a lot of challenge and our methodology tested on all this
challenge so detect the corner only we get almost 75% accuracy of recognition, then combing
corner with blob features gets accuracy of 85%, the last thing tested in this paper is the
combination of corner, blob, and ridge features get the accuracy of 90%.
Something obvious that the accuracy of the classification is actually depends on the entered
data and the characteristic of it, so till now there is no perfect way for classify the action in all
situation, in future work propose different classification method could get better accuracy.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
Qluwatoyin popoola and kejun wang, “Video based abnormal human behavior recognition”,
IEEE, 2012.
A. Forsyth and M. M. Fleck. Body plans. In IEEE Conference on Computer Vision and
Pattern Recognition, pages 678–683, Puerto Rico, 1997.
Ruoyun Gao,” Dynamic Feature Description in Human Action Recognition Dynamic
Feature Description in Human Action Recognition”, Leiden Institute of Advanced
Computer Science, Leiden University,2009.
Ronald Poppe, Mannes Poel, Discriminative human action recognition using pairwise CSP
classifiers, in: Proceedings of the International Conference on Automatic Face and Gesture
Recognition (FGR’08), September 2008.
Haralick, R. "Ridges and Valleys on Digital Images". Computer Vision, Graphics, and
Image Processing,1983.
Turaga, P. K., Chellappa, R., Subrahmanian, V. S., and Udrea, O. (2008). Machine
recognition of human activities: a survey. Proc. IEEE Trans. Circuits Syst. Video Technol.
2008.
Destelle F, Ahmadi A, O’Connor NE, Moran K, Chatzitofis A, Zarpalas D, Daras P (2014)
Low-cost accurate skeleton tracking based on fusion of kinect and wearable inertial sensors.
In: Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European, pp
371–375
Helten T, Muller M, Seidel HP, Theobalt C (2013) Real-time body tracking with one depth
camera and inertial sensors. In: Computer Vision (ICCV), 2013 I.E. International
Conference on, pp 1105–1112
Tian Y, Meng X, Tao D, Liu D, Feng C (2015) Upper limb motion tracking with the
integration of IMU and Kinect. Neurocomputing 159:207–218
I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003.
G. Willems, T. Tuytelaars, and L. Van Gool. An efficient dense and scale-invariant spatiotemporal interest point detector. In ECCV, 2008.
[Schuldt, Laptev and Caputo, Proc. ICPR'04, Cambridge, UK], 2005.
http://www.iaeme.com/IJCIET/index.asp
339
editor@iaeme.com
Zahraa Salim David, Amel Hussain Abbas
[13]
[14]
[15]
[16]
[17]
[18]
[19]
Jesper Birksø Bækdahl,” Human Action Recognition using Bag of Features”, IEEE,2016.
Chen Chen & Roozbeh Jafari & Nasser Kehtarnavaz, “A survey of depth and inertial sensor
fusion for human action recognition”, Springer, 2015.
Chen C, Kehtarnavaz N, Jafari R (2014) A medication adherence monitoring system for pill
bottles based on a wearable inertial sensor. In: Engineering in Medicine and Biology Society
(EMBC), 2014 36th Annual International Conference of the IEEE, pp 4983–4986
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Moore R (2013) Realtime human pose recognition in parts from single depth images. Commun ACM 56(1):116–
124
Raviteja Vemulapalli, Felipe Arrate and Rama Chellappa, “Human Action Recognition by
Representing 3D Skeletons as Points in a Lie Group”, Computer Vision Foundation, IEEE
(2014)
Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: a multimodal dataset for human
action recognition utilizing a depth camera and a wearable inertial sensor. In: Proceedings
of the IEEE International Conference on Image Processing. Canada.
Jiang Wang, Zicheng Liu, Ying Wu and Junsong Yuan. Mining actionlet ensemble for
action recognition with depth cameras. In Computer Vision and Pattern Recognition
(CVPR), 2012 IEEE Conference on, pages 1290–1297. IEEE, 2012. (Cited on pages 14, 21,
70 and 205.).
http://www.iaeme.com/IJCIET/index.asp
340
editor@iaeme.com
Related documents
Download