International Journal of Civil Engineering and Technology (IJCIET) Volume 10, Issue 04, April 2019, pp. 333–343, Article ID: IJCIET_10_04_034 Available online at http://www.iaeme.com/ijmet/issues.asp?JType=IJCIET&VType=10&IType=4 ISSN Print: 0976-6308 and ISSN Online: 0976-6316 © IAEME Publication Scopus Indexed HUMAN ACTION RECOGNTION USING INTEREST POINT DETECTOR WITH KTH DATASET Zahraa Salim David, Amel Hussain Abbas Department of computer science Mustansiriyah University, Baghdad, Iraq ABSTRACT Human action recognition and detection is very important in many application specially in security for monitoring and surveillance systems, for interacting field such as games, and interacting application, in this paper has focused on showing the challenges of capturing the video such as lighting, noise, scaling that exist in KTH data set, the propose method extracted corner, blob, and ridge interest point, as mention that all challenge of KTH has been tested, so the numbers of enter data are huge that’s why the classification method has choosing is K-Nearest-neighbor( KNN) which works well with big data. The accuracy is 90% with this propose algorithm. Keyword: Human Action Recognition, KTH, Corner, Blob, Ridge, KNN. Cite this Article: Zahraa Salim David, Amel Hussain Abbas, Human Action Recogntion Using Interest Point Detector with Kth Dataset. International Journal of Civil Engineering and Technology 10(4), 2019, pp. 333–343. http://www.iaeme.com/IJCIET/issues.asp?JType=IJCIET&VType=10&IType=4 1. INTRODUCTION Now days millions of videos are taken and uploaded to many application, different movement and different action are done by the same person, beside that different person can do the same action in different way, that’s make it really a huge challenge to recognize the human action. Human action recognition is the process of correctly identifying the action performed by user [1]. Actually, this subject has taken a lot of interest in the past year because it inters in many subjects such as: security field, the camera now days everywhere, so using human action information to recognize the unusual movement in road, elderly care, shopping, healthcare systems [2, 3]. Actually, in everything including surveillance issue, beside that it enters in human-computer interaction for example games [4]. Actually human action is not new field, it has been studies many Centuries ago, in 1983 Haralick, R. [5] consider the first implementation for first computer vision project that analyze the human action Figure (1). http://www.iaeme.com/IJCIET/index.asp 333 editor@iaeme.com Zahraa Salim David, Amel Hussain Abbas In recent research of this scope, Forsyth et al. [2] was interest in recovery of human poses and motion in image, Ronald Poppe et al. [4] focus on label the name of action to sequence of image, Turaga et al. [6] was interest in recognize human activity which consider higher level than actions, some research focus on human skeleton to recognize the motion[7], other method focus on tracking the person [8,9] . As mention earlier that human action recognition is enter in a lot of application, so the aim is recognizing the action in robust way that recognize the action as the human do, but a lot of problem has been faced when studying this subject, the intra-classes variation [3] which is means there is some similar action such as waking and running make it very difficult to recognize by the propose system, besides that, the input is video so a lot challenge we face: lightning, environment, and we should mention a really important problem that the same action can done in different ways, such as each person has his own way in walking different each one from the other that make it very difficult for classifier to efficiently classify as human do, beside all of that we take a data set which is consider the most challenging data set because it contain all the problem we mention, also there is some low resolution video, scale videos which make it big challenge for classifier to recognize these videos. This work try to recognize human action using corner, blob, and ridge features that extract from each frame of each video and store the features in suitable way for classifier , unsupervised machine learning algorithms (K-Nearest-Neighbor, and Support vector machine) the input video has taken from the KTH data set that split to training and testing to test the our classifier. 2. MATERIAL AND METHODS This section shows the type of data set that has been used, which methods depends on to collect the enough data to recognize the action, and which classification method is the best for this work. Figure (1) show the propose method have been used which is basically the same steps for training and testing so explain in the same Figure. Dataset Preprocessing Result Descriptor Classifier Figure 1 Explain the basic step of training or testing for the propose algorithm for human action recognition 2.1. Dataset The dataset that used here for training and testing is KTH data set that consider the most famous reliable dataset that contain human action, it is contain almost 2391 video which is: six action (running, boxing, walking, waving, jogging, clipping) and 25 persons (Male, female) each action of each person is done in four scenarios indoor, outdoor, scaling, different clothes Figure (2) shows the four situation( indoor, outdoor, scaling, different clothes) of male person do boxing action. http://www.iaeme.com/IJCIET/index.asp 334 editor@iaeme.com Human Action Recogntion Using Interest Point Detector with Kth Dataset Figure 2 Show boxing action of KTH data set in four scenarios outdoor, indoor, different clothes, and scaling. This project takes subset of KTH waving and boxing actions in all cases and all challenges in this data set the total number of videos are 200 videos, the selected subset is splitting to training 160 videos and testing groups 40 videos. 2.2. Experimental Framework We explain earlier human action recognition and how much it is important in many applications but also it face a lot of challenge, so this paper explaining how to face these challenges using corner, blob, and ridge features combination to recognize the motion of human. 2.2.1. Preprocessing As mention in earlier sections the proposed data set is a subset of the famous KTH data set, so the first phase of preprocessing is convert the video into gray scale, second phase is dividing the video into frame so we can separately working on this frame as shown in Figure (3), number of frame is 50 frame for each video this give us the most suitable information for each video. Figure 3 some frame of boxing video http://www.iaeme.com/IJCIET/index.asp 335 editor@iaeme.com Zahraa Salim David, Amel Hussain Abbas 2.2.2. Extract Information The aim of this project to recognize the motion and to do that the feature must extracted from each frame, this work depend on extract corner, blob, and ridge detectors combination, after explaining earlier the preprocessing to the video then the second phase we prepare the frame to extract the feature from it so performing Gaussian transformation in both x direction and y direction individually. Firstly introduce the Corner detector: first type of feature is corner which defines as the cross of two edges. It can extracted using harrier detector, First one who introduced harrier detector by Laptev and Lindeberg [10], perform Gaussian transformation on the image in both x direction and y direction Equ. (1), after that to extract the corner it depend on second order matrix Equ. (2), H is the value of corner after applying Equ. (3) π(π₯, π¦) = 2 2 2 1 π −[(π₯−ππ¦ ) +(π¦−ππ¦ ) ]/(2π ) 2ππ 2 (1) Where σ is the stander division and π is the center of the peak of Gaussian form. πΌ π₯2(π₯, π¦) π(π₯, π¦) = ∑π’,π£ π€(π’, π£) ∗ [ πΌπ₯ πΌπ¦ (π₯, π¦ πΌπ₯ πΌπ¦ (π₯, π¦) ] πΌ π¦2(π₯, π¦) (2) Where πΌπ₯ deviation in x dimension is πΌπ¦ is deviation in y dimension, π€(π’, π£) is weighted window of Gaussian. π» = det(M) − ktrace(M) , H > 0 (3) Where: πππ‘ = (πΌπ₯ πΌπ¦ ) − πΌπ₯,π¦ 2 and ππ‘ππππ = (πΌπ₯ + πΌπ¦ )2 Blob detector: second type of feature that has been extracted is blob feature, Blob actually in computer vision is the areas that have different properties for example color or brightness, it can be extracted using Hessian detector, it was introduce by [11], where this detector extracts spatio-temporal Gaussian blob depends on second order matrix Equ. (2). det(π») = πΌπ₯π₯ πΌπ¦π¦ − πΌπ₯π¦ (4) Where πΌπ₯π₯ , πΌπ₯π¦ , and πΌπ¦π¦ are second-order image derivatives computed using Gaussian function of standard deviation σ Equ. (1). Ridge detector: a lot of people confuse about the ridge and corner and think it is the same thing, ridge are set of curves not a corners, it was first introduce by [5]. π = β(πΌπ₯π₯ − πΌπ¦π¦ )2 + 4(πΌπ₯π¦ 2 )β (5) Where πΌπ₯π₯ , πΌπ₯π¦ , and πΌπ¦π¦ are second-order image derivatives computed using Gaussian function of standard deviation σ Equ. (1). The last stage of feature detection of all type of feature mention earlier is local maxima that’s applied on the extracted feature that determine the strongest point from each detector result depend on number determine manually (20 points for each detector) beside that it’s determining the final location of interest point from blob, corner, and ridge, this data is represent as vector of N-1 dimension. 2.2.3. Classification method After the feature has been prepared in specific form then these feature inter to new stage which is classification these video belong to which class. This process depend on the propose classifier which is the K-Nearest-Neighbor (KNN) classifier. http://www.iaeme.com/IJCIET/index.asp 336 editor@iaeme.com Human Action Recogntion Using Interest Point Detector with Kth Dataset KNN is simple algorithm that depend on the similarity in classification, it store all the cases Detail of the training set and classify the new case(testing feature) base how close this point to the identified classes in feature space, it calculate the distance between point depend on Euclidean distance function Equ. (6) π(π₯, π¦) = βπ₯ − π¦β = √(π₯ − π¦). (π₯ − π¦) 2 1/2 = (∑π π=1((π₯π − π¦π ) ))) (6) m Where x and y are histogram in X=R The advantage of KNN and why it has been used, it works on large data also on multiple model classes and its classification on small neighborhood similarity, that’s will lead to good accuracy. 3. EXPREMENTIAL RESULT This section explain the result after Appling the methods of preprocessing, detector, and classification on the proposed KTH. The main challenge focused in this paper is the KTH data set difficulties which takes almost all cases of each action, in our point of view the creator of the KTH [8] try to take all the cases when normal camera in shops and roads capture the action and what the challenge, such as low resolution video or scaled video Figure (4), also it’s capture in different lightning different view point on each 25 person, another important thing that each person do actions in his own way so this dataset takes almost all the behaviors of the action, such as the boxing is done by 25 person that means 25 different way for the same action this is such challenge cause we talk about huge amount of data enter to the classifier (KNN) in training stage above 8300 features, and testing stage 200 feature for one video, all of this challenge has been taken in consider in result bellow. (a) (b) (c) Figure (4.a) person do the boxing action in one video of KTH dataset, (4.b) the same person in same video after scaling, (4.c) low resolution video in KTH video for person do boxing action. First thing we try to detect only corner using harrier detector, from each frame and we get the result that shown in Figure (5), Table (1). http://www.iaeme.com/IJCIET/index.asp 337 editor@iaeme.com Zahraa Salim David, Amel Hussain Abbas Figure 5 confusion matrix of classification two classes (boxing, and waving) using KNN classifier and harrier detector. After that we try to detect blob also using Hessian Detector and the corner as combination we get the result in Figure (6). Figure 6 confusion matrix of classification two classes (boxing, and walking) depend on KNN classifier and using harrier and Hessian Detectors After that we try improve the accuracy so third detector is calculated which is ridge detector and we get the result in Figure (7). Figure 7 confusion matrix of classification two classes (boxing, and walking) depend on KNN classifier and using harrier, Hessian, and ridge detectors. http://www.iaeme.com/IJCIET/index.asp 338 editor@iaeme.com Human Action Recogntion Using Interest Point Detector with Kth Dataset Table 1 summary of the methods and result of human action recognition using KNN classifier Method Detector type Harrier Corner Harrier and Hessian Corner and Blob Harrier, Hessian, and Corner, Blob, and ridge ridge Accuracy 67% 85% 90% Negative Predictions in KNN waving class 32% 21% 17% Negative Predictions in KNN boxing class 34% 10% 6% 4. CONCLUSION This work try’s to recognize the human actions (boxing and waving) which consider important in security, to do that its depend on extracting interest point (corner, blob, and ridge) the learning algorithm using here is K-Nearest-Neighbor (KNN) algorithm that works well with huge amount of data, KTH data set contains a lot of challenge and our methodology tested on all this challenge so detect the corner only we get almost 75% accuracy of recognition, then combing corner with blob features gets accuracy of 85%, the last thing tested in this paper is the combination of corner, blob, and ridge features get the accuracy of 90%. Something obvious that the accuracy of the classification is actually depends on the entered data and the characteristic of it, so till now there is no perfect way for classify the action in all situation, in future work propose different classification method could get better accuracy. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] Qluwatoyin popoola and kejun wang, “Video based abnormal human behavior recognition”, IEEE, 2012. A. Forsyth and M. M. Fleck. Body plans. In IEEE Conference on Computer Vision and Pattern Recognition, pages 678–683, Puerto Rico, 1997. Ruoyun Gao,” Dynamic Feature Description in Human Action Recognition Dynamic Feature Description in Human Action Recognition”, Leiden Institute of Advanced Computer Science, Leiden University,2009. Ronald Poppe, Mannes Poel, Discriminative human action recognition using pairwise CSP classifiers, in: Proceedings of the International Conference on Automatic Face and Gesture Recognition (FGR’08), September 2008. Haralick, R. "Ridges and Valleys on Digital Images". Computer Vision, Graphics, and Image Processing,1983. Turaga, P. K., Chellappa, R., Subrahmanian, V. S., and Udrea, O. (2008). Machine recognition of human activities: a survey. Proc. IEEE Trans. Circuits Syst. Video Technol. 2008. Destelle F, Ahmadi A, O’Connor NE, Moran K, Chatzitofis A, Zarpalas D, Daras P (2014) Low-cost accurate skeleton tracking based on fusion of kinect and wearable inertial sensors. In: Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European, pp 371–375 Helten T, Muller M, Seidel HP, Theobalt C (2013) Real-time body tracking with one depth camera and inertial sensors. In: Computer Vision (ICCV), 2013 I.E. International Conference on, pp 1105–1112 Tian Y, Meng X, Tao D, Liu D, Feng C (2015) Upper limb motion tracking with the integration of IMU and Kinect. Neurocomputing 159:207–218 I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003. G. Willems, T. Tuytelaars, and L. Van Gool. An efο¬cient dense and scale-invariant spatiotemporal interest point detector. In ECCV, 2008. [Schuldt, Laptev and Caputo, Proc. ICPR'04, Cambridge, UK], 2005. http://www.iaeme.com/IJCIET/index.asp 339 editor@iaeme.com Zahraa Salim David, Amel Hussain Abbas [13] [14] [15] [16] [17] [18] [19] Jesper Birksø Bækdahl,” Human Action Recognition using Bag of Features”, IEEE,2016. Chen Chen & Roozbeh Jafari & Nasser Kehtarnavaz, “A survey of depth and inertial sensor fusion for human action recognition”, Springer, 2015. Chen C, Kehtarnavaz N, Jafari R (2014) A medication adherence monitoring system for pill bottles based on a wearable inertial sensor. In: Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE, pp 4983–4986 Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Moore R (2013) Realtime human pose recognition in parts from single depth images. Commun ACM 56(1):116– 124 Raviteja Vemulapalli, Felipe Arrate and Rama Chellappa, “Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group”, Computer Vision Foundation, IEEE (2014) Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Proceedings of the IEEE International Conference on Image Processing. Canada. Jiang Wang, Zicheng Liu, Ying Wu and Junsong Yuan. Mining actionlet ensemble for action recognition with depth cameras. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1290–1297. IEEE, 2012. (Cited on pages 14, 21, 70 and 205.). http://www.iaeme.com/IJCIET/index.asp 340 editor@iaeme.com