International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04 38 Development of Gesture Database for an Adaptive Gesture Recognition System Azri A. AZIZ , Khairunizam WAN , SK Zaaba , Shahriman A.B , Nazrul H. ADNAN , Rudzuan M. Nor , M. Nasir Ayob , A.H. Ismail and M. Fadhil Ramly Abstract— The application of human gesture for the interaction between humans and computers is becoming an impressive alternative. Particularly, hand gesture is used as a non-verbal communication between human and machines. Most of recent studies for gesture recognition deal with the shape and movement of hands and also discussion on factor that contributed to the effect of individual factors in arm motions. This paper mainly concentrated on the development of a gesture database to eliminate individual factors, which affect the efficiency of the recognition system. An adaptive gesture recognition system is proposed, and the system could adaptively select the correspond database for the purpose of comparison with the input gesture. A classification algorithm is introduced to investigate whether the individual factor is the primary cause that affects the efficiency of the recognition system. In this study, by examining the characteristics of hand trajectories, motion features are selected and classified by using a statistical approach. The result shows that the individual factor, affects the efficiency of the recognition system. Moreover, the body structure of the performer needs to be considered in the development of the gesture database. Index Terms-- arm motion, human computer interaction, hand Azri A. AZIZ is serving in Advanced Intelligent Computing and Sustainability Research Group 2 School of Mechatronic Engineering, Universiti Malaysia Perlis, Main Campus Pauh Putra, 02600 Perlis, Malaysia. azriaziz@unimap.edu.my trajectories, individual factors, statistical approach, adaptive gesture recognition system, gesture database I. Khairunizam WAN is serving in Advanced Intelligent Computing and Sustainability Research Group School of Mechatronic Engineering, Universiti Malaysia Perlis, Main Campus Pauh Putra, 02600 Perlis, Malaysia SK Zaaba is serving in Advanced Intelligent Computing and Sustainability Research Group School of Mechatronic Engineering, Universiti Malaysia Perlis, Main Campus Pauh Putra, 02600 Perlis, Malaysia Shahriman A.B is serving in Advanced Intelligent Computing and Sustainability Research Group School of Mechatronic Engineering, Universiti Malaysia Perlis, Main Campus Pauh Putra, 02600 Perlis, Malaysia. , Nazrul H. ADNAN is serving in Advanced Intelligent Computing and Sustainability Research Group 2 School of Mechatronic Engineering, Universiti Malaysia Perlis, Main Campus Pauh Putra, 02600 Perlis, Malaysia Rudzuan M. Nor is serving in Advanced Intelligent Computing and Sustainability Research Group School of Mechatronic Engineering, Universiti Malaysia Perlis, Main Campus Pauh Putra, 02600 Perlis, Malaysia M. Nasir Ayob is serving in Advanced Intelligent Computing and Sustainability Research Group School of Mechatronic Engineering, Universiti Malaysia Perlis, Main Campus Pauh Putra, 02600 Perlis, Malaysia A.H. Ismail is serving in Advanced Intelligent Computing and Sustainability Research Group School of Mechatronic Engineering, Universiti Malaysia Perlis, Main Campus Pauh Putra, 02600 Perlis, Malaysia M. Fadhil Ramly is serving in School of Mechatronic Engineering, Universiti Malaysia Perlis, Main Campus Pauh Putra, 02600 Perlis, Malaysia. INTRODUCTION Human senses consist of five different senses, which are visual, audile, touch, taste and smell. One of the most important senses is visual. It can indirectly affect other senses. Giving machines the ability to see and recognize is always been a researcher’s goal. This will enables machine to perform new things, such as receiving commands with little information. Nowadays, there is a growing need for more flexible and simple way to communicate with an electronic device, from computers to hand phones. Hand gesture recognition has several application likes computer games and gaming machines as a mouse replacement and machinery control, and for example are crane and surgery machines. Moreover, controlling computers via hand gestures can make many applications work more intuitive than using mouse, keyboard or other input devices [1, 2]. There are several numbers of different human-machine interfaces, from the typical keyboard and mouse, touch-screens to voice activation. There is also a motion sensor based communication, which detect the movement for a device by using accelerometers or gyro meters. However, this system usually is bulky and expensive. This limits it’s applications in daily consumer products. Gesture based communication requires no or little use of peripherals. This research’s aim is to develop the new way of a humanmachine interaction (HMI), which is by using visual hand gesture recognition [2-4]. Gesture recognition can be seen as a way for computers to begin to understand human body languages, thus building a richer bridge between machines 1210204-3737-IJECS-IJENS © August 2012 IJENS IJENS International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04 and humans than typical text user interfaces or even graphical user interfaces GUI, which is still limit the majority of input to keyboard and mouse [7]. Gesture interaction can also be implemented into robotic through artificial intelligence approach. For example, a robot can recognize behaviors and gestures of human without any other information and react accordingly. This can be used for nursing and emergency evacuation robots, where users would probably unable to interact normally. This can also enables machines to recognize human emotion, by recognizing certain gestures such as body language, or the facial reaction. This means, one day robots will be able to fully interact with humans independently and with such resemblance to another human. In this paper, the development of an adaptive recognition system is introduced by designing the gesture database based on body size of human. A new algorithm is proposed to classify various motion patterns, which is a resampling algorithm. An optical motion capture system (MoCap) is used for extracting motion features from hand trajectories, which movements of the arm [8]. A statistical technique is used for the classification of motion features, which is based on the distribution of the resampled motion data. In this research, several gesture databases will be developed, and are based on a body structure of human. The preliminary research focuses on the classification of human based on motion trajectories perform by them. It is expected that based on motion trajectories, human could be classified by four groups of people, which are “Fat-Tall”, “Fat-Short”, “Thin-Tall” and “Thin-Short”. This paper is structured as follows: Section II addresses the related researches to the approaches, applications and problems of recognizing the human gesture. Section III describes the configuration of the system. Section IV describes the proposed algorithm for the classification of motion patterns. Section V presents the results of the classification and the article is concluded with the summary in section VI. II. 39 In the application of robotic arms, humans try to control it by analyzing the movements of their hands and arms based on a video stream, after which the robot will mimic the movements in almost real-time. This setup is unique as the two cameras that it uses are capable of measuring movements, orientation and the position of a human’s hand and its shape over 100 times a second. The robot is capable of doing some simple-yet impressive things such as detecting a human clenching his fist or grabbing an object. Duke University Medical Center researchers and their colleagues have tested a neural system on monkeys that enabled the animals to use their brain signals, as detected by implanted electrodes, to control a robot arm to reach for a piece of food. The scientists even transmitted the brain signals over the internet, remotely controlling a robot arm 600 miles away. This could form the basis for a brainmachine interface that would allow paralyzed patients to control the movement of prosthetic limbs. The EyeToy is a color digital camera device, similar to a webcam for the PlayStation 2. The technology uses computer vision and Gesture recognition to process images taken by the camera. This allows players to interact with games using motion, color detection and sound, through its built-in microphone. All the application discussed above deal with body movements, which is represented by human gestures. Some of the researches may discuss on effect of individual body characteristics for gesture movements. In their researches, particular features from gesture motions are extracted for the purpose of identification of a particular person among plural people by observing characteristic of body motions [8]. A motion classification technique is required to recognize them. In this study, a resampling algorithm is introduced to classify various motion patterns. III. SYSTEM CONFIGURATION RELATED RESEARCHES There are many possible applications for the proposed research field. Visual gesture recognition can be used as a new form of sensor, which detects movements as its input. It can be used as a new form of interface between human and machine. Visual gestures are not just limited to the movement of hands. It also includes body languages, facial reactions and movements of other parts of the body [3-5]. There is also a research that uses brain signals to emulate the movement of limbs for paralyzed people [6]. The other form of gesture recognition shares the same principal, to detect, analyze and recognize the gesture. The “Sixth sense” is a mobile, wearable gestural interface that implements the tracking and recognition of the hand to operate a cursor. This is achieved by using a color-based recognition and tracking via a simple webcam [7]. The color markers acts similarly to a computer’s mouse optical sensor or touchpad sensor, providing the information on the position and motion of the user’s finger. This negates the need for common physical input devices, thus making the system compact and light. Fig. 1. An Optical Motion Capture System Fig. 2. Process of obtain data and store in database 1210204-3737-IJECS-IJENS © August 2012 IJENS IJENS International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04 40 An optical motion capture system, which tracks one marker simultaneously in real time, is used for the motion measurement. Figure 1 shows the space used in the experiments. The system is equipped with five high-speed cameras with an image resolution 640 x 480 pixels and the ability of capturing 200 frames per second. The movements of hand motions are mainly dealt to analyze human gestures by excluding hand figures, due to the complexities of the finger configurations and occlusion problems. The marker is attached at feature points of the body of a performer, which is the finger of the hand. From the captured 3D position data, the system estimates the characteristic of body motions, which are movement of the arm. The output from an optical motion capture system is 3D position of data. A resampling algorithm is employed to each position of data. The gestural database contains many substantial of data that have developed initially. An adaptive gesture database is designed for the purpose of experiment. Figure 2 shows the process flow to obtain and to process database. IV. METHODOLOGIES A. Resampling algorithm. Fig. 3. The flow of the purposed system An optical motion capture system is used to acquire motion data in three-dimensional coordinate (X, Y, Z). The output from an optical motion capture system is the 3D position data as shown in Fig. 3. Different performer and different repetitive gesture create the differences in results. This caused by differences in speed, angle, and range of hand movements. A resampling algorithm is introduced to reduce the differences between two trajectories of perform gestures. Without resampling, it is difficult to do comparison between data as shown in Fig. 4. For example, motion data are resampled from 600 to 30 points. Each point is defined as resampling point. Resampling method reduces the size of the motion data so that comparison can be done in a simpler manner. The captured raw data consists of more than 500 frames, and are the time-based data. This means comparing each data would be very complicated. For simplification, the distance base calculation is introduced to simplify the raw data. Fig. 4. Example 3D motion data for gesture “circle” The value for each reference points then will be initialized. It is to reduce the range of data between 0 and 1. Through resampling process, 30 reference points will be created. The first step is to find the range of movement between two frames. The current frame is labeled as Gxyz(n) and the previous frame is Gxyz(n-1). The range of movement between two frames can be represented as follow: Δ Gxyz(p) = Gxyz(n) - Gxyz(n-1) (1) Every value for range of movement calculated in eq. 1 was sum up to acquire single value. The calculated values might be in negative value, so the absolute is used for elimination of negative sign, 1210204-3737-IJECS-IJENS © August 2012 IJENS IJENS International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04 |Δ Gxyz(p) | 41 (2) Then, total amount in eq. 2 is divided to the total reference points, and is defined by M, | Δ Gxyz(p) | ] [ (3) M Since it is concluded that there will be 30 reference points, the sum of all movement is then divided by 30. [ | Δ Gxyz(p) | ] (4) 30 B. Adaptive Gesture Recognition System. In this research, an adaptive gesture recognition system is proposed. An adaptive here means the system will be able to choose suitable gesture database, and are developed based on the body structure of the performer. The database originally stored containing substantial of various data of gestures perform by various group of people. The classification of motion data is designed by referring to the body sizes of the performer. By referring to the previous researches, hand motions are influenced by the sizes and emotional factors of the performer, hence, classification of gestures due to the body size of the performer could increase the efficiency of the recognition system. An adaptive system is needed in recognizing the unknown gesture and at the same time to decrease a failure rate of the recognition system. In the preliminary experiments, the classification of gestural motions will be done and is based on the body size of the performer by using Neural Networks (NNs). NNs could be used to classify of motion data. The human body sizes are defined as a length of the center body to the head, and the length between two shoulders. In the measurement, the reflective markers will be attached to the body of the performer to measure the distance between centre of the body to the top head and the distance between two shoulders. In the experiments, a group of male and female subjects will be chosen. The subject is grouped to four groups, which are Fat-Tall, Fat-Short, Thin-Tall and ThinShort as shown in Table 1. In the recognition, first the body structure on the performer will be scan, and follow by the selection of the gesture database that suit him/her’s body structure. Figure 5 shows the configuration of the recognition system. Fig. 5. Flow of adaptive gesture recognition process. V. EXPERIMENT A. Experimental setup To acquire input data for the experiment, a Qualisys™ Motion Capture System was used as shown in Fig. 6. The system uses five high speed cameras arranged around a subject to create a 3-D coordinate. Reflective markers were used to highlight desired points of interest. The cameras used in the system are Oqus FX ProReflex high-speed camera. They were the backbone of the Qualisys motion capture system. They offer extreme precision and real-time capabilities. It was capable of capturing up to 200 frames per second. Figure 7 shows 10 geometrical forms and 20 subjects with different appearances will choose to present 10 geometrical gestures. Each gesture was repeated 10 times in the experiments. Fig. 6. An optical motion capture system TABLE I 4 Groups of people to perform the gestures Fig. 7. The Geometrical gesture used in the experiment 1210204-3737-IJECS-IJENS © August 2012 IJENS IJENS International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04 B. Experimental results The captured raw data consist of 600 frames and were the time based data as shown as in Fig. 8(a). There are many points along the trajectory of graph x, y, z which are difficult in comparing between points of the data. Simplification process is introduced for raw data which are resampled from 600 points to 30 points. The resampling calculation can be referred to equation (1), equation (2) and equation (4). Figure 8(b) shows the result of the resampled data of x, y and z axis of the 3D motion data for gesture circle given by subject #1 in group Ω. Refer on each graph of resampled data, there are 30 reference points are resampled and initialize in the range of data between 0 and 1. Comparison can be done in simpler manner between each point of reference points. Each point of reference points means the point of resampling. Figure 9 shows the result of the average calculation between x, y and z coordinates for each reference point. It is representing as distance between 2 points for average of resampled three-dimensional data. There are 29 points of distance were obtained from the result of average resampled data and can be use to represent the differences in gesture database for a variety of data collected. Different of human physical features contributed to differences in gesture motions. Average distance for gesture “circle” for group Ω given by five subjects are shown in Fig. 10. From data produce by five difference subjects in group Ω, three 42 dimensional x, y, z distance data were calculated to find the average. Refer on the Fig. 11, all data from four difference groups of human physical characteristic were placed together to measure the similarity. The comparison were reveal in Table 2 which mentioned about data that could be classified into four different type of subjects between resampling point #1 until resampling point #29. There were 17 resampling points that occupy 100%, followed by 9 resampling points of 75%, and 3 resampling points of 50%. Hence, results show that classification of gesture motions into four groups of different people, which were Ω (Fat-Short), α (Fat-Tall), ∆ (Thin-Short), and β (Thin-Tall) could be done. The average of percentage to classify human to 4 groups was 87%. Therefore, the gesture database of 4 group of human could be designed for the recognition of gesture. The comparison could be done in a simpler manner based on each resampling point along the trajectories on the Fig. 11 for these entire gesture databases. Table 3 shows the possibility of human to be grouped to 1 Group, 2 Groups, 3 Groups and 4 Groups. The results show that the human could be grouped to 4 groups of people with high possibility, which was 58.6%. Fig. 8. omparison of resampling process for 3D motion data X, Y, Z gesture “circle” given by subject #1 in group Ω Fig. 9. The distance between 2 points of the average resampled x, y, z data gesture “circle” for subject #1 in group Ω 1210204-3737-IJECS-IJENS © August 2012 IJENS IJENS International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04 43 Fig. 9. Average distance for gesture “circle” for group Ω given by five subjects Fig. 9. Comparison result between four groups of gesture database for gesture “circle” 1210204-3737-IJECS-IJENS © August 2012 IJENS IJENS International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04 TABLE II Classification of human based on motion trajectories: Based on appearance of graph in Fig. 11 RESAMPLING POINTS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 POSSIBLE NO. OF GROUP (Maximum = 4) 3 2 4 4 4 4 3 4 4 2 4 4 4 4 3 3 3 4 4 4 3 4 4 3 4 4 3 3 2 AVERAGE (%) PERCENTAGE TO CLASSIFY HUMAN TO 4 GROUPS (%) 75 50 100 100 100 100 75 100 100 50 100 100 100 100 75 75 75 100 100 100 75 100 100 75 100 100 75 75 50 87.1 TABLE III The possibility of human to be grouped to 1 Group, 2 Groups, 3 Group and 4 Groups VI . 44 CONCLUSIONS The study proposes a new algorithm features attraction method for the classification of motion data. In the experiment, a reflective marker was attached at the finger of performer’s hand. The performer gave the geometrical gestures, which were “circle”, “wave”, “triangle”, “rectangle”, “semi circle”, “vertical”, “star”, “love sign”, “zigzag” and “diamond”. High-speed motion capture system was used to obtain motion data. The proposed resampling algorithm was proving to work well to classify various motion patterns. The proposed algorithm could be used in the development of gesture recognition system. The results show that the human could be classified to four groups based on their body structure. By using the proposed adaptive gesture database, the system will robustly choose the database that suit the body structure of the performer. ACKNOWLEDGMENT This work is also supported by the fundamental research grant scheme (FRGS) awarded by the Ministry of Higher Education to Universiti Malaysia Perlis (FRGS 9003-00313) and Short Term Research Grant Scheme (STG 9001-00363) from Universiti Malaysia Perlis.. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] No. No. of Group Percentage Classification (%) 1 1 Group 0 2 2 Groups 10.3 [10] 3 3 Groups 24.1 [11] 4 4 Groups 58.6 [9] [12] Ho-Sub Yoon, Jung Soh, Younglae J. Bae et al. "Hand gesture recognition using combined features of location, angle and velocity, " Image Processing Division, Computer & Software Technology Lab., pp.305-350, 1999. J.S. Kim, C.S. Lee, K.J. Song et al. "Real-time hand gesture recognition for avatar motion control," In the Proceedings of HCI’97. , 1997. S. Seki, K. Takahashi and R. Oka, "Gesture recognition from motion images by spotting algorithm," ACCV’93. ,1993. Heung-Il Suk, Bong-Kee Sin, and Seong-Whan Lee, "Hand gesture recognition based on dynamic Bayesian network framework,” V. Pavlovic, R. Sharma and T. Huan, "Visual interpretation of hand gestures for human–computer interaction," University of Illinois at Urbana-Champaign, 1999. H. Avilés-Arriaga, L. Sucar and C. Mendoza, “Visual recognition of similar gestures,” Hong Kong, 2006. Mohd Azri ABD AZIZ, Khairunizam WAN, Shahriman AB, Siti Khadijah ZA’ABA, Abdul Halim ISMAIL, M.K. Ali HASSAN and M.Nasir AYOB, "A Real Time Hand Tracking for Virtual Object Manipulation," Malaysian Technical Universities International Conference on Engineering & Technology (MUiCET 2011). Khairunizam Wan and H. Sawada, "Dynamic gesture recognition based on the probabilistic distribution of arm trajectories," Mechatronics and Automation, . ICMA 2008. IEEE, pp. 426-431 Nazrul H ADNAN, Khairunizam WAN and Shahriman AB, "Accurate Measurement of the Force Sensor for Intermediate and Proximal Phalanges of Index Finger", International Journal of Computer Applications 45(15):59-65, 2012. Kye Kyung Kim, Keun Chang Kwak and Su Young Chi (2006): “Gesture Analysis for Human-Robot Interaction,” ICACT 2006. Khairunizam Wan, Nazrul Hamizi Bin Adnan, Shahriman AB, Siti Khadijah Za’aba, Mohd Azri ABD AZIZ and Zulkifli Md. Yusof, “Gesture Recognition Based On Hand Postures And Trajectories By Using Dataglove: A Fuzzy Probability Approach – A Review,” ICoMMS 2012. Simon Haykin, Neural Networks and Learning Machines, third ed., Pearson, New Jersey, 2009. 1210204-3737-IJECS-IJENS © August 2012 IJENS IJENS