Development of Gesture Database for an Adaptive

advertisement
International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04
38
Development of Gesture Database for an Adaptive
Gesture Recognition System
Azri A. AZIZ , Khairunizam WAN , SK Zaaba , Shahriman A.B ,
Nazrul H. ADNAN , Rudzuan M. Nor , M. Nasir Ayob ,
A.H. Ismail and M. Fadhil Ramly
Abstract— The application of human gesture for the
interaction between humans and computers is becoming an
impressive alternative. Particularly, hand gesture is used as a
non-verbal communication between human and machines.
Most of recent studies for gesture recognition deal with the
shape and movement of hands and also discussion on factor
that contributed to the effect of individual factors in arm
motions. This paper mainly concentrated on the development
of a gesture database to eliminate individual factors, which
affect the efficiency of the recognition system. An adaptive
gesture recognition system is proposed, and the system could
adaptively select the correspond database for the purpose of
comparison with the input gesture. A classification algorithm is
introduced to investigate whether the individual factor is the
primary cause that affects the efficiency of the recognition
system. In this study, by examining the characteristics of hand
trajectories, motion features are selected and classified by
using a statistical approach. The result shows that the
individual factor, affects the efficiency of the recognition
system. Moreover, the body structure of the performer needs
to be considered in the development of the gesture database.
Index Terms-- arm motion, human computer interaction, hand
Azri A. AZIZ is serving in
Advanced Intelligent Computing and Sustainability Research Group
2
School of Mechatronic Engineering, Universiti Malaysia Perlis,
Main Campus Pauh Putra, 02600 Perlis, Malaysia.
azriaziz@unimap.edu.my
trajectories, individual factors, statistical approach, adaptive
gesture recognition system, gesture database
I.
Khairunizam WAN is serving in
Advanced Intelligent Computing and Sustainability Research Group
School of Mechatronic Engineering, Universiti Malaysia Perlis,
Main Campus Pauh Putra, 02600 Perlis, Malaysia
SK Zaaba is serving in
Advanced Intelligent Computing and Sustainability Research Group
School of Mechatronic Engineering, Universiti Malaysia Perlis,
Main Campus Pauh Putra, 02600 Perlis, Malaysia
Shahriman A.B is serving in
Advanced Intelligent Computing and Sustainability Research Group
School of Mechatronic Engineering, Universiti Malaysia Perlis,
Main Campus Pauh Putra, 02600 Perlis, Malaysia.
,
Nazrul H. ADNAN is serving in
Advanced Intelligent Computing and Sustainability Research Group
2
School of Mechatronic Engineering, Universiti Malaysia Perlis,
Main Campus Pauh Putra, 02600 Perlis, Malaysia
Rudzuan M. Nor is serving in
Advanced Intelligent Computing and Sustainability Research Group
School of Mechatronic Engineering, Universiti Malaysia Perlis,
Main Campus Pauh Putra, 02600 Perlis, Malaysia
M. Nasir Ayob is serving in
Advanced Intelligent Computing and Sustainability Research Group
School of Mechatronic Engineering, Universiti Malaysia Perlis,
Main Campus Pauh Putra, 02600 Perlis, Malaysia
A.H. Ismail is serving in
Advanced Intelligent Computing and Sustainability Research Group
School of Mechatronic Engineering, Universiti Malaysia Perlis,
Main Campus Pauh Putra, 02600 Perlis, Malaysia
M. Fadhil Ramly is serving in
School of Mechatronic Engineering, Universiti Malaysia Perlis,
Main Campus Pauh Putra, 02600 Perlis, Malaysia.
INTRODUCTION
Human senses consist of five different senses, which are
visual, audile, touch, taste and smell. One of the most
important senses is visual. It can indirectly affect other
senses. Giving machines the ability to see and recognize is
always been a researcher’s goal. This will enables machine
to perform new things, such as receiving commands with
little information.
Nowadays, there is a growing need for more flexible and
simple way to communicate with an electronic device, from
computers to hand phones. Hand gesture recognition has
several application likes computer games and gaming
machines as a mouse replacement and machinery control,
and for example are crane and surgery machines. Moreover,
controlling computers via hand gestures can make many
applications work more intuitive than using mouse,
keyboard or other input devices [1, 2]. There are several
numbers of different human-machine interfaces, from the
typical keyboard and mouse, touch-screens to voice
activation. There is also a motion sensor based
communication, which detect the movement for a device by
using accelerometers or gyro meters. However, this system
usually is bulky and expensive. This limits it’s applications
in daily consumer products. Gesture based communication
requires no or little use of peripherals.
This research’s aim is to develop the new way of a humanmachine interaction (HMI), which is by using visual hand
gesture recognition [2-4]. Gesture recognition can be seen as
a way for computers to begin to understand human body
languages, thus building a richer bridge between machines
1210204-3737-IJECS-IJENS © August 2012 IJENS
IJENS
International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04
and humans than typical text user interfaces or even
graphical user interfaces GUI, which is still limit the
majority of input to keyboard and mouse [7]. Gesture
interaction can also be implemented into robotic through
artificial intelligence approach. For example, a robot can
recognize behaviors and gestures of human without any
other information and react accordingly. This can be used
for nursing and emergency evacuation robots, where users
would probably unable to interact normally. This can also
enables machines to recognize human emotion, by
recognizing certain gestures such as body language, or the
facial reaction. This means, one day robots will be able to
fully interact with humans independently and with such
resemblance to another human.
In this paper, the development of an adaptive recognition
system is introduced by designing the gesture database
based on body size of human. A new algorithm is proposed
to classify various motion patterns, which is a resampling
algorithm. An optical motion capture system (MoCap) is
used for extracting motion features from hand trajectories,
which movements of the arm [8]. A statistical technique is
used for the classification of motion features, which is based
on the distribution of the resampled motion data. In this
research, several gesture databases will be developed, and
are based on a body structure of human. The preliminary
research focuses on the classification of human based on
motion trajectories perform by them. It is expected that
based on motion trajectories, human could be classified by
four groups of people, which are “Fat-Tall”, “Fat-Short”,
“Thin-Tall” and “Thin-Short”.
This paper is structured as follows: Section II addresses the
related researches to the approaches, applications and
problems of recognizing the human gesture. Section III
describes the configuration of the system. Section IV
describes the proposed algorithm for the classification of
motion patterns. Section V presents the results of the
classification and the article is concluded with the summary
in section VI.
II.
39
In the application of robotic arms, humans try to control it
by analyzing the movements of their hands and arms based
on a video stream, after which the robot will mimic the
movements in almost real-time. This setup is unique as the
two cameras that it uses are capable of measuring
movements, orientation and the position of a human’s hand
and its shape over 100 times a second. The robot is capable
of doing some simple-yet impressive things such as
detecting a human clenching his fist or grabbing an object.
Duke University Medical Center researchers and their
colleagues have tested a neural system on monkeys that
enabled the animals to use their brain signals, as detected by
implanted electrodes, to control a robot arm to reach for a
piece of food. The scientists even transmitted the brain
signals over the internet, remotely controlling a robot arm
600 miles away. This could form the basis for a brainmachine interface that would allow paralyzed patients to
control the movement of prosthetic limbs.
The EyeToy is a color digital camera device, similar to a
webcam for the PlayStation 2. The technology uses
computer vision and Gesture recognition to process images
taken by the camera. This allows players to interact with
games using motion, color detection and sound, through its
built-in microphone.
All the application discussed above deal with body
movements, which is represented by human gestures. Some
of the researches may discuss on effect of individual body
characteristics for gesture movements. In their researches,
particular features from gesture motions are extracted for the
purpose of identification of a particular person among plural
people by observing characteristic of body motions [8]. A
motion classification technique is required to recognize
them. In this study, a resampling algorithm is introduced to
classify various motion patterns.
III.
SYSTEM CONFIGURATION
RELATED RESEARCHES
There are many possible applications for the proposed
research field. Visual gesture recognition can be used as a
new form of sensor, which detects movements as its input. It
can be used as a new form of interface between human and
machine. Visual gestures are not just limited to the
movement of hands. It also includes body languages, facial
reactions and movements of other parts of the body [3-5].
There is also a research that uses brain signals to emulate
the movement of limbs for paralyzed people [6]. The other
form of gesture recognition shares the same principal, to
detect, analyze and recognize the gesture.
The “Sixth sense” is a mobile, wearable gestural interface
that implements the tracking and recognition of the hand to
operate a cursor. This is achieved by using a color-based
recognition and tracking via a simple webcam [7]. The color
markers acts similarly to a computer’s mouse optical sensor
or touchpad sensor, providing the information on the
position and motion of the user’s finger. This negates the
need for common physical input devices, thus making the
system compact and light.
Fig. 1. An Optical Motion Capture System
Fig. 2. Process of obtain data and store in database
1210204-3737-IJECS-IJENS © August 2012 IJENS
IJENS
International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04
40
An optical motion capture system, which tracks one marker
simultaneously in real time, is used for the motion
measurement. Figure 1 shows the space used in the
experiments. The system is equipped with five high-speed
cameras with an image resolution 640 x 480 pixels and the
ability of capturing 200 frames per second. The movements
of hand motions are mainly dealt to analyze human gestures
by excluding hand figures, due to the complexities of the
finger configurations and occlusion problems.
The marker is attached at feature points of the body of a
performer, which is the finger of the hand. From the
captured 3D position data, the system estimates the
characteristic of body motions, which are movement of the
arm. The output from an optical motion capture system is
3D position of data. A resampling algorithm is employed to
each position of data. The gestural database contains many
substantial of data that have developed initially. An adaptive
gesture database is designed for the purpose of experiment.
Figure 2 shows the process flow to obtain and to process
database.
IV.
METHODOLOGIES
A. Resampling algorithm.
Fig. 3. The flow of the purposed system
An optical motion capture system is used to acquire motion
data in three-dimensional coordinate (X, Y, Z). The output
from an optical motion capture system is the 3D position
data as shown in Fig. 3. Different performer and different
repetitive gesture create the differences in results. This
caused by differences in speed, angle, and range of hand
movements.
A resampling algorithm is introduced to reduce the
differences between two trajectories of perform gestures.
Without resampling, it is difficult to do comparison between
data as shown in Fig. 4. For example, motion data are
resampled from 600 to 30 points. Each point is defined as
resampling point. Resampling method reduces the size of
the motion data so that comparison can be done in a simpler
manner. The captured raw data consists of more than 500
frames, and are the time-based data. This means comparing
each data would be very complicated. For simplification, the
distance base calculation is introduced to simplify the raw
data.
Fig. 4. Example 3D motion data for gesture “circle”
The value for each reference points then will be initialized.
It is to reduce the range of data between 0 and 1. Through
resampling process, 30 reference points will be created. The
first step is to find the range of movement between two
frames. The current frame is labeled as Gxyz(n) and the
previous frame is Gxyz(n-1). The range of movement
between two frames can be represented as follow:
Δ Gxyz(p) = Gxyz(n) - Gxyz(n-1)
(1)
Every value for range of movement calculated in eq. 1 was
sum up to acquire single value. The calculated values might
be in negative value, so the absolute is used for elimination
of negative sign,
1210204-3737-IJECS-IJENS © August 2012 IJENS
IJENS
International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04
|Δ Gxyz(p) |
41
(2)
Then, total amount in eq. 2 is divided to the total
reference points, and is defined by M,
| Δ Gxyz(p) | ]
[
(3)
M
Since it is concluded that there will be 30 reference
points, the sum of all movement is then divided by 30.
[
| Δ Gxyz(p) | ]
(4)
30
B. Adaptive Gesture Recognition System.
In this research, an adaptive gesture recognition system is
proposed. An adaptive here means the system will be able to
choose suitable gesture database, and are developed based
on the body structure of the performer. The database
originally stored containing substantial of various data of
gestures perform by various group of people. The
classification of motion data is designed by referring to the
body sizes of the performer. By referring to the previous
researches, hand motions are influenced by the sizes and
emotional factors of the performer, hence, classification of
gestures due to the body size of the performer could increase
the efficiency of the recognition system. An adaptive system
is needed in recognizing the unknown gesture and at the
same time to decrease a failure rate of the recognition
system.
In the preliminary experiments, the classification of gestural
motions will be done and is based on the body size of the
performer by using Neural Networks (NNs). NNs could be
used to classify of motion data. The human body sizes are
defined as a length of the center body to the head, and the
length between two shoulders. In the measurement, the
reflective markers will be attached to the body of the
performer to measure the distance between centre of the
body to the top head and the distance between two
shoulders. In the experiments, a group of male and female
subjects will be chosen. The subject is grouped to four
groups, which are Fat-Tall, Fat-Short, Thin-Tall and ThinShort as shown in Table 1. In the recognition, first the body
structure on the performer will be scan, and follow by the
selection of the gesture database that suit him/her’s body
structure. Figure 5 shows the configuration of the
recognition system.
Fig. 5. Flow of adaptive gesture recognition process.
V.
EXPERIMENT
A. Experimental setup
To acquire input data for the experiment, a Qualisys™
Motion Capture System was used as shown in Fig. 6. The
system uses five high speed cameras arranged around a
subject to create a 3-D coordinate. Reflective markers were
used to highlight desired points of interest. The cameras used
in the system are Oqus FX ProReflex high-speed camera.
They were the backbone of the Qualisys motion capture
system. They offer extreme precision and real-time
capabilities. It was capable of capturing up to 200 frames per
second. Figure 7 shows 10 geometrical forms and 20
subjects with different appearances will choose to present 10
geometrical gestures. Each gesture was repeated 10 times in
the experiments.
Fig. 6. An optical motion capture system
TABLE I
4 Groups of people to perform the gestures
Fig. 7. The Geometrical gesture used in the experiment
1210204-3737-IJECS-IJENS © August 2012 IJENS
IJENS
International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04
B. Experimental results
The captured raw data consist of 600 frames and were the
time based data as shown as in Fig. 8(a). There are many
points along the trajectory of graph x, y, z which are
difficult in comparing between points of the data.
Simplification process is introduced for raw data which are
resampled from 600 points to 30 points. The resampling
calculation can be referred to equation (1), equation (2) and
equation (4). Figure 8(b) shows the result of the resampled
data of x, y and z axis of the 3D motion data for gesture
circle given by subject #1 in group Ω. Refer on each graph
of resampled data, there are 30 reference points are
resampled and initialize in the range of data between 0 and
1. Comparison can be done in simpler manner between each
point of reference points. Each point of reference points
means the point of resampling.
Figure 9 shows the result of the average calculation
between x, y and z coordinates for each reference point. It is
representing as distance between 2 points for average of
resampled three-dimensional data. There are 29 points of
distance were obtained from the result of average
resampled data and can be use to represent the differences in
gesture database for a variety of data collected. Different of
human physical features contributed to differences in
gesture motions. Average distance for gesture “circle” for
group Ω given by five subjects are shown in Fig. 10. From
data produce by five difference subjects in group Ω, three
42
dimensional x, y, z distance data were calculated to find the
average.
Refer on the Fig. 11, all data from four difference groups of
human physical characteristic were placed together to
measure the similarity. The comparison were reveal in
Table 2 which mentioned about data that could be classified
into four different type of subjects between resampling point
#1 until resampling point #29. There were 17 resampling
points that occupy 100%, followed by 9 resampling points
of 75%, and 3 resampling points of 50%. Hence, results
show that classification of gesture motions into four groups
of different people, which were Ω (Fat-Short), α (Fat-Tall),
∆ (Thin-Short), and β (Thin-Tall) could be done. The
average of percentage to classify human to 4 groups was
87%.
Therefore, the gesture database of 4 group of human could
be designed for the recognition of gesture. The comparison
could be done in a simpler manner based on each
resampling point along the trajectories on the Fig. 11 for
these entire gesture databases. Table 3 shows the possibility
of human to be grouped to 1 Group, 2 Groups, 3 Groups and
4 Groups. The results show that the human could be
grouped to 4 groups of people with high possibility, which
was 58.6%.
Fig. 8. omparison of resampling process for 3D motion data X, Y, Z
gesture “circle” given by subject #1 in group Ω
Fig. 9. The distance between 2 points of the average resampled x, y, z data gesture “circle”
for subject #1 in group Ω
1210204-3737-IJECS-IJENS © August 2012 IJENS
IJENS
International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04
43
Fig. 9. Average distance for gesture “circle” for group Ω given by five subjects
Fig. 9. Comparison result between four groups of gesture database for gesture “circle”
1210204-3737-IJECS-IJENS © August 2012 IJENS
IJENS
International Journal of Electrical & Computer Sciences IJECS-IJENS Vol:12 No:04
TABLE II
Classification of human based on motion trajectories: Based on appearance
of graph in Fig. 11
RESAMPLING
POINTS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
POSSIBLE
NO. OF
GROUP
(Maximum =
4)
3
2
4
4
4
4
3
4
4
2
4
4
4
4
3
3
3
4
4
4
3
4
4
3
4
4
3
3
2
AVERAGE
(%)
PERCENTAGE
TO CLASSIFY
HUMAN TO 4
GROUPS
(%)
75
50
100
100
100
100
75
100
100
50
100
100
100
100
75
75
75
100
100
100
75
100
100
75
100
100
75
75
50
87.1
TABLE III
The possibility of human to be grouped to 1 Group, 2 Groups, 3 Group and
4 Groups
VI .
44
CONCLUSIONS
The study proposes a new algorithm features attraction
method for the classification of motion data. In the
experiment, a reflective marker was attached at the finger of
performer’s hand. The performer gave the geometrical
gestures, which were “circle”, “wave”, “triangle”,
“rectangle”, “semi circle”, “vertical”, “star”, “love sign”,
“zigzag” and “diamond”. High-speed motion capture system
was used to obtain motion data. The proposed resampling
algorithm was proving to work well to classify various
motion patterns. The proposed algorithm could be used in
the development of gesture recognition system. The results
show that the human could be classified to four groups
based on their body structure. By using the proposed
adaptive gesture database, the system will robustly choose
the database that suit the body structure of the performer.
ACKNOWLEDGMENT
This work is also supported by the fundamental research
grant scheme (FRGS) awarded by the Ministry of Higher
Education to Universiti Malaysia Perlis (FRGS 9003-00313)
and Short Term Research Grant Scheme (STG 9001-00363)
from Universiti Malaysia Perlis..
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
No.
No. of
Group
Percentage
Classification
(%)
1
1 Group
0
2
2 Groups
10.3
[10]
3
3 Groups
24.1
[11]
4
4 Groups
58.6
[9]
[12]
Ho-Sub Yoon, Jung Soh, Younglae J. Bae et al. "Hand gesture
recognition using combined features of location, angle and velocity,
" Image Processing Division, Computer & Software Technology
Lab., pp.305-350, 1999.
J.S. Kim, C.S. Lee, K.J. Song et al. "Real-time hand gesture
recognition for avatar motion control," In the Proceedings of
HCI’97. , 1997.
S. Seki, K. Takahashi and R. Oka, "Gesture recognition from motion
images by spotting algorithm," ACCV’93. ,1993.
Heung-Il Suk, Bong-Kee Sin, and Seong-Whan Lee, "Hand gesture
recognition based on dynamic Bayesian network framework,”
V. Pavlovic, R. Sharma and T. Huan, "Visual interpretation of hand
gestures for human–computer interaction," University of Illinois at
Urbana-Champaign, 1999.
H. Avilés-Arriaga, L. Sucar and C. Mendoza, “Visual recognition of
similar gestures,” Hong Kong, 2006.
Mohd Azri ABD AZIZ, Khairunizam WAN, Shahriman AB, Siti
Khadijah ZA’ABA, Abdul Halim ISMAIL, M.K. Ali HASSAN and
M.Nasir AYOB, "A Real Time Hand Tracking for Virtual Object
Manipulation," Malaysian Technical Universities International
Conference on Engineering & Technology (MUiCET 2011).
Khairunizam Wan and H. Sawada, "Dynamic gesture recognition
based on the probabilistic distribution of arm trajectories,"
Mechatronics and Automation, . ICMA 2008. IEEE, pp. 426-431
Nazrul H ADNAN, Khairunizam WAN and Shahriman AB,
"Accurate Measurement of the Force Sensor for Intermediate and
Proximal Phalanges of Index Finger", International Journal of
Computer Applications 45(15):59-65, 2012.
Kye Kyung Kim, Keun Chang Kwak and Su Young Chi (2006):
“Gesture Analysis for Human-Robot Interaction,” ICACT 2006.
Khairunizam Wan, Nazrul Hamizi Bin Adnan, Shahriman AB, Siti
Khadijah Za’aba, Mohd Azri ABD AZIZ and Zulkifli Md. Yusof,
“Gesture Recognition Based On Hand Postures And Trajectories By
Using Dataglove: A Fuzzy Probability Approach – A Review,”
ICoMMS 2012.
Simon Haykin, Neural Networks and Learning Machines, third ed.,
Pearson, New Jersey, 2009.
1210204-3737-IJECS-IJENS © August 2012 IJENS
IJENS
Download