Uploaded by Nandita Tiwari

Mood detection of a Person Face Expressions and Lip Movement - Copy

advertisement
MOOD DETECTION OF A PERSON THROUGH FACE
EXPRESSION AND LIP MOVEMENTS
Nandita Tiwari and Dr. Dinesh Chandra Jain
Research Scholar, Professor
Computer Science and Engineering, SIRTS Bhopal
nandita.0119@gmail.com
dineshwebsys@gmail.com
Abstract – Mood detection of a person has become a
new field of interest for the edicts of Artificial
Intelligence. A variety of applications exist today that
require mood detection of a person. Mood detection
can be done on the basis of facial expression, speech,
lip movement, etc. In this paper, we will focus on
detecting the mood of a person depending on facial
expressions and speech. For this purpose we will
concentrate on the motion of lips of that person in
various conditions. The main part of this paper will
focus on the speed of speech and the pitch of voice in
order to judge the mood. This paper proposes the idea
of detecting the mood of a person on the basis of his
voice and facial expression. The future robots will
also contain this very idea as an in – built function.
We have focused on six basic moods of a person in
this paper, they are: happy, sad, surprise, disgust,
angry and fear. For facial expressions recognition
SpE2DPCA algorithm is used whereas for pitch
detection AMDF is being used. The database of this
research will contain the video recordings of persons
in different emotional conditions. The conclusions of
this research can be used in making the smart robots
which are capable enough that they can understand
the mood of a person with the help of their speech and
lip movement.
means of communication. With the mood, lip
movement and pitch of voice of a person is
also affected. For instance, if a person is
angry or hyper his lip movement may
become very fast and pitch of voice may
become very high while speaking. Or if he is
sad, he may speak in a comparatively low
pitch of voice with a slower lip movement.
When a person is fearful or surprised he may
not speak at all or he may speak minimum
with a very low voice.
Keywords – Mood detection, SpE2DPCA, AMDF,
pitch detection.
A speech signal is introduced into a medium
by a vibrating object as vocal folds in throat.
This is the source of the disturbance that
moves through the medium. Each spoken
word is created using the phonetic
combination of a set of vowel semivowel and
consonant speech sound units. Different
stress is applied by vocal cord of a person for
particular emotion [6].
I.
INTRODUCTION
Mood detection of a person is one of the
most emerging area of Research in the field
of Artificial Intelligence, Robotics and
Medical Applications. It is not an easy task
to detect the mood of a person using
machines. Thus certain limitations exist like
age, similar facial features.
Voice is a verbal means of communication,
whereas facial expression is a non – verbal
The facial expressions have a considerable
effect on a listening interlocutor; the facial
expression of a speaker accounts for about 55
percent of the effect, 38 percent of the latter
is conveyed by voice intonation and 7
percent by the spoken words. As a
consequence of the information that they
carry, facial expressions can play an
important role wherever humans interact
with machines [1].
Facial expressions are generated by
contractions official muscles, which results
in temporally deformed facial features such
as eye lids, eye brows, nose, lips and skin
texture, often revealed by wrinkles and
bulges. Facial expressions give us
information about the emotional state of the
person. Moreover, these expressions help in
understanding the overall mood of the person
in a better way [7].
The basic moods of a person are HAPPY,
SAD, SURPRISE, DISGUST, ANGRY,
FEAR. In present scenario, we find that
robots are not working on voice pitch and
speech together. We are proposing a new
concept with robotics that can recognize the
speech using lip movement of the owner.
The Robot may sense the mood of his owner
from their lip movement and they can then
act accordingly.
Robots of future will be smart enough that
they will understand the mood of their owner
with the help of lip movement. Voice of the
person will also be used for this context.
Robots will observe the pitch and speed of
speech in order to understand the mood of
their owner.
II.
ALGORITHM
We have performed experiments on the Real
Time Database. The database contains the
video recordings of six basic moods of a
person. The database will contain different
recordings of the person which will represent
different moods based on the created
atmosphere.
For
feature
extraction
SpE2DPCA
algorithms are being used whereas, for
detecting the speed of lip movement and
pitch of voice we have used AMDF (Average
Magnitude Difference function). Combined
results of both the algorithms will give us the
required result.
Step 2: Capture the videos of all the
environments.
Step 3: Study the speed of lip movement and
pitch of voice.
Step 4: Compare the results of lip movement
and voice in all the environments.
Step 5: Conclude the results.
III.
DATA FLOW DIAGRAM
Create
environment
for different
moods
Capture the
videos in all
the
environments
Study the
speed of lip
movement and
calculate the
pitch and
speed of voice
Apply
Techniques like
SpE2DPCA and
AMDF for the
study of
expressions and
lip movement
Compare the
results in all
the
environments
Compare the
results of all
the moods and
conclude them
The methodology used to detect different
moods is described as follows:
Step 1: Create the environment for each
mood so that the person being captured
speaks according to that environment.
Fig.2: Data Flow Diagram showing the steps to be
followed
IV.
METHOD
For performing our experiment, firstly we
need to create different environments so that
different moods can be established. For this
purpose, for instance, we will display any
humorous act in front of the person with
whom we are conducting our experiment.
After the completion of the act we will tell
all of them to start a conversation. While the
conversation continues, we will capture their
video recordings as the input for our
experiment.
In the next step, study of lip movement and
voice is performed in order to calculate the
results. Study of facial expression (lip
movement) will be done using SpE2DPCA
(Sub pattern Extended 2-dimensional
Principal Component Analysis). For studying
the voice recorded, we will apply AMDF in
order to calculate the speed and pitch of
speech. The results of SpE2DPCA and
HAPPY
DISGUST
SURPRISE
SAD
FEAR
ANGRY
AMDF collectively will give the final results.
Now perform the above steps for all the
different moods, so that the results for all the
moods can be compared at the end. Each
time, the created environment will give a
different result. The results for each mood
type vary greatly. Comparison between all
types of moods will be done in order to
conclude the results.
Fig.1: Six Basic Human Facial Expressions
V.
TECHNIQUES
Human facial expression recognition
problem is composed of three problem areas
also includes [8]:
(1) Finding faces view,
(2) Extracting facial features and/or facial
features change as the speed of analysis, and
some facial expression interpretations
categories (for example, emotions, facial
muscle actions to classify this information
found in the facial area),
(3) Face the problem of finding a division
problem (machine vision) or (pattern
recognition) is a problem in locating, it can
be seen as a human face identification of all
areas in the view refers to the head in pretend
occlusions and variations.
For achieving the goal of our research two
major techniques are being used, and these
are Sub pattern Extended 2-dimensional
Principal Component Analysis (SpE2DPCA)
and Average Mean Difference Function.
SpE2DPCA will be used for the study of
facial expression i.e., movement of lips.
SpE2DPCA [4] is introduced for colour
space. The recognition rate of SpE2DPCA is
higher than PCA, 2DPCA, E2DPCA. Multi
linear Image Analysis uses tensor concept
and is introduced to work with different
lighting conditions and other distractions.
The recognition rate of PCA is low and has
small sample size problem. For gray facial
expression recognition, 2DPCA is extended
to Extended 2DPCA. But E2DPCA is not
applicable for colour images [5].
Therefore Sub pattern Extended 2Dimensional
PCA
(SpE2DPCA)
is
introduced for colour face recognition [5].
The recognition rate is higher than PCA,
2DPCA, E2DPCA and problem of small
sample size in PCA is also eliminated [5].
Average Magnitude Difference Function
(AMDF) [L1] Method is a type of
autocorrelation
analysis.
Instead
of
correlating the input speech at various delays
(where multiplications and summations are
formed at each value), a difference signal is
formed between the delayed speech and
original, and at each delay value the absolute
magnitude is taken. For the frame
of N samples, the short-term difference
function AMDF is defined
where x(n) are the samples of analyzed
speech frame, x(n-m) are the samples time
shifted on m samples and N is the frame
length[L1]. The difference function is
expected to have a strong local minimum if
the lag m is equal to or very close to the
fundamental period. Figure 3 depicts values
of AMDF function for voiced frame.
PDA based on average magnitude difference
function has advantage in relatively low
computational
cost
and
simple
implementation. Unlike the autocorrelation
function, the AMDF calculations require no
multiplications [L1]. This is a desirable
property
for
real-time
applications.
Procedure of processing operations for
AMDF based pitch detector is quite similar
to the NCCF algorithm. After segmentation,
the signal is pre-processed to remove the
effects of intensity variations and
background noise by low-pass filtering [L1].
Then the average magnitude difference
function is computed on speech segment at
lags running from 16 to 160 samples. The
pitch period is identified as the value of the
lag at which the minimum AMDF occurs. In
addition to the pitch estimate, the ratio
between the maximum and minimum values
of AMDF (MAX/MIN) is obtained. This
measurement with the frame energy is used
to make a voiced/unvoiced decision [L1]. In
transition
segments
between
voiced,
unvoiced or silence regions some
determination errors may occur. Especially
F0 doubling or halving errors are most
frequent. Therefore median filtering is used
in AMDF based PDA [L1].
VI.
SOFTWARE
REQUIREMENTS
1. MATLAB –version 8.1, 8.2, 8.3, 8.4.
2. Database –Video recordings of
persons in MP4 format.
3. Operating System – Microsoft
Windows family, Linux, and Mac OS
X.
VII.
HARDWARE
REQUIREMENTS
Following are the system requirements
for Windows while using 32-Bit and 64Fig.3: ADMF function of voiced frame of speech
Bit MATLAB and Simulink Product
Families:
1. Processor
Any Intel or AMD x86 processor
supporting SSE2 instruction set.
2. Disk Space
1 GB for MATLAB only,
3 – 4 GB for a typical installation.
3. Graphics
No specific graphics card is required.
Hardware accelerated graphics card
supporting OpenGL 3.3 with 1GB
GPU memory recommended.
4. RAM
1024 MB (At least 2048 MB
recommended)
VIII. ADVANTAGES
1. The results of this experiment are
useful in detecting the mood of
person in different emotional
conditions.
2. This technique can be used to make
smart robots which are capable
enough of sensing the mood of a
person act accordingly [2].
3. This may also help doctors in the
treating speechless patients. The
emotional state of patients can be
understood in order to provide them a
better treatment and counselling.
IX.
APPLICATIONS
1. The results of this experiment may
prove useful for the doctors or
psychiatrist in understanding the
behaviour and emotions of their
patients.
2. Smart robots can be made which
possess human like understanding of
emotions and facial expressions, so
that, it can act accordingly.
3. Many speechless people will be
benefited with the success of this
research.
X.
FUTURE SCOPE
In this paper, we have discussed about
detecting the mood of a person with the help
of his facial expression and voice. This
technique may prove helpful in designing
machines which can understand verbal
commands. For an instance, we can say that
this technique can be used in robots which
are capable enough of understanding the
mood of a person on the basis of his facial
expressions and voice.
XI.
RESULTS AND
CONCLUSIONS
Results of this experiment vary greatly from
each other. The AMDF method has great
advantage in very low computational
complexity, it possible to implement it in
real-time applications [L1].
Due to low computation complexity and fine
estimation performance, AMDF can be
realized without much difficulty and it is
often applied in real-time environment such
as speech coder [3].
Because of nonstationarity of speech signal,
AMDF is generally implemented in short
time process form. Then it can easily cause
error pitch period detection for the sample
number’s decrease with the offset of speech
signal increasing which renders descent of
peak value along with the function value in
AMDF [3].
The pitch of voice, lip movement and facial
expressions are different from each other
while speaking in all the above emotional
atmosphere. When the person is happy or
surprised lips move faster, the pitch of voice
is high when the person is happy but the lip
movement is slower when the person is
surprised.
When a person is in disgust lip movement is
faster and pitch of voice is also high. When a
person is angry the speed of speech is fastest
i.e., lip movement have their maximum
speed, and the pitch is also highest.
When the person is sad, lip movement is
slowest and the pitch of voice is lowest of all
the cases. In case of fear, the pitch of voice is
slower whereas the lip movement is faster.
Based on the above research, smart
technologies for the detection of Mood (or
Emotional State) can be implemented for
making the smart Robots.
This research may also help in field of
Medical Research. Mood detection based on
lip movement can help the doctors in order to
treat the speechless patients.
ACKNOWLEDGEMENT
I would like to thank my guide Dr. Dinesh
Chandra Jain who helped me in preparing
this paper. His guidance and experienced
suggestions made this paper a success.
REFERENCES
[1] Rajneesh Singla “A new approach for mood
detection via using Principal Component Analysis
and Fisherface Algorithm”, Journal of Global
Research in Computer Science, Volume 2 No.(7),
July 2011.
[2] Vaibhavkumar J. Mistry, Mahesh M. Goyani
“A literature survey on Facial Expression
Recognition using Global Features”,
International Journal of Engineering and
Advanced Technology (IJEAT) ISSN: 2249 –
8958, Volume-2, Issue-4, April 2013.
[3] Huan Zhao and Wenjie Gan “A New Pitch
Estimation Method Based on AMDF”, Journal Of
Multimedia, Volume 8, No. 5, October 2013
[4] Jyoti Rani Kanwal Garg “Emotion Detection
Using Facial Expressions -A Review”,
International Journal of Advanced Research in
Computer Science and Software Engineering,
Volume 4, Issue 4, April 2014.
[5] Ms. Aswathy. R “A Literature review on Facial
Expression Recognition Techniques”, IOSR
Journal of Computer Engineering (IOSR-JCE),
Volume 11, Issue 1 (May. - Jun. 2013).
[6] A. A. Khulage and Prof. B. V. Pathak “Analysis
of speech under Stress using Linear Techniques
and Non – Linear Techniques for Emotion
Recognition System”
[7] Surbhi , Mr. Vishal Arora “The Facial
expression detection from Human Facial Image
by using neural network”, International Journal
of Application or Innovation in Engineering &
Management (IJAIEM), Volume 2, Issue 6, June
2013.
[8] Shyna Dutta, V.B. Baru “Review of Facial
Expression Recognition System and Used
Datasets”, IJRET: International Journal of
Research in Engineering and Technology, Volume:
02 Issue: 12, Dec-2013, Available @
http://www.ijret.org
LINKS
[L1] “Performance Evaluation of Pitch Detection
Algorithms”.http://access.feld.cvut.cz/view.php?cisloc
lanku=2009060001
Download