MOOD DETECTION OF A PERSON THROUGH FACE EXPRESSION AND LIP MOVEMENTS Nandita Tiwari and Dr. Dinesh Chandra Jain Research Scholar, Professor Computer Science and Engineering, SIRTS Bhopal nandita.0119@gmail.com dineshwebsys@gmail.com Abstract – Mood detection of a person has become a new field of interest for the edicts of Artificial Intelligence. A variety of applications exist today that require mood detection of a person. Mood detection can be done on the basis of facial expression, speech, lip movement, etc. In this paper, we will focus on detecting the mood of a person depending on facial expressions and speech. For this purpose we will concentrate on the motion of lips of that person in various conditions. The main part of this paper will focus on the speed of speech and the pitch of voice in order to judge the mood. This paper proposes the idea of detecting the mood of a person on the basis of his voice and facial expression. The future robots will also contain this very idea as an in – built function. We have focused on six basic moods of a person in this paper, they are: happy, sad, surprise, disgust, angry and fear. For facial expressions recognition SpE2DPCA algorithm is used whereas for pitch detection AMDF is being used. The database of this research will contain the video recordings of persons in different emotional conditions. The conclusions of this research can be used in making the smart robots which are capable enough that they can understand the mood of a person with the help of their speech and lip movement. means of communication. With the mood, lip movement and pitch of voice of a person is also affected. For instance, if a person is angry or hyper his lip movement may become very fast and pitch of voice may become very high while speaking. Or if he is sad, he may speak in a comparatively low pitch of voice with a slower lip movement. When a person is fearful or surprised he may not speak at all or he may speak minimum with a very low voice. Keywords – Mood detection, SpE2DPCA, AMDF, pitch detection. A speech signal is introduced into a medium by a vibrating object as vocal folds in throat. This is the source of the disturbance that moves through the medium. Each spoken word is created using the phonetic combination of a set of vowel semivowel and consonant speech sound units. Different stress is applied by vocal cord of a person for particular emotion [6]. I. INTRODUCTION Mood detection of a person is one of the most emerging area of Research in the field of Artificial Intelligence, Robotics and Medical Applications. It is not an easy task to detect the mood of a person using machines. Thus certain limitations exist like age, similar facial features. Voice is a verbal means of communication, whereas facial expression is a non – verbal The facial expressions have a considerable effect on a listening interlocutor; the facial expression of a speaker accounts for about 55 percent of the effect, 38 percent of the latter is conveyed by voice intonation and 7 percent by the spoken words. As a consequence of the information that they carry, facial expressions can play an important role wherever humans interact with machines [1]. Facial expressions are generated by contractions official muscles, which results in temporally deformed facial features such as eye lids, eye brows, nose, lips and skin texture, often revealed by wrinkles and bulges. Facial expressions give us information about the emotional state of the person. Moreover, these expressions help in understanding the overall mood of the person in a better way [7]. The basic moods of a person are HAPPY, SAD, SURPRISE, DISGUST, ANGRY, FEAR. In present scenario, we find that robots are not working on voice pitch and speech together. We are proposing a new concept with robotics that can recognize the speech using lip movement of the owner. The Robot may sense the mood of his owner from their lip movement and they can then act accordingly. Robots of future will be smart enough that they will understand the mood of their owner with the help of lip movement. Voice of the person will also be used for this context. Robots will observe the pitch and speed of speech in order to understand the mood of their owner. II. ALGORITHM We have performed experiments on the Real Time Database. The database contains the video recordings of six basic moods of a person. The database will contain different recordings of the person which will represent different moods based on the created atmosphere. For feature extraction SpE2DPCA algorithms are being used whereas, for detecting the speed of lip movement and pitch of voice we have used AMDF (Average Magnitude Difference function). Combined results of both the algorithms will give us the required result. Step 2: Capture the videos of all the environments. Step 3: Study the speed of lip movement and pitch of voice. Step 4: Compare the results of lip movement and voice in all the environments. Step 5: Conclude the results. III. DATA FLOW DIAGRAM Create environment for different moods Capture the videos in all the environments Study the speed of lip movement and calculate the pitch and speed of voice Apply Techniques like SpE2DPCA and AMDF for the study of expressions and lip movement Compare the results in all the environments Compare the results of all the moods and conclude them The methodology used to detect different moods is described as follows: Step 1: Create the environment for each mood so that the person being captured speaks according to that environment. Fig.2: Data Flow Diagram showing the steps to be followed IV. METHOD For performing our experiment, firstly we need to create different environments so that different moods can be established. For this purpose, for instance, we will display any humorous act in front of the person with whom we are conducting our experiment. After the completion of the act we will tell all of them to start a conversation. While the conversation continues, we will capture their video recordings as the input for our experiment. In the next step, study of lip movement and voice is performed in order to calculate the results. Study of facial expression (lip movement) will be done using SpE2DPCA (Sub pattern Extended 2-dimensional Principal Component Analysis). For studying the voice recorded, we will apply AMDF in order to calculate the speed and pitch of speech. The results of SpE2DPCA and HAPPY DISGUST SURPRISE SAD FEAR ANGRY AMDF collectively will give the final results. Now perform the above steps for all the different moods, so that the results for all the moods can be compared at the end. Each time, the created environment will give a different result. The results for each mood type vary greatly. Comparison between all types of moods will be done in order to conclude the results. Fig.1: Six Basic Human Facial Expressions V. TECHNIQUES Human facial expression recognition problem is composed of three problem areas also includes [8]: (1) Finding faces view, (2) Extracting facial features and/or facial features change as the speed of analysis, and some facial expression interpretations categories (for example, emotions, facial muscle actions to classify this information found in the facial area), (3) Face the problem of finding a division problem (machine vision) or (pattern recognition) is a problem in locating, it can be seen as a human face identification of all areas in the view refers to the head in pretend occlusions and variations. For achieving the goal of our research two major techniques are being used, and these are Sub pattern Extended 2-dimensional Principal Component Analysis (SpE2DPCA) and Average Mean Difference Function. SpE2DPCA will be used for the study of facial expression i.e., movement of lips. SpE2DPCA [4] is introduced for colour space. The recognition rate of SpE2DPCA is higher than PCA, 2DPCA, E2DPCA. Multi linear Image Analysis uses tensor concept and is introduced to work with different lighting conditions and other distractions. The recognition rate of PCA is low and has small sample size problem. For gray facial expression recognition, 2DPCA is extended to Extended 2DPCA. But E2DPCA is not applicable for colour images [5]. Therefore Sub pattern Extended 2Dimensional PCA (SpE2DPCA) is introduced for colour face recognition [5]. The recognition rate is higher than PCA, 2DPCA, E2DPCA and problem of small sample size in PCA is also eliminated [5]. Average Magnitude Difference Function (AMDF) [L1] Method is a type of autocorrelation analysis. Instead of correlating the input speech at various delays (where multiplications and summations are formed at each value), a difference signal is formed between the delayed speech and original, and at each delay value the absolute magnitude is taken. For the frame of N samples, the short-term difference function AMDF is defined where x(n) are the samples of analyzed speech frame, x(n-m) are the samples time shifted on m samples and N is the frame length[L1]. The difference function is expected to have a strong local minimum if the lag m is equal to or very close to the fundamental period. Figure 3 depicts values of AMDF function for voiced frame. PDA based on average magnitude difference function has advantage in relatively low computational cost and simple implementation. Unlike the autocorrelation function, the AMDF calculations require no multiplications [L1]. This is a desirable property for real-time applications. Procedure of processing operations for AMDF based pitch detector is quite similar to the NCCF algorithm. After segmentation, the signal is pre-processed to remove the effects of intensity variations and background noise by low-pass filtering [L1]. Then the average magnitude difference function is computed on speech segment at lags running from 16 to 160 samples. The pitch period is identified as the value of the lag at which the minimum AMDF occurs. In addition to the pitch estimate, the ratio between the maximum and minimum values of AMDF (MAX/MIN) is obtained. This measurement with the frame energy is used to make a voiced/unvoiced decision [L1]. In transition segments between voiced, unvoiced or silence regions some determination errors may occur. Especially F0 doubling or halving errors are most frequent. Therefore median filtering is used in AMDF based PDA [L1]. VI. SOFTWARE REQUIREMENTS 1. MATLAB –version 8.1, 8.2, 8.3, 8.4. 2. Database –Video recordings of persons in MP4 format. 3. Operating System – Microsoft Windows family, Linux, and Mac OS X. VII. HARDWARE REQUIREMENTS Following are the system requirements for Windows while using 32-Bit and 64Fig.3: ADMF function of voiced frame of speech Bit MATLAB and Simulink Product Families: 1. Processor Any Intel or AMD x86 processor supporting SSE2 instruction set. 2. Disk Space 1 GB for MATLAB only, 3 – 4 GB for a typical installation. 3. Graphics No specific graphics card is required. Hardware accelerated graphics card supporting OpenGL 3.3 with 1GB GPU memory recommended. 4. RAM 1024 MB (At least 2048 MB recommended) VIII. ADVANTAGES 1. The results of this experiment are useful in detecting the mood of person in different emotional conditions. 2. This technique can be used to make smart robots which are capable enough of sensing the mood of a person act accordingly [2]. 3. This may also help doctors in the treating speechless patients. The emotional state of patients can be understood in order to provide them a better treatment and counselling. IX. APPLICATIONS 1. The results of this experiment may prove useful for the doctors or psychiatrist in understanding the behaviour and emotions of their patients. 2. Smart robots can be made which possess human like understanding of emotions and facial expressions, so that, it can act accordingly. 3. Many speechless people will be benefited with the success of this research. X. FUTURE SCOPE In this paper, we have discussed about detecting the mood of a person with the help of his facial expression and voice. This technique may prove helpful in designing machines which can understand verbal commands. For an instance, we can say that this technique can be used in robots which are capable enough of understanding the mood of a person on the basis of his facial expressions and voice. XI. RESULTS AND CONCLUSIONS Results of this experiment vary greatly from each other. The AMDF method has great advantage in very low computational complexity, it possible to implement it in real-time applications [L1]. Due to low computation complexity and fine estimation performance, AMDF can be realized without much difficulty and it is often applied in real-time environment such as speech coder [3]. Because of nonstationarity of speech signal, AMDF is generally implemented in short time process form. Then it can easily cause error pitch period detection for the sample number’s decrease with the offset of speech signal increasing which renders descent of peak value along with the function value in AMDF [3]. The pitch of voice, lip movement and facial expressions are different from each other while speaking in all the above emotional atmosphere. When the person is happy or surprised lips move faster, the pitch of voice is high when the person is happy but the lip movement is slower when the person is surprised. When a person is in disgust lip movement is faster and pitch of voice is also high. When a person is angry the speed of speech is fastest i.e., lip movement have their maximum speed, and the pitch is also highest. When the person is sad, lip movement is slowest and the pitch of voice is lowest of all the cases. In case of fear, the pitch of voice is slower whereas the lip movement is faster. Based on the above research, smart technologies for the detection of Mood (or Emotional State) can be implemented for making the smart Robots. This research may also help in field of Medical Research. Mood detection based on lip movement can help the doctors in order to treat the speechless patients. ACKNOWLEDGEMENT I would like to thank my guide Dr. Dinesh Chandra Jain who helped me in preparing this paper. His guidance and experienced suggestions made this paper a success. REFERENCES [1] Rajneesh Singla “A new approach for mood detection via using Principal Component Analysis and Fisherface Algorithm”, Journal of Global Research in Computer Science, Volume 2 No.(7), July 2011. [2] Vaibhavkumar J. Mistry, Mahesh M. Goyani “A literature survey on Facial Expression Recognition using Global Features”, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-2, Issue-4, April 2013. [3] Huan Zhao and Wenjie Gan “A New Pitch Estimation Method Based on AMDF”, Journal Of Multimedia, Volume 8, No. 5, October 2013 [4] Jyoti Rani Kanwal Garg “Emotion Detection Using Facial Expressions -A Review”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 4, April 2014. [5] Ms. Aswathy. R “A Literature review on Facial Expression Recognition Techniques”, IOSR Journal of Computer Engineering (IOSR-JCE), Volume 11, Issue 1 (May. - Jun. 2013). [6] A. A. Khulage and Prof. B. V. Pathak “Analysis of speech under Stress using Linear Techniques and Non – Linear Techniques for Emotion Recognition System” [7] Surbhi , Mr. Vishal Arora “The Facial expression detection from Human Facial Image by using neural network”, International Journal of Application or Innovation in Engineering & Management (IJAIEM), Volume 2, Issue 6, June 2013. [8] Shyna Dutta, V.B. Baru “Review of Facial Expression Recognition System and Used Datasets”, IJRET: International Journal of Research in Engineering and Technology, Volume: 02 Issue: 12, Dec-2013, Available @ http://www.ijret.org LINKS [L1] “Performance Evaluation of Pitch Detection Algorithms”.http://access.feld.cvut.cz/view.php?cisloc lanku=2009060001