DEPARTMENT OF COMPUTER SCIENCE Lie Indication System Using Micro-Expressions By Nathan de la Cruz A mini-thesis submitted in partial fulfillment of the requirements for the degree of BSc. Honours Supervisor: Mehrdad Ghaziasgar Date: 2011-11-06 Contents Project Proposal ............................................................................................................................................ 4 User Requirements ....................................................................................................................................... 5 User’s view of the problem: ...................................................................................................................... 5 Description of the problem: ...................................................................................................................... 5 Expectations from the software solution: ................................................................................................ 5 Not expected from the software solution: ............................................................................................... 5 Requirements Analysis .................................................................................................................................. 6 Designer’s interpretation and Breakdown of the Problem: ..................................................................... 6 Complete Analysis of the Problem:........................................................................................................... 6 Recording the subject in real-time and detecting the face: ................................................................. 6 Pre-processing the frames of the video: ............................................................................................... 7 Processing the frames of the video: ..................................................................................................... 7 Categorizing the frames: ....................................................................................................................... 7 Displaying information to the user: ...................................................................................................... 8 Current Solutions: ..................................................................................................................................... 8 Suggested Solution: .................................................................................................................................. 9 User Interface specification ........................................................................................................................ 10 The complete user interface: .................................................................................................................. 10 The input video frame:............................................................................................................................ 10 How the user interface behaves: ............................................................................................................ 11 High Level design (HLD) .............................................................................................................................. 12 Description of System Objects: ............................................................................................................... 12 Relationships between objects: .............................................................................................................. 13 Complete Subsystem: ............................................................................................................................. 13 Low Level design (LLD) ................................................................................................................................ 14 Low Level Description of Concepts: ........................................................................................................ 14 Video feed: .............................................................................................................................................. 15 Image Processing (ROI): .......................................................................................................................... 15 Resizing cropped image: ......................................................................................................................... 16 RGB 2 Grayscale: ..................................................................................................................................... 16 Grayscale 2 LBP image: ........................................................................................................................... 17 LBP image put into a histogram: ............................................................................................................. 17 Histogram sent to SVM: .......................................................................................................................... 18 Output: .................................................................................................................................................... 18 Training System:.......................................................................................................................................... 19 Testing and Results: .................................................................................................................................... 19 User Manual ................................................................................................................................................ 21 Starting the System: ................................................................................................................................ 21 ................................................................................................................................................................ 22 Clicking frame: ........................................................................................................................................ 22 Conclusion .................................................................................................................................................. 22 References .................................................................................................................................................. 24 Project Proposal People are told lies every second of every day and although it is human nature to lie it is quite discerning to know who to trust. Research has found that: More than 80% of women admit to occasionally telling “harmless halftruths”. 31% of people admit to lying on their CV’s. 60% of people lie at least once during a 10 minute conversation and on average, tell 2 to 3 lies. And on a study undertaken to uncover the percentage of patience that lie to their doctors it was found that 13% of them outright lie to their doctors, 32% of them admit to stretching the truth when providing doctors with personal information and nearly 40% lied about following their doctors treatment plan. Given this it is quite evident that you will not go through life without either being lied to or lying to someone and in my project i hope to create software which uses an ordinary webcam to detect whether or not someone is in fact lying by their facial expressions. This project however will just focus on micro-expressions (involuntary movements in facial muscles) that people subconsciously use when lying. This will be the only source for indicating either truths or lies. User Requirements User’s view of the problem: The user requires the system to detect any sort of micro-expression and display it. The user will ask the subject a series of questions and right after the subject responds, the system should start monitoring the subjects facial-expressions looking for any sort of micro-expression that indicate deception. The system should allow the user to gain access to the web camera and initialize the process (by clicking some button) whereby some algorithm will execute. Description of the problem: The main aim of this project is to develop an interactive system that provides the user with clear and unambiguous information so he/she can make an informed decision on whether or not a subject is lying. The user should ask the subject some predefined questions and after the subject responds the system should be able to detect a micro-expression and once 30 consecutive frames has been categorized the system would then display only the micro-expressions to help the user make an informed decision. Expectations from the software solution: The software should allow the user the ability to stop the program at any time. The software should also detect any micro-expressions and clearly and unambiguously output them. Not expected from the software solution: The software should not monitor the subjects’ eye movements, body language, wording or gestures. The software is not expected to read in pre-recorded videos. The Software is also not expected to monitor multiple subjects at the same time. The software is not expected to monitor the subject if the said subject is not facing the camera directly. Requirements Analysis Designer’s interpretation and Breakdown of the Problem: The user will need a subject in order to use the software. The subject will need to face the camera directly and not turn his/her head or move out of the camera’s scope while answering questions that the user decides. The input that the system will accept will be a live video stream which would hinder the amount of frames per second that we could receive if it were a pre-recorded video stream however the frame rate that we will receive from a live stream will suffice. The camera will be used to capture the subjects’ expressions and will be initialized once the user clicks some button. To be able to accurately identify the “deceptive expressions” some pre-processing would have to be done using open computer vision (OpenCV) libraries to decrease the possibility of something negatively affecting the results. After the pre-processing has been done an LBP (Local binary patterns) algorithm will then be performed on each frame and put into a histogram where the SVM (support vector machine) will be used to categorize the frames based on its unique histogram pattern. Once 30 frames have been categorized the system would display all micro-expressions to the user. Complete Analysis of the Problem: Recording the subject in real-time and detecting the face: The camera connected to the computer will be used to capture the subjects’ facial expressions in real time. The software will then accept this input from the camera, frame by frame in 2 dimensions. The video feed will also be displayed on the screen for the user to view. Face detection (which is found in the OpenCV library) will then be used to locate the face in each frame. Also eye pair detection will be used in combination with face detection to get the width and the length of the region of interest respectively. Pre-processing the frames of the video: In order to get an accurate result pre-processing is needed. The pre-processing can be done using the OpenCV library functions. There are over 500 functions in this library however the pre-processing will involve only two methods. Firstly the frames should be converted to grayscale. Once this is done the face will then be cropped using the region of interest and as everything else is inconsequential it can therefore be dropped. The region of interest will then be resized to a resolution of 40x60 as this has been proven to optimize the accuracy when using LBP images. Processing the frames of the video: Once the pre-processing is complete the LBP (local binary pattern) algorithm can then be done on each frame. The local binary pattern operator is an image operator which transforms an image into an array or image of integer labels describing small-scale appearance of the image. These statistics, most commonly the histogram, are then used for further image analysis. The most widely used versions of the operator are designed for monochrome still images but it has been extended also for colour images as well as videos. The operator labels the pixels of an image by Thresholding the 3x3 neighbourhood of each pixel with the centre value and considering the result as a binary number. I will be using the histogram to store this valuable information in a text file. Categorizing the frames: The SVM (support vector machine) will be used to categorize the frames into either micro-expressions or neutral expressions. The SVM (support vector machine) analyses data and recognizes patterns, it’s used for classification and regression analysis. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a nonprobabilistic binary linear classifier. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on. Displaying information to the user: When the system detects a micro-expression the user will not be faced with detailed information about which micro-expressions were detected and at what time but simply display all micro-expressions that did occur within 30 frames, hence letting the user know that the subject might be lying. This will cause no confusion about the results thereby improving the user friendly aspect of the software. Current Solutions: There are instances where lie detectors are used in legal disputes but these lie detector tests do not take into account the facial expressions but rather detects a baseline for the subjects pulse and asks them a series of questions whereby their pulse in those instances are compared to the baseline. Other than that, techniques such as listening to verbal content and context, studying the body language, noting emotional gestures and contradictions, interactions and reactions, looking closely at micro-expressions and lastly studying the eyes are all used by interrogators to decipher lies and truths. Also Caifeng Shan, Shaogang Gong and Peter W. McOwan from the University of London developed a robust facial expression recognition system using local binary patterns as well, however in their system it is used in combination with template matching to locate the regions of interest and their system only works with prerecorded videos. Suggested Solution: The system will work efficiently at detecting deceptive micro-expressions. The Suggested solution will focus solely on micro-expressions and this will decrease the accuracy of detecting lies however the solution will be easily modifiable to add additional functionality at any time. Also this solution is cost effective as all that is needed is a web camera and a computer with OpenCV and SVM Libraries installed. User Interface specification This following section will describe exactly what the user interface is going to do, what it looks like and how the user interacts with the program. The complete user interface: The complete user interface is entirely Graphical User Interface (GUI) based. The user is the interrogator and the subject displayed in the video feed would be the one being interrogated. Text commands are not used by the user to interact with the system, instead the user interacts with the system via a mouse click. The figure below shows the user interface as it appears to the user. The input video frame: Once the system starts running, the video feed will be displayed to the user within a window on the screen as shown above. The user then clicks on the frame to begin the categorization each frame. Figure 1: the user interface when system starts up How the user interface behaves: The system will continue to display the video feed once the user clicks on the frame. 30 consecutive frames will then be captured and categorized and the system will then display all micro-expressions to the user to help him/her make an informed decision. This output can be seen in figure 2 below. Figure 2: output to a user once 30 frames are categorized High Level design (HLD) In this section a HLD view of the problem will be applied. Since the programming language of choice is C/C++; Object Orientated Analysis is not being applied. A very high level of abstraction of the system is constructed as we take a look and analyze the methodology behind the construction of the system. Description of System Objects: OBJECT DESCRIPTION OpenCV is a library of programming functions OpenCV mainly aimed at real time computer vision and is focused mainly on real-time image processing. RGB2GRAY Convert from colour to grey scale. The ROI is achieved by using face and eye pair Region of Interest (ROI) detection. The eye pair is used to get the width of the ROI and the face is used to get the length of the ROI. Local Binary Patterns (LBP) Histogram The operator labels the pixels of an image by thresholding the 3x3 neighbourhood of each pixel with the centre value and considering the result as a binary number. The Histogram represents the distribution of colour within an image. A SVM is used to recognize patterns regarding the intensity of the pixels and is used to classify which Support Vector Machine (SVM) class a certain pixel belongs to and makes its decision based on the data analysis. The ROI as well as the Histogram values are sent to the SVM for training and testing the system. Table 1: System objects and their descriptions Relationships between objects: Figure 3: Object relations Complete Subsystem: The figure below shows the high level design and its sub-components which include more detail about the subsystem. Input Image Processing Video feed Crop face Capture Images resize the croped image Capture Event Button Convert to greyscale Convert to Local Binary Pattern Image Figure 4: Complete Subsystem Classification Support Vector Machine (SVM) Output Display MicroExpressions in Window Low Level design (LLD) In this section explicit detail of all data types and functions will be provided. Pseudo code will also be provided as well as all the aspects of the programming effort without resorting to actual code. Low Level Description of Concepts: CLASS ATTRIBUTES Video Feed cvCaptureFromCAM() Event Capture cvSetMouseCallback( ) Get Consecutive Frames cvQueryFrame(); Region of Interest (ROI) face_cascade.detectMultiScale(); (Crop Face) eyes_cascade.detectMultiScale(); Resize cropped Image cvResize(); Convert to grayscale cvCvtColor(CV_RGB2GRAY) Local Binary Patterns Table 2: Low Level View Video feed: The figure below depicts how the video feed is captured. The consecutive frames make up the video which is displayed on the user’s monitor. Figure 5: Web cam capturing frames Image Processing (ROI): The frame is then cropped to only include the face. It is cropped by using the length of your face and the width of your eye pair it gets from face and eye detection. The output is as seen below. Figure 6: Cropped image Resizing cropped image: Choosing the optimal resolution before performing the LBP algorithm is an important yet tedious task. The optimal resolution chosen was 40x60. Figure 7: resized image RGB 2 Grayscale: The cropped image is then converted to grayscale so that the LBP algorithm can then be done on the image. The product is shown below. Figure 8: Image converted to grayscale Grayscale 2 LBP image: The Grayscale image is then converted to an LBP image by labeling the pixels of an image by thresholding the 3x3 neighborhood of each pixel with the center value and considering the result as a binary number. The result is as seen below. Figure 9: Image converted to an LBP image LBP image put into a histogram: The LBP image is then segmented into cells. The frequency of the pixel values are then put into an array and only uniform binary numbers are then captured as part of the histogram. Uniform binary numbers are numbers that have at most two bitwise transitions. Figure 10: LBP Image converted into a histogram Histogram sent to SVM: The SVM will then be sent a histogram of the frame and this histogram will then be categorized into either a micro-expression or a neutral expression. Figure 11: histogram sent to the SVM for categorization Output: Once 30 frames are categorized the system would then stop and combine all the micro-expression frames into one picture and output it to the user like the one in figure 12. Figure 12: Output shown to the user Training System: Training the system involved training the SVM (Support Vector Machine), as the SVM is in control of the categorization. 10 subjects were recorded doing 6 expressions each. These expressions are: Anger Contempt Disgust Fear Surprise Happy These expressions were chosen because they have been known to be indicative of lying. The images were then evaluated by two people to make sure that the expressions were done correctly. 10 subjects were then narrowed down to 6 subjects. 5 frames of each expression that the subjects made were then trained on the SVM to represent the micro-expressions and 30 neutral frames of each subject were trained on the SVM to represent the neutral expressions, Hence an equal number of neutral and micro expressions were used in the training process. Testing and Results: Testing the system proved difficult given that subjects needed to be under the correct conditions to show these Micro-expressions and these conditions are hard to replicate while at the same time remaining ethical. A card game was chosen to aid in testing. In this card game the subjects are given 6 cards and asked to lift it while keeping eye contact with the camera. They then are asked to acknowledge the card and decide whether to lie or tell the truth about the identity of the card with their main objective to try and fool me. They are told that they can lie exactly 3 times out of the 6 cards. I then guessed whether or not they were lying and if I guessed incorrectly they were given a sweet. These videos were then labeled by subject and card and run through the system to find out if there was any correlation between micro-expressions and lying. There were expressions detected by the system but there was not enough of a correlation between the system detecting an expression and the subject lying. This could be because the subjects were not exposed to the correct conditions and because they were asked to fool me and therefore fooled me into thinking they were lying when in actual fact they were telling the truth. However in a real life scenario a subject would never try and fool an investigator into thinking they were lying. The frames were then analyzed to find out how accurate the system was at detecting a neutral and micro-expression by means of a confusion matrix. It was found that out of 1007 neutral frames the system detected 833 and out of 365 micro-expression frames the system detected 326. This confusion matrix can be viewed in figure 13 below. Actually Neutral System Micro Neutral 832 39 Micro 175 326 Total 1007 365 Figure 13: Confusion matrix From the confusion matrix an accuracy of the overall system in a percentage could be acquired. It was found that for neutral expressions the system had 82.62 percent accuracy in detection while for micro-expressions the system had 89.32 percent accuracy in detection as seen in figure 14. Neutral Micro 82.62% 89.32% Figure 14: Overall system accuracy in detecting neutral and micro-expressions User Manual The demonstration mode of the GUI is illustrated in the figures that follow. Starting the System: This was a research project and because of that not a lot of emphasis was put on the user interface. For this reason the system is run from the terminal. Once executed a camera feed will be shown on the display. As seen in figures 15 and 16. Figure 15: Running system from the terminal Figure 16: live feed displayed on monitor Clicking frame: Once the subject answers a question, clicking inside the window to start the categorization can begin like in figure 17. Figure 17: left clicking on frame Conclusion In this report a detailed description of the problem was stated as well as the software solution to the problem. The users required an interactive system that provides them with information that helps them make an informed decision on whether or not someone could be lying. This report described the methods used for carrying out the implementation of the system from capturing the frames, performing the processing to the categorization of the frames. This report also touched on the user interface specification of the system. This report explained how exactly data was acquired to train the system and how the testing was done. It was noticed that when evaluating the results from the testing there was a weak correlation between the system detecting a micro-expression and the subject lying. It was said that this weak correlation could have been brought about by the fact that subjects were not exposed to the correct conditions and that during the testing subjects were asked to try and fool the investigator and this led to subjects fooling the investigator into thinking they were lying when in fact they were telling the truth. However the system’s accuracy in detecting neutral and micro-expressions proved to be adequate as it achieved 82.62 % in detecting neutral expressions and 89.32% in detecting micro-expressions. References 1. Paul Ekman Group, LLC. 2013. Paul Ekman Group, LLC. [ONLINE] Available at: http://www.paulekman.com. [Accessed 27 May 2013]. 2. Micro Expressions - Research, Theory & Lying | Human Behaviour, Forensic Psychology | Blifaloo.com. 2013. Micro Expressions - Research, Theory & Lying | Human Behaviour, Forensic Psychology | Blifaloo.com. [ONLINE] Available at: http://www.blifaloo.com/info/microexpressions.php. [Accessed 27 May 2013]. 3. Paul Ekman, 2007. Emotions Revealed, Second Edition: Recognizing Faces and Feelings to Improve Communication and Emotional Life. 2nd Edition. Holt Paperbacks. 4. Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression analysis. Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG'00), Grenoble, France, 4653. 5. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The Extended Cohn-Kanade Dataset (CK+): A complete expression dataset for action unit and emotion-specified expression. Proceedings of the Third International Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB 2010), San Francisco, USA, 94-101.