Term 4 Documentation - UWC Computer Science

advertisement
DEPARTMENT OF COMPUTER SCIENCE
Lie Indication System Using Micro-Expressions
By Nathan de la Cruz
A mini-thesis submitted in partial fulfillment of the requirements for the
degree of
BSc. Honours
Supervisor: Mehrdad Ghaziasgar
Date: 2011-11-06
Contents
Project Proposal ............................................................................................................................................ 4
User Requirements ....................................................................................................................................... 5
User’s view of the problem: ...................................................................................................................... 5
Description of the problem: ...................................................................................................................... 5
Expectations from the software solution: ................................................................................................ 5
Not expected from the software solution: ............................................................................................... 5
Requirements Analysis .................................................................................................................................. 6
Designer’s interpretation and Breakdown of the Problem: ..................................................................... 6
Complete Analysis of the Problem:........................................................................................................... 6
Recording the subject in real-time and detecting the face: ................................................................. 6
Pre-processing the frames of the video: ............................................................................................... 7
Processing the frames of the video: ..................................................................................................... 7
Categorizing the frames: ....................................................................................................................... 7
Displaying information to the user: ...................................................................................................... 8
Current Solutions: ..................................................................................................................................... 8
Suggested Solution: .................................................................................................................................. 9
User Interface specification ........................................................................................................................ 10
The complete user interface: .................................................................................................................. 10
The input video frame:............................................................................................................................ 10
How the user interface behaves: ............................................................................................................ 11
High Level design (HLD) .............................................................................................................................. 12
Description of System Objects: ............................................................................................................... 12
Relationships between objects: .............................................................................................................. 13
Complete Subsystem: ............................................................................................................................. 13
Low Level design (LLD) ................................................................................................................................ 14
Low Level Description of Concepts: ........................................................................................................ 14
Video feed: .............................................................................................................................................. 15
Image Processing (ROI): .......................................................................................................................... 15
Resizing cropped image: ......................................................................................................................... 16
RGB 2 Grayscale: ..................................................................................................................................... 16
Grayscale 2 LBP image: ........................................................................................................................... 17
LBP image put into a histogram: ............................................................................................................. 17
Histogram sent to SVM: .......................................................................................................................... 18
Output: .................................................................................................................................................... 18
Training System:.......................................................................................................................................... 19
Testing and Results: .................................................................................................................................... 19
User Manual ................................................................................................................................................ 21
Starting the System: ................................................................................................................................ 21
................................................................................................................................................................ 22
Clicking frame: ........................................................................................................................................ 22
Conclusion .................................................................................................................................................. 22
References .................................................................................................................................................. 24
Project Proposal
People are told lies every second of every day and although it is human nature to
lie it is quite discerning to know who to trust. Research has found that:
 More than 80% of women admit to occasionally telling “harmless halftruths”.
 31% of people admit to lying on their CV’s.
 60% of people lie at least once during a 10 minute conversation and on
average, tell 2 to 3 lies.
And on a study undertaken to uncover the percentage of patience that lie to their
doctors it was found that 13% of them outright lie to their doctors, 32% of them
admit to stretching the truth when providing doctors with personal information
and nearly 40% lied about following their doctors treatment plan.
Given this it is quite evident that you will not go through life without either being
lied to or lying to someone and in my project i hope to create software which uses
an ordinary webcam to detect whether or not someone is in fact lying by their
facial expressions. This project however will just focus on micro-expressions
(involuntary movements in facial muscles) that people subconsciously use when
lying. This will be the only source for indicating either truths or lies.
User Requirements
User’s view of the problem:
The user requires the system to detect any sort of micro-expression and display it.
The user will ask the subject a series of questions and right after the subject
responds, the system should start monitoring the subjects facial-expressions
looking for any sort of micro-expression that indicate deception. The system
should allow the user to gain access to the web camera and initialize the process
(by clicking some button) whereby some algorithm will execute.
Description of the problem:
The main aim of this project is to develop an interactive system that provides the
user with clear and unambiguous information so he/she can make an informed
decision on whether or not a subject is lying. The user should ask the subject
some predefined questions and after the subject responds the system should be
able to detect a micro-expression and once 30 consecutive frames has been
categorized the system would then display only the micro-expressions to help the
user make an informed decision.
Expectations from the software solution:
The software should allow the user the ability to stop the program at any time.
The software should also detect any micro-expressions and clearly and
unambiguously output them.
Not expected from the software solution:
The software should not monitor the subjects’ eye movements, body language,
wording or gestures. The software is not expected to read in pre-recorded videos.
The Software is also not expected to monitor multiple subjects at the same time.
The software is not expected to monitor the subject if the said subject is not
facing the camera directly.
Requirements Analysis
Designer’s interpretation and Breakdown of the
Problem:
The user will need a subject in order to use the software. The subject will need to
face the camera directly and not turn his/her head or move out of the camera’s
scope while answering questions that the user decides. The input that the system
will accept will be a live video stream which would hinder the amount of frames
per second that we could receive if it were a pre-recorded video stream however
the frame rate that we will receive from a live stream will suffice.
The camera will be used to capture the subjects’ expressions and will be
initialized once the user clicks some button. To be able to accurately identify the
“deceptive expressions” some pre-processing would have to be done using open
computer vision (OpenCV) libraries to decrease the possibility of something
negatively affecting the results. After the pre-processing has been done an LBP
(Local binary patterns) algorithm will then be performed on each frame and put
into a histogram where the SVM (support vector machine) will be used to
categorize the frames based on its unique histogram pattern. Once 30 frames
have been categorized the system would display all micro-expressions to the user.
Complete Analysis of the Problem:
Recording the subject in real-time and detecting the face:
The camera connected to the computer will be used to capture the subjects’ facial
expressions in real time. The software will then accept this input from the
camera, frame by frame in 2 dimensions. The video feed will also be displayed on
the screen for the user to view. Face detection (which is found in the OpenCV
library) will then be used to locate the face in each frame. Also eye pair detection
will be used in combination with face detection to get the width and the length of
the region of interest respectively.
Pre-processing the frames of the video:
In order to get an accurate result pre-processing is needed. The pre-processing
can be done using the OpenCV library functions. There are over 500 functions in
this library however the pre-processing will involve only two methods.
 Firstly the frames should be converted to grayscale.
 Once this is done the face will then be cropped using the region of interest
and as everything else is inconsequential it can therefore be dropped.
 The region of interest will then be resized to a resolution of 40x60 as this
has been proven to optimize the accuracy when using LBP images.
Processing the frames of the video:
Once the pre-processing is complete the LBP (local binary pattern) algorithm can
then be done on each frame. The local binary pattern operator is an image
operator which transforms an image into an array or image of integer labels
describing small-scale appearance of the image.
These statistics, most commonly the histogram, are then used for further image
analysis. The most widely used versions of the operator are designed for
monochrome still images but it has been extended also for colour images as well
as videos. The operator labels the pixels of an image by Thresholding the 3x3
neighbourhood of each pixel with the centre value and considering the result as a
binary number. I will be using the histogram to store this valuable information in a
text file.
Categorizing the frames:
The SVM (support vector machine) will be used to categorize the frames into
either micro-expressions or neutral expressions. The SVM (support vector
machine) analyses data and recognizes patterns, it’s used for classification and
regression analysis. The basic SVM takes a set of input data and predicts, for each
given input, which of two possible classes forms the output, making it a nonprobabilistic binary linear classifier. Given a set of training examples, each marked
as belonging to one of two categories, an SVM training algorithm builds a model
that assigns new examples into one category or the other. An SVM model is a
representation of the examples as points in space, mapped so that the examples
of the separate categories are divided by a clear gap that is as wide as possible.
New examples are then mapped into that same space and predicted to belong to
a category based on which side of the gap they fall on.
Displaying information to the user:
When the system detects a micro-expression the user will not be faced with
detailed information about which micro-expressions were detected and at what
time but simply display all micro-expressions that did occur within 30 frames,
hence letting the user know that the subject might be lying. This will cause no
confusion about the results thereby improving the user friendly aspect of the
software.
Current Solutions:
There are instances where lie detectors are used in legal disputes but these lie
detector tests do not take into account the facial expressions but rather detects a
baseline for the subjects pulse and asks them a series of questions whereby their
pulse in those instances are compared to the baseline.
Other than that, techniques such as listening to verbal content and context,
studying the body language, noting emotional gestures and contradictions,
interactions and reactions, looking closely at micro-expressions and lastly
studying the eyes are all used by interrogators to decipher lies and truths.
Also Caifeng Shan, Shaogang Gong and Peter W. McOwan from the University of
London developed a robust facial expression recognition system using local binary
patterns as well, however in their system it is used in combination with template
matching to locate the regions of interest and their system only works with prerecorded videos.
Suggested Solution:
The system will work efficiently at detecting deceptive micro-expressions. The
Suggested solution will focus solely on micro-expressions and this will decrease
the accuracy of detecting lies however the solution will be easily modifiable to
add additional functionality at any time. Also this solution is cost effective as all
that is needed is a web camera and a computer with OpenCV and SVM Libraries
installed.
User Interface specification
This following section will describe exactly what the user interface is going to do,
what it looks like and how the user interacts with the program.
The complete user interface:
The complete user interface is entirely Graphical User Interface (GUI) based. The
user is the interrogator and the subject displayed in the video feed would be the
one being interrogated. Text commands are not used by the user to interact with
the system, instead the user interacts with the system via a mouse click. The
figure below shows the user interface as it appears to the user.
The input video frame:
Once the system starts running, the video feed will be displayed to the user within
a window on the screen as shown above. The user then clicks on the frame to
begin the categorization each frame.
Figure 1: the user interface when system
starts up
How the user interface behaves:
The system will continue to display the video feed once the user clicks on the
frame. 30 consecutive frames will then be captured and categorized and the
system will then display all micro-expressions to the user to help him/her make an
informed decision. This output can be seen in figure 2 below.
Figure 2: output to a user once 30 frames
are categorized
High Level design (HLD)
In this section a HLD view of the problem will be applied. Since the programming
language of choice is C/C++; Object Orientated Analysis is not being applied. A
very high level of abstraction of the system is constructed as we take a look and
analyze the methodology behind the construction of the system.
Description of System Objects:
OBJECT
DESCRIPTION
OpenCV is a library of programming functions
OpenCV
mainly aimed at real time computer vision and is
focused mainly on real-time image processing.
RGB2GRAY
Convert from colour to grey scale.
The ROI is achieved by using face and eye pair
Region of Interest (ROI)
detection. The eye pair is used to get the width of
the ROI and the face is used to get the length of
the ROI.
Local Binary Patterns (LBP)
Histogram
The operator labels the pixels of an image by
thresholding the 3x3 neighbourhood of each pixel
with the centre value and considering the result as a
binary number.
The Histogram represents the distribution of
colour within an image.
A SVM is used to recognize patterns regarding the
intensity of the pixels and is used to classify which
Support Vector Machine (SVM)
class a certain pixel belongs to and makes its
decision based on the data analysis. The ROI as
well as the Histogram values are sent to the SVM
for training and testing the system.
Table 1: System objects and their
descriptions
Relationships between objects:
Figure 3: Object relations
Complete Subsystem:
The figure below shows the high level design and its sub-components which
include more detail about the subsystem.
Input
Image
Processing
Video feed
Crop face
Capture Images
resize the
croped image
Capture Event
Button
Convert to
greyscale
Convert to
Local Binary
Pattern Image
Figure 4: Complete Subsystem
Classification
Support Vector
Machine (SVM)
Output
Display MicroExpressions in
Window
Low Level design (LLD)
In this section explicit detail of all data types and functions will be provided.
Pseudo code will also be provided as well as all the aspects of the programming
effort without resorting to actual code.
Low Level Description of Concepts:
CLASS
ATTRIBUTES
Video Feed
cvCaptureFromCAM()
Event Capture
cvSetMouseCallback( )
Get Consecutive Frames
cvQueryFrame();
Region of Interest (ROI)
face_cascade.detectMultiScale();
(Crop Face)
eyes_cascade.detectMultiScale();
Resize cropped Image
cvResize();
Convert to grayscale
cvCvtColor(CV_RGB2GRAY)
Local Binary Patterns
Table 2: Low Level View
Video feed:
The figure below depicts how the video feed is captured. The consecutive frames
make up the video which is displayed on the user’s monitor.
Figure 5: Web cam capturing frames
Image Processing (ROI):
The frame is then cropped to only include the face. It is cropped by using the
length of your face and the width of your eye pair it gets from face and eye
detection. The output is as seen below.
Figure 6: Cropped image
Resizing cropped image:
Choosing the optimal resolution before performing the LBP algorithm is an
important yet tedious task. The optimal resolution chosen was 40x60.
Figure 7: resized image
RGB 2 Grayscale:
The cropped image is then converted to grayscale so that the LBP algorithm can
then be done on the image. The product is shown below.
Figure 8: Image converted to grayscale
Grayscale 2 LBP image:
The Grayscale image is then converted to an LBP image by labeling the pixels of
an image by thresholding the 3x3 neighborhood of each pixel with the center
value and considering the result as a binary number. The result is as seen below.
Figure 9: Image converted to an LBP image
LBP image put into a histogram:
The LBP image is then segmented into cells. The frequency of the pixel values are
then put into an array and only uniform binary numbers are then captured as part
of the histogram. Uniform binary numbers are numbers that have at most two
bitwise transitions.
Figure 10: LBP Image converted into a histogram
Histogram sent to SVM:
The SVM will then be sent a histogram of the frame and this histogram will then
be categorized into either a micro-expression or a neutral expression.
Figure 11: histogram sent to the SVM for categorization
Output:
Once 30 frames are categorized the system would then stop and combine all the
micro-expression frames into one picture and output it to the user like the one in
figure 12.
Figure 12: Output shown to the user
Training System:
Training the system involved training the SVM (Support Vector Machine), as the
SVM is in control of the categorization. 10 subjects were recorded doing 6
expressions each. These expressions are:






Anger
Contempt
Disgust
Fear
Surprise
Happy
These expressions were chosen because they have been known to be indicative of
lying. The images were then evaluated by two people to make sure that the
expressions were done correctly. 10 subjects were then narrowed down to 6
subjects.
5 frames of each expression that the subjects made were then trained on the
SVM to represent the micro-expressions and 30 neutral frames of each subject
were trained on the SVM to represent the neutral expressions, Hence an equal
number of neutral and micro expressions were used in the training process.
Testing and Results:
Testing the system proved difficult given that subjects needed to be under the
correct conditions to show these Micro-expressions and these conditions are hard
to replicate while at the same time remaining ethical.
A card game was chosen to aid in testing. In this card game the subjects are given
6 cards and asked to lift it while keeping eye contact with the camera. They then
are asked to acknowledge the card and decide whether to lie or tell the truth
about the identity of the card with their main objective to try and fool me. They
are told that they can lie exactly 3 times out of the 6 cards. I then guessed
whether or not they were lying and if I guessed incorrectly they were given a
sweet.
These videos were then labeled by subject and card and run through the system
to find out if there was any correlation between micro-expressions and lying.
There were expressions detected by the system but there was not enough of a
correlation between the system detecting an expression and the subject lying.
This could be because the subjects were not exposed to the correct conditions
and because they were asked to fool me and therefore fooled me into thinking
they were lying when in actual fact they were telling the truth. However in a real
life scenario a subject would never try and fool an investigator into thinking they
were lying.
The frames were then analyzed to find out how accurate the system was at
detecting a neutral and micro-expression by means of a confusion matrix. It was
found that out of 1007 neutral frames the system detected 833 and out of 365
micro-expression frames the system detected 326. This confusion matrix can be
viewed in figure 13 below.
Actually
Neutral
System
Micro
Neutral
832
39
Micro
175
326
Total
1007
365
Figure 13: Confusion matrix
From the confusion matrix an accuracy of the overall system in a percentage
could be acquired. It was found that for neutral expressions the system had 82.62
percent accuracy in detection while for micro-expressions the system had 89.32
percent accuracy in detection as seen in figure 14.
Neutral
Micro
82.62%
89.32%
Figure 14: Overall system accuracy in detecting neutral and
micro-expressions
User Manual
The demonstration mode of the GUI is illustrated in the figures that follow.
Starting the System:
This was a research project and because of that not a lot of emphasis was put on
the user interface. For this reason the system is run from the terminal. Once
executed a camera feed will be shown on the display. As seen in figures 15 and
16.
Figure 15: Running system from the terminal
Figure 16: live feed displayed on monitor
Clicking frame:
Once the subject answers a question, clicking inside the window to start the
categorization can begin like in figure 17.
Figure 17: left clicking on frame
Conclusion
In this report a detailed description of the problem was stated as well as the
software solution to the problem. The users required an interactive system that
provides them with information that helps them make an informed decision on
whether or not someone could be lying.
This report described the methods used for carrying out the implementation of
the system from capturing the frames, performing the processing to the
categorization of the frames. This report also touched on the user interface
specification of the system. This report explained how exactly data was acquired
to train the system and how the testing was done.
It was noticed that when evaluating the results from the testing there was a weak
correlation between the system detecting a micro-expression and the subject
lying. It was said that this weak correlation could have been brought about by the
fact that subjects were not exposed to the correct conditions and that during the
testing subjects were asked to try and fool the investigator and this led to
subjects fooling the investigator into thinking they were lying when in fact they
were telling the truth.
However the system’s accuracy in detecting neutral and micro-expressions
proved to be adequate as it achieved 82.62 % in detecting neutral expressions
and 89.32% in detecting micro-expressions.
References
1. Paul Ekman Group, LLC. 2013. Paul Ekman Group, LLC. [ONLINE] Available at:
http://www.paulekman.com. [Accessed 27 May 2013].
2. Micro Expressions - Research, Theory & Lying | Human Behaviour, Forensic
Psychology | Blifaloo.com. 2013. Micro Expressions - Research, Theory &
Lying | Human Behaviour, Forensic Psychology | Blifaloo.com. [ONLINE]
Available at: http://www.blifaloo.com/info/microexpressions.php. [Accessed
27 May 2013].
3. Paul Ekman, 2007. Emotions Revealed, Second Edition: Recognizing Faces
and Feelings to Improve Communication and Emotional Life. 2nd Edition.
Holt Paperbacks.
4. Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial
expression analysis. Proceedings of the Fourth IEEE International Conference
on Automatic Face and Gesture Recognition (FG'00), Grenoble, France, 4653.
5. Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I.
(2010). The Extended Cohn-Kanade Dataset (CK+): A complete expression
dataset for action unit and emotion-specified expression. Proceedings of the
Third International Workshop on CVPR for Human Communicative Behavior
Analysis (CVPR4HB 2010), San Francisco, USA, 94-101.
Download