2. Data Set

CIS520 fall 2006
Final Project Report
Jiwoong Sim
1. Introduction
Human facial recognition is an interesting topic since the result is easily recognized by
human sense. Also it has much practical usage in real life application, which motivates
the intense study in this area. In this project, given real human picture as an input, I will
recognize and classify an image by the person taken a photo. Furthermore, classifying
an image by several emotional classes would be attempted. In addition, the result of
learning structure would be shown in forms of an image.
2. Data Set
There are several public human facial data sets such as Yale Face Database i , CMU
Cohn-Kanade Facial Expression Databaseii. To satisfy the purpose of this project, two
constraints are needed for the data set.
1. The data should contain an image of multiple people
2. The data should contain an image of various emotional expression
The Yale Face Database was composed by pictures from various poses. Using a various
pose image would certainly make a learning problem harder, so this dataset was
excluded from the selection. Cohn-Kanade Facial data set has a single pose which is
taken from a direct front direction, and also it has various sequence of emotional
expression so it matches the project purpose. However, problem of Cohn-Kanade
Facial data set is that facial images are not aligned well. Because variance of face
position in image is large, careful pre-processing was needed to obtain a better
learning result. Because aligning a facial image using a confident bias is another
challenging problem which cannot be accomplished easily, this data was not used in the
this project
Expression(JAFFE) Database’iii which satisfy all above constraints. In addition, every
image on the database is accompanied by emotion scores rated by a group of people.
This score information could be easily applied in the learning process.
3. Methods
Each image in JAFFE data set is a gray-scale, 256*256 resolutions. If we use each pixel
as a input feature, there are 256*256 = 65535 features for each input. This is a large
feature size which would make an execution time of every learning method quite slow.
Also, there are only around 200 input images, so this is the case which number of
feature is larger than the number of inputs, which easily makes overfitting in learning
process. This is a common problem for a application using an image as an input, and the
Principle Component Analysis(PCA) is an useful method to shrink down a feature space.
However, calculating Principle Components requires solving SVD for the feature space,
which would not be done in practical running time. So instead of calculating all Eigen
vectors for X*X’, we can use a simple trick here. Suppose that v is an eigen vector for
X’*X which satisfy X’*X = l*v, then X*v is an eigen vector of X*X’ because
X*X’*(Xv) = X*(X’Xv) = X*(lv) = l*Xv
Therefore, we can speed up calculating eigen vector for X*X’ if we don’t need every
eigen vector for X*X’. Calculating v is much faster since X’*X only has dimension of
(input image #) by (input image #), which is going to be around 200 by 200 in this data
set. Once we have eigen vector v for X’*X, we can use X*v for an eigen vector fo X*X’.
After calculating principle components, we subtract mean from each image and project
them to PC space, and use coefficients for linear regression process. This learning
method is called the Principle Component Regression(PCR).
Since we have 5 emotional categories, each emotion scores are used for the intensity
for the image belonging to given emotion category. Also to classify 10 subjects in the
image, I added 10 indicator flag(1/0) which indicates the subject of given input image.
4. Results
Actor(subject) classification Error
Emotion classification Error
Red line : training error
Blue line : test error
X-axis : number of PC used for learning
Data set was randomly divided into same size of train set and test set. Each set contains
90 input images. There are two types of rating for emotion score, which differ in use of
fear score. In this project rating without a fear score is used.
Actor(subject) classification
For the actor classification, indicator of maximum value was picked and classified as a
corresponding class. As the result given above, using more than 20 PC for the
prediction showed 100% accuracy for both train set and test set. From this result, we
can estimate that a distinguishing feature for an individual face is can be well
represented by relatively small number of principal components. We can confirm this
assumption by reconstructing a input image using PCs as below.
<Input Image>
<Reconstructed using 20 PCs>
<Reconstructed using 10 PCs>
<Reconstructed using 80 PCs>
Above figure is a comparison between input image and reconstructed images. PCA is
done by train set, and the input image is from test set. Because we have trained PC over
the train set, input face is not precisely reconstructed even when we use large number of
PCs(when we look at details of reconstructed images carefully by overlapping it with
original image, we can detect that actual appearance has changed). However, by using
more than 20 PCs reconstructed image looks fairly similar to original input. Considering
that there are 10 different subjects in the data set, the fact that only 20 PCs is enough to
reconstruct an overall face expression is rather impressive.
Emotion classification
For the emotion classification, class which has maximum score (input rating ranges over
1~6, each 5 category representing HAP, SAD, SUR, ANG, DIS) was picked. As the
number of PC used for the prediction grows, accuracy has kept increased. Because we
used a trick explained above to calculate eigen vectors, number of PC is limited by
number of input data. We can only speculate that the error rate for train set has almost
converged to 0 and the gradient of the test set is decreasing, using more PC might
result to an overfitting.
Though the accuracy of emotion classification is not as good as an actor classification,
the result is reasonable considering that the nature of emotion is not as clear as subject
classification. In fact, scores rated from human sense itself shows considerable
ambiguity. Some input images are rated almost same in more than one category, and
there is input image intended to be ‘neutral’, which does not fall to any emotion
categories. Excluding those neutral images expected to conclude to the better result,
but it did not show reasonable differences when experimented.
To validate the PCA for emotional scores, additional reconstruction was done. After
learning a PCA from input images, coefficients for PC related to emotional category
could be calculated by weighting a coefficients vector from input image by emotional
scores and averaging it.
<Average face>
<Average Happy face>
<Average Sad face>
<Average Surprised face>
<Average Angry face>
<Average disappointed face>
Above result is reconstructed mean faces using coefficients calculated by emotional
score weight. To show the effect clearly, each coefficient are multiplied by factor of 3
and added to the mean faces. As we can recognize clearly, each category shows a
result which could be confirmed to be correct by the human sense.
5. Discussion
Under a stable circumstance of input, facial recognition seems to work excellent using
PCR. However, such a stabilizing is hard to obtain in a real world according to different
environments such as light, pose, occlusions. Developing a robust learning model which
could include all those variation would be another challenge.
Classifying an emotional expression from face image would be a solvable problem too
because human face shares some distinctive features over same kind of emotions. In
this project, we couldn’t check the bottom line of the PCR learning method due to the
limitation of input image number. Also one could improve the prediction accuracy by
applying more complicated categorizing method instead of picking a best score among
the emotion scores.