CIS520 fall 2006 Final Project Report Jiwoong Sim 1. Introduction Human facial recognition is an interesting topic since the result is easily recognized by human sense. Also it has much practical usage in real life application, which motivates the intense study in this area. In this project, given real human picture as an input, I will recognize and classify an image by the person taken a photo. Furthermore, classifying an image by several emotional classes would be attempted. In addition, the result of learning structure would be shown in forms of an image. 2. Data Set There are several public human facial data sets such as Yale Face Database i , CMU Cohn-Kanade Facial Expression Databaseii. To satisfy the purpose of this project, two constraints are needed for the data set. 1. The data should contain an image of multiple people 2. The data should contain an image of various emotional expression The Yale Face Database was composed by pictures from various poses. Using a various pose image would certainly make a learning problem harder, so this dataset was excluded from the selection. Cohn-Kanade Facial data set has a single pose which is taken from a direct front direction, and also it has various sequence of emotional expression so it matches the project purpose. However, problem of Cohn-Kanade Facial data set is that facial images are not aligned well. Because variance of face position in image is large, careful pre-processing was needed to obtain a better learning result. Because aligning a facial image using a confident bias is another challenging problem which cannot be accomplished easily, this data was not used in the experiment. The data used in this project is ‘The Japanese Female Facial Expression(JAFFE) Database’iii which satisfy all above constraints. In addition, every image on the database is accompanied by emotion scores rated by a group of people. This score information could be easily applied in the learning process. 3. Methods Each image in JAFFE data set is a gray-scale, 256*256 resolutions. If we use each pixel as a input feature, there are 256*256 = 65535 features for each input. This is a large feature size which would make an execution time of every learning method quite slow. Also, there are only around 200 input images, so this is the case which number of feature is larger than the number of inputs, which easily makes overfitting in learning process. This is a common problem for a application using an image as an input, and the Principle Component Analysis(PCA) is an useful method to shrink down a feature space. However, calculating Principle Components requires solving SVD for the feature space, which would not be done in practical running time. So instead of calculating all Eigen vectors for X*X’, we can use a simple trick here. Suppose that v is an eigen vector for X’*X which satisfy X’*X = l*v, then X*v is an eigen vector of X*X’ because X*X’*(Xv) = X*(X’Xv) = X*(lv) = l*Xv Therefore, we can speed up calculating eigen vector for X*X’ if we don’t need every eigen vector for X*X’. Calculating v is much faster since X’*X only has dimension of (input image #) by (input image #), which is going to be around 200 by 200 in this data set. Once we have eigen vector v for X’*X, we can use X*v for an eigen vector fo X*X’. After calculating principle components, we subtract mean from each image and project them to PC space, and use coefficients for linear regression process. This learning method is called the Principle Component Regression(PCR). Since we have 5 emotional categories, each emotion scores are used for the intensity for the image belonging to given emotion category. Also to classify 10 subjects in the image, I added 10 indicator flag(1/0) which indicates the subject of given input image. 4. Results Actor(subject) classification Error Emotion classification Error Red line : training error Blue line : test error X-axis : number of PC used for learning Data set was randomly divided into same size of train set and test set. Each set contains 90 input images. There are two types of rating for emotion score, which differ in use of fear score. In this project rating without a fear score is used. Actor(subject) classification For the actor classification, indicator of maximum value was picked and classified as a corresponding class. As the result given above, using more than 20 PC for the prediction showed 100% accuracy for both train set and test set. From this result, we can estimate that a distinguishing feature for an individual face is can be well represented by relatively small number of principal components. We can confirm this assumption by reconstructing a input image using PCs as below. <Input Image> <Reconstructed using 20 PCs> <Reconstructed using 10 PCs> <Reconstructed using 80 PCs> Above figure is a comparison between input image and reconstructed images. PCA is done by train set, and the input image is from test set. Because we have trained PC over the train set, input face is not precisely reconstructed even when we use large number of PCs(when we look at details of reconstructed images carefully by overlapping it with original image, we can detect that actual appearance has changed). However, by using more than 20 PCs reconstructed image looks fairly similar to original input. Considering that there are 10 different subjects in the data set, the fact that only 20 PCs is enough to reconstruct an overall face expression is rather impressive. Emotion classification For the emotion classification, class which has maximum score (input rating ranges over 1~6, each 5 category representing HAP, SAD, SUR, ANG, DIS) was picked. As the number of PC used for the prediction grows, accuracy has kept increased. Because we used a trick explained above to calculate eigen vectors, number of PC is limited by number of input data. We can only speculate that the error rate for train set has almost converged to 0 and the gradient of the test set is decreasing, using more PC might result to an overfitting. Though the accuracy of emotion classification is not as good as an actor classification, the result is reasonable considering that the nature of emotion is not as clear as subject classification. In fact, scores rated from human sense itself shows considerable ambiguity. Some input images are rated almost same in more than one category, and there is input image intended to be ‘neutral’, which does not fall to any emotion categories. Excluding those neutral images expected to conclude to the better result, but it did not show reasonable differences when experimented. To validate the PCA for emotional scores, additional reconstruction was done. After learning a PCA from input images, coefficients for PC related to emotional category could be calculated by weighting a coefficients vector from input image by emotional scores and averaging it. <Average face> <Average Happy face> <Average Sad face> <Average Surprised face> <Average Angry face> <Average disappointed face> Above result is reconstructed mean faces using coefficients calculated by emotional score weight. To show the effect clearly, each coefficient are multiplied by factor of 3 and added to the mean faces. As we can recognize clearly, each category shows a result which could be confirmed to be correct by the human sense. 5. Discussion Under a stable circumstance of input, facial recognition seems to work excellent using PCR. However, such a stabilizing is hard to obtain in a real world according to different environments such as light, pose, occlusions. Developing a robust learning model which could include all those variation would be another challenge. Classifying an emotional expression from face image would be a solvable problem too because human face shares some distinctive features over same kind of emotions. In this project, we couldn’t check the bottom line of the PCR learning method due to the limitation of input image number. Also one could improve the prediction accuracy by applying more complicated categorizing method instead of picking a best score among the emotion scores. i http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html http://www.ri.cmu.edu/projects/project_421.html iii http://www.kasrl.org/jaffe.html ii