View/Open - Sacramento

FACE RECOGNITION IN HYPER FACIAL FEATURE SPACE A Project Presented to the faculty of the Department of Computer Science California State University, Sacramento Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Computer Science by Amer Harb FALL 2013 © 2013 Amer Harb ALL RIGHTS RESERVED ii FACE RECOGNITION IN HYPER FACIAL FEATURE SPACE A Project by Amer Harb Approved by: __________________________________, Committee Chair Scott Gordon __________________________________, Second Reader Behnam Arad ____________________________ Date iii Student: Amer Harb I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the project. __________________________, Graduate Coordinator ___________________ Nikrouz Faroughi Date Department of Computer Science iv Abstract of FACE RECOGNITION IN HYPER FACIAL FEATURE SPACE by Amer Harb A proof of concept was achieved in this project, in which the dimensions of a hyperspace were built, denoted “face hyperspace”, out of facial features. Each dimension in that space represents a facial feature or sub-feature. A vector (point) in that pace can represent a face. The vector is to some degree tolerant to lighting and contrast variations. For every feature, classifiers were built that process feature images to produce a number representing a feature class. To recognize a face image, it is passed through a face feature extraction routine that identifies and extracts each face feature in an image separately. Next each image feature is passed into a classifier, which in turn generates group numbers representing the face features. Finally, a vector representing the image is built by concatenating the classifier outputs of all features. The vector is compared to vectors in an image database for a matching vector using Euclidean distance. _______________________, Committee Chair Scott Gordon _______________________ Date v TABLE OF CONTENTS Chapter Page 1 INTRODUCTION ........................................................................................................... 1 1.1 Motivation ........................................................................................................ 2 1.2 Limitations ........................................................................................................ 2 2 BACKGROUND ............................................................................................................. 3 2.1 Active Shape Model (ASM) ............................................................................. 3 2.2 Artificial Neural Network (ANN) .................................................................... 4 3 ACTIVE SHAPE MODEL (ASM) ................................................................................. 7 3.1 Shape Model ..................................................................................................... 7 3.2 Shape ................................................................................................................ 8 3.3 Aligning Shapes................................................................................................ 9 3.4 Procrustes Analysis .......................................................................................... 9 3.5 Align Shapes ................................................................................................... 10 3.6 Scaling shapes: ............................................................................................... 12 3.7 Rotate Shapes ................................................................................................. 14 3.8 Principal Components Analysis ..................................................................... 16 3.9 Profile Model .................................................................................................. 18 3.10 New Image Profile ........................................................................................ 21 4 EXTRACT FEATURE IMAGES ................................................................................. 23 4.1 Feature Attributes ........................................................................................... 25 vi 4.2 Build Neural Networks Training Data ........................................................... 27 4.2.1 Scale images .................................................................................27 4.2.2 Extract Input and Output training Data………….…….................27 4.3 Neural Networks ............................................................................................. 29 4.4 Neural Networks Topology ............................................................................ 29 4.5 Classifiers Collection...................................................................................... 30 4.6 Recognizing a face ......................................................................................... 31 4.7 Face Detection ................................................................................................ 33 4.8 Initialize Face Landmarks .............................................................................. 33 4.9 Locate Facial Features .................................................................................... 34 5 EXTRACT FEATURE IMAGES ..................................................................... ………37 5.1 Build Face Vector ........................................................................................... 37 5.2 Face Database ................................................................................................. 37 5.3 Face Vector Search ......................................................................................... 38 6 RESULTS ...................................................................................................................... 40 6.1 Neural Network Training Results ................................................................... 40 6.2 Generalization ................................................................................................. 41 6.3 Lighting and contrast tolerance ...................................................................... 43 7 CONCLUSION ............................................................................................................. 46 8 FUTURE WORK .......................................................................................................... 47 References………………...………………………………………………………………48 vii List of figures Figures Page 1: Neural network example showing a possible structure of a neural network having a fully connected input, one hidden and an output layers. .................................... 5 2: Face landmarks, from fei database ................................................................................. 8 3: Triangular shapes to demonstrate shape alignments .................................................... 10 4: Shapes been shifted to the same location ..................................................................... 11 5: Shapes are scaled to the same size ............................................................................... 14 6: Shapes rotated to the same orientation ......................................................................... 16 7: PCA algorithm .............................................................................................................. 17 8: Norm from a point to the image boundary ................................................................... 20 9: Points on the norm ........................................................................................................ 20 10: Profile vectors of a point and 3 adjacent neighbors on each side ............................... 21 11: Results of applying the feature locator on a face image ............................................. 22 12: Extract facial features sequence ................................................................................. 23 13: Inclination of upper eye lid ........................................................................................ 24 14: Feature image file name format .................................................................................. 25 15: Extract input and desired output from feature images to train neural networks ........ 28 16: Face image input training data sample ....................................................................... 28 17: Reading a face image methods in the developed system ........................................... 31 18: Loading the models in the developed system ............................................................. 32 viii 19: Face look up ............................................................................................................... 32 20: Face detection to locate and measure the face ........................................................... 34 21: Profile vectors of a point and 3 adjacent neighbors on each side ............................... 35 22: Face recognition processes ......................................................................................... 39 23: Original ....................................................................................................................... 44 24: Darkened by adding 30 to all pixel intensities ........................................................... 44 25: Darkened by adding 60 to all pixel intensities ........................................................... 44 26: Darkened by adding 90 to all pixel intensities ........................................................... 44 27: Blurred ........................................................................................................................ 45 28: Sharpened by laplace filter ......................................................................................... 45 ix 1 Chapter 1 INTRODUCTION Humans easily perform face recognition to identify each other. As computers and computer science advance through time, researchers gradually developed an interest in using computers for face recognition. Current technology lacks accurate performance especially if no constraints are imposed (e.g. pose, expression, lighting, occlusions, age). Humans on the other hand perform badly recognizing animal faces. Their good performance recognizing human faces is likely due to extensive training on human faces. This leads us to expect that computers, eventually, using the right technology and training method, will learn how to recognize human faces. Some industries showed early interest in face recognition, and many more developed interest as the technology gradually advanced. Examples of these industries are law enforcement, security, robotics, surveillance and auto identification. 2 1.1 Motivation Existing systems impose constraints on the images, such as having standard frontal pose with no expressions still, the performance of these systems are limited due the variation of light, shades and contrasts. Most face recognition algorithms end up in a kind of hyperspace search. Some of these algorithms implicitly create the dimensions in these spaces while others explicitly build them out of correlated or statistical data. Surprisingly, there does not appear to be a system that uses the facial features as the bases (dimensions) of their hyper face space. It is expected that using the facial features as the bases of the face space will? improve the performance and tolerate the variation of lighting and contrast in face images, because the dimensions of the space are built out of the information of the features rather than the intensity variation of the image pixels. 1.2 Limitations This project is limited to gray level images ([0 – 255] level range). Moreover, images should be in standard frontal pose with no expressions and no occlusions. 3 Chapter 2 BACKGROUND Some work on face recognition started as early as 1960’s but the first fully automated face recognition system was developed by Takeo Kenade in 1973 [1]. In the 1980’s, there was a number of prominent approaches that led successive studies by many researchers. Among these are the works by Sirovich and Kirby in 1986. Their methods were based on the Principal Component Analysis (PCA) which led to the work of Mathew Turk and Alex Pentland in 1992 [2], where they used eigenfaces for recognition. Other relevant methods that were used in many studies are deformable templates, Linear Discriminant Analysis (LDA), Independent Component Analysis (ICA), Statistical Shape and Appearance Models, local filters such as Gabor jets and the design of the AdaBoost cascade classifier Architecture for face detection [1]. 2.1 Active Shape Model (ASM) ASM is a statistical shape model “iteratively deform to fit to an example of the object in a new image, developed by Tim Cootes and Chris Taylor in 1995.” [Wikipedia] ASM has two models that are applied successively in multiple iterations to best fit the object shape. The first is a model of points related to each other that represents a shape. The second is a separate intensity profile of the neighbor pixels around every point in the first model. The models are constructed from training images of the object to be modeled. Points, that are numbered, will be placed on the landmarks and outline that represents the shape of the object on all training images. The points are placed in the same 4 corresponding location of each image. For every marked point on every image, a profile of the neighboring pixels around that point are gathered (this implementation uses the points along the tangent line to the object shape). To apply the algorithm, to search for an object on a new image, the points, at first, will be assigned initial positions where the two models will be iteratively applied as follows:  For every point, search the neighboring pixel intensities to update the point position to the location that best matches the model point profile. (applies to each point separately)  Apply the model shape, the first model mentioned above, to the new point positions and modify them if required to confirm to the whole shape. (Applies to all the points together). For more information on ASM refer to [Cootes] and [Dotterer]. 2.2 Artificial Neural Network (ANN) Artificial Neural Networks attempt to mimic biological neurons. They are built out of structured mathematical accumulator nodes, lined up in columns of fully interconnected parallel layers. Each node processes multiple weighted inputs to produce one output. The layers are stacked between two outer layers. The layer on the left is the input layer and the one on the right is the output layer. The middle layers are called hidden layers. Most ANN have one or two hidden layers. ANN’s are usually used as a nonlinear mapping 5 between input values and output values. They can be used to model complex relationships between inputs and outputs or to find patterns in data. Input Hidden Output Figure 1: Neural Network example showing a possible structure of a neural network having a fully connected input, one hidden and an output layers. Each node in a layer is fully connected to the nodes in the next layer by a weighted connection. The output at a node is the summation of the previous layers nodes multiplied by their corresponding weights. For a neural network to learn a certain function or a pattern in data, it needs to be trained. A number of training images and their corresponding desired outputs are presented to the neural network to be trained. In machine learning, this type of learning is called supervised learning. The number of training images is proportional to the number of classes and the input size. It ranges, usually, from a few hundred to a hundred thousand. The training is done by adjusting the weights of the node connections. This is accomplished in multiple iterations by feeding the input to the network and evaluating the 6 gradient decent of the error between the network output and the desired output, in a process called back propagation. 7 Chapter 3 ACTIVE SHAPE MODEL (ASM) ASM used to search for a certain object in images. The object is represented by a set of points placed on the landmarks and edges of the object. The set of point’s portrait the object, they are referred to as “shape”. ASM use two models iteratively to perform the search. One model, called Shape Model, governs the total shape of the points and their relative position to each other; while the other model, called Profile Model, is a point specific model that capture the intensity profile of neighbor pixels around every point in the Shape Model. The models are constructed from training images of the object to be modeled. For more information on ASM refer to [Cootes] and [Dotterer]. 3.1 Shape Model The Shape Model is composed of a group of points representing an object shape. The objective of the model is to account for the normal and natural variations of shapes of an object. Hence, the group of points should be located at landmarks or edges of the object in such a way that they implicitly depict the object. To accommodate different shape variations we use multiple shapes of the object to build the Model. In this case, our object is the human face. Therefore, we use multiple images of different human faces to build the model. 8 3.2 Shape The group of points on one particular object (face) is called a shape. In general, a shape can be a two or three dimensions. In this project, two-dimensional shapes are used. Therefore, each point in the shape have x and y coordinates. The shape can be represented as a vector by concatenating all points coordinates. For instance if we have three points in a shape: A, B and C; the shape vector would be xA yA xB yB xC yC. Other implementations of ASM concatenate all x values of all points followed by the y values. This project uses the former. The points are manually placed on the training face images. Each point has a number and represents a particular landmark or information on a face; therefore, it should be placed accurately on the same corresponding spot on each image. Figure 2: face landmarks, from FEI database 9 3.3 Aligning Shapes Shape points location may vary on different images. To capture the variation range of a point location it may have on different faces, we align the shapes that been manually put on all training images; then determine the range of variation for every point on the shape. Two shapes are considered to be aligned if a specific distance between them is less than a small threshold. This distance is called Frobenius Norm: Frobenius Norm =√∑n1(shapeAi − shapeBi )2 To align all shapes we need to perform a number of operations on all shapes. These operations are translation, scaling and rotation. One process that does all three operations is called Procrustes Analysis. 3.4 Procrustes Analysis Procrustes process in general is a process of fitting things together by stretching or cutting them. The name Procrustes refers to a rogue smith and bandit from Greek mythology. He had an iron bed. He invites travelers to sleep in this bed, and then made them fit it by either stretching their limbs or cutting them off. To simplify the explanation of Procrustes process, simple shapes will be used to demonstrate the process step by step. Consider the two triangles shown below in figure 3: 10 Figure 3: Triangular Shapes to demonstrate shape alignments 3.5 Align Shapes The objective of the whole process is to align these two shapes so they will end up as closely as possible having the same orientation, size and location. Thus if exactly similar shapes were aliened, they would end up on top of each other even if they have different size, orientation and location. Let us start the process for the two triangles shown above. All objects are centered on their center of mass. To bring an object center of mass to be at the origin (0,0) we subtract the coordinate mean from each corresponding point coordinate. This process called translation. The coordinate means can be computed by summing all similar coordinates of the shape points then divide the sum by the number of points. The example below demonstrate the steps: Compute the triangles Xmean (µx ) and Ymean (µy ): Green = [ (12,24), (28,22), (20,14)] Red (10,10), (12,8) ] = [ (8,6), Green X Means: µx = ∑n 1 xi n = 12+28+20 3 = 20 11 Green Y Means: µy = Red X Means: µx = Red Y Means: µx = ∑n 1 yi n ∑n 1 xi n ∑n 1 yi n = 24+22+14 = 8+10+12 = 6+10+8 3 3 3 = 20 = 10 =8 Shift shape center of mass to the origin x i = x i − µx and yi = yi − µy Green X’s : 12-20 = -8 , 28-20=8 , 20-20 = 0 Green Y’s : 24-20 = 22-20=2 , 14-20 =-6 Red X’s : 8 -10 = -2 , 10-10=0 , 12-10 = 2 Red Y’s : 6 - 8 = -2 , 10- 8=2 , 8 – 8 = 0 4 , The two translated triangles look as shown in figure 4 below: Figure 4: shapes been shifted to the same location 12 3.5.1 3.6 Scaling shapes: Bringing all shapes to the same size can be done in multiple ways, the easiest being to scale all shapes to unit size. This can be achieved by dividing each point coordinate by the corresponding length of that shape coordinate. This process is called normalization. Its computation is done as follows: ||X|| = √∑𝑛1(𝑥𝑖 )2 𝑥𝑖 = 𝑥𝑖 ||𝑋|| ||Y|| = √∑𝑛1(𝑦𝑖 )2 𝑦𝑖 = 𝑦𝑖 ||𝑌|| Translated green points = (-8, 4) Translated Red (-2, -2) points = ||Xgreen|| = √−82 + 82 + 02 ||Xgreen|| = √128 ||Xgreen|| = 11.314 ||Ygreen|| = √42 + 22 + −62 ||Ygreen|| = √56 ||Ygreen|| = 7.5 ||Xred|| = √−22 + 02 + 22 (8, 2) (0, 2) (0, -6) (2, 0) 13 ||Xred|| = √8 ||Xred|| = 2.83 ||Yred|| = √−22 + 22 + 02 ||Yred|| = √8 ||Yred|| = 2.83 −8 8 0 Scaled Xgreen =11.314 = -0.71 11.314 = 0.71 Scaled Ygreen = 4 7.5 2 = 0.53 7.5 −2 Scaled Xred =2.83 = -0.71 Scaled Yred =2.83 = -0.71 −2 11.314 = 0.27 0 2.83 2 2.83 −6 7.5 = 0 = 0.8 2 = 0 2.83 = 0.71 0 2.83 = 0 Scaled Translated green points = (-0.71, 0.53) Scaled Translated Red (-0.71, -0.71) (0.71, 0.27) (0, 0.8) points = (0, 0.71) (0.71, 0) The two scaled translated triangles look as shown in figure 5 below: = 0.71 14 12 12 10 10 88 66 44 22 22 44 66 88 10 10 12 12 14 14 Figure 5: shapes are scaled to the same size 3.7 Rotate Shapes Next choose one shape and rotate all other shapes to be within the smallest Frobenius Norm distance or within an acceptable threshold Frobenius Norm distance. To perform the rotation we use the singular value decomposition (SVD) of the matrix multiplication results of the transposed shape to be rotated multiplied by the shape the rotation will be toward. Rot shape = shape1* shapei T U∑ V t = SVD (Rot shape ) Apply to triangle example: Rot shape=(-0.71 0.53 0.71 0.27 0 first Rot shape = 0.5 -0.95 1 -0.19 0.8) * (−0.71 − 0.71 0 0.71 0.71 0)𝑇 15 15th Rot shape = 1.1 -0.46 -0.19 0.85 The rotation matrix rotates shapei around its center of mass (the origin) toward shape1is: Rot matrix = V U t The result of rotation is Rot shapei = shapei * Rot matrix Now we compute the Frobenius Norm distance to check if it is less than our predetermined threshold; if yes then aligning is done otherwise iterate the process of rotation using the last Rot shapei to rotate toward shape1 and continue this iteration till the Frobenius Norm distance is less than threshold. Frobenius = √∑n1(shape1 − Rot shapei )2 After 15 iterations using a threshold for Frobenius Norm = 0.5; its value was 0.314 Rotated Scaled Translated Red As shown in figure 6 below: points = (-0.94, 0.34) (0.64, 0.3) (0.3, -0.64) 16 12 12 10 10 88 66 44 22 22 44 66 88 10 10 12 12 14 14 Figure 6: Shapes rotated to the same orientation 3.8 Principal Components Analysis So far, we got all the training images aligned and having almost the same orientation, size and have their center of mass at the same location. The objective now is to capture all the variation in their shapes. The variation can be seen and harvested out of the different location values each point in a shape have relative to its corresponding points in the other shapes. If we build a matrix of all shape vectors arranged on top of each other; each column in the matrix would be the same point number coordinate for all shapes. Aligned green points =(-0.71, 0.53) (0.71, 0.27) (0, 0.8) Aligned Red points = (0.64, 0.3) (0.3, -0.64) Shapes Matrix (-0.94, 0.34) = (-0.71 0.53 (-0.94 0.34 0.71 0.27 0 0.64 0.3 0.3 0.8 ) -0.64) Notice that column 3 is the x coordinates of the second point in each shape. To satisfy our objective, we need to find the range of variation in each column, and how much each column affects other columns. Applying Principal Component Analysis (PCA) on this 17 Matrix will us these information. To apply PCA on the matrix above, we do the following as shown in figure 7: 1. Take the mean of every column 2. Subtract each value in the matrix from its corresponding mean. Sub Matrix = Matrix - Mean. 3. Compute the covariance matrix. Cov Matrix = Sub Matrix * Sub Matrix transpose 4. Find and sort the eigenvalues of the covariance matrix. 5. Choose the eigenvectors for the corresponding significant eigenvalues. 6. Build the data in terms of the selected eigenvectors Data in New dimension = eigenvectors transposed * Sub Matrix 7. Now we can generate new shapes using the selected eigenvector new Shape = Mean + eigenvectors * b b is a vector of coefficients, each corresponding to an eigenvector. They scale the contribution of their corresponding eigenvector in the new shape. b = eigenvectors transposed (new Shape – Mean) Figure 7: PCA algorithm To make sure the new shape is similar to the training shapes and deforms within acceptable limits; or in other words, to make sure the points on the new shape vary within the range of their corresponding point’s variation in all training images; we enforce a 18 limit on the values of the b vector. Cootes et. al. (2004; 1995) suggests that the deformation values bi should be bounded by the interval -3√λ𝑖 and + 3√λ𝑖 . λ𝑖 Is the eigenvalue that corresponds to the selected eigenvector i. PCA accomplished the job we required. Subtracting the corresponding mean out of the columns reveals the variations in that corresponding point. The covariance matrix computes the effect of the variation of one point on the others. Choosing the significant eigenvectors will enable us to create new bases (dimensions) for shapes space that has less dimensions and reduces the noise, which originally existed in the data in terms of the insignificant eigenvectors. The new bases constitute the Shape Model of the training images. The exact criteria of choosing the optimum number of eigenvectors are application dependent. After sorting the eigenvalues, if there is a clear cut between two values where the difference is relatively big, then that could be an obvious choice, otherwise empirical testing would determine the right choice of eigenvectors numbers to choose. 3.9 Profile Model To decide the location of the points on a new face, we use a face detector to locate the face on an image. The face detector used is the Haar Classifier Cascade available in openCV. It returns the smallest square box which fits in the face. Using the width of the box returned, we calculate the ratio of the new face width to the shape model width. We 19 use the ratio to scale the points mean of the model to fit the new face measurement. The new points obtained are the initial locations of the points on the new face. Due to the difference of face shapes, the spatial ratios and locations of the landmarks to each other vary from one face to another. Therefore, the calculated initial points are not the right location of the face landmarks we seek; but are within the vicinity of those landmarks. We need certain information for each point to help locate the points on any face. For that reason, we gather this information from the training data. For each point, we build a profile of intensity variation for the neighbors around a point. This can be implemented in multiple ways; in this project, a simple implementation was used that builds a normalized intensity gradient along the orthogonal line from the point to the shape boundary. The gradient in this sense expresses the difference of intensity from one pixel to the other, where the value at n equals the intensity at n minus the intensity at (n1). After obtaining all the vector values, we normalize the vector by dividing each element by the sum of all vector element absolute values. The use of the normalized intensity gradient equalizes the variation in images lighting and contrast. Around every point along the orthogonal line to the image boundary, we choose some number of points. In this implementation, we choose three on each side, plus the point itself, which makes the number of elements in the vector equal seven, as shown in the figure below. To find the orthogonal line to the image boundary (or the line the landmark represents) at a certain point (C), we use two other points (A and B) around point C. We find the perpendicular line from point C to the line connecting A and B as described in figure 8. 20 B dd A C Figure 8: Norm from a point to the image boundary So far, for each point in all training images, we built a vector of seven elements as shown in figure 9. Next, we compute the mean µ for each element in a vector across all images. Like always, when we have multiple means related to one object (point), we compute the covariance matrix 𝑐𝑚 to show the inter dependencies of these elements on each other. The size of the covariance matrix for each point is 7 X 7. Since we have 46 points in a shape, we would have 46 vectors and 46 X 7 elements thus 46 X 7 means and 46 covariance matrixes. Figure 9: points on the norm 21 3.10 New Image Profile When searching a new image to find the right location for all landmarks; we start with the initial locations, discussed above. For every point, we build multiple profiles, the same way we did for training images, along the point and orthogonal to the image boundary. The difference here is, we build multiple profiles one along the point itself and others along neighbors on each side of the point. This implementation builds three on each side. Along every profile, we use the previous pixel, before the profile first pixel, to calculate the gradient intensity at the first element in the vector as shown in figure 10. Previous pixels before the profile First profile vecotr P1 Figure 10: profile vectors of a point and 3 adjacent neighbors on each side We compare these profiles Pi to the mean profile (model profile) we built for the corresponding point at the training time. We calculate the Mahalanobis distance between these profiles and the model profile where: Mahalanobis distance = (𝑃𝑖 − µ )𝑇 𝑐𝑚 −1 (𝑃𝑖 − µ) 22 We choose the profile with the smallest distance and set the middle point in that profile to be the new suggested point for the landmark considered. We calculate all new suggested locations for all landmarks, and then we apply the Model Shape to fix the whole shape to be a valid shape consistent with the model; which may modify some of the suggested landmark locations. We continue applying these two models in iterations until the max iteration count is reached or the number of points shifted, with value greater than shift threshold, are less than points shifted threshold. Figure 11: results of applying the feature locator on a face image 23 Chapter 4 EXTRACT FEATURE IMAGES After building the ASM models, we use them to extract facial features from the training face images. We read training images one at a time and put features into their dedicated folders. For each feature we use the coordinates of specific landmarks, those that are on or near the boundary of the feature, to find the minimum and maximum x and y values of the subarea of the face image the feature image to be extracted; figure 12 depict this process. Read training images one at a time. For each image, extract its facial features. Save each feature image in its corresponding folder. Figure 12: extract facial features sequence Feature images need to be juxtaposited and classified into numbers that order the features according to their similarities. Using one number to represent a feature will not work, because all features have multiple attributes. Different faces may have, for one feature, a variety of resemblance among attributes, where some are similar in some attributes but are different in others. To accurately represent a feature, we should represent almost all its attributes by numbers that characterize them and express their information in some mathematical representation. Once we calculate all the numbers symbolizing the 24 attributes of a feature, we concatenate all the numbers together. In other words, we need to represent each feature by a vector rather than a number. The attributes are features information like height, width, inclination, protrudes etc. A digit or a group of concatenated digits shall represent each attribute. All digits range from 1 (low) to 5 (high). The inclination measures the average skew angle of a certain feature shape and will be represented as follows: 1 represents 60 degrees, 2 represent 30 degrees, 3 represent zero degree, 4 represent -30 degrees and 5 represent -60 degrees. Inclinations 1: / 2: 3: 4: 5: \ Width, thickness 1: 2: 3: 4: 5: 2 2 3 4 4 Figure 13: inclination of upper eye lid Juxtaposition and classifying the features is a manual process. A person shall go to a folder with a feature from all training images, open each image separately, then decide on the right numbers that represent the attributes been chosen for that feature. Finally, append this attribute vector to the end of the image file name. An example is shown in figure 14: 25 $178a-22344_53332_543423.jpg The original image name is in red color while the appended class string in green color Figure 14: feature image file name format 4.1 Feature Attributes All feature attributes considered and their digit description are listed below. Multiple dashes for an attribute means breaking that attribute in adjacent segments. Eyes 16 digits representing it as follows: Upper lid Inclination Lower lid Inclination Width Height Eye inclination Protrudes Eye bags Smoothness --------- Eyebrows 13 digits representing it as follows: Inclination Width Dense Length Smoothness --------- Left half Lips 32 digits representing it as follows: Upper lip Inclination Upper lip width Lower lip Inclination Lower lip width Lips meet inclination Length Smoothness -------------------------- 26 Nose 12 digits representing it as follows: Width Length Nose tip size Nose tip rise Septum width Septum height Septum protrudes Nostril size Nostril base Smoothness --- Forehead 8 digits representing it as follows: Protrudes Middle Protrudes Width Length ----- Left Chin 9 digits representing it as follows: Chin edge Lip to chin protrudes -------- Cheeks 5 digits representing it as follows: Width Protrudes Smoothness --- Left face edge 10 digits representing it as follows: Upper side inclination Lower side inclination --------- 27 4.2 Build Neural Networks Training Data By now, all feature images have the vector classes representing their features appended to the end of file names. The next step is to read each feature folder and build the neural network training data. 4.2.1 Scale images The images, belonging to one feature, do not have the same image size. By evaluating the images heights and widths, we choose one height and width to scale all images for a feature to that size. In general, we step down the dimension of images 2 to 5 times in order to reduce the training and processing time of the neural networks. 4.2.2 Extract Input and Output training Data After scaling the images down, we scale the intensity values to the range 0-1 by dividing every value by 255. Next, we extract the scaled pixels intensities into an array of a type double (double[][]). Together with the input array, we extract from the image file name the feature attributes class vector, to be the desired output to train the neural networks. A summary of the process is shown in figure 15, and a sample set of training inputs is shown in figure 16. 28 Extract the output classes from their image name. pass each class with the input to a neural net Extract data from all scaled images, for one feature, in a double[][] Read each feature folder and scale all images to the same size. Figure 15: extract input and desired output from feature images to train neural networks 0.9615384615384616 0.2076923076923077 0.19615384615384615 0.24615384615384617 0.23846153846153847 0.08846153846153847 0.11923076923076924 0.35 0.4115384615384615 0.4307692307692308 0.4653846153846154 0.4653846153846154 0.46153846153846156 0.21153846153846154 0.18461538461538463 0.10384615384615385 0.43846153846153846 0.16538461538461538 0.11538461538461539 0.45 0.17307692307692307 0.3038461538461538 0.4 0.43846153846153846 0.4576923076923077 0.47307692307692306 0.4846153846153846 0.48846153846153845 0.48846153846153845 0.9615384615384616 0.16538461538461538 0.15 0.23076923076923078 0.18846153846153846 0.12307692307692308 0.1076923076923077 0.34615384615384615 0.41923076923076924 0.4461538461538462 0.45384615384615384 0.46923076923076923 0.45384615384615384 0.46153846153846156 0.4576923076923077 0.45 0.16923076923076924 0.15 0.18846153846153846 0.16538461538461538 0.07307692307692308 0.17307692307692307 0.3230769230769231 0.3269230769230769 0.3 0.27692307692307694 0.22692307692307692 0.2 0.1576923076923077 0.1576923076923077 0.18846153846153846 0.16538461538461538 0.16538461538461538 0.16923076923076924 0.13846153846153847 0.05384615384615385 0.21923076923076923 0.36538461538461536 0.33076923076923076 0.24615384615384617 0.12307692307692308 0.15 0.23076923076923078 0.18846153846153846 0.12307692307692308 0.1076923076923077 0.34615384615384615 0.41923076923076924 0.4461538461538462 0.45384615384615384 0.46923076923076923 0.45384615384615384 0.46153846153846156 0.4576923076923077 0.45 0.16923076923076924 0.15 0.18846153846153846 0.16538461538461538 0.07307692307692308 0.17307692307692307 0.3230769230769231 0.3269230769230769 0.3 0.27692307692307694 0.22692307692307692 0.2 0.1576923076923077 0.1576923076923077 0.18846153846153846 0.16538461538461538 0.16538461538461538 0.16923076923076924 0.13846153846153847 0.05384615384615385 0.21923076923076923 0.36538461538461536 0.33076923076923076 0.24615384615384617 0.12307692307692308 Figure 16: face image input training data sample 29 4.3 Neural Networks To implement the neural network, the Encog neural network framework was used, which is a library of classes provided for Java and other languages [Heaton]. Encog provides many training techniques like back propagation and others. For this project the Resilient Propagation Training was used instead of the backpropagation because, as described by [Heaton], it is more efficient and requires fewer parameters that need calibration. Both training methods use gradient of the error between the network output and the desired output. The way they use the gradient is different: in backpropagation, the value of the gradient is used in calculate the new weight values, while in resilient propagation, the sign (direction) of the gradient is used rather than the value of the error, to change the new weights by magnitudes that change as the training progresses. 4.4 Neural Networks Topology The number of inputs and outputs of the neural net, for every feature class, is absolutely determined by the size of the scale feature images and the number of attribute class digits. On the other hand, the number of hidden layers and their size was determined by trial and error, as is usual when using neural networks. Three hidden layers were used for all neural network classifiers, having their size related to their input size where the first layer is one thirtieth of the input size, the second layer is one fortieth of the input size and the third is one fifty of the input size. The code for implementing the neural networks is shown below: 30 network = new BasicNetwork(); network.addLayer(new BasicLayer(input[0].length)); network.addLayer(new BasicLayer(input[0].length/30)); network.addLayer(new BasicLayer(input[0].length/40)); network.addLayer(new BasicLayer(input[0].length/50)); network.addLayer(new BasicLayer(desiredOutput[0].length)); network.getStructure().finalizeStructure(); network.reset(); // create training data NeuralDataSet trainingSet = New BasicNeuralDataSet(this.input, this.desiredOutput); final Train train = new ResilientPropagation(network, trainingSet); int epoch = 1; do { train.iteration(); epoch++; } while(train.getError() > 0.005 && epoch < 7000); 4.5 Classifiers Collection To compute the numbers that represent each attribute for a feature, we need to build a classifier for each attribute, where we pass the feature image to that classifier which in turn produces a number that characterizes the corresponding feature attribute. We loop through all the folders of feature training images and use the image file names to determine the number of classes each feature has. We build a neural net classifier for all attributes and store them in a map collection of maps: Map<Integer, Map< Integer, EncogNeuralNat >>. The outer map is the feature number mapped to its attributes, while the inner map is the attribute mapped to its neural net classifier. 31 4.6 Recognizing a face The developed system can handle two ways of image entries for image faces to be recognized. The first is by opening an image file stored on the file system, and the second is by capturing the image using the computer camera; figure 17 shows the UI marking the buttons to do that. Figure 17: reading a face image methods in the developed system The first time the system is run, the ASM models and the neural network classifier objects have to be loaded by pressing the load model from file button and load classifier button. The loading of these models need to be done only once every time the system starts. ; figure 18 shows the UI marking the buttons to do that. 32 Figure 18: loading the models in the developed system Once the image is entered by either way, to recognize the image, the lookup face button is pressed which will start the recognition process. See figure 19 below. Figure 19: face look up 33 4.7 Face Detection The first step in the recognition process is to detect the face in the image at hand, using the face detection functions provided by OpenCV. “OpenCV stands for Open Source Computer Vision. It was originally started by Intel back in the late 90s and is currently released under the open source BSD license. While it is mainly written in C, it has been ported to Python, Java, and other languages. In Java, it is available through JavaCV, which is a wrapper that calls the native functions” [TK Gospodinov]. OpenCV uses Paul Viola Haar-like features face detector. The face detector returns the coordinates of a box that fits the face. Hence, the dimensions of the box are the same as the face detected. 4.8 Initialize Face Landmarks Using the dimensions attained by the face detector, we scale the ASM Shape Model mean points to the size of the detected face and use the resultant points as the initial points on the detected face. The points will not be on the right landmarks where they should be, but they would have the correct face shape size. Therefore, we iteratively apply the ASM models to reach to the correct point locations. Figure 20 shows the UI after pressing the button “Detect Face” for a particular face image. 34 Figure 20: face detection to locate and measure the face 4.9 Locate Facial Features For each initial point on the image face to be recognized, we build multiple profiles, the same way we did for training images, along the line through the landmark and orthogonal to the image boundary. The difference here is, we build multiple profiles one along the point itself and others along the neighbors on each side of the point as shown in figure 21. In this implementation, three were built on each side. Along every profile, we use the previous pixel, before the profile first pixel, to calculate the gradient intensity at the first element in the vector. 35 Previous pixels before the profile First profile vecotr P1 Create 7 profile vectors, one along the point itself and 3 on both sides Figure 21: profile vectors of a point and 3 adjacent neighbors on each side The distance between a search profile P and the model mean profile µ is calculated using the Mahalanobis distance = (𝑃𝑖 − µ )𝑇 𝑐𝑚 −1 (𝑃𝑖 − µ). 𝑐𝑚 is the covariance matrix of the corresponding landmark in the training images mean variation profiles matrix. The profile that produces the smallest Mahalanobis distance will be chosen and consequently the mid-point along that profile will be the new suggested landmark point for the current iteration. Once all the new suggested points for all landmarks are calculated, we apply the shape model to check if the new shape is a valid shape according to the training images shape model. This is accomplished by checking the values (coefficients) of the vector representing the new suggested shape computed out using the Shape Model eigenvectors and means as follows: b = eigenvectors transposed (new Shape – Mean) Where b is a vector of coefficients, each corresponding to an eigenvector. They scale the contribution of their corresponding eigenvector in the new shape. Cootes et. al. (2004; 36 1995) suggests that the deformation values bi should be bounded by the interval -3√λ𝑖 and + 3√λ𝑖 . λ𝑖 Is the eigenvalue corresponding to the selected eigenvector i. To force the new suggested shape to be consistent with the shape model, we hard limit every value in the vector to the range -3√λ𝑖 and + 3√λ𝑖 . If a value in a column in vector b is greater than three times the square root of the corresponding eigenvalue for that column, we set that value of that column to + 3√λ𝑖 . Similarly, if a value of a column in vector b is less than the negative of the square root of the corresponding eigenvalue for that column, we set that value of that column to - 3√λ𝑖 . Since vector b got modified by trimming its column that exceeded the ranged specified, we recomputed the new suggested shape using the new b vector as follows: new Shape = Mean + eigenvectors * b This may cause some landmarks to have newer suggested positions; hence we reapply the points profile model to search around the newly suggested points if there is a neighbor profile that better matches the model profile for that point. We continue applying these two models iteratively until the max iteration count is reached or the number of points shifted (with value greater than shift threshold) are less than points shifted threshold. 37 Chapter 5 EXTRACT FEATURE IMAGES Once the loop of applying the two ASM models finishes, we get to the closest position possible to the landmarks on the face we are looking up. Next we extract the feature images into images of their own. The objective is to find, for every feature, two points on a face that marks the upper left corner and lower right corner of the smallest possible rectangle that fits the feature. For every feature, we determine a group of landmarks that pound the feature location on a face; we always use these same landmarks for the corresponding feature. We calculate four values out of the coordinates of these points: the minimum x, minimum y, maximum x and maximum y coordinates. The minimum x and y values will be used as the upper left corner and the maximum x and y will be used as the lower right corner of the feature boundary rectangle. 5.1 Build Face Vector After extracting feature images, we loop over them and loop over the classifier collection passing every feature image to the all the attribute classifiers belong to it. We collect the output numbers the classifier generates and concatenate them into a vector that represents the face. We use the generated vector to search a database of face images. 5.2 Face Database A database was built that has among other tables a face table. Each record in the face table is for a person face. The record contains information about the person and includes 38 the face image and the face vector, which is built using the output of the feature classifiers. Initially records in the table were from the training face images. Thereafter, whenever a miss on a face lookup happen, the application will prompt the user if he/she wishes to add the person to the database. If the user responds yes to add the person, he/she will be directed to a screen where additional information on the person to be added is gathered. 5.3 Face Vector Search The vector we built is a point in the hyper facial features space. Each column in the vector is a coefficient of a dimension in that space. The vector can be viewed as a line from the origin of that space to the point. Two vectors match, or represent the same face, if they are close to each other (have a small distance between them) and not necessarily have exactly the same column values. Finding the distance between two vectors (two points in a multi dimension space) is computed by finding the Euclidean distance, which equals the length of the line connecting the two points and computed using the Pythagorean Theorem. Euclidean Distance = √∑𝑛𝑖=1(𝐴𝑖 − 𝐵𝑖 )2 Finding the distance between two points in one dimension is simply finding the difference between the points values for example the distance between points A = 3 and point B = 8 is 8 – 3 = 5. Finding the Euclidean Distance for the same points = √(8 − 3)2 = 8-3 = 5. 39 Finding the distance between two points in two dimensions, between points A(2,3) and B(6,9) is √(6 − 2)2 + (9 − 3)2 = √52. To search a face in the database: we produce the vector representation of that face then compute the Euclidean distance with all other face vectors in the database. We choose the face having the closest distance, within a threshold, with the face we are looking up. Figure 22 shows a diagram of the process flow of face recognition. Read face image Find neighbor intensity profile around each point Extract Feature images Detect face Initialize face landmark points Choose a neighbor point if its profile better matches the corresponding model point profile Apply the model to adjust the whole points shape Pass each feature image to Neural Net Classifier Use Classifiers output to build image vector then lookup database for nearest Euclidian distance Figure 22: Face recognition processes 40 Chapter 6 RESULTS We have two objectives we are trying to accomplish in this project: 1. Use the facial features as bases for the hyper face space, where a vector in that space can uniquely represent a face. 2. The face vector is, to a degree, impervious to the lighting and contrast variation. We will start be evaluating the neural network training results which affects the overall result of the project. 6.1 Neural Network Training Results The training iteration continues until the error is reduced to 0.005 or the maximum training iteration of 7000 is reached. Classifiers that did not reach the error threshold, most likely, did not learn the class they have been trained to learn. Although, even when most classifiers training iteration terminated by achieving the error threshold, they do not necessarily generalize or perform well on other non-seen images. The final error and the number of iteration resulted from training all the classifiers are listed below: 41 featureNumber 0 C:\left_eyes_Upper_Lid Epoch #391 Error:0.004994191524030874 Epoch #104 Error:0.004984493529422337 Epoch #265 Error:0.004985108468826185 Epoch #484 Error:0.0049929719580387765 Epoch #460 Error:0.004984689654502988 featureNumber 1 C:\left_eyes_Lower_Lid Epoch #827 Error:0.004986937733546353 Epoch #971 Error:0.00499552864373195 Epoch #5210 Error:0.0049999223714012235 Epoch #1504 Error:0.004999113098132049 Epoch #849 Error:0.004998021528461399 Epoch #7000 Error:0.0191719140465112 Epoch #798 Error:0.004992879184250633 Epoch #530 Error:0.004989525345879893 Epoch #1262 Error:0.004995976382873747 Epoch #7000 Error:0.00677015938958029 Epoch #614 Error:0.0049977379423736265 featureNumber 2 C:\left_eyebrows Epoch #360 Error:0.004980656237549085 Epoch #7000 Error:0.007036987719454902 Epoch #376 Error:0.004981622225126858 Epoch #275 Error:0.004965901982859265 Epoch #287 Error:0.004995343525756889 Epoch #595 Error:0.004990825343522399 Epoch #7000 Error:0.17772509850215729 Epoch #611 Error:0.004984436210732808 Epoch #819 Error:0.004995913313891747 Epoch #686 Error:0.004993121414198667 Epoch #860 Error:0.004990753706857306 Epoch #384 Error:0.00499366749923531 Epoch #601 Error:0.004987558523469151 featureNumber 3 C:\upper_lips Epoch #5991 Error:0.004998927140866537 Epoch #2795 Error:0.004999661918578198 Epoch #881 Error:0.00499867228063733 Epoch #7000 Error:0.005961182177266967 Epoch #7000 Error:0.019647908671887468 Epoch #1247 Error:0.004997730465916958 Epoch #7000 Error:0.019667583174116997 Epoch #4596 Error:0.004999268018850392 Epoch #7000 Error:0.12192198322262622 Epoch #7000 Error:0.2747058146029274 Epoch #7000 Error:0.15185164589227265 Epoch #7000 Error:0.4668459633858304 featureNumber 4 C:\lower_lips Epoch #7000 Error:0.012662371017759608 Epoch #548 Error:0.004988885991709977 Epoch #526 Error:0.0049933770846231395 Epoch #941 Error:0.004991239325289801 Epoch #1603 Error:0.004998223257053886 Epoch #11 Error:0.0018117435133911375 Epoch #7000 Error:0.13907564763869243 Epoch #7000 Error:0.020912792428856374 Epoch #7000 Error:0.4245340767894438 Epoch #7000 Error:0.007688870684335883 Epoch #7000 Error:0.012568393085617127 Epoch #7000 Error:0.011440157043761789 Epoch #1162 Error:0.004998001437164182 Epoch #4252 Error:0.004999434530689529 Epoch #6012 Error:0.004999532729013137 Epoch #7000 Error:0.04069135814238408 Epoch #770 Error:0.004998132157863931 Epoch #130 Error:0.0049819863543390085 Epoch #7000 Error:0.06943209890660564 Epoch #7000 Error:0.03361610332221863 featureNumber 5 C:\noses Epoch #7000 Error:0.07157927286693923 Epoch #7000 Error:0.06733283516745575 Epoch #7000 Error:0.05136952769052274 Epoch #7000 Error:0.09643124024979384 featureNumber 6 C:\lower_noses Epoch #7000 Error:0.45334147618528975 Epoch #7000 Error:0.33231235914052365 Epoch #7000 Error:0.07814296249175354 Epoch #7000 Error:0.13780606729853667 Epoch #7000 Error:0.6131983652534196 Epoch #7000 Error:0.20341779451847417 Epoch #7000 Error:0.3372756020290337 Epoch #7000 Error:0.09376213622449585 featureNumber 7 C:\foreheads Epoch #7000 Error:0.007711066513603634 Epoch #2777 Error:0.004999122403146206 Epoch #7000 Error:0.00675278630918557 Epoch #7000 Error:0.2073878013587105 Epoch #7000 Error:0.11383902379764574 Epoch #7000 Error:0.008530828794424915 Epoch #1205 Error:0.004995181644837944 Epoch #5252 Error:0.004999958294297756 featureNumber 8 C:\chins Epoch #1487 Error:0.004999785912154509 Epoch #709 Error:0.0049748791968673165 Epoch #341 Error:0.004988587575850116 Epoch #7000 Error:0.019838039501158037 Epoch #353 Error:0.00498672604896285 Epoch #8 Error:0.004309270470980495 Epoch #760 Error:0.004995590228877071 Epoch #2357 Error:0.0049988521785205 Epoch #534 Error:0.004982904642091496 featureNumber 9 C:\cheeks Epoch #7000 Error:0.03003274451185928 Epoch #7000 Error:0.01547748415994407 Epoch #7000 Error:0.2984376373571743 Epoch #7000 Error:0.17093035708830454 Epoch #4028 Error:0.0049998166749983565 featureNumber 10 C:\upper_left_faces Epoch #611 Error:0.004988326771541779 Epoch #445 Error:0.004991463334444358 Epoch #480 Error:0.0049893915371665735 Epoch #7000 Error:0.005209573790832722 Epoch #311 Error:0.004986990788078784 featureNumber 11 C:\lower_left_faces Epoch #7000 Error:0.005510204084803682 Epoch #124 Error:0.004998436710232547 Epoch #476 Error:0.0049723089464967975 Epoch #7 Error:0.002414108153429113 Epoch #68 Error:0.004937368026882365 Traning is done 6.2 Generalization Not all classifiers were successfully trained, by achieving the required threshold for the output error, as is clear from the error listing above. Almost, more than 95 percent did 42 train successfully but later did not generalize well. The training images for certain feature information should be constrained to include that feature information as isolated as possible. This means we have to precisely determine the coordinates on the face image where the feature sub image will be extracted. This is required to reduce the clatter of information in that sub image to simplify the neural network training. The feature locator tool (ASM) we implemented calculates the coordinates of the features. The tool accuracy is not great due to its simplistic implementation; hence, the resulting sub images contain more or less information than they should. In addition to that, human error in the manual process of assigning the desired classes to the training images, contributes to some of the training problems. To evaluate the performance on multiple images, I used the training images with an added shift to the feature sub images coordinate to have a different input image. I passed the feature images to the neural network classifiers then I compared their output to the desired output on the images file name and aggregated the matching ones to be displayed as Feature Num 0 cls Num 0 val 62 cls Num 1 val 117 cls Num 2 val 92 cls Num 3 val 98 cls Num 4 val 84 Feature Num 1 cls Num 0 val 4 cls Num 1 val 5 cls Num 2 val 50 cls Num 3 val 3 cls Num 4 val 1 cls Num 5 val 11 cls Num 6 val 16 cls Num 7 val 11 cls Num 8 val 35 cls Num 9 val 32 cls Num 10 val 24 follows: Feature Num 2 cls Num 0 val 34 cls Num 1 val 65 cls Num 2 val 45 cls Num 3 val 59 cls Num 4 val 38 cls Num 5 val 35 cls Num 6 val 33 cls Num 7 val 4 cls Num 8 val 46 cls Num 9 val 9 cls Num 10 val 25 cls Num 11 val 43 cls Num 12 val 40 43 Feature Num 3 cls Num 0 val 103 cls Num 1 val 63 cls Num 2 val 85 cls Num 3 val 144 cls Num 4 val 35 cls Num 5 val 35 cls Num 6 val 4 cls Num 7 val 44 cls Num 8 val 36 cls Num 9 val 24 cls Num 10 val 74 cls Num 11 val 10 Feature Num 4 cls Num 0 val 62 cls Num 1 val 2 cls Num 2 val 8 cls Num 3 val 82 cls Num 4 val 28 cls Num 5 val 34 cls Num 6 val 42 cls Num 7 val 3 cls Num 8 val 84 cls Num 9 val 40 cls Num 10 val 41 cls Num 11 val 8 cls Num 12 val 12 cls Num 13 val 3 cls Num 14 val 26 cls Num 15 val 163 cls Num 16 val 140 cls Num 17 val 33 cls Num 18 val 8 Feature Num 5 cls Num 0 val 103 cls Num 1 val 105 cls Num 2 val 106 cls Num 3 val 32 Feature Num 7 cls Num 0 val 25 cls Num 1 val 53 cls Num 2 val 69 cls Num 3 val 67 cls Num 4 val 29 cls Num 5 val 38 cls Num 6 val 97 cls Num 7 val 81 Feature Num 8 cls Num 0 val 103 cls Num 1 val 136 cls Num 2 val 89 cls Num 3 val 30 cls Num 4 val 136 cls Num 5 val 127 cls Num 6 val 74 cls Num 7 val 82 Feature Num 9 cls Num 0 val 37 cls Num 1 val 58 cls Num 2 val 79 cls Num 3 val 41 cls Num 4 val 50 Feature Num 10 cls Num 0 val 91 cls Num 1 val 124 cls Num 2 val 110 cls Num 3 val 134 cls Num 4 val 152 Feature Num 11 cls Num 0 val 4 cls Num 1 val 3 cls Num 2 val 4 Feature Num 6 cls Num 0 val 66 cls Num 1 val 93 cls Num 2 val 101 cls Num 3 val 9 cls Num 4 val 3 cls Num 5 val 95 cls Num 6 val 58 6.3 Lighting and contrast tolerance To evaluate the tolerance to lighting and contrast, we generate a vector representation for multiple images of one face having different lighting and contrast values. To simulate the different lighting we add a constant value to all pixel intensity of an image to darken it. We did this for three images where we added 30 to one image, 60 to the second and 90 to the third. To simulate contrast variation, we blurred one image using Gaussian-smoothing filter and sharpened another using Laplace filter as shown in the figures (23-28) below. 44 Figure 23: original Figure 24: darkened by adding 30 to all pixel intensities Figure 25: darkened by adding 60 to all pixel intensities Figure 26: darkened by adding 90 to all pixel intensities 45 Figure 27: blurred Figure 28: sharpened by Laplace filter The vectors generated for each of the images are listed below. Each row represents one vector. The order of how the vectors are listed is as follows: original, darkened30, darkened60, darkened90, blurred and sharpened. 22344 22344 23344 13324 22344 34325 44322312322 43432351423 44423321224 52423422323 43422412433 42432411224 1244311323333 1354421354341 1254311454331 1354315554312 1244321314344 2324431112311 133514333453 222314153453 132524343453 243524113453 133514333453 222324153453 34443224331513543323 34443224324513543333 54433224354543523323 34433224355543423323 34433224331413543323 44532224315243533333 2233 2233 2233 2233 2233 2233 45322223 45322223 45322223 45322223 45322223 45322223 32344531 21344554 32444335 42544345 31344534 13344533 543432121 544352112 543522112 434552312 542432121 525222312 43333 42333 42333 32334 43333 24331 11111 11111 11111 11111 11111 12211 55534 55534 55434 55434 55534 54534 The tolerance to the lighting and contrast variation is shown by maintaining the same class number of a certain feature information, as shown above along columns in the vector representations. 46 Chapter 7 CONCLUSION Some feature classifiers completed its neural network training successfully by reaching an error less than 0.005 before the maximum number of iteration reached. Most of these classifiers were able to achieve both objectives: of uniquely identifying the feature and have the same or close representation for the same face under different conditions of lighting and contrast. Other feature classifiers failed to complete their neural network training successfully for one or more reasons among the following:  Manual errors in assigning the right desired class to the training image.  feature images have too many information. This can be fixed by finding the smallest sub-image containing the feature to be classified.  The classes assigned to feature information are not distinctly accurate grouping.  Some feature information cannot be learned successfully without additional image processing techniques to embolden feature information.  The feature locator tool implemented in this project is a simple one, which, sometimes, renders inaccurate feature locations. 47 Chapter 8  FUTURE WORK Use variable weighted strength multiplier for each feature that depends on how much information that feature contributes to identify the face and also on how stable the feature shape with different face expressions.  Use Active Shape Model (ASM) with more points that clearly circumferences all facial features.  Measure the changes of feature shapes with different expressions.  Use colored images  Allow occlusions (e.g. eyeglasses , hats). 48 REFERENCES 1. Anthony Dotterer. “Using Procrustes Analysis and Principal Component Analysis to Detect Schizophrenic Brains”. 2006 http://www.personal.psu.edu/asd136/CSE586/Procrustes/report/Using%20PA%20 and%20PCA%20to%20Detect%20Schizos.pdf 2. FEI Face Database. http://fei.edu.br/~cet/facedatabase.html. Face images taken between June 2005 and March 2006 at the Artificial Intelligence Laboratory of FEI in São Bernardo do Campo, São Paulo, Brazil. 3. Jeff Heaton Introduction to Encog2.5 for Java. 2010. http://www.heatonresearch.com/dload/ebook/IntroductionToEncogJava.pdf. 4. Julio C´esar Pastrana P´erez .Active shape models with focus on overlapping problems applied to plant detection and soil pore analysis. http://www.bgthannover.de/homepagethesis/pastrana_phd_2012.pdf. 2012 5. Lindsay I Smith. A tutorial on Principal Components Analysis. 2002 http://www.google.com/url?sa=t&rct=j&q=a%20tutorial%20on%20principal%20 components%20analysis&source=web&cd=1&cad=rja&sqi=2&ved=0CDQQFjA A&url=http%3A%2F%2Fwww.ce.yildiz.edu.tr%2Fpersonal%2Fsongul%2Ffile% 2F1097%2Fprincipal_components.pdf&ei=dORYUayNHuf9iwKi74GYAQ&usg =AFQjCNFAAD718BgyS8tVYTRLpcLjXaRfsA&bvm=bv.44442042,d.cGE. 6. Stan, Li and Anil, Jain. “Handbook of Face Recognition”. Springer 2011, 2nd edition. Print. 7. Tim Cootes. http://personalpages.manchester.ac.uk/staff/timothy.f.cootes

View/Open - Sacramento

Related documents

Products

Support

View/Open - Sacramento

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib