Biometric Identification Infer-ability MICHAEL GIBBONS, SUNG-HYUK CHA, CHARLES C. TAPPERT AND SUNG SOO YOON Pace University School of Computer Science and Information Systems Pleasantville, NY 10570 mgibbons@gmail.com, scha@pace.edu, ctappert@pace.edu and ssyoon@csai.yonsei.ac.kr Abstract. This paper concerns the infer-ability of biometric identification in large scale and open systems. Although many researchers have claimed identification accuracies of 94% or higher, their studies are based on closed systems consisting of a few hundred or thousand subjects. It is important, therefore, to consider what happens to these systems as more unknowns are added. Our hypothesis is that these systems do not generalize well as the population increases. To support this hypothesis we study two biometric databases, writer and iris data, using the Support Vector Machine pattern classifier. Key words: Biometric identification, Writer identification, Iris identification, Support Vector Machines Abbreviations: SVM, Support Vector Machines; RBF, Radial Basis Function 1 Introduction Biometric applications are becoming more common and acceptable in today’s society. Technology continues to improve, providing faster processors, smaller sensors and cheaper materials, all of which are contributing to better, affordable biometric applications. A common use of biometrics is for security and access control, such as fingerprint verification for laptop protection or voter registration. Another use of biometrics is identification, which is the focus of this paper. Identification can be applied in a closed system such as employee identification or in an open system such as a national ID system. There have been very promising results reported for closed identification systems. Accuracies from 94% to 98% have been reported in writer and iris identification studies [4, 5, 6], which may lead to false hopes for applying the same methods to very large or open systems. This paper will determine the infer-ability of biometric identification. Our hypothesis is that the methods used for closed systems are sufficient only for the closed systems and do not generalize well to larger and open systems. Two biometric databases, one consisting of writer data, the other of iris data, are used to support this hypothesis. In section 2, identification models will be explained. In section 3, the biometric databases used in this paper will be described. In section 4, a high-level overview of the Support Vector Machines pattern classification technique is presented. In section 5, the statistical experiment to support the hypothesis is described and observations presented. In section 6, we conclude with a summary and considerations for future work. 2 Identification models In this study the focus is on the identification model. There are two identification models: positive and negative identification [8]. Positive identification refers to determining that a given individual is in a member database. This is an n-class classification problem – that is, given N subjects and an unknown sample ni belonging to N, can ni be classified correctly? The negative identification model refers to determining that a subject is not in some negative database. For example, given a negative database of N subjects and an unknown subject u, is u one of the N or some other subject not part of the database? An application of this model is a most-wanted list. As will be discussed in section 4, both the positive and negative models are used in our experiments. 3 Biometric databases Two biometric databases are used to support our hypothesis in this study. The writer and iris databases are described in the next two subsections. 3.1 Writer database In a previous study [3], Cha et al considered the individuality of handwriting. In [3], a database was constructed containing handwriting samples of 841 subjects. The subjects are representative of the United States population. Each subject copied a source document three times. This document containing 156 words was carefully constructed to have each letter of the alphabet used in the start (both upper and lower case), middle and end of a word (lower case). Each document was digitized and features were extracted. Features were extracted at document, word and character level. For the purposes of this study, we will focus only on the document features (the best results obtained in [3] were with document level features). The document level features extracted are: entropy, threshold, number of black pixels, number of exterior contours, number of interior contours, slant, height, horizontal slope, vertical slope, negative slope and positive slope. For a detailed explanation of these features, please see [3]. The database described above (and in more detail in [3]) was used in this study. 3.2 Iris database The iris database contains 10 left bare eye samples of 52 subjects. In comparison to the writer database, the iris database suffers in terms of the number of subjects, but gains from the larger number of samples per subject. This will allow for more samples to be trained. After the images were acquired, the images were segmented as part of pre-processing. The segmentation provided a normalized rectangular sample of the iris. The feature extraction process used 2-D multi-level wavelet transforms. For this experiment, 3 levels were used producing a total of 12 parts. The 12 parts produce 12 feature vectors consisting of the coefficients from the wavelet transform. The mean and variance of each vector was obtained producing a total of 24 features for each sample. For more information on the 2-D wavelet transforms used, see [9]. 4 Support Vector Machines In the field of pattern classification, there are a number of classifiers to choose from: Artificial Neural Networks, Nearest Neighbor, and variations including k-Nearest Neighbor and Weighted k-Nearest Neighbor. However, the classifier that has gained popularity in recent years is the Support Vector Machines (SVM) classifier. The objective of the SVM classifier is to separate data with a maximal margin. Obtaining the maximal margin results in a better generalization of the data. Generalization helps in a common classification problem of over-fitting. This is a one of the main reasons SVM has gained so much attention in recent years. The points that lie on the planes that separate the data are the support vectors. Finding the support vectors requires solving the following optimization problem: T min 12 w w C i l i 1 w,b , T sub. to: yi ( w ( x i ) b) 1 i , i (1) 0 The methods for solving this problem are outside the scope of this paper. For the curious reader, refer to the following publications for more information: [1, 2]. The geometric representation of the SVM is easily visualized when the data falls into the linear separable and linear non-separable cases. However, real world data tends to fall into the non-linear separable case. To solve this problem, SVM rely on pre-processing the data to represent patterns in a higher dimension than the original data/feature set. The functions which provide the mapping to higher dimensions are known as phi functions or kernels. Common kernels include Radial Basis Function (RBF), linear, polynomial and sigmoid. The RBF kernel will be used for this study and additional information on this kernel will follow in section 5. 5 Statistical experiment 5.1 Experiment setup Our hypothesis is that biometric identification on closed systems does not generalize well when applied to very large and open systems. In order to support this hypothesis experiments were conducted with data from both the writer and iris databases. For each of the databases, training sets were created. Training sets for the writer data consisted of 50, 100, 200, 400 and 700 subjects. Training sets for the iris data consisted of 5, 15, 25, 35 and 45. These sets included all instances per subject, i.e., 3 per subject for writer and 10 per subject for iris. A SVM was trained on these known subjects. Parameter tuning, or SVM optimization, was performed prior to training. The first parameter is the penalty parameter C from equation (1), and depending on the kernel used, there are additional parameters to tune. For this experiment we used the RBF kernel which has the form: 2 K ( xi, xj ) e || xi xj || , 0 (2) The parameter in equation (2) is the only kernel parameter requiring tuning. A grid-search method as defined in [7] was used to optimize these parameters. Tuning the parameters gives 100% accuracies on the known training subjects. The next step is to add testing unknowns to the evaluation set to determine the accuracy of the trained support vector machines. Therefore, for each training set we created a combined evaluation sets consisting of the trained subjects (that classify with 100% accuracy) plus an increasing number of unknowns. For example, the evaluation sets for the 50-writer trained SVM consisted of 50, 100, 200, 400, 700 and 841 subjects, and the evaluation sets for the 25-iris trained SVM consisted of 25, 35, 45 and 52 subjects. In terms of the identification models of section 2, this experiment combines the positive identification model with the negative identification model. The positive identification model is used to verify the train subjects are identified correctly. Then, due to low number of samples per subject, the exact samples used in training are also used in testing. However, the samples from subjects not used in training are all unknowns. Classifying these unknowns is an example of the negative identification model, where we expect all unknowns to be classified as not matching one of the trained samples, i.e., not on the most-wanted list. If one of the samples is classified as matching a trained sample, it will be a false positive. Since we guarantee 100% accuracy on the positive identification task, we have eliminated false positives and false negatives from the training samples. 5.2 Results and analysis The results of the writer experiment are presented in figure 1. Fig. 1. Experiment results for writer data As expected, for each curve (training set), as the number of unknowns increases, the accuracy monotonically decreases (or equivalently, the error increases). However, the rate at which the accuracy decreases appears to converge. To ensure that this is not due to the particular handwriting data used, we obtained experiment results on the iris data as presented in figure 2. Fig. 2. Experiment results for iris data Figure 2 does seem to support the convergence theory. Clearly, if there were more subjects, the evidence would be stronger for iris. If the convergence theory holds true, we should be able to estimate the accuracy attainable for very large systems by finding an equation that fits these points. From figures 1 and 2 we recognize that the curves are of exponential form. After some fitting trials, we find the curve most similar to be: y ae 1 (( x b ) / c ) 2 d (3) In equation (3), the constant b is the number of known trained samples; the constants a, c, and d vary based on b; and the accuracy converges to d. Figure 3 displays the curve fitting for the 50 and 200 trained writer data samples. Fig. 3. Curve fitting for 50 and 200 writer data 6 Conclusion In this paper, we showed that although high accuracies have been obtained for closed biometric identification problems, they do not appear to generalize well to the open system problem. This hypothesis was validated by experiments on two biometric databases: writer and iris. Furthermore, the expected errors can be projected based on the asymptote of an exponential curve. We feel these error rates are fairly large and should be taken into account when designing biometric systems for screening processes – for example, screening passengers at an airport. 6.1 Future work These experiments should be further validated by testing against larger biometric databases. Also, additional forms of biometrics, for example, fingerprint, face or hand geometry should be tested. In these experiments we used a state-of-the-art in pattern classifier (Support Vector Machines with the RBF kernel). Future work might include comparison tests using different kernels, as well as comparing different pattern classifiers. References [1] Edgar E Osuna, Robert Freund and Federico Girosi. Support Vector Machines: Training and Applications. MIT Artificial Intelligence Laboratory and Center for Biological and Computational Learning Department of Brain and Cognitive Sciences. A.I. Memo No 1602, C.B.C.L. Paper No 144, 1997. [2] Christopher J.C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. 2:121-167, 1998. [3] Sargur N. Srihari, Sung-Hyuk Cha, Hina Arora and Sangjik Lee. Individuality of Handwriting. Journal of Forensic Science. 47(4), July 2002, pp. 1-17. [4] Sung-Hyuk Cha and Sargur N. Srihari. Writer Identification: Statistical Analysis and Dichotomizer. Proceedings International Workshop on Structural and Syntactical Pattern Recognition (SSPR 2000), Alicante, Spain, August/September 2000, pp. 123- 132. [5] Andreas Schlapbach and Horst Bunke. Off-line Handwriting Identification Using HMM Based Recognizers. Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2. August 23 - 26, 2004. Cambridge UK. pp. 654-658. [6] Emine Krichen, M. Anouar Mellakh, Sonia Garcia-Salicetti and Bernadette Dorizzi. Iris Identification Using Wavelet Packets. Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4. August 23 - 26, 2004. pp. 335-338. [7] Chih-Wei Hsu, Chih-Chung Chang and Chih-jen Len. A Practical Guide to Support Vector Classification. [8] Ruud M. Bolle, et al. Guide to Biometric. Springer. 2004. [9] Brendon J. Woodford, Da Deng and George L Benwell. A Wavelet-based Neuro-fuzzy System forData Mining Small Image Sets.