gibbons - Seidenberg School of Computer Science and

advertisement
Biometric Identification Infer-ability
MICHAEL GIBBONS, SUNG-HYUK CHA, CHARLES C. TAPPERT AND SUNG SOO YOON
Pace University
School of Computer Science and Information Systems
Pleasantville, NY 10570
mgibbons@gmail.com, scha@pace.edu, ctappert@pace.edu and ssyoon@csai.yonsei.ac.kr
Abstract. This paper concerns the infer-ability of biometric identification in large scale and open systems. Although many
researchers have claimed identification accuracies of 94% or higher, their studies are based on closed systems consisting of
a few hundred or thousand subjects. It is important, therefore, to consider what happens to these systems as more
unknowns are added. Our hypothesis is that these systems do not generalize well as the population increases. To support
this hypothesis we study two biometric databases, writer and iris data, using the Support Vector Machine pattern classifier.
Key words: Biometric identification, Writer identification, Iris identification, Support Vector Machines
Abbreviations: SVM, Support Vector Machines; RBF, Radial Basis Function
1 Introduction
Biometric applications are becoming more common and acceptable in today’s society. Technology
continues to improve, providing faster processors, smaller sensors and cheaper materials, all of which
are contributing to better, affordable biometric applications. A common use of biometrics is for
security and access control, such as fingerprint verification for laptop protection or voter registration.
Another use of biometrics is identification, which is the focus of this paper. Identification can be
applied in a closed system such as employee identification or in an open system such as a national ID
system.
There have been very promising results reported for closed identification systems. Accuracies
from 94% to 98% have been reported in writer and iris identification studies [4, 5, 6], which may lead
to false hopes for applying the same methods to very large or open systems. This paper will determine
the infer-ability of biometric identification. Our hypothesis is that the methods used for closed systems
are sufficient only for the closed systems and do not generalize well to larger and open systems. Two
biometric databases, one consisting of writer data, the other of iris data, are used to support this
hypothesis.
In section 2, identification models will be explained. In section 3, the biometric databases used
in this paper will be described. In section 4, a high-level overview of the Support Vector Machines
pattern classification technique is presented. In section 5, the statistical experiment to support the
hypothesis is described and observations presented. In section 6, we conclude with a summary and
considerations for future work.
2 Identification models
In this study the focus is on the identification model. There are two identification models: positive and
negative identification [8]. Positive identification refers to determining that a given individual is in a
member database. This is an n-class classification problem – that is, given N subjects and an unknown
sample ni belonging to N, can ni be classified correctly?
The negative identification model refers to determining that a subject is not in some negative
database. For example, given a negative database of N subjects and an unknown subject u, is u one of
the N or some other subject not part of the database? An application of this model is a most-wanted
list. As will be discussed in section 4, both the positive and negative models are used in our
experiments.
3 Biometric databases
Two biometric databases are used to support our hypothesis in this study. The writer and iris databases
are described in the next two subsections.
3.1 Writer database
In a previous study [3], Cha et al considered the individuality of handwriting. In [3], a database was
constructed containing handwriting samples of 841 subjects. The subjects are representative of the
United States population. Each subject copied a source document three times. This document
containing 156 words was carefully constructed to have each letter of the alphabet used in the start
(both upper and lower case), middle and end of a word (lower case).
Each document was digitized and features were extracted. Features were extracted at
document, word and character level. For the purposes of this study, we will focus only on the
document features (the best results obtained in [3] were with document level features).
The document level features extracted are: entropy, threshold, number of black pixels, number
of exterior contours, number of interior contours, slant, height, horizontal slope, vertical slope,
negative slope and positive slope. For a detailed explanation of these features, please see [3].
The database described above (and in more detail in [3]) was used in this study.
3.2 Iris database
The iris database contains 10 left bare eye samples of 52 subjects. In comparison to the writer
database, the iris database suffers in terms of the number of subjects, but gains from the larger number
of samples per subject. This will allow for more samples to be trained.
After the images were acquired, the images were segmented as part of pre-processing. The
segmentation provided a normalized rectangular sample of the iris. The feature extraction process
used 2-D multi-level wavelet transforms. For this experiment, 3 levels were used producing a total of
12 parts. The 12 parts produce 12 feature vectors consisting of the coefficients from the wavelet
transform. The mean and variance of each vector was obtained producing a total of 24 features for
each sample. For more information on the 2-D wavelet transforms used, see [9].
4 Support Vector Machines
In the field of pattern classification, there are a number of classifiers to choose from: Artificial Neural
Networks, Nearest Neighbor, and variations including k-Nearest Neighbor and Weighted k-Nearest
Neighbor. However, the classifier that has gained popularity in recent years is the Support Vector
Machines (SVM) classifier.
The objective of the SVM classifier is to separate data with a maximal margin. Obtaining the
maximal margin results in a better generalization of the data. Generalization helps in a common
classification problem of over-fitting. This is a one of the main reasons SVM has gained so much
attention in recent years.
The points that lie on the planes that separate the data are the support vectors. Finding the
support vectors requires solving the following optimization problem:
T
min 12 w w C  i
l
i 1
w,b ,
T
sub. to: yi ( w  ( x i )  b)  1   i ,

i
(1)
0
The methods for solving this problem are outside the scope of this paper. For the curious reader, refer
to the following publications for more information: [1, 2].
The geometric representation of the SVM is easily visualized when the data falls into the linear
separable and linear non-separable cases. However, real world data tends to fall into the non-linear
separable case. To solve this problem, SVM rely on pre-processing the data to represent patterns in a
higher dimension than the original data/feature set. The functions which provide the mapping to
higher dimensions are known as phi functions or kernels. Common kernels include Radial Basis
Function (RBF), linear, polynomial and sigmoid. The RBF kernel will be used for this study and
additional information on this kernel will follow in section 5.
5 Statistical experiment
5.1 Experiment setup
Our hypothesis is that biometric identification on closed systems does not generalize well when
applied to very large and open systems. In order to support this hypothesis experiments were
conducted with data from both the writer and iris databases.
For each of the databases, training sets were created. Training sets for the writer data consisted
of 50, 100, 200, 400 and 700 subjects. Training sets for the iris data consisted of 5, 15, 25, 35 and 45.
These sets included all instances per subject, i.e., 3 per subject for writer and 10 per subject for iris. A
SVM was trained on these known subjects.
Parameter tuning, or SVM optimization, was performed prior to training. The first parameter is
the penalty parameter C from equation (1), and depending on the kernel used, there are additional
parameters to tune. For this experiment we used the RBF kernel which has the form:
2
K ( xi, xj )  e  || xi  xj || ,   0
(2)
The  parameter in equation (2) is the only kernel parameter requiring tuning. A grid-search method
as defined in [7] was used to optimize these parameters.
Tuning the parameters gives 100% accuracies on the known training subjects. The next step is
to add testing unknowns to the evaluation set to determine the accuracy of the trained support vector
machines. Therefore, for each training set we created a combined evaluation sets consisting of the
trained subjects (that classify with 100% accuracy) plus an increasing number of unknowns. For
example, the evaluation sets for the 50-writer trained SVM consisted of 50, 100, 200, 400, 700 and
841 subjects, and the evaluation sets for the 25-iris trained SVM consisted of 25, 35, 45 and 52
subjects.
In terms of the identification models of section 2, this experiment combines the positive
identification model with the negative identification model. The positive identification model is used
to verify the train subjects are identified correctly. Then, due to low number of samples per subject,
the exact samples used in training are also used in testing. However, the samples from subjects not
used in training are all unknowns. Classifying these unknowns is an example of the negative
identification model, where we expect all unknowns to be classified as not matching one of the trained
samples, i.e., not on the most-wanted list. If one of the samples is classified as matching a trained
sample, it will be a false positive. Since we guarantee 100% accuracy on the positive identification
task, we have eliminated false positives and false negatives from the training samples.
5.2 Results and analysis
The results of the writer experiment are presented in figure 1.
Fig. 1. Experiment results for writer data
As expected, for each curve (training set), as the number of unknowns increases, the accuracy
monotonically decreases (or equivalently, the error increases). However, the rate at which the
accuracy decreases appears to converge. To ensure that this is not due to the particular handwriting
data used, we obtained experiment results on the iris data as presented in figure 2.
Fig. 2. Experiment results for iris data
Figure 2 does seem to support the convergence theory. Clearly, if there were more subjects, the
evidence would be stronger for iris. If the convergence theory holds true, we should be able to
estimate the accuracy attainable for very large systems by finding an equation that fits these points.
From figures 1 and 2 we recognize that the curves are of exponential form. After some fitting trials,
we find the curve most similar to be:
y  ae
1
(( x b ) / c ) 2
 d (3)
In equation (3), the constant b is the number of known trained samples; the constants a, c, and d vary
based on b; and the accuracy converges to d. Figure 3 displays the curve fitting for the 50 and 200
trained writer data samples.
Fig. 3. Curve fitting for 50 and 200 writer data
6 Conclusion
In this paper, we showed that although high accuracies have been obtained for closed biometric
identification problems, they do not appear to generalize well to the open system problem. This
hypothesis was validated by experiments on two biometric databases: writer and iris. Furthermore, the
expected errors can be projected based on the asymptote of an exponential curve. We feel these error
rates are fairly large and should be taken into account when designing biometric systems for screening
processes – for example, screening passengers at an airport.
6.1 Future work
These experiments should be further validated by testing against larger biometric databases. Also,
additional forms of biometrics, for example, fingerprint, face or hand geometry should be tested. In
these experiments we used a state-of-the-art in pattern classifier (Support Vector Machines with the
RBF kernel). Future work might include comparison tests using different kernels, as well as
comparing different pattern classifiers.
References
[1] Edgar E Osuna, Robert Freund and Federico Girosi. Support Vector Machines: Training and Applications. MIT Artificial
Intelligence Laboratory and Center for Biological and Computational Learning Department of Brain and Cognitive Sciences. A.I. Memo
No 1602, C.B.C.L. Paper No 144, 1997.
[2] Christopher J.C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery.
2:121-167, 1998.
[3] Sargur N. Srihari, Sung-Hyuk Cha, Hina Arora and Sangjik Lee. Individuality of Handwriting. Journal of Forensic Science. 47(4),
July 2002, pp. 1-17.
[4] Sung-Hyuk Cha and Sargur N. Srihari. Writer Identification: Statistical Analysis and Dichotomizer. Proceedings International
Workshop on Structural and Syntactical Pattern Recognition (SSPR 2000), Alicante, Spain, August/September 2000, pp. 123- 132.
[5] Andreas Schlapbach and Horst Bunke. Off-line Handwriting Identification Using HMM Based Recognizers. Pattern Recognition,
17th International Conference on (ICPR'04) Volume 2. August 23 - 26, 2004. Cambridge UK. pp. 654-658.
[6] Emine Krichen, M. Anouar Mellakh, Sonia Garcia-Salicetti and Bernadette Dorizzi. Iris Identification Using Wavelet Packets.
Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4. August 23 - 26, 2004. pp. 335-338.
[7] Chih-Wei Hsu, Chih-Chung Chang and Chih-jen Len. A Practical Guide to Support Vector Classification.
[8] Ruud M. Bolle, et al. Guide to Biometric. Springer. 2004.
[9] Brendon J. Woodford, Da Deng and George L Benwell. A Wavelet-based Neuro-fuzzy System forData Mining Small Image Sets.
Download