CIHSPS 2005 – IEEE International Conference on Computational Intelligence for Homeland Security and Personal Safety Orlando, FL, USA, 31 March – 1 April 2005 Spoof-Proofing Fingerprint Systems Using Evolutionary Time-Delay Neural Networks Reza Derakhshani Department of Computer Science and Electrical Engineering School of Computing and Engineering University of Missouri, Kansas City 5100 Rockhill Road Kansas City, Missouri 64110-2499 E-mail: reza@umkc.edu Abstract – With the new wave of affordable, small, and easy to use scanners, fingerprint-based biometric systems have been receiving an increasing attention. However, a major security concern is the possibility of intrusion by presenting a nonliving finger, be it a duplicate or a severed finger, to an electronic fingerprint scanner in order to gain access to a protected entity. It has been shown that one can spoof fingerprint scanners even with play-doh® or gummy fingers. In order to circumvent this problem, one can read signals from the presented finger to verify its liveness and thus eliminate the threat of synthesized or cadaver finger attacks. Earlier research has shown that the process of perspiration on the live fingertip skin presents a specific time progression that cannot be seen in cadaver or synthetic fingerprint scans, and thus phenomenon can be used as a measure of fingerprint liveness. However, the perspiration process demonstrates itself differently on different scanning technologies and thus a scanner-specific approach is needed. In this paper, a new general evolutionary temporal neural network (GETnet) for perspiration-based liveness detection is proposed. It is shown that GETnet can arrive at a succinct solution that performs both feature extraction and classification on the raw fingerprint ridge signals. With the given variety of fingerprint scanners as well as the diversity of their operating conditions, including climate and user demographics, it is more efficient to automatically breed customized solutions for the perspiration-based fingerprint liveness detection through a general framework such as GETnet instead of tailoring feature extractors and classifiers to each and every different scenario. Keywords – Biometrics, Liveness Detection, Time Delay Neural Networks, Evolving Neural Networks, Evolutionary Algorithms, Sequence Analysis, Intelligent Signal Processing, Pattern Recognition. I. INTRODUCTION Biometrics is the science of identifying or verifying individuals’ identities based on their unique physiological traits [1]. There has been a growing interest in biometric applications in recent years. With the new wave of affordable, small, and easy to use scanners, fingerprints have been receiving more attention compared to other biometric modalities. However, the security of a fingerprint-based biometric system can be compromised by presenting a nonliving finger (duplicate or a severed) that can yield an image close enough to the one registered by the biometric system. This is also known as a spoof attack. As long as the spoof finger delivers the same original pattern, biometric systems will accept the spoof as a legitimate claimant. Different scenarios for spoof attacks continue to be popularized by the mass media. Unfortunately, it has been shown that this threat is real [2] and one can spoof a fingerprint scanner even with play-doh® [3] or gummy fingers [4], [5] with relative ease. For synthetic fingers, casts can be formed either by direct finger impression or from lifted latent fingerprints using photographic etching techniques. Needless to say, the worst-case scenario would be using the victim’s severed finger. In order to circumvent this problem, one can read signals from the finger in order to verify its liveness and thus eliminate the threat of synthetic or cadaver finger attacks. However, reading the more obvious signs of life such as those obtained from electrocardiograms [6] and pulse oximetry requires extra hardware which can be bulky and expensive. Another approach is to read signs of life from the background data that is already captured by the biometric sensor. This method of anti-spoofing, though subtler, is more attractive since it can be applied to the existing biometric hardware in form of a mere software upgrade. Our earlier research has shown that the process of perspiration on a live fingertip skin can be seen in the consecutive captures of an electronic scanner within the first few seconds of the finger contact [7]. The ongoing perspiration of a live finger presents a specific time progression that cannot be seen in cadaver or synthetic fingerprint scans (Fig. 1). This phenomenon has been used as the basis for a software-based fingerprint anti-spoofing algorithm [8] described below. The perspiration detection algorithm uses two gray scale fingerprints, one captured upon contact of the finger with the scanner and another after 5 seconds. After preprocessing, the fingerprint ridges are extracted and concatenated. These concatenated ridges are traversed and their gray levels recorded to obtain a ridge-signal that reflects moisture levels of each fingerprint as the moister ridge areas are darker (Fig. 2). Features are derived from these two scans’ ridge signals. Fig. 1. Temporal progression of perspiration as a sign of life. Left column, top: a live finger scanned upon contact; left column, bottom: the same fingerprint after 5 seconds. Middle column, top: play-doh® duplicate of the same finger scanned upon contact; middle column, bottom: the same fingerprint after 5 seconds. Right column, top: a cadaver finger scanned upon contact; left column, bottom: the same fingerprint after 5 seconds. Note the perspiration spots around the pores in the initial live fingerprint scan and the specific changes in the second live fingerprint scan compared to the relatively unchanging scans from the synthetic and cadaver fingers. These features contain information about the dispersion of perspiration from pores towards the drier areas of the fingertip, which are subsequently fed to a classifier for final decision on the liveness of the presented finger. II. A NEW APPROACH IN CLASSIFICATION In this section we justify the need for the suggested new approach for feature extraction and classification of the fingerprint ridge signals for the liveness detection task. 1. It has been shown that the perspiration based liveness detection algorithm, which was originally developed to circumvent spoof finger attacks on capacitive CMOS scanners, is applicable to other fingerprint capturing technologies such as optical and electro-optical scanners with a varying degree of success [9]. However, the original feature set of the perspiration based liveness detection algorithm yields different values for different scanning technologies and thus a scanner-specific approach might be needed [10], [11]. That is, the perspiration detection algorithm should be customized for different capturing technologies. With the given variety of available scanners and operating conditions, such as climate and demographics, it is more efficient to solve this problem through a general framework. 2. Temporal neural networks can easily handle the inputoutput format of this problem. Earlier we showed how one can convert fingerprints to ridge signals. The target of the liveness detection task can also be considered as a bivalued liveness signal. This signal will assume its high value for live fingerprint ridge signals, and will lapse into its low value otherwise. This representation is a two-channel to one channel sequence mapping problem, which is an ideal form for temporal neutral networks. This approach removes the Fig. 2. Extraction of ridge signals from a live fingerprint is demonstrated on a magnified subsection of a fingerprint. Right: the black trace on the gray ridges denotes the path traversed to record the gray levels which are proportional to the ridge moisture. Left: the solid plot demonstrates the gray level signal from the initial capture’s concatenated ridges. The dashed plot depicts the same after 5 seconds. Horizontal axis shows the traversed ridge pixels and the vertical axis shows their gray level. need for features extraction from the raw ridge signals, which is a challenging and more or less ad-hoc task since it is hard to find a near optimal set of features for a rather vague physiological phenomenon such as perspiration [12]. Based upon the above, a novel general evolutionary temporal neural network (GETnet) [13], [14] is suggested for feature extraction and classification of the fingerprint ridge signals. GETnet is a comprehensive algorithm for evolving general nonlinear neural networks with distributed feed forward and recurrent delay paths. Such networks can represent arbitrary dynamic systems [15], [16] and are at least as powerful as Turing machines [17]. GETnet finds the topology, size, memory depth, and synaptic connection strengths of the sought neural network through a hybrid system of deterministic and stochastic searches in weight, delay, and architecture spaces. GETnet also introduces a novel and pragmatic time-based regularization mechanism in order to achieve minimum description length (MDL) solutions to address the bias-variance dilemma and achieve better generalization with smaller data sets. III. EVOLVING THE NEURAL NET CLASSIFIER The following summarizes GETnet’s algorithm. This description intentionally avoids going through the details, since the focus of this paper is the biometric application of GETnet. First, GETnet randomly generates a population of temporal neural networks. Each neuron in a network is connected either to itself or to other neurons with single or multiple branches. Each connection branch has a specific weight and delay. These connections can be either feed forward or feeding back into any node. Minimum trivial heuristics are used to ensure functionality of the networks, such as each network and its nodes should have their input(s) and output(s) connected to somewhere, and that zero-delay Fig. 4. ROC curve for the 30-point test data. Fig. 3. GETnet’s flowchart. The names of main modules are italicized, and the output of each stage appears after a colon. loops should be avoided. Once the functionality is checked, each neural network is trained using scaled conjugate gradient method [18] on the training dataset with a time that is limited by the actual execution time of the smaller networks in each generation. This partial, time limited training will favor faster, more compact networks since they can train for more epochs during the given time limit. This race against the time results in a minimum description length (MDL) that ensures fastest performance on the hosting hardware (i.e. temporal MDL). Evolutionary pressure on the network size through temporal MDL results in compact evolved neural networks with minimum variance entailing good generalization capabilities. The fitness of each individual (i.e. temporal neural network) in a generation is calculated as the inverse of its mean squared error on the unseen validation data after partial training. The fitness score is evaluated multiple times and averaged to include the effect of multiple starting points within the given span of inherited Fig. 5. Spoof, live, and cadaver concatenated training data. Top, dark grey: first capture ridge signal; top, light grey: last capture ridge signal. Bottom: high signal denotes the span of the ridge signal extracted from live fingers. weights (see mutation description below for more explanations). The better networks are given a higher chance to parent the next generation based on their fitness values. Parents are then mutated to form the offspring. Mutation acts upon general strategy variables, connecting branches (delay and weight perturbation), and the number of connections and nodes (network sparsity and size) in form of Gaussian perturbations. Delay perturbations adjust the distributed memory of each network. In order to take advantage of the partial trainings in a lineage, weights are transferred from parent to offspring. However, to avoid local minima, noise is added to the acquired weights (non-exact knowledge transfer from parent to offspring). The magnitude of this noise is a strategy variable itself and subject to evolution. This adaptive process creates room for each new generation to explore the performance landscape around its inherited starting point in the weight space. It can be shown that the effect of adding noise to weights is similar to adding noise to the target values Fig. 6 Size of evolving networks (number of connections). Fig. 8. Training output, best evolved network. Fig. 7. MSE of evolving networks. Fig. 9. Sample live test data output, best evolved network. which can further improves generalization and convergence [19]. Crossover is not used in GETnet since in practice it can destroy the distributed knowledge that is spread throughout neural networks [20]. During the deleting mutations, chained dependencies are taken into account in order to calculate the overall effects of deletions and avoid disruptive ones such as removing a network’s output path, if possible. These smooth mutations reduce the noise in evolutionary assessment of the free parameters. The evolution cycle continues until the required precision is reached. After finishing the evolutionary loop, the last generation of networks is fully trained and the best network output as well as the average outputs of all the survivors in the last generation are produced. The latter creates a committee of networks that might yield a lower error in case of independence of errors in the population. A flowchart of GETnet is given in Fig. 3. GETnet is more flexible and comprehensive than the comparable temporal neural network paradigms such as TDNN [21], FIRnet [22], Elman [23], Jordan [24], PRNN [25], and NARMA [26]. In contrast to GETnet, all the mentioned networks need human experts to determine their memory and network structures as well as the other learning parameters. This is also known as the baby-sitting problem [27], which entails the lack of an automated mechanism to determine the minimum required network size, an essential issue in generalization. Furthermore, none of the above paradigms offer an arbitrary distributed memory structure comprised of recurrent nodes and sub-circuits as well as feedforward delay lines of variable degrees as GETnet does. Fig. 10. Sample cadaver test data output, best evolved network. Fig. 12. Best evolved network for the fingerprint liveness detection. Note the novel structure with multiple feedback paths. The number beside each path indicates the number of its parallel connections, each with its own weight and delay. signals (-1 for non-living and +1 for live) were used to tag the target classes. V. RESULTS Below are the results generated by GETnet for our spoof detection problem. All standard deviations refer to Gaussian perturbations. Please see figures 4 through 12 for further details. Fig. 11. Sample spoof test data output, best evolved network. IV. DATA AND SIMULATION SETTINGS The following simulation demonstrates GETnet’s ability to evolve appropriate compact networks for the fingerprint liveness detection. The following settings were used: 1-Training data: 4 spoof (play-doh®), 8 live, and 4 cadaver fingerprints. Each sample contains only 150 pixels of the corresponding fingerprint ridge signal. 2-Validation (early stopping) data: same composition as in the training set, but each passage is only 50 pixels long. 3-Test data: 10 samples from each category (live, cadaver, play-doh® spoof). Full-length ridge signals were used. Given the nominal full length of 3000 to 5000 pixels for the fingerprint ridge signals, the total size of training and validation datasets used for training and evolution (200 pixels) constitute less than 10 percent of each sample. Bipolar 1- Evolution start, original ancestor of the best-evolved network: Total number of network branches = 95. Standard deviation for perturbing the number of network nodes = 0.0460. Standard deviation for perturbing the number of network interconnections = 0.0585. Standard deviation for perturbing the connections’ delay line depths = 0.0816. Training and validation time (mean of multiple starts) = 314.5720 sec. 2- Evolution end, best evolved network after 15 generations: Total number of network branches = 59. Standard deviation for perturbing the number of network nodes = 0.0041. Table 1 Test outputs for live subjects. Incorrect results are italicized. Subject LivTst1 LivTst2 LivTst3 LivTst4 LivTst5 LivTst6 LivTst7 LivTst8 LivTst9 LivTst10 Best Net Output 0.7596 0.8251 0.6708 0.2159 0.6020 0.7852 -0.6284 -0.2923 -0.4239 0.5883 Committee Output 0.7701 0.8332 0.6771 0.3300 0.6617 0.7893 -0.5962 -0.2121 -0.3580 0.5769 Table 4 Confusion matrix for the test set. Threshold for output is zero. Liveness 1 1 1 1 1 1 -1 -1 -1 1 Table 2 Test outputs for cadaver subjects. Incorrect results are italicized. Subject Committee Output -0.6383 0.7149 Liveness CdvTst1 CdvTst2 Best Net Output -0.6553 0.6934 CdvTst3 CdvTst4 CdvTst5 CdvTst6 CdvTst7 CdvTst8 CdvTst9 CdvTst10 -0.1882 -0.1677 -0.7048 -0.4266 -0.6014 -0.6168 -0.6919 -0.5695 -0.0134 -0.1583 -0.6803 -0.3572 -0.5773 -0.5845 -0.6822 -0.5621 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 Table 3 Test outputs for spoof subjects. Incorrect results are italicized. Subject SpfTst1 SpfTst2 SpfTst3 SpfTst4 SpfTst5 SpfTst6 SpfTst7 SpfTst8 SpfTst9 SpfTst10 Best Net Output -0.5984 -0.6311 -0.6563 -0.6493 -0.3486 -0.0479 0.0229 -0.2592 -0.4689 -0.2772 Committee Output -0.5494 -0.6057 -0.6423 -0.6250 -0.1599 -0.1518 0.0309 -0.3232 -0.4283 -0.2025 Liveness -1 -1 -1 -1 -1 -1 1 -1 -1 -1 Standard deviation for perturbing the number of network interconnections = 0.0230. Standard deviation for perturbing connections’ delay line depths = 0.0028. Training and validation time (mean of multiple starts) =151.5210 sec. A summary of the test results is given in the tables 1 through 4. As it can be seen, 3 live claimants were falsely recognized as nonliving, whereas one out of ten for each spoof and cadaver test datasets was falsely recognized as live. The overall precision is therefore (30-3-1-1)/30=83.3%. The scalar output values in the tables were calculated as the normalized area under each output signal yi Output i = ∫ y i (τ ) d τ length ( y i ) (1) The liveness results were determined as Livenessi = hard _ threshold (Output i ) (2) Hard limiting threshold function returns –1 for nonliving (Outputi≤0) and 1 for living (Outputi>0) for the final classification result. The best evolved network is depicted in Fig. 12. VI. DISCUSSION AND CONCLUSION It was shown that one can evolve a general temporal neural network and arrive at a succinct solution for the fingerprint vitality detection problem. The demonstrated solution not only classifies, but also performs feature extraction by feeding the raw fingerprint ridge signals to a recurrent time delay neural network with only four neurons. The assigned task is the detection of a valid ongoing perspiration pattern in order to separate live fingerprints from the nonliving to circumvent the spoof attacks on fingerprintbased biometric systems. During our simulation, a solution was evolved using 16 sample fingerprints. The evolving networks were exposed only to less than %10 of each fingerprint’s ridge signal. The resulting 4-neuron classifier and feature extractor obtained by using this scarce amount of training data shows the ability of GETnet’s temporal MDL to push the evolution towards a compact and fast-acting solution that can learn from small datasets due to its lower complexity. The compactness of the solution can also be partially credited to the ability of GETnet to evolve recurrent structures. As it can be seen from Fig. 12, GETnet’s evolved solution is nonstandard and novel compared to the existing comparable architectures. Such novel solutions are especially important for problems similar to the perspiration-based liveness detection, where there might be no standard starting point for either feature extraction or classification. As mentioned earlier, GETnet’s novel temporal MDL creates fast-acting networks. Such agility is considered to be a sign of intelligence by some researchers [28]. From the training and validation times, we see that the evolved network is more than twice faster than its original ancestor. This is what we desired by choosing an evolutionary pressure that reduces network complexity using the actual execution time on the hosting hardware for regularization. Simulations also show that during the course of evolution the standard deviations of mutations usually decrease [13]. This effect can be compared to simulated-annealing [29]. The interesting point is that this behavior emerges through evolution and is not coded into GETnet. In the end, it is instructive to note that the perceived external world, i.e. the mental image of the existence as captured by the sensory inputs of an individual, is initially conveyed through a series of multidimensional time signals. As human beings, we have a natural bias towards detecting specific patterns in these streams and ignore much of the remaining background information. Intelligent systems on the other hand can be evolved towards a different type of sensory acuity, and in this case perspiration detection by GETnet. REFERENCES [1] [2] [3] [4] [5] [6] [7] L. O’Gorman L, (1999), Biometrics, Personal Identification in Networked Society. Chapter 2-Fingerprint Verification. Jain A., Bolle R., and Pankanti S. Eds., Kluwer Academic Publishers, Dordrecht. L. Thalheim, and J. Krissler, “Body check: biometric access protection devices and their programs put to the test”, c’t magazine, November 2002. Schuckers S. A. C. (2002) “Spoofing and anti-spoofing measures,” Information Security Technical Report, Vol. 7, No. 4, pages 56-62. Matsumoto T., Matsumoto H., Yamada K., and Hoshino S. (2002) “Impact of Artificial ‘gummy’ Fingers on Fingerprint Systems”, Proceedings of SPIE, vol. 4677. van der Putte T., and Keuning J. (2000), “Biometrical Fingerprint Recognition: Don’t Get Your Fingers Burned,” in Proceedings of the Fourth Working Conference on Smart Card Research and Advanced Applications, Kluwer Academic Publishers, pp. 289-303. D. Osten, H. M. Carim, M. R. Arneson, B. L. Blan, “Biometric, personal authentication system”, Minnesota Mining and Manufacturing Company, U.S. Patent #5,719,950, February 17, 1998. Derakhshani R, Schuckers SAC, Hornak L, O’Gorman L, “Neural Network-Based Approach for Detection of Liveness in Fingerprint Scanners.” Proceedings of the International Conference on Artificial Intelligence, Las Vegas, NV. CSREA Press, pp. 1099-1105, 2001. [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] Derakhshani R, Schuckers S. A. C., Hornak L, and O’Gorman L (2003) “Determination of Vitality from a Non-Invasive Biomedical Measurement for Use in Fingerprint Scanners.” Pattern Recognition Journal, Vol. 36, No.2, pp. 383-396. Parthasaradhi S, Derakhshani R, Hornak L, Schuckers SAC, “TimeSeries Detection of Perspiration as a Liveness Test in Fingerprint Devices.” To appear in the IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews. Schuckers S. A. C., Derakhshani R, Parthasaradhi S, and Hornak L (2004) “Improvement of an Algorithm for Recognition of Liveness using Perspiration in Fingerprint Devices.” Proceedings of SPIE, Vol. 5404. Schuckers S. A. C., Parthasaradhi S., Derakhshani R., Hornak L. (2004) “Comparison of Classification Methods for Time-Series Detection of Perspiration as a Liveness Test in Fingerprint Devices,” Proceedings of the International Conference on Biometric Authentication, Hong Kong (Springer-Verlag LNCS series). Quinton P.M., (1983) “Sweating and Its Disorders,” Ann. Rev. Med. 34 429–452. Derakhshani R. “GETnet: A General Framework for Evolutionary Temporal Neural Networks.” To appear in the Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, Canada. Derakhshani, R (2004) “Biologically Inspired Temporal Evolutionary Neural Circuits,” PhD dissertation, West Virginia University. Seidl D.R. and Lorenz D., (1991) “A Structure by Which a Recurrent Neural Network can Approximate a Nonlinear Dynamic System,” Proc. Int. Joint Conf. Neural Networks, Vol. 2, pp. 709-714. Siegelmann H. T. and Sontag E. D., (1995) “On the Computational Power of Neural Networks,” J. Comput. Syst. Sci., vol. 50, no. 1, pp. 132-150. Siegelmann, H.T., Horne, B.G., Giles, C.L. (1997), “Computational Capabilities of Recurrent NARX Neural Networks” Systems, IEEE Transactions on Man and Cybernetics, Part B, Vol.27, Issue 2, pp 208215. Moller M. (1993) “A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning,” Neural Networks, 6:525-533. Jim K., Giles C.L., and Horne B.G. (1996) "An Analysis of Noise in Recurrent Neural Networks: Convergence and Generalization", IEEE Transactions on Neural Networks, Vol. 7, No. 6, pp. 1424-1439. Yao, X. (1999), “Evolving Artificial Neural Networks,” Proceedings of IEEE, September, 87 (9) pp. 1423-1447. Waibel, A.; Hanazawa, T.; Hinton, G.; Shikano, K.; Lang, K.J. (1989) “Phoneme Recognition Using Time-Delay Neural Networks,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol.37, Issue 3. pp 328-339. Wan E., (1993) "Time Series Prediction Using a Neural Network with Embedded Tapped Delay-Lines", in Predicting the Future and Understanding the Past, SFI Studies in the Science of Complexity, Eds. A. Weigend , N. Gershenfeld Addison-Wesley. Elman, J (1990) “Finding Structure in Time,”Cognitive Science, 14, pp 179-211. Jordan, M. (1986) “Serial order: A Parallel Distributed Processing Approach,” Institute for Cognitive Science Report 8604. University of California, San Diego. Haykin, S. and Liang Li (1995) “Nonlinear Adaptive Prediction of Nonstationary Signals,” IEEE Transactions on Signal Processing, [see also Acoustics, Speech, and Signal Processing, IEEE Transactions on], Vol.43, Issue 2, pp 526-535. Narendra, K.S.and Parthasarathy, K. (1990). “Identification and Control of Dynamical Systems Using Neural Networks” IEEE Transactions on Neural Networks, Vol.1, No.1, pp 4-27. Roy A. (2003), “Neural Networks: How Do We Make a Widely Used Technology Out of It?” IEEE NNS Connections, Vol. 1, No. 2, pp. 812. Kurzweil R (2000), The Age of Spiritual Machines: When Computers Exceed Human Intelligence, Penguin. Kirkpatrick S., Gelatt Jr. C. D., and Vecchi M. P. (1983), "Optimization by Simulated Annealing", Science, Vol. 220, No. 4598, pp. 671-680.