METHODS OF DERIVING BIOMETRIC ROC CURVES FROM THE k-NN CLASSIFIER Robert S. Zack May 8, 2010 Agenda Introduction to ROC Curves Classification Multi-Class Issues and Solutions New Derivation Methods Weak and Strong System Training Use Cases Search for a Topic Publications Dissertation Status Questions Introduction to ROC Curves Used for binary decisions Signal detection – signal / no signal Medical diagnosis – disease / no disease Biometric authentication – you are the person you claim to be / you are not In biometrics the ROC curve varies from FAR=1 & FRR=0 at one end to FAR=0 & FRR=1 at other FAR = False Accept Rate – the rate an imposter is falsely accepted FRR = False Reject Rate – the rate the correct person is falsely rejected ROC Charts are expressed in terms of percentages (0-100%) or probabilities (0-1). These are used interchangeably. Authentication Analogy Supreme Court – nine judges procedure – majority required to make decision Like 9NN needing majority to authenticate a user Usual ROC Curve – creates many potential procedures Need 9 votes to make decision (very conservative) Need 8, 7, 6 votes to make decision (conservative) Need 5 votes to make decision (majority) Need 4, 3, 2 votes to make decision (liberal) Need 1 or even 0 votes to make decision (very liberal) Anatomy of a Biometric ROC Curve Conservative is too restrictive. Positive classification requires strong evidence. Liberal is too open. Requires weak evidence. Parametric Procedures Parametric techniques are well studied. Data follows a normal or Gaussian distribution. Vary a threshold to obtain the tradeoff between FAR/FRR. Probability density functions can be calculated without estimation. Parametric ROC Derivation Classification 1. 2. 3. 4. The k-NN classifier is well studied. Biometrics classification problems can have many classes. It is easier to work with a large or unknown population if the data is converted from a multi-class to a two-class decision. Cha Dichotomy Model. K-NN Nonparametric Classifier k-NN is nonparametric. A vectordifference model is used to covert a many class problem into a two class, binary problem. Uses Euclidean distance k-NN Classification Procedure for k=5, Adapted from Pattern Classification, Duda, et al. Cha Dichotomy Model Simplifies complexity Transforms a feature space into a distance vector space. Uses distance measures. Multi-class to two Class Transformation Process, Adapted from Yoon et al (2005) m-kNN Method Pure Rank Method. Evaluate the top 7 NN. Q is authenticated if # within-class matches is >= decision threshold of 4NN. Unweighted. All W’s are equal in weight. wm-kNN Method Rank method weighted by rank order. Authenticate if W choices are > weighted match (m) Score varies from 0 to =k(k+1)/2 or 5+4+3+2+1 For every m, FAR/FRR pair or ROC point. If m=0, FAR=1, FAR=0 …All users accepted. If m=15, FAR=small, FRR=large, few Q’s accepted. m-kNN and wm-kNN ROC’s LapFree – Weak Training m-kNN and wm-kNN ROC’s DeskFree – Weak Training t-kNN Method A distance threshold method. A positive vote is within a distance threshold from the user’s sample. Uses feature vector space distances only. At 0, no distance vectors are authenticated. FAR=0, FRR=100%. At t=100, all distance vectors are authenticated. FAR=100, FRR=0. t-kNN Method DeskFree (left) and LapFree (right) Data ht-kNN Method Weighted vote based on distances to the kNN. Hybrid of rank method and vector space distances. For each test sample, the withinclass weight (WCW) is calculated based on the distance vectors. DeskFree (left) and LapFree (right) Data New Nonparametric ROC Methods Need m votes out of k for decision 1. • Pure rank method Need wm votes for decision, but some judges get more than one vote (weighted method) 2. • Rank method weighted by rank order A positive vote is within a distance threshold from the user’s sample 3. • Uses feature vector space distances only Weighted vote based on distances to the kNN 4. • Hybrid of rank method and vector space distances Weak & Strong Training Weak Training • People used in testing not used in training Independent sets of users for testing and training Strong Training • People used in testing also used in training Usually • • to augment the different training people But new difference-vectors used to authenticate For example, users provide 8 samples – 5 for training and 3 to match against for authentication Weak & Strong Training kNN Performance 100.00% 99.00% 98.00% Percent Accuracy 97.00% 96.00% 95.00% 94.00% DeskFree (WT) LapFree (WT) DeskFree (ST36) DeskFree (ST54) DeskFree (ST18) 93.00% 92.00% 91.00% 90.00% 1 3 5 7 9 11 Nearest Neighbor 13 15 17 19 21 Use Cases On-line test taking – Authentication Application Enroll students at the start of a class. Collect biometric samples. Authenticate users are who they should be using off-line batch processing. Corporate Compliance Training/Test Administration Enroll employees at some point prior to the training or test administration. Collect biometric samples. Refresh them at designated intervals. Authenticate users are who they should be. Future Work Real-time authentication. Accuracy Improvements. Error Cost Analysis. Measurement Error. Initial Search for a Topic Started program in Fall 2008. Entered DPS with an idea to research a topic in the area of mobile computing. Quickly discarded the idea. Continued to search for ideas by participating as a Customer for IT691/CS691Projects. Became exposed to Facial and Keystroke Biometrics. Continued working with Keystroke Biometrics and eventually found a topic with the help of Dr. Tappert. Idea Vetting The first few presentations of the topic met with a lot of resistance. It took some time to develop the “so what”. Every Research Seminar was recorded so that I could go back and listen to criticisms. Participated as co-author to several papers on the subject. Some papers were peer-reviewed and submitted for publication. Publications [1] J. Abbazio, S. Perez, D. Silva, R. Tesoriero, F. Penna, and R. S. Zack, "Face Biometric Systems," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2009, pp. C1.1-C1.8. [2] A. Amatya, J. Aliperti, T. Mariutto, A. Shah, M. Warren, R. S. Zack, and C. C. Tappert, "Keystroke Biometric Authentication System Experimentation," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2009, pp. C4.1-C4.8. [3] A. C. Caicedo, K. Chan, D. A. Germosen, S. Indukuri, M. N. Malik, D. Tulasi, M. C. Wagner, R. S. Zack, and C. C. Tappert, "Keystroke Biometric: Data/Feature Experiments," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010. [4] K. Doller, S. Chebiyam, S. Ranjan, E. Little-Tores, and R. S. Zack, "Keystroke Biometric System Test Taker Setup and Data Collection," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010. [5] S. Janapala, S. Roy, J. John, L. Columbu, J. Carrozza, R. S. Zack, and C. C. Tappert, "Refactoring a Keystroke Biometric System," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010, pp. B1.1-B1.8. [6] M. Lam, U. Patel, M. Schepp, T. Taylor, and R. S. Zack, "Keystroke Biometric: Data Capture Resolution Accuracy," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010. [7] C. C. Tappert, S.-H. Cha, M. Villani, and R. S. Zack, "A Keystroke Biometric System for LongText Input," International Journal of Information Security and Privacy, Pending Publication, 2010. [8] R. S. Zack, C. C. Tappert, S.-H. Cha, J. Aliperti, A. Amatya, T. Mariutto, A. Shah, and M. Warren, "Obtaining Biometric ROC Curves from a Non-Parametric Classifier in a Long-Text-Input Keystroke Authentication Study," vol. 268, Pace University, 2009. Questions