Biometric ROC Curves from kNN Classifier

advertisement
METHODS OF DERIVING
BIOMETRIC ROC CURVES
FROM THE k-NN
CLASSIFIER
Robert S. Zack
May 8, 2010
Agenda










Introduction to ROC Curves
Classification
Multi-Class Issues and Solutions
New Derivation Methods
Weak and Strong System Training
Use Cases
Search for a Topic
Publications
Dissertation Status
Questions
Introduction to ROC Curves



Used for binary decisions
 Signal detection – signal / no signal
 Medical diagnosis – disease / no disease
 Biometric authentication – you are the person you claim to be / you
are not
In biometrics the ROC curve varies from FAR=1 & FRR=0 at one end
to FAR=0 & FRR=1 at other
 FAR = False Accept Rate – the rate an imposter is falsely accepted
 FRR = False Reject Rate – the rate the correct person is falsely
rejected
ROC Charts are expressed in terms of percentages (0-100%) or
probabilities (0-1). These are used interchangeably.
Authentication Analogy

Supreme Court – nine judges
procedure – majority required to make decision
 Like 9NN needing majority to authenticate a user
 Usual

ROC Curve – creates many potential procedures
 Need
9 votes to make decision (very conservative)
 Need 8, 7, 6 votes to make decision (conservative)
 Need 5 votes to make decision (majority)
 Need 4, 3, 2 votes to make decision (liberal)
 Need 1 or even 0 votes to make decision (very liberal)
Anatomy of a Biometric ROC Curve




Conservative is
too restrictive.
Positive
classification
requires strong
evidence.
Liberal is too
open.
Requires weak
evidence.
Parametric Procedures

Parametric
techniques are well
studied.

Data follows a
normal or Gaussian
distribution.

Vary a threshold to
obtain the tradeoff
between FAR/FRR.

Probability density
functions can be
calculated without
estimation.
Parametric ROC Derivation
Classification
1.
2.
3.
4.
The k-NN classifier is well studied.
Biometrics classification problems can have many
classes.
It is easier to work with a large or unknown
population if the data is converted from a multi-class
to a two-class decision.
Cha Dichotomy Model.
K-NN Nonparametric Classifier



k-NN is
nonparametric.
A vectordifference model
is used to covert a
many class
problem into a two
class, binary
problem.
Uses Euclidean
distance
k-NN Classification Procedure for k=5, Adapted from Pattern Classification, Duda, et al.
Cha Dichotomy Model



Simplifies
complexity
Transforms a
feature space
into a
distance
vector space.
Uses
distance
measures.
Multi-class to two Class Transformation Process, Adapted from Yoon et al (2005)
m-kNN Method




Pure Rank Method.
Evaluate the top 7
NN.
Q is authenticated
if # within-class
matches is >=
decision threshold
of 4NN.
Unweighted. All
W’s are equal in
weight.
wm-kNN Method






Rank method weighted
by rank order.
Authenticate if W
choices are > weighted
match (m)
Score varies from 0 to
=k(k+1)/2 or
5+4+3+2+1
For every m, FAR/FRR
pair or ROC point.
If m=0, FAR=1, FAR=0
…All users accepted.
If m=15, FAR=small,
FRR=large, few Q’s
accepted.
m-kNN and wm-kNN ROC’s
LapFree – Weak Training
m-kNN and wm-kNN ROC’s
DeskFree – Weak Training
t-kNN Method




A distance threshold
method.
A positive vote is
within a distance
threshold from the
user’s sample.
Uses feature vector
space distances only.
At 0, no distance
vectors are
authenticated.
FAR=0, FRR=100%.
At t=100, all distance
vectors are
authenticated.
FAR=100, FRR=0.
t-kNN Method
DeskFree (left) and LapFree (right) Data
ht-kNN Method



Weighted vote
based on distances
to the kNN.
Hybrid of rank
method and vector
space distances.
For each test
sample, the withinclass weight
(WCW) is
calculated based on
the distance
vectors.
DeskFree (left) and LapFree (right) Data
New Nonparametric ROC Methods
Need m votes out of k for decision
1.
•
Pure rank method
Need wm votes for decision, but some judges get
more than one vote (weighted method)
2.
•
Rank method weighted by rank order
A positive vote is within a distance threshold from
the user’s sample
3.
•
Uses feature vector space distances only
Weighted vote based on distances to the kNN
4.
•
Hybrid of rank method and vector space distances
Weak & Strong Training

Weak Training
•
People used in testing not used in training
 Independent

sets of users for testing and training
Strong Training
•
People used in testing also used in training
 Usually
•
•
to augment the different training people
But new difference-vectors used to authenticate
For example, users provide 8 samples – 5 for training
and 3 to match against for authentication
Weak & Strong Training
kNN Performance
100.00%
99.00%
98.00%
Percent Accuracy
97.00%
96.00%
95.00%
94.00%
DeskFree (WT)
LapFree (WT)
DeskFree (ST36)
DeskFree (ST54)
DeskFree (ST18)
93.00%
92.00%
91.00%
90.00%
1
3
5
7
9
11
Nearest Neighbor
13
15
17
19
21
Use Cases

On-line test taking – Authentication Application
Enroll students at the start of a class. Collect biometric
samples.
 Authenticate users are who they should be using off-line
batch processing.


Corporate Compliance Training/Test Administration
Enroll employees at some point prior to the training or test
administration. Collect biometric samples. Refresh them at
designated intervals.
 Authenticate users are who they should be.

Future Work




Real-time authentication.
Accuracy Improvements.
Error Cost Analysis.
Measurement Error.
Initial Search for a Topic




Started program in Fall 2008.
Entered DPS with an idea to research a topic in the
area of mobile computing. Quickly discarded the
idea.
Continued to search for ideas by participating as a
Customer for IT691/CS691Projects. Became
exposed to Facial and Keystroke Biometrics.
Continued working with Keystroke Biometrics and
eventually found a topic with the help of Dr.
Tappert.
Idea Vetting



The first few presentations of the topic met with a
lot of resistance. It took some time to develop the
“so what”.
Every Research Seminar was recorded so that I
could go back and listen to criticisms.
Participated as co-author to several papers on the
subject. Some papers were peer-reviewed and
submitted for publication.
Publications








[1]
J. Abbazio, S. Perez, D. Silva, R. Tesoriero, F. Penna, and R. S. Zack, "Face Biometric Systems,"
in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2009, pp. C1.1-C1.8.
[2]
A. Amatya, J. Aliperti, T. Mariutto, A. Shah, M. Warren, R. S. Zack, and C. C. Tappert, "Keystroke
Biometric Authentication System Experimentation," in Student-Faculty Research Day, CSIS, Pace
University, White Plains, 2009, pp. C4.1-C4.8.
[3]
A. C. Caicedo, K. Chan, D. A. Germosen, S. Indukuri, M. N. Malik, D. Tulasi, M. C. Wagner, R.
S. Zack, and C. C. Tappert, "Keystroke Biometric: Data/Feature Experiments," in Student-Faculty
Research Day, CSIS, Pace University, White Plains, 2010.
[4]
K. Doller, S. Chebiyam, S. Ranjan, E. Little-Tores, and R. S. Zack, "Keystroke Biometric System
Test Taker Setup and Data Collection," in Student-Faculty Research Day, CSIS, Pace University, White
Plains, 2010.
[5]
S. Janapala, S. Roy, J. John, L. Columbu, J. Carrozza, R. S. Zack, and C. C. Tappert, "Refactoring
a Keystroke Biometric System," in Student-Faculty Research Day, CSIS, Pace University, White Plains,
2010, pp. B1.1-B1.8.
[6]
M. Lam, U. Patel, M. Schepp, T. Taylor, and R. S. Zack, "Keystroke Biometric: Data Capture
Resolution Accuracy," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010.
[7]
C. C. Tappert, S.-H. Cha, M. Villani, and R. S. Zack, "A Keystroke Biometric System for LongText Input," International Journal of Information Security and Privacy, Pending Publication, 2010.
[8]
R. S. Zack, C. C. Tappert, S.-H. Cha, J. Aliperti, A. Amatya, T. Mariutto, A. Shah, and M. Warren,
"Obtaining Biometric ROC Curves from a Non-Parametric Classifier in a Long-Text-Input Keystroke
Authentication Study," vol. 268, Pace University, 2009.
Questions
Download