EE435: Biometric Signal Processing Project 4: Pattern Recognition III (System Performance)

advertisement
EE435: Biometric Signal Processing
Project 4: Pattern Recognition III (System Performance)
Assigned: Thurs 1/30/14
Due: Thurs 2/06/14
I. Introduction
The goal of this project is for you to investigate the performance of an iris recognition algorithm. Performance will be
evaluated using the algorithm’s Receiver Operating Characteristic (ROC) curve. You will used data that was compiled as
part of the design process for the USNA Ridge Energy Direction (RED) government-owned iris recognition algorithm.
You will be using two files: one contains only genuine match scores, and the other contains only imposter match scores. In
this course, we have a separate topic area devoted to methods of iris recognition, but here is a brief description of how these
scores came about…
In iris recognition, each iris image is processed into a compact digital representation (a template) of the important
information in the iris that make it a very strong biometric in terms of uniqueness. When two templates are compared, a
match score is derived. Iris recognition uses a dissimilarity measure (vice a similarity measure) to describe how well two
templates compare. A smaller match score means that they are very similar, which tends to mean that the two iris images
are of the same eye. A higher match score means that they are more dissimilar, so it is more likely images of two different
eyes. In iris recognition, the match score is a 0 for a perfect match (that is, there is no dissimilarity), and a 1.0 for two
images that are exactly opposite. Therefore, the range of match scores is 0.0 to 1.0. Genuine match scores (comparing
templates generated from the same eye) should be smaller, closer to 0.0, and imposter match scores (comparing templates
from different eyes) should be higher, closer to 1.0. In truth, it is almost impossible to get an iris match score greater than
0.7.
Recall from Chapter 1 of our text how an ROC curve is created…visually we plot the probability distribution of the genuine
scores and the distribution of the imposter scores on the same axis (match score is on the x-axis). These distributions will
probably overlap on their tails. When comparing two unknown iris templates, we choose a threshold of the match score to
make our decision as to whether the two templates are from the same eye or not. Wherever the threshold is set, because of
the overlapping probability distributions, we can expect to make errors: false accepts (FA-the two templates are from
different eyes buy we say they are from the same) or false rejects (FR-the two templates were from the same eye but we
decide they were not). As we let the threshold value range from its minimum value to its maximum value, we can compute
the false acceptance rate (FAR) and false rejection rate (FRR) at each threshold value. This produces pairs of values, a FAR
for each FRR. By plotting these coordinates on an axis with FAR on the x-axis and FRR on the y-axis, this is the ROC
curve.
Using the ROC curve, we can choose what FAR (or FRR) we wish to operate at, and this fixes the FRR (or FAR) value,
since these values are paired.
Other measures of performance are equal error rate (ERR, the point on the ROC curve where FAR = FRR), and d’, which
roughly evaluates how “spread out” the imposter and genuine distributions are (higher value means more spread out, so can
expect
The images used were a total of 1000 images from 25 subjects (50 irises, 20
images per iris) from the University of Bath iris database. An example
image is shown to the right. These images were 960 rows x 1280 columns
in size.
II. ROC Curve
1.
Download the iris_matches_genuines.txt and the iris_matches_imposters.txt files from the shared Google Drive. Load
these into MATLAB as variables using the load command.
How many imposter scores are there ? ____________
How many genuine scores? _______________
With the information I gave you about the images that were used to create the data, why are there so many more
imposter scores than genuine scores?
2.
3.
Create a plot of the genuine and imposter probability distributions as follows:
a.
Use histc to create a histogram of the genuine match scores, with bin edges that run from 0.0 to 0.7 with a 0.01
bin width. Have the histc function return the histogram values in a variable.
b.
Use histc to create a histogram of the imposter match scores, with bin edges that run from 0.0 to 0.7 with a 0.01
bin width. Have the histc function return the histogram values in a different variable.
c.
Plot the two histograms using the plot command in a single plot command. You’ll notice that the imposter
distribution has a high peak, but the genuine distribution looks to be almost all zeros compared to the imposter
distribution. This is because there are many, many more imposter match scores. To turn this into a probability plot,
divide each value in the genuine histogram by the number of genuine match scores, and the imposter histogram
values by the number of imposter match scores. Note: in actuality, we call this type of distribution a probability
mass function rather than a probability distribution, because it is based on actual data vice theoretical expectation.
d.
Add appropriate labels, title, a grid and a legend. TURN THIS PLOT IN, along with the ROC curve (below) on a
single side of 1 sheet of paper (that is, 2 plots on one side of 1 page).
Create the FAR and FRR data for the ROC curve as follows:
a.
You will vary the threshold for recognition from 0.0 up to 0.7, in increments of 0.01, and for each threshold value,
count the number of false accepts and the number of false rejects. This could be done using the histograms you
computed in step II.2.a and II.2.b above, or using the find command, or perhaps other ways. This means that you
will be creating 3 vectors: a vector of threshold values, and 2 new vectors that have the same number of elements
as the threshold vector. These last two vectors will hold the number of false accepts for each threshold value in one
vector, and the number of false rejects in the other.
b.
Turn the false accepts vector into FAR values by dividing each value in the vector by the number of imposter
matches. Turn the false rejects vector into FRR values by dividing each value in the vector by the number of
genuine matches.
c.
Plot FAR (on the x-axis) versus FRR. Label the axes, give a title and turn on the grid. You may notice that the
ROC curve seems to lie very close to the vertical axes…this is because iris recognition tends to produce very few
(if any) false accepts if done correctly. Zoom in to the area around the “knee” of the curve so that you can see that
it does have some shape to it. TURN THIS PLOT IN, along with the probability distribution curve (from above)
on a single side of 1 sheet of paper.
III. Other Performance Measures
1.
By zooming in on your ROC curve as needed, determine the EER point and record it below. It may be easiest to do by
having MATLAB draw a line with a slope of 1 on your plot, then zooming in to see where this line intersects the ROC
curve.
EER = __________%
2.
Determine the value of d’ for this data set, and record it below. Show your computation in the space provided.
d’ = ______________
3.
What threshold value produces the smallest number of total errors (number of FA + number of FR)? Record the
threshold value and the minimum number of errors below.
Threshold value = _________________
Minimum # of errors = ______________
For a writeup, fill in the blanks above, turn in the two plots and the code you wrote to make
the plots. To conserve paper, put both figures on the same plot!
Download