EE435: Biometric Signal Processing Project 4: Pattern Recognition III (System Performance) Assigned: Thurs 1/30/14 Due: Thurs 2/06/14 I. Introduction The goal of this project is for you to investigate the performance of an iris recognition algorithm. Performance will be evaluated using the algorithm’s Receiver Operating Characteristic (ROC) curve. You will used data that was compiled as part of the design process for the USNA Ridge Energy Direction (RED) government-owned iris recognition algorithm. You will be using two files: one contains only genuine match scores, and the other contains only imposter match scores. In this course, we have a separate topic area devoted to methods of iris recognition, but here is a brief description of how these scores came about… In iris recognition, each iris image is processed into a compact digital representation (a template) of the important information in the iris that make it a very strong biometric in terms of uniqueness. When two templates are compared, a match score is derived. Iris recognition uses a dissimilarity measure (vice a similarity measure) to describe how well two templates compare. A smaller match score means that they are very similar, which tends to mean that the two iris images are of the same eye. A higher match score means that they are more dissimilar, so it is more likely images of two different eyes. In iris recognition, the match score is a 0 for a perfect match (that is, there is no dissimilarity), and a 1.0 for two images that are exactly opposite. Therefore, the range of match scores is 0.0 to 1.0. Genuine match scores (comparing templates generated from the same eye) should be smaller, closer to 0.0, and imposter match scores (comparing templates from different eyes) should be higher, closer to 1.0. In truth, it is almost impossible to get an iris match score greater than 0.7. Recall from Chapter 1 of our text how an ROC curve is created…visually we plot the probability distribution of the genuine scores and the distribution of the imposter scores on the same axis (match score is on the x-axis). These distributions will probably overlap on their tails. When comparing two unknown iris templates, we choose a threshold of the match score to make our decision as to whether the two templates are from the same eye or not. Wherever the threshold is set, because of the overlapping probability distributions, we can expect to make errors: false accepts (FA-the two templates are from different eyes buy we say they are from the same) or false rejects (FR-the two templates were from the same eye but we decide they were not). As we let the threshold value range from its minimum value to its maximum value, we can compute the false acceptance rate (FAR) and false rejection rate (FRR) at each threshold value. This produces pairs of values, a FAR for each FRR. By plotting these coordinates on an axis with FAR on the x-axis and FRR on the y-axis, this is the ROC curve. Using the ROC curve, we can choose what FAR (or FRR) we wish to operate at, and this fixes the FRR (or FAR) value, since these values are paired. Other measures of performance are equal error rate (ERR, the point on the ROC curve where FAR = FRR), and d’, which roughly evaluates how “spread out” the imposter and genuine distributions are (higher value means more spread out, so can expect The images used were a total of 1000 images from 25 subjects (50 irises, 20 images per iris) from the University of Bath iris database. An example image is shown to the right. These images were 960 rows x 1280 columns in size. II. ROC Curve 1. Download the iris_matches_genuines.txt and the iris_matches_imposters.txt files from the shared Google Drive. Load these into MATLAB as variables using the load command. How many imposter scores are there ? ____________ How many genuine scores? _______________ With the information I gave you about the images that were used to create the data, why are there so many more imposter scores than genuine scores? 2. 3. Create a plot of the genuine and imposter probability distributions as follows: a. Use histc to create a histogram of the genuine match scores, with bin edges that run from 0.0 to 0.7 with a 0.01 bin width. Have the histc function return the histogram values in a variable. b. Use histc to create a histogram of the imposter match scores, with bin edges that run from 0.0 to 0.7 with a 0.01 bin width. Have the histc function return the histogram values in a different variable. c. Plot the two histograms using the plot command in a single plot command. You’ll notice that the imposter distribution has a high peak, but the genuine distribution looks to be almost all zeros compared to the imposter distribution. This is because there are many, many more imposter match scores. To turn this into a probability plot, divide each value in the genuine histogram by the number of genuine match scores, and the imposter histogram values by the number of imposter match scores. Note: in actuality, we call this type of distribution a probability mass function rather than a probability distribution, because it is based on actual data vice theoretical expectation. d. Add appropriate labels, title, a grid and a legend. TURN THIS PLOT IN, along with the ROC curve (below) on a single side of 1 sheet of paper (that is, 2 plots on one side of 1 page). Create the FAR and FRR data for the ROC curve as follows: a. You will vary the threshold for recognition from 0.0 up to 0.7, in increments of 0.01, and for each threshold value, count the number of false accepts and the number of false rejects. This could be done using the histograms you computed in step II.2.a and II.2.b above, or using the find command, or perhaps other ways. This means that you will be creating 3 vectors: a vector of threshold values, and 2 new vectors that have the same number of elements as the threshold vector. These last two vectors will hold the number of false accepts for each threshold value in one vector, and the number of false rejects in the other. b. Turn the false accepts vector into FAR values by dividing each value in the vector by the number of imposter matches. Turn the false rejects vector into FRR values by dividing each value in the vector by the number of genuine matches. c. Plot FAR (on the x-axis) versus FRR. Label the axes, give a title and turn on the grid. You may notice that the ROC curve seems to lie very close to the vertical axes…this is because iris recognition tends to produce very few (if any) false accepts if done correctly. Zoom in to the area around the “knee” of the curve so that you can see that it does have some shape to it. TURN THIS PLOT IN, along with the probability distribution curve (from above) on a single side of 1 sheet of paper. III. Other Performance Measures 1. By zooming in on your ROC curve as needed, determine the EER point and record it below. It may be easiest to do by having MATLAB draw a line with a slope of 1 on your plot, then zooming in to see where this line intersects the ROC curve. EER = __________% 2. Determine the value of d’ for this data set, and record it below. Show your computation in the space provided. d’ = ______________ 3. What threshold value produces the smallest number of total errors (number of FA + number of FR)? Record the threshold value and the minimum number of errors below. Threshold value = _________________ Minimum # of errors = ______________ For a writeup, fill in the blanks above, turn in the two plots and the code you wrote to make the plots. To conserve paper, put both figures on the same plot!