Adding human expertise to the quantitative analysis of fingerprints Busey and Chen PROGRAM NARRATIVE A. Research Question Machine learning algorithms take a number of approaches to the quantitative analysis of fingerprints. These include identifying and matching minutiae (refs), matching patterns of local orientation based on dynamic masks (refs), and neural network approaches that attempt to learn the structure of fingerprints (refs). While these techniques provide good results in biometric applications and serve a screening role in forensic cases, they are less useful when applied to severely degraded fingerprints, which must be matched by human experts. Indeed, statistical approaches and human experts have different strengths. Despite the enormous computational power available today for use by computer analysis systems, the human visual system remains unequaled in its flexibility and pattern recognition abilities. Three possible reasons for this success come from the experts knowledge of where the most important regions are located on a particular set of prints, the ability to tune their visual systems to specific features, and the integration of information across different features. In the present project, we propose to integrate the knowledge of experts into the quantitative analysis of fingerprints to a degree not achieved by other approaches. There is much that fingerprint examiners can add to machine learning algorithms and, as we describe below, many ways in which statistical learning algorithms can assist human experts. Thus the central research question of this proposal is: How can the integration of information derived from experts improve the quantitative analysis of fingerprints? B. Research goals and objectives The goal of the present proposal is to integrate data from human experts with statistical learning algorithms to improve the quantitative analysis of inked and latent prints. We introduce a novel procedure developed by one investigator (Tom Busey) and use it to guide the input to statistical learning algorithms developed and extended by our other investigator (Chen Yu). The fundamental idea behind our approach is that the quantitative evaluation of the information Page 1 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen contained in latent and inked prints can be vastly improved by using elements of human expertise to assist the statistical modeling, as well as to introduce a new dimension of time that is not contained in the static latent print analysis. The main benefit, as we discuss in sections C.x.x, is that the format of the data extracted from experts allows the application of novel quantitative models that are adapted from related areas. To apply this knowledge derived from experts, we will use our backgrounds in vision, perception, machine learning and behavioral testing to design experiments that extract relevant information from experts and use this to improve the quantitative analysis techniques applied to fingerprints by integrating the two sources of information. Our research interests differ somewhat from the existing approaches and reflects the adaptations that are necessary to incorporate human expert knowledge. Existing statistical algorithms developed to match fingerprints rely on several different classes of algorithms, Some extract minutiae and other robust sources of information such as the number of ridges between minutiae (refs). Others rely on the computation of local curvature of the ridges, and then partition these into different classes (MASK refs). Virtually all approaches make reasoned and reasonable guesses as to what the important sources of information might be, such as minutiae, local ridge orientation or local ridge width (dgs paper). The present approach takes a more agnostic approach to what might be the important sources of information in fingerprints, and we will develop statistical models that take advantage of the data derived from experts. However, a major goal of the grant is to demonstrate how expert knowledge can be applied to any extant model, and to suggest how this might be accomplished. Thus we will spend substantial time documenting our application of expert knowledge for our statistical models. In addition, we will make all of our expert data available for other researchers and practitioners. It is likely that the data will have implications for training, although this is not the focus of the present proposal. C. Research design and methods At the heart of our approach is idea that human expertise, properly represented, can improve the quantitative analyses of fingerprints. In a later section we describe how we apply human Page 2 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen expert knowledge to various statistical analyses, but first we need to answer the question of whether human experts can add something to the quantitative analyses of prints. The answer to this question can be broken down into two parts. First, do human visual systems in general possesses attributes not captured by current statistical approaches, and second, do human experts have additional capacities not shared by novices, capacities that could further inform statistical approaches. Below we briefly summarize what the visual science literature tells us about how humans recognize patterns, and then describe our own work that has addressed the differences between experts and novices. As we will show, human experts have much to add to quantitative approaches. We should stress that while we will gather data from human experts to improve our quantitative analyses of fingerprints, the goal of this grant is not to study human experts in order to determine whether or how they differ from novices, nor are we interested in questions about the reliability or accuracy of human experts. Instead, we will generalize our previous results that demonstrate strong differences in the visual processing of fingerprints in experts, and apply this expertise to our own statistical analyses. As a result, we will only gather data from human experts (latent print examiners with at least 5 years of post-apprentice work in the field) under the assumption that this will provide maximum improvement to our statistical methods. We can demonstrate the effectiveness of this knowledge by simply re-running the statistical analyses without the benefit of knowledge from experts. There are various metrics attached to each analysis technique that demonstrate the superiority of expert-enhanced analyeses, such as the correct recognition/false recognoition tradeoff graphs, or the dimensionality reduction/reconstruction successes of data reduction techniques. We will also apply novel approaches adapted from the related domain of language analyses. It might seem odd to apply techniques developed for linguistic analyses to a visual domain such as pattern recognition, but the principles that underlie both domains are very similar. Both involve large numbers of features that have complex statistical relations. In the case of language, the features are often words, phonemes or other acoustical signals. Fingerprints are defined by a Page 3 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen complex but very regular dictionary of features that also share a complex and meaningful correlational structure. One of us (Chen) is a highly-published expert in the field machine learning algorithms as applied to multimodal data, and several papers inlcuded as appendicies detail this expertisze. His work on multimodal applications between visual and auditory domains make him well-suited to address the relation between human data and machnie leanring algorythms. Both linguistic and visual informaiton contain highly-structured data that consist of regularities that are extracted by perceivers, and this is not unlike the temporal sequence that experts go through when they perform a latent print examination, as we describe in a later section. First, however, we address how we might document the principles of human expertise. Can we use elements of the human visual system to improve our statistical analyses? The answer to this question is straightforward, in part because of the overwhelming evidence that human-based recognition systems contain processes that are not captured by current statistical approaches. One of us (Busey) has published many articles addressing different aspects of human sensation, perception and cognition, and thus is well-suited to manage the acquisition and application of human expertise to statistical approaches. Below we briefly summarize the properties of the human visual system and in a later section we describe how we plan to extract fundamental principles from this design in order to improve our statistical analyses of fingerprints. An analyses of the human visual system by vision scientists demonstrates that the recognition process proceeds via an hierarchical series of stages, each with important non-linearities (nature ref), that produce areas that respond to objects of greater and greater complexity. This process also provides increasing spatial independence, allowing brain areas to integrate over larger and larger regions. This will become important for holistic or configural processing, as discussed in a later section. (also talk about feature-based attention) A second benefit of this hierarchical approach is that objects achieve limited scale and contrast invariance. Statistical approaches often deal with this through local contrast or Page 4 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen brightness normalization, but this is a separate process. Scale invariance is often achieved by explicitly measuring the width of ridges (grayscale ref), again a separate process. A third strength of the human visual system is that it appears to have the ability to form new feature templates through an analyses of the statistical information contained in the fingerprints. This process, called unitization, will tend to improve feature detection in noisy environments as is often found with latent prints. Do forensic scientists have visual capabilities not shared by novices? The prior summary of the elements of the human visual system suggests that current statistical approaches can be improved by adapting some of the principles underlying the human visual system. There are, however, other processes that are specifically developed by latent print examiners that may also be profitably applied to statistical models. Below we summarize the results of two empirical studies that have recently been published in the highly respected journal Vision Research (Busey & Vanderkolk, 2005). The results demonstrate not only that experts are better than novices, but suggest the nature of the processes that produce this superior performance. Visual expertise takes many forms. It could be different for different parts of the identification process, and may not even be verbalizable by the expert since many elements of perceptual expertise remain cognitively impenetrable (refs). A major focus of our research is to capture elements of this expertise and use this as a training signal for our statistical learning algorithms. What is novel to our approach is our ability to capture the expertise at a very deep and rich level. In the next section we describe our prior work documenting the nature of the processes that enable experts to perform at levels much superior to novices, and then in Section C.2 we describe how we capture this expertise in a way that we can use it to improve our statistical learning algorithms. Page 5 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen C.1. Documenting expertise in human latent print examiners Initially, experts tend to focus on the entire print, which leads to benefits that we have previously identified as configural processing (Busey & Vanderkolk, 2005). Configural processing takes several forms, but the basic idea behind this process is that instead of focusing on individual features or minutiae, the observer instead integrates information over a large region, to identify important relations such as relative locations of features or curvature of ridge flow. Fingerprint examiners often talk about 'viewing the image in its totality', which is different language for the same process. While configural processing reveals the overall structure of an image and selects important regions for further inspection, the real work comes in comparing small regions in one print to regions in the other. These regions may be selected on the basis of minutiae identified in the print, or high-quality Level 3 detail. We know from related work on perceptual learning in the visual system that one of the processes by which expertise develops is through the development of new feature detectors. Experts spend a great deal of time viewing prints, and this has the potential to result in profound changes in how their visual systems process fingerprints. (config processing refs) One process by which experts could improve how they extract latent print information from noisy prints is termed unitization, in which novel feature detector are created through experience (unitization refs). Fingerprints contain remarkable regularities and the human visual system C.1.a. Do experts have information valuable to training networks or documenting the quantitative nature of fingerprints? Fingerprint examiners have received almost no attention in the perceptual learning or expertise literatures, and thus the PI began a series of studies in consultation with John Vanderkolk, of the Indiana State Police Forensic Sciences Laboratory in Fort Wayne, Indiana. Our first study addressed the nature of the expertise effects in a behavioral experiment, and then we followed up evidence for configural processing with an electrophysiological study. The Page 6 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen discussion below describes the experiments in some detail, in part Study Image 1 Second because extensions of this work are proposed in Section D, and a complete description here illustrates the technical rigor and converging methods of our Mask 200 or 5200 Milliseconds approach. C.1.b. Behavioral evidence for configural processing In our first experiment, we abstracted Test Images Until Response what we felt were the essential elements of the fingerprint examination process into an X-AB task that could be accomplished in relatively short order. Figure 1. Sequence of events in a behavioral experiment with fingerprint experts and novices. Note that the study image has a different orientation and is slightly brighter to reduce reliance on low-level cues. This work is described in Busey and Vanderkolk (2005), but we briefly describe the methods here since they illustrate how our approach seeks to find a paradigms that is less time-consuming than fully realistic forensic examinations (which can take hours to days to complete) yet still maintains enough ecological validity to tap the expertise of the examiners. Figure 1 shows the stimuli used in the experiment as well as a timeline of one trial. We cropped out fingerprint fragments from inked prints, grouped them into pairs, and briefly presented one of the two for 1 second. This was followed by a mask for either 200 or 5200 ms, and then the expert or novice subject made a forced-choice response indicating which of the two test prints they believe was shown at study. We introduced orientation and brightness jitter at study, and the construction of the pairs was done to reduce the reliance on idiosyncratic features such as lint or blotches. At test, we introduced two manipulations that we thought captured aspects of latent prints, as shown in Figure 2. First, latent prints are often embedded in visual noise from the texture of the Page 7 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen surface, dust, and other sources. One expert, in describing how he approached latent prints, stated that his job was to Clear Fragments Partially-Masked Fragments Fragments Presented in Noise Partially-Masked Fragments Presented in Noise 'see through the noise.' To simulate at least elements of this noise, we embedded half of our test prints in white visual noise. While this may have a spatial distribution that differs from the Figure 2. Four types of test trials. noise typically encountered by experts, we hoped that it would tap whatever facilities experts may have developed to deal with noise. The second manipulation was motivated by the observation that latent prints are rarely complete copies of their inked counterparts. They often appear patchy if made on an irregular surface, and sections may be partially masked out. To simulate this, we created partially-masked fingerprint fragments as shown in the upper-right panel of Figure 2. Note that the partiallymasked print and its complement each contain exactly half of the information of the full print and the full print can be recovered by summing the two partial prints pixel-bypixel. We use this property to test for configural effects as described in a later section. All three manipulations (delay between study and test, added noise and partial masking) were fully crossed to create 8 conditions. The data is shown in Figure 3, which show main effects for all three factors for novices. Somewhat surprising is the finding that while Figure 3. Behavioral Experiment Data. Error bars represent one standard error of the mean (SEM). Page 8 Figure 2. Sequence of events in a behavioral experiment with fingerprint experts. Note that the study image has a different orientation and is slightly brighter to reduce Adding human expertise to the quantitative analysis of fingerprints Busey and Chen experts show effects of added noise and partial masking, they show no effect of delay, which suggests that they are able to re-code their visual information into a more durable store resistant to decay, or have better visual memories. Experts also show an interaction between added noise and partial masking, but novices do not. This interaction seen with the experts may result from very strong performance for full images embedded in noise, and may result from configural processes. To test this in a scale-invariant manner, we developed a multinomial model which makes a prediction for full-image performance given partial-image performance using principles similar to probability summation. The complete results are found in Busey & Vanderkolk (2005), but to summarize, when partial image performance is around 65%, the model predicts full image performance to be about 75%, and it is almost at 90%, significantly above the probability summation prediction. Thus it appears that when both halves of an image are present (as in the full image) experts are much more efficient at extracting information from each half. The results of this experiment lay the groundwork for a more complete investigation of perceptual expertise in fingerprint examiners. From this work we have evidence that: 1) Experts perform much better than novices overall, despite the fact that the testing conditions were time-limited and somewhat different than those found in a traditional latent print examination. 2) Experts appear immune to longer delays between study and test images, suggesting better information re-coding strategies and/or better visual memories 3) Experts may have adopted configural processing abilities over the course of their training and practice. All observers have similar facilities for faces as a consequence the ecological importance of faces and our quotidian exposure as a result of social interactions. Experts may have extended this ability to the domain of fingerprints, since configural processing is seen as one mechanism underlying expertise (e.g. Gauthier & Tarr, 1997). C.1.c. Electrophysiological evidence for configural processing To provide converging evidence that fingerprint experts process full fingerprints configurally, we turned to an electrophysiological paradigm based on work from the face Page 9 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen recognition literature. This experiment is described more fully in Busey and Vanderkolk (2005), which is included as an appendix. However, these results support the prior conclusions described above, and demonstrate that the configural processing observed with fingerprint examiners is a result of profound and qualitative changes that occur in the very earliest stages of their perceptual processing of fingerprints. C.2. Elements of human expertise that could improve quantitative analyses The two studies described above are important because they illustrate that configural information is one process that could be adapted for use in the quantitative analyses of fingerprints. Existing quantitative models of fingerprints incorporate some elements of the expertise seen above, but many elements could be added that would improve the recognition accuracy of existing programs. The two major approaches to fingerprint matching rely on local features such as minutiae detection (refs), and more global approaches such as dynamic masks applied to orientation computed at many locations on a grid overlaying the print (refs). Of these two approaches, the dynamic mask approach comes closer to the idea of configural processing, although it does not compute minutiae directly. strengthen this intro Neither approach takes advantage of the temporal information that expresses elements of expertise in the human matching process. Quantitative information such as fingerprint data, when represented in pixel form, has a highly-dimensional structure. The two techniques described above reduce this dimensionality by either extracting salient points such as minutiae, or computing orientation only at discrete locations. Both of these approaches throw out a great deal of information that could otherwise be used to train a statistical model on the elemental features that allow for matches. Part of the reason this is necessary is that the high-dimensional space is difficult to work in: all prints are more or less equally similar without this dimensionality reduction, and by reducing the dimensionality computations such as similarity become tractable. The key, then, is to reduce the dimensionality while preserving the essential features that allow for discrimination among prints. One technique that has been explored in language acquisition is the concept of "starting small" (Elman ref). In this procedure, machine learning approaches such Page 10 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen as neural network analyses are given very coarse information at first, which helps the network find an appropriate starting point. Gradually, more and more detail information is added, which allows the network to make finer and finer discriminations. We discuss these ideas more fully in section X.Xx, but we mention it here to motivate the empirical methods described next. Experts likely select which information they choose to initially examine based on the need to organize their search processes. Thus they likely acquire information that may not immediately indicate to a definitive conclusion of confirmation or rejection, but guides the later acquisition process. In the scene perception literature, this process is known as 'gist acquisition' (refs), and suggests that the order in which a system (machine or human) learns information matters. In the section below we describe how we acquire both spatial and temporal information from experts, and then describe how this knowledge can be incorporated into quantitative models. C.3. Capturing the information acquisition process: The moving window paradigm To identify the nature of the information used by experts, and the order in which it is gathered, we have begun to use a technique called a moving window procedure. In the sections below we describe this procedure and how it can be extended to address the role of configural or gist information in human experts. C.3.a. The moving window paradigm The moving window paradigm is a software tool that simulate the relative acuity of the fovea and peripheral visual systems. As we look around the world, there is a region of high acuity at the location our eyes are currently pointing. Regions outside the foveal viewing cone are represented less well. In the moving window paradigm we represent this state by slightly blurring the image and reducing the contrast. http://cognitrn.psych.indiana.edu/busey/FingerprintExample/ Figure 4 shows several frames of the moving window program, captured at different points in time. The two images have been degraded by a blurring operation that somewhat mimics the Page 11 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen Figure 4. The moving window paradigm allows the user to move the circle of interest around to different locations on the two prints. This circle provides high-quality information, and allows the expert the opportunity to demonstrate, in a procedure that is very similar to an actual latent print examination, which sections of the prints they believe are most informative. This procedure also records the order in which different sites are visited. reduced representation of peripheral vision. The exception is a clear circle that responds in real time to the movement of the mouse. This dynamic display forces the user to move the clear window to regions of the display that warrant special interest. The blurred portions provide some context for where to move the window. By recording the position of the mouse each time it is moved, we can reconstruct a complete record of the manner in which the user examined the prints. This method has some drawbacks in that the eyes move faster than the mouse. However, Page 12 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen we find that with practice the experts report very little limitations with this procedure and it has the benefit of precise spatial localization. A major benefit of this procedure is that it can be done over the web, reaching dozens of experts and producing a massive dataset. Many related information theoretic approaches such as latent semantic analysis find that a large corpus of data is necessary in order to reveal the underlying structure of the representation of information, and a web-based approach provides sufficient data. The data produced by this paradigm is vast: x/y coordinates for the clear window at each millisecond. We have begun to analyze this data using several different techniques. The first analysis we designed creates a mask that is black for regions the observer never visited and clear for areas visited most often. Figure 5 shows an example of this kind of analysis. Areas visited less often are somewhat darkened. The left panels of Figure 5 show two masked images, which shows not only where the experts visited, but how long they spent inspecting each location. Thus it represents a window into the regions the experts believed informative. The right panels give a slightly different view, where unvisited areas are represented in red. This illustrates that experts actually spend most of their time in relatively small regions of the prints. As a first pass, the images in Figure 5 reveal where the experts believe the task-relevant information resides. However, lost in such a representation is the order in which these sites were visited. In addition, this information is very specific to a particular set of print. Ultimately we will produce more general representation that characterizes both the fundamental set of features (often described as the basis set) that experts rely on, as well as how they process these features. We have begun to explore an information-theoretic approach to this problem that seeks to find a set of visual features that is common to a number of experts and fingerprint pairs. This approach is related to many of the dimensionality reduction techniques that have been applied to natural images (e.g. Olshausen & Field, 1996). Later project extend this approach to incorporate elements of configural processing or context-specific models. In the present proposal we discuss several different ways we plan to analyze what is a very rich dataset. Page 13 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen Figure 5. Examples of masked imaged revealing where experts choose to acquire information in order to make an identification. The black versions show only regions where the expert spent any time, and the mask is clearer for regions in which the expert spent more time. The right-hand images show teh same information, but allow some of the uninspected information to show through. These images reveal that experts pay relatively little attention to much of the image and only focus on regions they deem releveant for the identification. We suggest that this element of expertise, learning to attend to relevant locations, is something that coudl benefit quantitative analyes of fingerprints. Our experts report relatively little hindrance when using the mouse to move the window. The latent and inked prints have their own window (only one is visible at any one time) and users press a key to flip back and forth between the two prints. This flip is actually faster than an eyemovement and automatically serves as a landmark pointer for each print, making this procedure almost as easy to use as free viewing of the two prints (which are often done under a loupe with its own movement complexities). In addition, we also give users brief views of the entire image to allow configural processes to work to establish the basic layout. C.3.b. Measuring the role of configural processing in latent print examinations behavioral experiment- blurred vs. very low contrast- qualitative changes across experts? complete this section C.3.c. Verification with eyemovement recording complete this section Page 14 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen C.4. Extracting the fundamental features used when matching prints Because latent and inked prints are rarely direct copies of each other, an expert must extract invariants from each image that survive the degradations due to noise, smearing, and other transformations. Once these invariants are extracted, the possibility of a match can be assessed. This is similar in principle to the type of categorical perception observed in speech recognition, in which the invariants of parts of speech are extracted from the voices of different talkers. This suggests that there exists a set of fundamental building blocks, or basis functions, that experts use to represent and even clean up degraded prints. The nature and existence of these features are quite relevant for visual expertise, since in some sense these are the direct outcomes of any perceptual system that tunes itself to the visual diet it experiences. We propose to perform data reduction techniques on the output of the moving window paradigm. These techniques have successfully been applied to derive the statistics of natural images (Hyvarinen & Hoyer, 2000). The results provided individual features that are localized in space and resemble the response profiles of simple cells in primary visual cortex. Many of these studies are performed on random sampling of images and visual sequences, but the moving window application provides an opportunity to use these techniques to recover the dimensions of only the inspected regions, and to compare the recovered dimensions from experts and representations based on random window locations. The specifics of this technique are straightforward. For each position of the moving window, extract out (say) a 12 x 12 patch of pixels. This is repeated at each location that was inspected by the subject, with each patch weighted by the amount of time spent at each location. The moving window experiment tens of thousands of patches of pixels, which are submitted to a data reduction technique (independent component analysis, or ICA), which is similar to principle components analysis, with the exception that the components are independent, not just uncorrelated. The linear decomposition generated by ICA has the property of sparseness, which has been shown to be important for representational systems (Field, 1994; Olshausen & Field, 1996) and implies that a random variable (the basis function) is active only very rarely. In Page 15 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen Figure 6. ICA components from expert data. practice, this sparse representation creates basis functions that are more localized in space than those captured by PCA and are more representative of the receptive fields found in the early areas of the visual system. Huge copra of samples are required to extract invariants from noisy images, and at present we have only pilot data from several experts. However, the results of this preliminary analysis can be found in Figure 6. This figure shows features discovered using the ICA algorithm (Hurri & Hyvarinen, 2003; Hyvarinen, Hoyer & Hurri, 2003). Each image represents a basis function that when linearly combined will reproduce the windows examined by experts. Inspection of Figure 6 reveals that features such as ridge endings, y-brachings and islands are beginging to become represented. This analysis takes on greater value when applied to the entire database we will gather, since it will combine across individual features to derive the invariant stimulus features that provide the basis for fingerprint examinations done by human experts. The ICA analysis is very sensitive to spatial location, and while cells in V1 are likely also highly position sensitive, the measured basis functions are properties of the entire visual stream, not just the early stages. More recent advances in ICA techniques have addressed this issue in a similar way that the visual system has chosen to solve the problem. In addition performing data reduction techniques to extract the fundamental basis sets, these extended ICA algorithms group the recovered components based on their energy (squared outputs). This grouping has shown to Page 16 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen Figure 7. ICA components from expert data, and grouped by energy. This analyses allows the basis functions to have partial spatial independence, at a slight cost to image quailty. This latter issue is less relevant for larger corpi when many similar features are combined by individual basis function groups. produce classes of basis functions that are position invariant by virtue of the fact that they include many different positions for each fundamental feature type. The examples shown in Figure 7 were generated by this technique, which reduces the reliance on spatial location. This groups the recovered features by class and accounts for the fact that rectangles have similar properties to nearby rectangles. Note that the features in Figure 14 are less localized than those typically found with ICA decompositions, which may be due to the large correlational structure inherent in fingerprints, although this remains an open question addressed by this proposal. The development of ICA approaches is an ongoing field, and we anticipate that the results of the proposed research will help extend these models as we develop our own extensions based on the applications to fingerprint experts. There are several ways in which the recovered components can be used to evaluate the choice of positions by experts (which ultimately determine, along with the image, the basis functions). First, one can visually inspect the sets of basis functions recovered from datasets produced by experts, and compare this with one generated from random window locations. A second technique can be used to demonstrate that experts do indeed posses a feature set that differs from a random set. The data from random windows and experts can be combined to produce a common set of components (basis functions). ICA is a linear technique, and thus the original data for both experts and random windows can be recovered through weighted sums of Page 17 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen the components, with some error if only some of the components are saved. If experts share a common set of features that is estimated by ICA, then their data should be recovered with less error than that of the random windows. This would demonstrate that an important component of expertise is the ability to take a highly dimensional dataset (as produced by noisy images) and reduce it down to fundamental features. From this perspective, visual expertise is data reduction. These kinds of data reduction techniques serve a separate purpose. Many of the experiments described in other sections of this proposal depend on specifying particular features. While initial estimates of the relevant features can be made on the basis of discussions with fingerprint experts, we anticipate that the results of the ICA analysis will help refine our view of what constitutes an important feature within the context of fingerprint matching. The moving window procedure has the disadvantage of being a very localized procedure, due to the nature of the small moving window. There is a fundamental tradeoff between the size of the window and the spatial acuity of the procedure. If the window is made too large, we know less about the regions from which the user is attempting to acquire information. To offset this, we have provided the user the opportunity to view quick flashes of the full image, enough to provide an overview of the prints, but not enough to allow matches of specific regions. We will also conduct the studies using large and small windows to see whether the nature of the recovered components changes with window size. C.4. Starting Small: Guiding feature extraction with expert knowledge We need to ask whether this is compelling, and cut it if it is not. Feature extraction procedures attempt to take a high dimensional space and use the redundancies in this space to derive a lower-dimensional representation that combine across the redundancies to provide a basis set. This basis set can be thought of as the fundamental feature set, and the development of this set can be thought of as one mechanism underlying human expertise. The difficulty with these highly-dimensional spaces is that algorithms that attempt to uncover the feature set through iterative procedures like Independent Component Analysis or neural networks may fall into local minima and fail to converge upon a global solution. One Page 18 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen solution that has been proposed in the human developmental literature is one of starting small (Elman, 1993). In this technique, programmers intially restrict the inputs to statistical models to provide general kinds of information rather than specific information that would lead to learning of specific instances. As a network matures, more specific information is added, which allows the network to avoid falling into local minima that represent non-learned states. While the exact nature of these effects are still being worked out (Rohde & Plaut, 1999), recent work has provided empirical support in the visual domain (Conway, Ellefson & Christiansen, ref). This suggests that we might use the temporal component of the data from experts in the moving window paradigm to help guide the training of our networks. As an expert views a print, they initially are likely to focus on broad, overall types of information that give the need to finish if necessary C.5. Automatic detection of regions of interest using expert knowledge In both fingerprint classification (e.g. Dass & Jain, 2004; Jain, prabhakar & Hong 1999; Cappelli, Lumini, Maio & Maltoni, 1999) and fingerprint identification (e.g. Pankanti, Prabhakar & Jain, 2002; Jain, Prabhakar & Pankanti, 2002) applications, there are two main components for an automatic system: (1) feature extraction and (2) matching algorithm to compare (or classify) fingerprints based on feature representation. The feature extraction is the first step to convert raw images into feature representations. The goal is to find robust and invariant features to deal with various conditions in real-world applications, such as illumination, orientation and occlusion. Given a whole image of fingerprint, most fingerprint recognition systems utilize the location and direction of minutiae as features for pattern matching. In our preliminary study of human expert behaviors, we observe that human experts focus on just parts of images (regions of interest – ROIs) as shown in Figure XX, suggesting that it is not necessary for a human expert to check through all minutiae in a fingerprint. A small subset of minutiae seems to be sufficient for the human expert to make a judgment. What regions are useful for matching among all the minutiae in a fingerprint? Is it possible to build an automatic ROI detection system that can achieve a similar performance as a human expert? We attempt to answer this question by Page 19 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen Figure X. The overview of automatic detection of regions of interest. The red regions in the fingerprints indicate whether human expert focus on in pattern matching task. building a classification system based on the training data captured from human experts. Given a new image, the detection system is able to automatically detect and label regions of interest for the matching purpose. We want to note that we expect that most regions selected by our system will be minutiae but we also expect that the system will potentially discover the structure regularities from non-minutia regions that are overlooked in previous studies. Different from previous studies of minutiae detection (e.g. Maio & Maltoni, 1997), our automatic detection system will not simply detect minutiae in a fingerprint but focus on detecting both a small set of minutiae and other useful regions for the matching task. Considering the difficulties in fingerprint recognition, building this automatic detection system is challenging. However, we are confident that this proposed research will be first steps toward the success and make important contributions. This confidence lies in two important factors that make our work different from other studies: (1) we will record detailed behaviors of human experts (e.g. where they look in a matching task) and recruits the knowledge extracted from human experts to build a pattern recognition system; and (2) we will apply state-of-art machine learning techniques in this study to efficiently encode both expert knowledge and regularities in fingerprint data. The combination of these two factors will lead us to achieve this research plan. Page 20 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen To build this kind of system, we need to develop a machine learning algorithm and estimate the parameters based on the training data. Using the moving window paradigm (described in C.3), we collect the information of where a human expert examines from moment to moment when he performs a matching task. Hence, the expert’s visual attention and behaviors (moving the windows) can be utilized as labels of regions of interest – providing the teaching signals for a machine learning algorithm. In the proposed research, we will build an automatic detection system that captures the expert’s knowledge to guide the detection of useful regions in a fingerprint for pattern matching. We will use the data collected from C.X. Each circular area examined by the expert is filtered by a bank of Gabor filters. Specifically, the Gabor filters with three scales and five orientations are applied to the segmented image. It is assumed that the local texture regions are spatially homogeneous, and the mean and the standard deviation of the magnitude of the transform coefficients are used to represent an object in a 48-dimensional feature vector. We reduced the high-dimensional feature vectors into the vectors of dimensionality 10 by principle component analysis (PCA), which represents the data in a lower dimensional subspace by pruning away those dimensions with the least variance. We also randomly sample other areas that the expert doesn’t pay attention to and code these areas with a Non-ROI label which is paired with feature vectors extracted from these areas. In total, the training data consists of two groups of labeled features – ROI and Non-ROI. Next, we will build a binary classifier based on Support vector machines (SVMs). SVMs have been successfully applied to many classification tasks (Vapnik 1995; Burges 1998). A SVM trains a linear separating plane for classifying data, through maximizes the margins of two parallel planes near the separating one. The central idea is to nonlinearly map the input vector into a high-dimensional feature space and then construct an optimal hyperplane for separating the features. This decision hyperplane depends on only a subset of the training data called support vectors. Page 21 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen For a set of n-dimensional training examples, {xi }im1 labeled by expert’s visual attention { yi }im1 , and a mapping of data into q-dimensional vectors ( X ) { ( xi )}im1 by kernel function where q n , a SVM can be built on the set of mapping training data based on the solution of the following optimization problem: Minimizing over ( w, b, 1 ,..., m ) the cost function: m 1 T w w C i 2 i 1 Subject to: im1 : y ( wT ( xi ) b) 1 i and i 0 for all i Where C is a user-specified constant for controlling the penalty to the violation terms denoted by each i . The is called slack variables that measure the deviation of a data point from the ideal condition of pattern separability. After training, w and b constitute of the classifier: y sign (wT ( x) b) Compared with other approaches used in fingerprint recognition, such as neural networks and k-nearest neighbors, SVMs have demonstrated more effective in many classification tasks. In addition, we first transform original features into a lower-dimensional space based on PCA. The purpose of this first step is to deal with the curse of dimensionality. We then map the data points into another higher-dimensional space so that they are linearly separable. By doing so, we convert the original pattern recognition problem into a simpler one. This idea is quite in line with kernel-based nonlinear PCA (Scholkopf, Smola & Muller 1998) that have been successfully used in several fields (e.g. Wu, Su & Carpuat 2004). Given a new testing fingerprint, we will shift a 40x40 window over the image and classify all the patches at each location and scale. The system will first extract Gabor-based features from local patches which will be the input to the detector. The detector will label all the regions as either ROI or Non-ROI. We expect that most ROIs are minutiae. Different from the methods based on minutiae matching, we also expect that only a small of minutiae are utilized by human experts. Moreover, we expect the system to detect some areas that are not defined as minutiae but human experts also pay attention to during the matching task. Thus, the ROI detector we develop will go beyond the standard approach in fingerprint recognition (minutiae extraction and Page 22 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen matching). By efficiently encoding the knowledge of human expert, the proposed system will have opportunities to discover the statistical regularities in fingerprints that have been overlooked in previous studies. C.6. Using expert-identified correspondences to extract environmental models In our moving window paradigm, a human expert moves the window back and forth between inked and latent fingerprints to perform pattern matching. We propose that the dynamic behaviors of the expert provide additional signals indicating one-to-one correspondences between two images. In light of this, our hypothesis is that an expert’s decision is based on the comparison of these one-to-one patches. Therefore, we propose that these expert-identified correspondences can serve as additional information to find the regularities in fingerprint and build the automatic detection system. We propose to use this knowledge as a prior for the training data. We observe that not all the focused regions in the latent print have the corresponding regions in the inked print. Thus, it is more likely that those one-to-one pairs play a more important role in pattern matching than other regions of interest. Based on this observation, we propose to maintain a set of weights over the training data. More specifically, for each ROI in the latent image, we find the most likely pairing patch in the inked image. Two constraints guide the searching of the matching pair. The temporal constraint is based on the expert’s behaviors. For instance, the patch in the inked pair that the expert immediately examine (right after looking at the ROI in the latent image) is more likely to associate with that ROI in the latent pair. The spatial constraint is to find the highest similarity of the patch in the latent image and any other patch in the inked image. In this way, each ROI in the latent image can be assigned with a weight indicating the probability to map this region to a region in the other image. With a set of weighted training data, we will apply a SVM-based algorithm (briefly described in C.5) which will focus on the paired samples (with high weights) in the training data. More specifically, we replace the constant C in the standard SVM with a set of variables ci , each of which corresponds to the weight of a data point. Accordingly, the new Page 23 Adding human expertise to the quantitative analysis of fingerprints objective function is Busey and Chen m 1 T w w ci i . Thus, the matching 2 i 1 regions receive more penalties if they are nonseparable points while other regions receive less attention because it is more likely that they are irrelevant to the expert’s decision. Thus, the parameters of the SVM are tuned up to favorite the regions that human experts are especially interested in. By encoding this knowledge in a machine learning algorithm, we expect that this method will lead to a better performance by closely imitating the expert’s Figure X. The overview of automatic detection of regions of interest. The red regions in the fingerprints indicate whether human expert focus on in pattern matching task. decision. C.7. Dependencies between global and local information: The role of gist information Fingerprints are categorized into several classes, such as whorl, right loop, left loop, arch, and tented arch in the Henry classification system (Henry 1900). In the literatures, researchers use only 4-7 classes in an automatic classification system. This is because the task of determining a fingerprint class can be difficult. For example, it is hard to find robust features from raw images that can aid classification as well as exhibit low variations within each class. In C.5 and C.6, we discuss how to use expert knowledge to find useful features for pattern matching. By taking a bigger picture of feature detection and fingerprint classification in this section, we find that we need to deal with a chicken-and-egg problem: (1) useful local features can predict fingerprint classes; and (2) a specific fingerprint class can predict what kinds of local regions likely occur in this type of fingerprint. In contrast, standard alone feature detection algorithms (e.g. in C.5 and C.6) usually look at local pieces of the image in isolation when deciding whether the patch is a region of interest. In machine learning, Murphy, Torralba and Freeman (2003) proposed a conditional random filed for jointly solving the tasks of object detection and scene classification. In light of this, we propose to use the whole image context as an extra source of global information to guide the searching of ROIs. In addition, a better set of Page 24 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen ROIs will also potentially make the classification of the whole fingerprint more accurate. Thus, the chicken-and-eggs problem is tackled by a bootstrapping procedure in which local and global pattern recognition systems interact with and boost each other. We propose a machine learning system based on graphical models (Jordan 1999) as shown in Figure XX. We define the gist of image as a feature vector extracted from the whole image by treating it as a single patch. The gist is denoted by vG . Then we introduce a latent variable T describing the type of fingerprint. The central idea in our graphical model is that ROI presence is conditionally independent given the type and the type is determined by the gist of image. Thus, our approach encodes the contextual information on a per image basis instead of extracting detailed correlations between different kinds of ROIs (e.g. a fix prior such as the patch A always occurs to the left of the patch B) because of the complexity and variations of detailed descriptions. Next we need to classify fingerprint types. We will simply train a one-vs-all binary SVM classifier for recognizing each fingerprint type based on the gist. We will then normalize p(T t 1 | vG ) the results: p(T t | vG ) where p(T t 1 | v G ) is the output of tth one-vs-all t' t ' p(T 1 | vG ) classifier. As far as the fingerprint type is known, we can use this information to facilitate ROI detection. As shown on the tree-structured graphical model in Figure XX, the following conditional joint density can be expressed as follows: 1 1 p(T , R1 ,..., RN | v) p(T | vG ) p( Ri | T , vi ) p(T | vG ) p(Tt ) p( Ri | Tt , vi ) z z t i i Where vG and vi are local and global features respectively. Ri is the class of a local patch. In the proposed research, we will investigate two types of R. One classification defines ROI and NonROI types which is the same with C.5 and C.6. The other classification defines several minutia types (plus Non-ROI) such as termination minutia and bifurcation minutia. z is a normalizing constant. Based on this graphical model, we will be able to use contextual knowledge to facilitate the classification of a local image. We also plan to develop a more advanced model which will use local information to facilitate the fingerprint type classification. We expect that this kind of Page 25 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen approach will lead to a more effective automatic system that can perform both top-down inference (fingerprint types to minutia types) and bottom-up inference (minutia types to fingerprint types). C.8. Summary of quantitative approaches (Tom writes) General themes: Incorporate expert knowledge Links between global and local structure made possible by input from experts Specification of elemental basis or feature set Classifying informativeness of regions Defining an intermediate level between low-level feature extractors and high-level gist or configural information D. Implications for knowledge and practice The implications of the knowledge gained by the results of these studies and analyses falls into four broad categories, each of which are discussed below. D.1. Implications for quantitative understanding of the information content of fingerprints D.2. Implications for an understand of the links between quantitative information content and the latent print examination process D.3. Implications for the classification and filtering of poor-quality latent prints Page 26 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen D.4. Implications for the development of software-based tools to assist human-based latent print examinations and training E. Mangement plan and organization F. Dissemination plan for project deliverables scientific articles, presentations at machine learning conferences and fingerprint conferences, proof-of-concept Java-based applets. (end of 30 pages) Page 27 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen G. Description of estimated costs Personnel The project will be co-directed by Thomas Busey and Chen Yu. We request 11 weeks of summer support, during which time both will devote 100% of their efforts to the project. Benefits are calculated at 19.81%. The salaries are incremented 3% per year. Many of the simulations will be conducted by a graduate student, who will be hired specifically for the purposes of this project. This student, likely an advanced computer science student with a background in cognitive science, requires a stipend, a fee remission and health insurance. The health insurance is incremented at 5% per year. Subject coordination and database management will be coordinated by hourly students who will work 20 hours/wk on the project. We will pay them $10/hr. Consultant John Vanderkolk, with whom Busey has worked with for the past two years, has agreed to serve as an unpaid consultant on this grant. He does require modest travel costs when he visits Bloomington. Travel Money is requested to bring in four experts for testing using the eyemovement recording equipment. These costs will total approximately $1500/yr. Money is requested for three conferences a year. These will enable the investigators to travel to conferences such as Neural Information Processing (NIPS) and forensic science conferences such as the International Association for Identification (IAI) to interact with colleagues and share the results of our analyses. These trips serve an important role in communicating the efforts of this grant to a wider audience. Other Costs Equipment This research is very computer-intensive, and thus we require a large UNIX-based server to run simulations in parallel. In addition, we require three pc-based workstations to run Matlab and Page 28 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen other simulations programs. Finally, conferences such as IAI and local Society for Identification meetings provide an ideal place to gather data from experts, and thus we require a portable computer for such onsite data-gathering purposes. We anticipate that up to half of our data can be collected using these on-site techniques, and this technique is preferable because we have control over the monitor and software. Thus the laptop computer represents a good investiment in the success of the project. Other costs The graduate student line requires a fee remission each year. The fee remission is incremented at 5% per year. The results of our studies require resources to reach a wide audience, and thus we require dissemination costs to cover the costs of publication and web-based dissemination. This project is highly image-intensive, and we require money to purchase image-processing software and upgrades. These include software packages such as Adobe Photoshop, as well as new image processing packages as they become available. We will test 80 subjects a year to obtain the necessary data for use in our statistical applications. Each subject requires $20 for the approximate 90 minute testing period. The project will consume supplies of approximately $100/month, for items such as backups, power supplies, etc. Indirect Costs The indirect rate negotiated between Indiana University and the federal government is set at 51.5%. This rate is assessed against all costs except the fee remission. This was negotiated with DHHS on 5.14.04 G. Staffing plan and Resources Both Busey and Chen maintain laboratories in the Department of Psychololgy at Indiana University that each contain approximately 700 sq. feet of space. These have subject running rooms, offices and spaces for servers. Chen's lab contains an eyemovement recording setup that Page 29 Adding human expertise to the quantitative analysis of fingerprints Busey and Chen is sufficent for the eyemovement porition of the experiments. Both investigators have offices in the Psychology department as well. We will recruit a graduate student from the Computer Science or Psychology programs at Indiana University. This student must have experience with machine learning algorythms at a theoretical level, and also be an expert programmer. They will work 20 hrs/wk. We will also recruit two hourly undergraduate students to coordinate the subject running, data analysis and server maintainance. They will also be responsible for managing the data repository site where our data will be accessible by other reserachers who wish to integrate human expert knowledge into their networks. The bulk of the theoretical work will be handled by Chen and Busey, while the graduate student will work in impliemnation and model testing. H. Timeline This is a multi-year project that is designed to alternate between acquiring human data and using it to refine the quantitative analyses of latent and inked prints. Year 1: Acquire necessary fingerprint databases. Begin testing 80 experts on 72 different latent/inked print pairs. Program Support Vector and Global Local models. Test 2 experts on the eyemovement equipment using all 72 prints. Year 2: Test an additional 80 experts on 72 new latent/inked prints. Begin model fitting and refinement. Test 2 experts on the eyemovement equipment using all 72 prints. Compare results from eyemovement studies and moving window studies. Year 3: Test the final 80 experts on 72 new latent/inked prints. Develop new versions of statistical models based on prior results. Put entire database online for use by other researchers. Disseminate results to peer-reviewed journals. Page 30