Machine learning algorithms take a number of

advertisement
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
PROGRAM NARRATIVE
A. Research Question
Machine learning algorithms take a number of approaches to the quantitative analysis of
fingerprints. These include identifying and matching minutiae (refs), matching patterns of local
orientation based on dynamic masks (refs), and neural network approaches that attempt to learn
the structure of fingerprints (refs). While these techniques provide good results in biometric
applications and serve a screening role in forensic cases, they are less useful when applied to
severely degraded fingerprints, which must be matched by human experts. Indeed, statistical
approaches and human experts have different strengths. Despite the enormous computational
power available today for use by computer analysis systems, the human visual system remains
unequaled in its flexibility and pattern recognition abilities. Three possible reasons for this
success come from the experts knowledge of where the most important regions are located on a
particular set of prints, the ability to tune their visual systems to specific features, and the
integration of information across different features. In the present project, we propose to
integrate the knowledge of experts into the quantitative analysis of fingerprints to a degree not
achieved by other approaches. There is much that fingerprint examiners can add to machine
learning algorithms and, as we describe below, many ways in which statistical learning
algorithms can assist human experts. Thus the central research question of this proposal is: How
can the integration of information derived from experts improve the quantitative analysis of
fingerprints?
B. Research goals and objectives
The goal of the present proposal is to integrate data from human experts with statistical
learning algorithms to improve the quantitative analysis of inked and latent prints. We introduce
a novel procedure developed by one investigator (Tom Busey) and use it to guide the input to
statistical learning algorithms developed and extended by our other investigator (Chen Yu). The
fundamental idea behind our approach is that the quantitative evaluation of the information
Page 1
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
contained in latent and inked prints can be vastly improved by using elements of human
expertise to assist the statistical modeling, as well as to introduce a new dimension of time that is
not contained in the static latent print analysis. The main benefit, as we discuss in sections C.x.x,
is that the format of the data extracted from experts allows the application of novel quantitative
models that are adapted from related areas. To apply this knowledge derived from experts, we
will use our backgrounds in vision, perception, machine learning and behavioral testing to design
experiments that extract relevant information from experts and use this to improve the
quantitative analysis techniques applied to fingerprints by integrating the two sources of
information.
Our research interests differ somewhat from the existing approaches and reflects the
adaptations that are necessary to incorporate human expert knowledge. Existing statistical
algorithms developed to match fingerprints rely on several different classes of algorithms, Some
extract minutiae and other robust sources of information such as the number of ridges between
minutiae (refs). Others rely on the computation of local curvature of the ridges, and then partition
these into different classes (MASK refs). Virtually all approaches make reasoned and reasonable
guesses as to what the important sources of information might be, such as minutiae, local ridge
orientation or local ridge width (dgs paper). The present approach takes a more agnostic
approach to what might be the important sources of information in fingerprints, and we will
develop statistical models that take advantage of the data derived from experts. However, a
major goal of the grant is to demonstrate how expert knowledge can be applied to any extant
model, and to suggest how this might be accomplished. Thus we will spend substantial time
documenting our application of expert knowledge for our statistical models. In addition, we will
make all of our expert data available for other researchers and practitioners. It is likely that the
data will have implications for training, although this is not the focus of the present proposal.
C. Research design and methods
At the heart of our approach is idea that human expertise, properly represented, can improve
the quantitative analyses of fingerprints. In a later section we describe how we apply human
Page 2
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
expert knowledge to various statistical analyses, but first we need to answer the question of
whether human experts can add something to the quantitative analyses of prints.
The answer to this question can be broken down into two parts. First, do human visual
systems in general possesses attributes not captured by current statistical approaches, and second,
do human experts have additional capacities not shared by novices, capacities that could further
inform statistical approaches. Below we briefly summarize what the visual science literature tells
us about how humans recognize patterns, and then describe our own work that has addressed the
differences between experts and novices. As we will show, human experts have much to add to
quantitative approaches.
We should stress that while we will gather data from human experts to improve our
quantitative analyses of fingerprints, the goal of this grant is not to study human experts in order
to determine whether or how they differ from novices, nor are we interested in questions about
the reliability or accuracy of human experts. Instead, we will generalize our previous results that
demonstrate strong differences in the visual processing of fingerprints in experts, and apply this
expertise to our own statistical analyses. As a result, we will only gather data from human
experts (latent print examiners with at least 5 years of post-apprentice work in the field) under
the assumption that this will provide maximum improvement to our statistical methods. We can
demonstrate the effectiveness of this knowledge by simply re-running the statistical analyses
without the benefit of knowledge from experts. There are various metrics attached to each
analysis technique that demonstrate the superiority of expert-enhanced analyeses, such as the
correct recognition/false recognoition tradeoff graphs, or the dimensionality
reduction/reconstruction successes of data reduction techniques.
We will also apply novel approaches adapted from the related domain of language analyses.
It might seem odd to apply techniques developed for linguistic analyses to a visual domain such
as pattern recognition, but the principles that underlie both domains are very similar. Both
involve large numbers of features that have complex statistical relations. In the case of language,
the features are often words, phonemes or other acoustical signals. Fingerprints are defined by a
Page 3
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
complex but very regular dictionary of features that also share a complex and meaningful
correlational structure. One of us (Chen) is a highly-published expert in the field machine
learning algorithms as applied to multimodal data, and several papers inlcuded as appendicies
detail this expertisze. His work on multimodal applications between visual and auditory domains
make him well-suited to address the relation between human data and machnie leanring
algorythms. Both linguistic and visual informaiton contain highly-structured data that consist of
regularities that are extracted by perceivers, and this is not unlike the temporal sequence that
experts go through when they perform a latent print examination, as we describe in a later
section. First, however, we address how we might document the principles of human expertise.
Can we use elements of the human visual system to improve our statistical analyses?
The answer to this question is straightforward, in part because of the overwhelming evidence
that human-based recognition systems contain processes that are not captured by current
statistical approaches. One of us (Busey) has published many articles addressing different
aspects of human sensation, perception and cognition, and thus is well-suited to manage the
acquisition and application of human expertise to statistical approaches. Below we briefly
summarize the properties of the human visual system and in a later section we describe how we
plan to extract fundamental principles from this design in order to improve our statistical
analyses of fingerprints.
An analyses of the human visual system by vision scientists demonstrates that the recognition
process proceeds via an hierarchical series of stages, each with important non-linearities (nature
ref), that produce areas that respond to objects of greater and greater complexity. This process
also provides increasing spatial independence, allowing brain areas to integrate over larger and
larger regions. This will become important for holistic or configural processing, as discussed in a
later section. (also talk about feature-based attention)
A second benefit of this hierarchical approach is that objects achieve limited scale and
contrast invariance. Statistical approaches often deal with this through local contrast or
Page 4
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
brightness normalization, but this is a separate process. Scale invariance is often achieved by
explicitly measuring the width of ridges (grayscale ref), again a separate process.
A third strength of the human visual system is that it appears to have the ability to form new
feature templates through an analyses of the statistical information contained in the fingerprints.
This process, called unitization, will tend to improve feature detection in noisy environments as
is often found with latent prints.
Do forensic scientists have visual capabilities not shared by novices?
The prior summary of the elements of the human visual system suggests that current
statistical approaches can be improved by adapting some of the principles underlying the human
visual system. There are, however, other processes that are specifically developed by latent print
examiners that may also be profitably applied to statistical models. Below we summarize the
results of two empirical studies that have recently been published in the highly respected journal
Vision Research (Busey & Vanderkolk, 2005). The results demonstrate not only that experts are
better than novices, but suggest the nature of the processes that produce this superior
performance.
Visual expertise takes many forms. It could be different for different parts of the
identification process, and may not even be verbalizable by the expert since many elements of
perceptual expertise remain cognitively impenetrable (refs). A major focus of our research is to
capture elements of this expertise and use this as a training signal for our statistical learning
algorithms. What is novel to our approach is our ability to capture the expertise at a very deep
and rich level. In the next section we describe our prior work documenting the nature of the
processes that enable experts to perform at levels much superior to novices, and then in Section
C.2 we describe how we capture this expertise in a way that we can use it to improve our
statistical learning algorithms.
Page 5
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
C.1. Documenting expertise in human latent print examiners
Initially, experts tend to focus on the entire print, which leads to benefits that we have
previously identified as configural processing (Busey & Vanderkolk, 2005). Configural
processing takes several forms, but the basic idea behind this process is that instead of focusing
on individual features or minutiae, the observer instead integrates information over a large
region, to identify important relations such as relative locations of features or curvature of ridge
flow. Fingerprint examiners often talk about 'viewing the image in its totality', which is different
language for the same process.
While configural processing reveals the overall structure of an image and selects important
regions for further inspection, the real work comes in comparing small regions in one print to
regions in the other. These regions may be selected on the basis of minutiae identified in the
print, or high-quality Level 3 detail. We know from related work on perceptual learning in the
visual system that one of the processes by which expertise develops is through the development
of new feature detectors. Experts spend a great deal of time viewing prints, and this has the
potential to result in profound changes in how their visual systems process fingerprints. (config
processing refs)
One process by which experts could improve how they extract latent print information from
noisy prints is termed unitization, in which novel feature detector are created through experience
(unitization refs). Fingerprints contain remarkable regularities and the human visual system
C.1.a. Do experts have information valuable to training networks or documenting the
quantitative nature of fingerprints?
Fingerprint examiners have received almost no attention in the perceptual learning or
expertise literatures, and thus the PI began a series of studies in consultation with John
Vanderkolk, of the Indiana State Police Forensic Sciences Laboratory in Fort Wayne, Indiana.
Our first study addressed the nature of the expertise effects in a behavioral experiment, and then
we followed up evidence for configural processing with an electrophysiological study. The
Page 6
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
discussion below describes the
experiments in some detail, in part
Study Image
1 Second
because extensions of this work are
proposed in Section D, and a complete
description here illustrates the technical
rigor and converging methods of our
Mask
200 or 5200
Milliseconds
approach.
C.1.b. Behavioral evidence for
configural processing
In our first experiment, we abstracted
Test Images
Until Response
what we felt were the essential elements
of the fingerprint examination process
into an X-AB task that could be
accomplished in relatively short order.
Figure 1. Sequence of events in a behavioral experiment
with fingerprint experts and novices. Note that the study
image has a different orientation and is slightly brighter to
reduce reliance on low-level cues.
This work is described in Busey and
Vanderkolk (2005), but we briefly describe the methods here since they illustrate how our
approach seeks to find a paradigms that is less time-consuming than fully realistic forensic
examinations (which can take hours to days to complete) yet still maintains enough ecological
validity to tap the expertise of the examiners. Figure 1 shows the stimuli used in the experiment
as well as a timeline of one trial. We cropped out fingerprint fragments from inked prints,
grouped them into pairs, and briefly presented one of the two for 1 second. This was followed by
a mask for either 200 or 5200 ms, and then the expert or novice subject made a forced-choice
response indicating which of the two test prints they believe was shown at study. We introduced
orientation and brightness jitter at study, and the construction of the pairs was done to reduce the
reliance on idiosyncratic features such as lint or blotches.
At test, we introduced two manipulations that we thought captured aspects of latent prints, as
shown in Figure 2. First, latent prints are often embedded in visual noise from the texture of the
Page 7
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
surface, dust, and other sources. One
expert, in describing how he approached
latent prints, stated that his job was to
Clear Fragments
Partially-Masked Fragments
Fragments
Presented in Noise
Partially-Masked Fragments
Presented in Noise
'see through the noise.' To simulate at
least elements of this noise, we
embedded half of our test prints in white
visual noise. While this may have a
spatial distribution that differs from the
Figure 2. Four types of test trials.
noise typically encountered by experts,
we hoped that it would tap whatever facilities experts may have developed to deal with noise.
The second manipulation was motivated by the observation that latent prints are rarely
complete copies of their inked counterparts. They often appear patchy if made on an irregular
surface, and sections may be partially masked out. To simulate this, we created partially-masked
fingerprint fragments as shown in the upper-right panel of Figure 2. Note that the partiallymasked print and its complement each contain exactly half of the information of the full print
and the full print can be recovered by
summing the two partial prints pixel-bypixel. We use this property to test for
configural effects as described in a later
section.
All three manipulations (delay
between study and test, added noise and
partial masking) were fully crossed to
create 8 conditions. The data is shown in
Figure 3, which show main effects for all
three factors for novices. Somewhat
surprising is the finding that while
Figure 3. Behavioral Experiment Data. Error bars represent
one standard error of the mean (SEM).
Page
8
Figure 2. Sequence of events in a behavioral experiment
with fingerprint experts. Note that the study image has a
different orientation and is slightly brighter to reduce
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
experts show effects of added noise and partial masking, they show no effect of delay, which
suggests that they are able to re-code their visual information into a more durable store resistant
to decay, or have better visual memories. Experts also show an interaction between added noise
and partial masking, but novices do not. This interaction seen with the experts may result from
very strong performance for full images embedded in noise, and may result from configural
processes. To test this in a scale-invariant manner, we developed a multinomial model which
makes a prediction for full-image performance given partial-image performance using principles
similar to probability summation. The complete results are found in Busey & Vanderkolk (2005),
but to summarize, when partial image performance is around 65%, the model predicts full image
performance to be about 75%, and it is almost at 90%, significantly above the probability
summation prediction. Thus it appears that when both halves of an image are present (as in the
full image) experts are much more efficient at extracting information from each half.
The results of this experiment lay the groundwork for a more complete investigation of
perceptual expertise in fingerprint examiners. From this work we have evidence that:
1) Experts perform much better than novices overall, despite the fact that the testing
conditions were time-limited and somewhat different than those found in a traditional latent print
examination.
2) Experts appear immune to longer delays between study and test images, suggesting better
information re-coding strategies and/or better visual memories
3) Experts may have adopted configural processing abilities over the course of their training
and practice. All observers have similar facilities for faces as a consequence the ecological
importance of faces and our quotidian exposure as a result of social interactions. Experts may
have extended this ability to the domain of fingerprints, since configural processing is seen as
one mechanism underlying expertise (e.g. Gauthier & Tarr, 1997).
C.1.c. Electrophysiological evidence for configural processing
To provide converging evidence that fingerprint experts process full fingerprints
configurally, we turned to an electrophysiological paradigm based on work from the face
Page 9
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
recognition literature. This experiment is described more fully in Busey and Vanderkolk (2005),
which is included as an appendix. However, these results support the prior conclusions described
above, and demonstrate that the configural processing observed with fingerprint examiners is a
result of profound and qualitative changes that occur in the very earliest stages of their
perceptual processing of fingerprints.
C.2. Elements of human expertise that could improve quantitative analyses
The two studies described above are important because they illustrate that configural
information is one process that could be adapted for use in the quantitative analyses of
fingerprints. Existing quantitative models of fingerprints incorporate some elements of the
expertise seen above, but many elements could be added that would improve the recognition
accuracy of existing programs. The two major approaches to fingerprint matching rely on local
features such as minutiae detection (refs), and more global approaches such as dynamic masks
applied to orientation computed at many locations on a grid overlaying the print (refs). Of these
two approaches, the dynamic mask approach comes closer to the idea of configural processing,
although it does not compute minutiae directly. strengthen this intro
Neither approach takes advantage of the temporal information that expresses elements of
expertise in the human matching process. Quantitative information such as fingerprint data, when
represented in pixel form, has a highly-dimensional structure. The two techniques described
above reduce this dimensionality by either extracting salient points such as minutiae, or
computing orientation only at discrete locations. Both of these approaches throw out a great deal
of information that could otherwise be used to train a statistical model on the elemental features
that allow for matches. Part of the reason this is necessary is that the high-dimensional space is
difficult to work in: all prints are more or less equally similar without this dimensionality
reduction, and by reducing the dimensionality computations such as similarity become tractable.
The key, then, is to reduce the dimensionality while preserving the essential features that allow
for discrimination among prints. One technique that has been explored in language acquisition is
the concept of "starting small" (Elman ref). In this procedure, machine learning approaches such
Page 10
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
as neural network analyses are given very coarse information at first, which helps the network
find an appropriate starting point. Gradually, more and more detail information is added, which
allows the network to make finer and finer discriminations.
We discuss these ideas more fully in section X.Xx, but we mention it here to motivate the
empirical methods described next. Experts likely select which information they choose to
initially examine based on the need to organize their search processes. Thus they likely acquire
information that may not immediately indicate to a definitive conclusion of confirmation or
rejection, but guides the later acquisition process. In the scene perception literature, this process
is known as 'gist acquisition' (refs), and suggests that the order in which a system (machine or
human) learns information matters. In the section below we describe how we acquire both spatial
and temporal information from experts, and then describe how this knowledge can be
incorporated into quantitative models.
C.3. Capturing the information acquisition process: The moving window paradigm
To identify the nature of the information used by experts, and the order in which it is
gathered, we have begun to use a technique called a moving window procedure. In the sections
below we describe this procedure and how it can be extended to address the role of configural or
gist information in human experts.
C.3.a. The moving window paradigm
The moving window paradigm is a software tool that simulate the relative acuity of the fovea
and peripheral visual systems. As we look around the world, there is a region of high acuity at
the location our eyes are currently pointing. Regions outside the foveal viewing cone are
represented less well. In the moving window paradigm we represent this state by slightly
blurring the image and reducing the contrast.
http://cognitrn.psych.indiana.edu/busey/FingerprintExample/
Figure 4 shows several frames of the moving window program, captured at different points in
time. The two images have been degraded by a blurring operation that somewhat mimics the
Page 11
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
Figure 4. The moving window paradigm allows the user to move the circle of interest around to different
locations on the two prints. This circle provides high-quality information, and allows the expert the
opportunity to demonstrate, in a procedure that is very similar to an actual latent print examination, which
sections of the prints they believe are most informative. This procedure also records the order in which
different sites are visited.
reduced representation of peripheral vision. The exception is a clear circle that responds in real
time to the movement of the mouse. This dynamic display forces the user to move the clear
window to regions of the display that warrant special interest. The blurred portions provide some
context for where to move the window. By recording the position of the mouse each time it is
moved, we can reconstruct a complete record of the manner in which the user examined the
prints. This method has some drawbacks in that the eyes move faster than the mouse. However,
Page 12
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
we find that with practice the experts report very little limitations with this procedure and it has
the benefit of precise spatial localization. A major benefit of this procedure is that it can be done
over the web, reaching dozens of experts and producing a massive dataset. Many related
information theoretic approaches such as latent semantic analysis find that a large corpus of data
is necessary in order to reveal the underlying structure of the representation of information, and a
web-based approach provides sufficient data.
The data produced by this paradigm is vast: x/y coordinates for the clear window at each
millisecond. We have begun to analyze this data using several different techniques. The first
analysis we designed creates a mask that is black for regions the observer never visited and clear
for areas visited most often. Figure 5 shows an example of this kind of analysis. Areas visited
less often are somewhat darkened. The left panels of Figure 5 show two masked images, which
shows not only where the experts visited, but how long they spent inspecting each location. Thus
it represents a window into the regions the experts believed informative.
The right panels give a slightly different view, where unvisited areas are represented in red.
This illustrates that experts actually spend most of their time in relatively small regions of the
prints.
As a first pass, the images in Figure 5 reveal where the experts believe the task-relevant
information resides. However, lost in such a representation is the order in which these sites were
visited. In addition, this information is very specific to a particular set of print. Ultimately we
will produce more general representation that characterizes both the fundamental set of features
(often described as the basis set) that experts rely on, as well as how they process these features.
We have begun to explore an information-theoretic approach to this problem that seeks to find a
set of visual features that is common to a number of experts and fingerprint pairs. This approach
is related to many of the dimensionality reduction techniques that have been applied to natural
images (e.g. Olshausen & Field, 1996). Later project extend this approach to incorporate
elements of configural processing or context-specific models. In the present proposal we discuss
several different ways we plan to analyze what is a very rich dataset.
Page 13
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
Figure 5. Examples of masked imaged revealing where experts choose to acquire information in order to
make an identification. The black versions show only regions where the expert spent any time, and the mask
is clearer for regions in which the expert spent more time. The right-hand images show teh same information,
but allow some of the uninspected information to show through. These images reveal that experts pay
relatively little attention to much of the image and only focus on regions they deem releveant for the
identification. We suggest that this element of expertise, learning to attend to relevant locations, is something
that coudl benefit quantitative analyes of fingerprints.
Our experts report relatively little hindrance when using the mouse to move the window. The
latent and inked prints have their own window (only one is visible at any one time) and users
press a key to flip back and forth between the two prints. This flip is actually faster than an
eyemovement and automatically serves as a landmark pointer for each print, making this
procedure almost as easy to use as free viewing of the two prints (which are often done under a
loupe with its own movement complexities). In addition, we also give users brief views of the
entire image to allow configural processes to work to establish the basic layout.
C.3.b. Measuring the role of configural processing in latent print examinations
behavioral experiment- blurred vs. very low contrast- qualitative changes across experts?
complete this section
C.3.c. Verification with eyemovement recording
complete this section
Page 14
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
C.4. Extracting the fundamental features used when matching prints
Because latent and inked prints are rarely direct copies of each other, an expert must extract
invariants from each image that survive the degradations due to noise, smearing, and other
transformations. Once these invariants are extracted, the possibility of a match can be assessed.
This is similar in principle to the type of categorical perception observed in speech recognition,
in which the invariants of parts of speech are extracted from the voices of different talkers. This
suggests that there exists a set of fundamental building blocks, or basis functions, that experts
use to represent and even clean up degraded prints. The nature and existence of these features are
quite relevant for visual expertise, since in some sense these are the direct outcomes of any
perceptual system that tunes itself to the visual diet it experiences.
We propose to perform data reduction techniques on the output of the moving window
paradigm. These techniques have successfully been applied to derive the statistics of natural
images (Hyvarinen & Hoyer, 2000). The results provided individual features that are localized in
space and resemble the response profiles of simple cells in primary visual cortex. Many of these
studies are performed on random sampling of images and visual sequences, but the moving
window application provides an opportunity to use these techniques to recover the dimensions of
only the inspected regions, and to compare the recovered dimensions from experts and
representations based on random window locations.
The specifics of this technique are straightforward. For each position of the moving window,
extract out (say) a 12 x 12 patch of pixels. This is repeated at each location that was inspected by
the subject, with each patch weighted by the amount of time spent at each location. The moving
window experiment tens of thousands of patches of pixels, which are submitted to a data
reduction technique (independent component analysis, or ICA), which is similar to principle
components analysis, with the exception that the components are independent, not just
uncorrelated. The linear decomposition generated by ICA has the property of sparseness, which
has been shown to be important for representational systems (Field, 1994; Olshausen & Field,
1996) and implies that a random variable (the basis function) is active only very rarely. In
Page 15
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
Figure 6. ICA components from expert data.
practice, this sparse representation creates basis functions that are more localized in space than
those captured by PCA and are more representative of the receptive fields found in the early
areas of the visual system.
Huge copra of samples are required to extract invariants from noisy images, and at present
we have only pilot data from several experts. However, the results of this preliminary analysis
can be found in Figure 6. This figure shows features discovered using the ICA algorithm (Hurri
& Hyvarinen, 2003; Hyvarinen, Hoyer & Hurri, 2003). Each image represents a basis function
that when linearly combined will reproduce the windows examined by experts. Inspection of
Figure 6 reveals that features such as ridge endings, y-brachings and islands are beginging to
become represented. This analysis takes on greater value when applied to the entire database we
will gather, since it will combine across individual features to derive the invariant stimulus
features that provide the basis for fingerprint examinations done by human experts.
The ICA analysis is very sensitive to spatial location, and while cells in V1 are likely also
highly position sensitive, the measured basis functions are properties of the entire visual stream,
not just the early stages. More recent advances in ICA techniques have addressed this issue in a
similar way that the visual system has chosen to solve the problem. In addition performing data
reduction techniques to extract the fundamental basis sets, these extended ICA algorithms group
the recovered components based on their energy (squared outputs). This grouping has shown to
Page 16
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
Figure 7. ICA components from expert data, and grouped by energy. This analyses allows the basis
functions to have partial spatial independence, at a slight cost to image quailty. This latter issue is less
relevant for larger corpi when many similar features are combined by individual basis function groups.
produce classes of basis functions that are position invariant by virtue of the fact that they
include many different positions for each fundamental feature type. The examples shown in
Figure 7 were generated by this technique, which reduces the reliance on spatial location. This
groups the recovered features by class and accounts for the fact that rectangles have similar
properties to nearby rectangles. Note that the features in Figure 14 are less localized than those
typically found with ICA decompositions, which may be due to the large correlational structure
inherent in fingerprints, although this remains an open question addressed by this proposal.
The development of ICA approaches is an ongoing field, and we anticipate that the results of
the proposed research will help extend these models as we develop our own extensions based on
the applications to fingerprint experts. There are several ways in which the recovered
components can be used to evaluate the choice of positions by experts (which ultimately
determine, along with the image, the basis functions). First, one can visually inspect the sets of
basis functions recovered from datasets produced by experts, and compare this with one
generated from random window locations.
A second technique can be used to demonstrate that experts do indeed posses a feature set
that differs from a random set. The data from random windows and experts can be combined to
produce a common set of components (basis functions). ICA is a linear technique, and thus the
original data for both experts and random windows can be recovered through weighted sums of
Page 17
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
the components, with some error if only some of the components are saved. If experts share a
common set of features that is estimated by ICA, then their data should be recovered with less
error than that of the random windows. This would demonstrate that an important component of
expertise is the ability to take a highly dimensional dataset (as produced by noisy images) and
reduce it down to fundamental features. From this perspective, visual expertise is data reduction.
These kinds of data reduction techniques serve a separate purpose. Many of the experiments
described in other sections of this proposal depend on specifying particular features. While initial
estimates of the relevant features can be made on the basis of discussions with fingerprint
experts, we anticipate that the results of the ICA analysis will help refine our view of what
constitutes an important feature within the context of fingerprint matching.
The moving window procedure has the disadvantage of being a very localized procedure, due
to the nature of the small moving window. There is a fundamental tradeoff between the size of
the window and the spatial acuity of the procedure. If the window is made too large, we know
less about the regions from which the user is attempting to acquire information. To offset this,
we have provided the user the opportunity to view quick flashes of the full image, enough to
provide an overview of the prints, but not enough to allow matches of specific regions. We will
also conduct the studies using large and small windows to see whether the nature of the
recovered components changes with window size.
C.4. Starting Small: Guiding feature extraction with expert knowledge
We need to ask whether this is compelling, and cut it if it is not.
Feature extraction procedures attempt to take a high dimensional space and use the
redundancies in this space to derive a lower-dimensional representation that combine across the
redundancies to provide a basis set. This basis set can be thought of as the fundamental feature
set, and the development of this set can be thought of as one mechanism underlying human
expertise. The difficulty with these highly-dimensional spaces is that algorithms that attempt to
uncover the feature set through iterative procedures like Independent Component Analysis or
neural networks may fall into local minima and fail to converge upon a global solution. One
Page 18
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
solution that has been proposed in the human developmental literature is one of starting small
(Elman, 1993). In this technique, programmers intially restrict the inputs to statistical models to
provide general kinds of information rather than specific information that would lead to learning
of specific instances. As a network matures, more specific information is added, which allows
the network to avoid falling into local minima that represent non-learned states. While the exact
nature of these effects are still being worked out (Rohde & Plaut, 1999), recent work has
provided empirical support in the visual domain (Conway, Ellefson & Christiansen, ref). This
suggests that we might use the temporal component of the data from experts in the moving
window paradigm to help guide the training of our networks.
As an expert views a print, they initially are likely to focus on broad, overall types of
information that give the need to finish if necessary
C.5. Automatic detection of regions of interest using expert knowledge
In both fingerprint classification (e.g. Dass & Jain, 2004; Jain, prabhakar & Hong 1999;
Cappelli, Lumini, Maio & Maltoni, 1999) and fingerprint identification (e.g. Pankanti, Prabhakar
& Jain, 2002; Jain, Prabhakar & Pankanti, 2002) applications, there are two main components for
an automatic system: (1) feature extraction and (2) matching algorithm to compare (or classify)
fingerprints based on feature representation. The feature extraction is the first step to convert
raw images into feature representations. The goal is to find robust and invariant features to deal
with various conditions in real-world applications, such as illumination, orientation and
occlusion. Given a whole image of fingerprint, most fingerprint recognition systems utilize the
location and direction of minutiae as features for pattern matching. In our preliminary study of
human expert behaviors, we observe that human experts focus on just parts of images (regions of
interest – ROIs) as shown in Figure XX, suggesting that it is not necessary for a human expert to
check through all minutiae in a fingerprint. A small subset of minutiae seems to be sufficient for
the human expert to make a judgment. What regions are useful for matching among all the
minutiae in a fingerprint? Is it possible to build an automatic ROI detection system that can
achieve a similar performance as a human expert? We attempt to answer this question by
Page 19
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
Figure X. The overview of automatic detection of regions of interest. The red regions in the fingerprints
indicate whether human expert focus on in pattern matching task.
building a classification system based on the training data captured from human experts. Given a
new image, the detection system is able to automatically detect and label regions of interest for
the matching purpose. We want to note that we expect that most regions selected by our system
will be minutiae but we also expect that the system will potentially discover the structure
regularities from non-minutia regions that are overlooked in previous studies. Different from
previous studies of minutiae detection (e.g. Maio & Maltoni, 1997), our automatic detection
system will not simply detect minutiae in a fingerprint but focus on detecting both a small set of
minutiae and other useful regions for the matching task. Considering the difficulties in
fingerprint recognition, building this automatic detection system is challenging. However, we are
confident that this proposed research will be first steps toward the success and make important
contributions. This confidence lies in two important factors that make our work different from
other studies: (1) we will record detailed behaviors of human experts (e.g. where they look in a
matching task) and recruits the knowledge extracted from human experts to build a pattern
recognition system; and (2) we will apply state-of-art machine learning techniques in this study
to efficiently encode both expert knowledge and regularities in fingerprint data. The combination
of these two factors will lead us to achieve this research plan.
Page 20
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
To build this kind of system, we need to develop a machine learning algorithm and estimate
the parameters based on the training data. Using the moving window paradigm (described in
C.3), we collect the information of where a human expert examines from moment to moment
when he performs a matching task. Hence, the expert’s visual attention and behaviors (moving
the windows) can be utilized as labels of regions of interest – providing the teaching signals for a
machine learning algorithm. In the proposed research, we will build an automatic detection
system that captures the expert’s knowledge to guide the detection of useful regions in a
fingerprint for pattern matching.
We will use the data collected from C.X. Each circular area examined by the expert is filtered
by a bank of Gabor filters. Specifically, the Gabor filters with three scales and five orientations
are applied to the segmented image. It is assumed that the local texture regions are spatially
homogeneous, and the mean and the standard deviation of the magnitude of the transform
coefficients are used to represent an object in a 48-dimensional feature vector. We reduced the
high-dimensional feature vectors into the vectors of dimensionality 10 by principle component
analysis (PCA), which represents the data in a lower dimensional subspace by pruning away
those dimensions with the least variance. We also randomly sample other areas that the expert
doesn’t pay attention to and code these areas with a Non-ROI label which is paired with feature
vectors extracted from these areas. In total, the training data consists of two groups of labeled
features – ROI and Non-ROI.
Next, we will build a binary classifier based on Support vector machines (SVMs). SVMs
have been successfully applied to many classification tasks (Vapnik 1995; Burges 1998). A SVM
trains a linear separating plane for classifying data, through maximizes the margins of two
parallel planes near the separating one. The central idea is to nonlinearly map the input vector
into a high-dimensional feature space and then construct an optimal hyperplane for separating
the features. This decision hyperplane depends on only a subset of the training data called
support vectors.
Page 21
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
For a set of n-dimensional training examples,   {xi }im1 labeled by expert’s visual attention
{ yi }im1 , and a mapping of data into q-dimensional vectors  ( X )  { ( xi )}im1 by kernel function
where q  n , a SVM can be built on the set of mapping training data based on the solution of
the following optimization problem:
Minimizing over ( w, b, 1 ,...,  m ) the cost function:
m
1 T
w w  C  i
2
i 1
Subject to: im1 : y ( wT  ( xi )  b)  1  i and
i  0 for all i
Where C is a user-specified constant for controlling the penalty to the violation terms denoted by
each  i . The  is called slack variables that measure the deviation of a data point from the ideal
condition of pattern separability. After training, w and b constitute of the classifier:
y  sign (wT  ( x)  b)
Compared with other approaches used in fingerprint recognition, such as neural networks and
k-nearest neighbors, SVMs have demonstrated more effective in many classification tasks. In
addition, we first transform original features into a lower-dimensional space based on PCA. The
purpose of this first step is to deal with the curse of dimensionality. We then map the data points
into another higher-dimensional space so that they are linearly separable. By doing so, we
convert the original pattern recognition problem into a simpler one. This idea is quite in line with
kernel-based nonlinear PCA (Scholkopf, Smola & Muller 1998) that have been successfully used
in several fields (e.g. Wu, Su & Carpuat 2004).
Given a new testing fingerprint, we will shift a 40x40 window over the image and classify all
the patches at each location and scale. The system will first extract Gabor-based features from
local patches which will be the input to the detector. The detector will label all the regions as
either ROI or Non-ROI. We expect that most ROIs are minutiae. Different from the methods
based on minutiae matching, we also expect that only a small of minutiae are utilized by human
experts. Moreover, we expect the system to detect some areas that are not defined as minutiae
but human experts also pay attention to during the matching task. Thus, the ROI detector we
develop will go beyond the standard approach in fingerprint recognition (minutiae extraction and
Page 22
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
matching). By efficiently encoding the knowledge of human expert, the proposed system will
have opportunities to discover the statistical regularities in fingerprints that have been
overlooked in previous studies.
C.6. Using expert-identified correspondences to extract environmental models
In our moving window paradigm, a human expert moves the window back and forth between
inked and latent fingerprints to perform pattern matching. We propose that the dynamic
behaviors of the expert provide additional signals indicating one-to-one correspondences
between two images. In light of this, our hypothesis is that an expert’s decision is based on the
comparison of these one-to-one patches. Therefore, we propose that these expert-identified
correspondences can serve as additional information to find the regularities in fingerprint and
build the automatic detection system.
We propose to use this knowledge as a prior for the training data. We observe that not all the
focused regions in the latent print have the corresponding regions in the inked print. Thus, it is
more likely that those one-to-one pairs play a more important role in pattern matching than other
regions of interest. Based on this observation, we propose to maintain a set of weights over the
training data. More specifically, for each ROI in the latent image, we find the most likely pairing
patch in the inked image. Two constraints guide the searching of the matching pair. The temporal
constraint is based on the expert’s behaviors. For instance, the patch in the inked pair that the
expert immediately examine (right after looking at the ROI in the latent image) is more likely to
associate with that ROI in the latent pair. The spatial constraint is to find the highest similarity of
the patch in the latent image and any other patch in the inked image. In this way, each ROI in the
latent image can be assigned with a weight indicating the probability to map this region to a
region in the other image. With a set of weighted training data, we will apply a SVM-based
algorithm (briefly described in C.5) which will focus on the paired samples (with high weights)
in the training data. More specifically, we replace the constant C in the standard SVM with a set
of variables ci , each of which corresponds to the weight of a data point. Accordingly, the new
Page 23
Adding human expertise to the quantitative analysis of fingerprints
objective function is
Busey and Chen
m
1 T
w w   ci i . Thus, the matching
2
i 1
regions receive more penalties if they are nonseparable
points while other regions receive less attention because it
is more likely that they are irrelevant to the expert’s
decision. Thus, the parameters of the SVM are tuned up to
favorite the regions that human experts are especially
interested in. By encoding this knowledge in a machine
learning algorithm, we expect that this method will lead to a
better performance by closely imitating the expert’s
Figure X. The overview of automatic
detection of regions of interest. The red
regions in the fingerprints indicate
whether human expert focus on in
pattern matching task.
decision.
C.7. Dependencies between global and local information: The role of gist information
Fingerprints are categorized into several classes, such as whorl, right loop, left loop, arch,
and tented arch in the Henry classification system (Henry 1900). In the literatures, researchers
use only 4-7 classes in an automatic classification system. This is because the task of
determining a fingerprint class can be difficult. For example, it is hard to find robust features
from raw images that can aid classification as well as exhibit low variations within each class. In
C.5 and C.6, we discuss how to use expert knowledge to find useful features for pattern
matching. By taking a bigger picture of feature detection and fingerprint classification in this
section, we find that we need to deal with a chicken-and-egg problem: (1) useful local features
can predict fingerprint classes; and (2) a specific fingerprint class can predict what kinds of local
regions likely occur in this type of fingerprint. In contrast, standard alone feature detection
algorithms (e.g. in C.5 and C.6) usually look at local pieces of the image in isolation when
deciding whether the patch is a region of interest. In machine learning, Murphy, Torralba and
Freeman (2003) proposed a conditional random filed for jointly solving the tasks of object
detection and scene classification. In light of this, we propose to use the whole image context as
an extra source of global information to guide the searching of ROIs. In addition, a better set of
Page 24
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
ROIs will also potentially make the classification of the whole fingerprint more accurate. Thus,
the chicken-and-eggs problem is tackled by a bootstrapping procedure in which local and global
pattern recognition systems interact with and boost each other.
We propose a machine learning system based on graphical models (Jordan 1999) as shown in
Figure XX. We define the gist of image as a feature vector extracted from the whole image by
treating it as a single patch. The gist is denoted by vG . Then we introduce a latent variable T
describing the type of fingerprint. The central idea in our graphical model is that ROI presence is
conditionally independent given the type and the type is determined by the gist of image. Thus,
our approach encodes the contextual information on a per image basis instead of extracting
detailed correlations between different kinds of ROIs (e.g. a fix prior such as the patch A always
occurs to the left of the patch B) because of the complexity and variations of detailed
descriptions. Next we need to classify fingerprint types. We will simply train a one-vs-all binary
SVM classifier for recognizing each fingerprint type based on the gist. We will then normalize
p(T t  1 | vG )
the results: p(T  t | vG ) 
where p(T t  1 | v G ) is the output of tth one-vs-all
t'
t ' p(T  1 | vG )
classifier.
As far as the fingerprint type is known, we can use this information to facilitate ROI
detection. As shown on the tree-structured graphical model in Figure XX, the following
conditional joint density can be expressed as follows:
1
1
p(T , R1 ,..., RN | v)  p(T | vG ) p( Ri | T , vi )  p(T | vG ) p(Tt ) p( Ri | Tt , vi )
z
z
t
i
i
Where vG and vi are local and global features respectively. Ri is the class of a local patch. In the
proposed research, we will investigate two types of R. One classification defines ROI and NonROI types which is the same with C.5 and C.6. The other classification defines several minutia
types (plus Non-ROI) such as termination minutia and bifurcation minutia. z is a normalizing
constant. Based on this graphical model, we will be able to use contextual knowledge to facilitate
the classification of a local image. We also plan to develop a more advanced model which will
use local information to facilitate the fingerprint type classification. We expect that this kind of
Page 25
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
approach will lead to a more effective automatic system that can perform both top-down
inference (fingerprint types to minutia types) and bottom-up inference (minutia types to
fingerprint types).
C.8. Summary of quantitative approaches
(Tom writes)
General themes:
Incorporate expert knowledge
Links between global and local structure made possible by input from experts
Specification of elemental basis or feature set
Classifying informativeness of regions
Defining an intermediate level between low-level feature extractors and high-level gist or
configural information
D. Implications for knowledge and practice
The implications of the knowledge gained by the results of these studies and analyses falls
into four broad categories, each of which are discussed below.
D.1. Implications for quantitative understanding of the information content of fingerprints
D.2. Implications for an understand of the links between quantitative information content
and the latent print examination process
D.3. Implications for the classification and filtering of poor-quality latent prints
Page 26
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
D.4. Implications for the development of software-based tools to assist human-based latent
print examinations and training
E. Mangement plan and organization
F. Dissemination plan for project deliverables
scientific articles, presentations at machine learning conferences and fingerprint conferences,
proof-of-concept Java-based applets.
(end of 30 pages)
Page 27
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
G. Description of estimated costs
Personnel
The project will be co-directed by Thomas Busey and Chen Yu. We request 11 weeks of
summer support, during which time both will devote 100% of their efforts to the project.
Benefits are calculated at 19.81%. The salaries are incremented 3% per year.
Many of the simulations will be conducted by a graduate student, who will be hired
specifically for the purposes of this project. This student, likely an advanced computer science
student with a background in cognitive science, requires a stipend, a fee remission and health
insurance. The health insurance is incremented at 5% per year.
Subject coordination and database management will be coordinated by hourly students who
will work 20 hours/wk on the project. We will pay them $10/hr.
Consultant
John Vanderkolk, with whom Busey has worked with for the past two years, has agreed to
serve as an unpaid consultant on this grant. He does require modest travel costs when he visits
Bloomington.
Travel
Money is requested to bring in four experts for testing using the eyemovement recording
equipment. These costs will total approximately $1500/yr.
Money is requested for three conferences a year. These will enable the investigators to travel
to conferences such as Neural Information Processing (NIPS) and forensic science conferences
such as the International Association for Identification (IAI) to interact with colleagues and share
the results of our analyses. These trips serve an important role in communicating the efforts of
this grant to a wider audience.
Other Costs
Equipment
This research is very computer-intensive, and thus we require a large UNIX-based server to
run simulations in parallel. In addition, we require three pc-based workstations to run Matlab and
Page 28
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
other simulations programs. Finally, conferences such as IAI and local Society for Identification
meetings provide an ideal place to gather data from experts, and thus we require a portable
computer for such onsite data-gathering purposes. We anticipate that up to half of our data can
be collected using these on-site techniques, and this technique is preferable because we have
control over the monitor and software. Thus the laptop computer represents a good investiment
in the success of the project.
Other costs
The graduate student line requires a fee remission each year. The fee remission is
incremented at 5% per year.
The results of our studies require resources to reach a wide audience, and thus we require
dissemination costs to cover the costs of publication and web-based dissemination.
This project is highly image-intensive, and we require money to purchase image-processing
software and upgrades. These include software packages such as Adobe Photoshop, as well as
new image processing packages as they become available.
We will test 80 subjects a year to obtain the necessary data for use in our statistical
applications. Each subject requires $20 for the approximate 90 minute testing period.
The project will consume supplies of approximately $100/month, for items such as backups,
power supplies, etc.
Indirect Costs
The indirect rate negotiated between Indiana University and the federal government is set at
51.5%. This rate is assessed against all costs except the fee remission. This was negotiated with
DHHS on 5.14.04
G. Staffing plan and Resources
Both Busey and Chen maintain laboratories in the Department of Psychololgy at Indiana
University that each contain approximately 700 sq. feet of space. These have subject running
rooms, offices and spaces for servers. Chen's lab contains an eyemovement recording setup that
Page 29
Adding human expertise to the quantitative analysis of fingerprints
Busey and Chen
is sufficent for the eyemovement porition of the experiments. Both investigators have offices in
the Psychology department as well.
We will recruit a graduate student from the Computer Science or Psychology programs at
Indiana University. This student must have experience with machine learning algorythms at a
theoretical level, and also be an expert programmer. They will work 20 hrs/wk. We will also
recruit two hourly undergraduate students to coordinate the subject running, data analysis and
server maintainance. They will also be responsible for managing the data repository site where
our data will be accessible by other reserachers who wish to integrate human expert knowledge
into their networks.
The bulk of the theoretical work will be handled by Chen and Busey, while the graduate
student will work in impliemnation and model testing.
H. Timeline
This is a multi-year project that is designed to alternate between acquiring human data and
using it to refine the quantitative analyses of latent and inked prints.
Year 1: Acquire necessary fingerprint databases. Begin testing 80 experts on 72 different
latent/inked print pairs. Program Support Vector and Global Local models. Test 2 experts on the
eyemovement equipment using all 72 prints.
Year 2: Test an additional 80 experts on 72 new latent/inked prints. Begin model fitting and
refinement. Test 2 experts on the eyemovement equipment using all 72 prints. Compare results
from eyemovement studies and moving window studies.
Year 3: Test the final 80 experts on 72 new latent/inked prints. Develop new versions of
statistical models based on prior results. Put entire database online for use by other researchers.
Disseminate results to peer-reviewed journals.
Page 30
Download