Given the possible number of genetic variations, the probabilit

advertisement
Summary
Given the possible number of genetic variations, the probability of having a naturally occurring
Doppelganger is low. This is why DNA evidence acquired at crime scenes is such conclusive evidence when presented
in criminal trials. Though the process of DNA fingerprinting is fallible, the probability that two unrelated people with
the same DNA exist is microscopic. Barring, then, that you have an identical evil twin, the probability that you will be
mistaken for a criminal based on such evidence is low. Fingerprints, however, being only a portion of this genetic
identity, seem far less restricting. It is then conceivably possible that one could be mistaken as the perpetrator of a
crime based on fingerprint evidence. It is our goal to determine exactly how probable this is.
One of the progenitors of the study of fingerprint identity was Sir Francis Galton, who identified
characteristic ridge patterns in the skin that vary widely among a population, but which are constant over time to an
individual. In addition to these minutiae, fingerprints also have an overall pattern that in nearly all cases falls into one
of three groups: loops, arches, and whorls. Using both the overall fingerprint patterns, and a set of the most commonly
occurring Galton Characteristics (GCs), we created a model to test the individuality of fingerprints, based on a
probabilistic interpretation: highly probable fingerprints are less individual, and less probably fingerprints are more
individual.
In this model, we first divided an ideal rectangular thumbprint into squares of equal area, denoted as cells.
Knowing that any comparison between two fingerprints first matches the general pattern of a fingerprint and then a
certain number of GCs, we calculated the fingerprint patterns that have the maximum probability of occurrence. This
was done by using figures which determined the relative frequency of occurrence of each of the patterns and GCs.
To start, we assumed that from an ideal thumbprint containing N total cells, we chose to confirm the form and
placement of n GCs in those cells. Our model proceeds in stages, first choosing the overall pattern of the print, and then
proceeding to choose n locations of GCs from the N total placements possible. Once the pattern and placement have
been determined, it remains only to factor in the relative occurrence probabilities of each GC in order to determine a
measure of the individuality of the fingerprint.
The model is constructed based on a number of assumptions. To begin with, we first assume that the patterns
and GCs occur independently; neither has an influence on the other’s probability. In later stages of our analysis, then,
we account for the fact that dependencies may exist, and alter the selection of GCs accordingly. Another assumption
that our model makes is that the GCs occur independently; that is, in the n spaces which we wish to confirm the
presence of GCs, placement has no effect on which characteristic is selected. Since there has been no conclusive
evidence that a particular fingerprint pattern has any influence on the minutiae present in the fingerprint, this seems to
be a valid assumption, and hence no unnecessary restrictions were placed on the form of the fingerprint. The
construction of the model allowed us to calculate the ability to confirm a fingerprint based on partial fingerprint
evidence. In addition, we used population figures of many countries and the entire world to find what the minimum
number of GCs in common between fingerprints should be before a match can be said to occur.
In testing this model, we did not calculate the probability of occurrence for every individual pattern and
placement of GCs. Rather, we calculated only the probability of the most likely occurrence. Also, the orientation of
GCs was not taken into consideration. This may at first seem to be a weakness, but is in fact a strength, as requiring a
fingerprint to occur with GCs oriented in a particular direction is stricter than not requiring any particular direction for
their placement. Thus, any fingerprint occurring in nature is hypothetically less likely to occur than our calculated
maximum. For a template fingerprint with 12 identified minutiae, a reasonable required number given new
advancements in laser recognition of fingerprints, the probability finding a match was calculated to be on the order of
10-13. This figure shows that even the most likely fingerprint is thus highly individual, and fingerprint identification is
as reliable on ideal grounds as DNA identification, which has reliability on the order of 10-10.
Team 250
Page 2
AN INQUIRY INTO INDIVIDUALITY OF THUMBPRINTS
Asma Al-Rawi, Steve Gilberston, Jonathan Whitmer
Kansas State University
Mathematical Contest in Modeling 2004
I. Introduction
“How can you disbelieve in me when I have created each one of you down to the
prints on your fingers?”
--God (The Holy Qur’an 75:3-4) [4]
The above reference, depending on one’s religiousness or secularism, either
confirms that fingerprints are distinct to individuals, or at the very least, that knowledge
of variation of fingerprints between persons, and its inherent properties in identification,
has existed since the 8th century. In modern Western culture, the idea of using
fingerprints as a means of identification first appeared in an article written by Henry
Faulds in 1880 in the journal Nature [3]. His interest was aroused by his discovery of
ridged pattern imprints in handmade pottery. After performing a series of experiments to
determine difference in fingerprints among individuals as well as their resilience, he
recommended that a primary use of these ridged imprints could be used as evidence of
criminal identity at the scene of the crime. At the root of this assertion is the assumption
of uniqueness in each human’s fingerprint patterns. There are several commonalities in
the patterns of ridged skin, however, which allow fingerprints to be systematically
classified.
For example, the ridged lines on fingers appear in a number of major pattern
types: loops, which comprise the largest portion of all fingerprints and occur in two
chiralities; whorls, which are characterized by the spiraling pattern of the ridges; and the
arches, which comprise the smallest major group [1]. Other possible manifestations exist;
however their occurrence is very rare. In addition to these major groups, the ridges of
different fingerprints show certain defining characteristics. This idea was prevalent in one
of the first attempted quantifications of fingerprint individuality, which was performed by
Sir Francis Galton in 1892 [1]. The patterns of finger ridge divergences and
combinations, termed minutiae, are also identified as Galton Characteristics in his honor.
Later developments have incorporated his ideas along with other print-determining
factors to establish more exactly each print’s uniqueness [1,2,6].
Whether or not each fingerprint pattern is truly unique, their use as a form of
identification has found much use in forensic science. Recently, however, the validity of
fingerprint evidence has been called into question, as evidenced by the case United States
v. Mitchell, which presented the US with its first challenge as to the admissibility of
latent fingerprint evidence as a means of identification [7]. This necessitates a
reevaluation of the validity of fingerprint uniqueness in measurement. Thus, we become
faced with the problem of determining the probability that two people in the world might
share the same fingerprints to measurable accuracy. This is quite a complex problem if
one allows it to be, as there seem at first to be almost infinitely many variations within
ridge patterns whose appearance and interplay must be accounted for, and yet it has a
simple and elegant solution which we will show in this paper. In our study, we focus not
Team 250
Page 3
on each of the ten fingers, but on only the thumb, which effectively serves as an upper
bound for the multiple occurrence probability of all friction ridged skin. Our calculations
have found on the basis of a discrete probability model that it is extremely unlikely that
two people with the same thumbprints have ever existed, within the limitations of current
measurement practices.
II. Model
The first step in devising a model for thumbprint individuality is simply to
understand what types of fingerprints exist. As mentioned previously, fingerprints occur
in what seems to be an infinite number of variations, determined by both their overall
pattern and the distribution of Galton Characteristics (GCs). The patterns fall into three
main categories: loops, arches, and whorls. These can be further divided into over a
thousand subcategories [1]. Figure 1 shows the major types of prints.
FIGURE 1. These are four most common patterns of fingerprint patterns: Left and right loops, whorls, and arches.
From www.sfis.ca.gov/pattern_types.htm.
Prints which fall into these categories can, to the untrained eye, and oftentimes
even the trained eye, appear very similar. When the contribution of GCs is factored in, a
particular fingerprint’s unique character starts to become apparent. The major types of
GCs are illustrated in Figure 2. Whether the pattern on the finger is a loop, arch, or whorl,
GCs occur randomly throughout the entire print. These occurrences give distinct
attributes to the print that can be systematically classified.
Team 250
Page 4
FIGURE 2. A chart showing the 10 most common forms of Galton Characteristics. (Osterburg ??)
The central problem, given a known classification of a fingerprint by its pattern
and GCs, becomes to calculate the probability that an identical finger exists. Our model
focuses specifically on thumbprints, for a variety of reasons. For instance, a thumb is
easy to idealize. In practice, when fingerprints are taken, the finger is rolled over nearly
its entire surface above the first knuckle. This is similar to the unrolling of an uncapped
cylinder. The shape of this print on paper is approximately rectangular. The thumbprint
has the largest area, and also the largest number of defining qualities, due to the random
distribution of GCs.
For an ideal rectangular thumbprint, we partition the area into N equally sized
squares, with a minimum size on the order of one square millimeter, due to the minimum
extent to which a GC can be identified as occurring in one of the N squares. Since only a
finite number of visible GCs can occur on a single patterned finger, a discrete probability
method is useful for determining the possibility of Doppelganger thumbs. It is then
perfectly admissible to use a counting argument to find approximately the number of
possible arrangements of friction ridges on the thumb, and their relative occurrences
based on the features they contain.
It should be noted that ideal fingerprints as described above do not usually occur
in actual fieldwork. Usually only portions of fingerprints are left by oils or other
substances on the fingers of the criminal; these are called latent prints. After these latent
prints are developed and brought into visible form, they are described as partial prints.
These partial prints contain only a fraction of the total surface of the friction ridged skin
on the thumb. Using similar ideas to the ones above, we can model partial prints simply
Team 250
Page 5
by decreasing N; that is, limiting the number of cells on which the prints have to match
up. Since a partial print cannot possibly match the rest of the cells contained in an ideal
print, the characteristics of those cells are irrelevant. Decreasing N then gives an accurate
model, as we can say that the area we are sampling from is smaller. Accordingly, the
probability of matching the print among people of a given population grows, as we show
below.
III. Probability Algorithms
Our first step was to measure the dimensions of an idealized thumb. Averaging
over the three members in our group, we found the dimensions of a nearly rectangular
print, when measured as described above, to be approximately 3 cm by 4 cm. Thus there
are approximately 1200 square millimeters on two thumbs. We took each square
millimeter to be a cell, so that in our ideal thumb model, a full print has a possibility of
1200 identification points.
In practice, a suspect’s thumbprint and the thumbprint found at the scene of the
crime are compared to each other on both the overall pattern and a certain number of
distinguishing characteristics. The distinguishing factors can correspond to either scars on
the suspect’s thumbprint or GCs. Since scars are the result of completely random events,
and thus are nearly impossible to quantify without exact personal histories, our model
considers only the cases in which GCs occupy these identifying points. In previous
models [1,2], the relation between GCs and the overall pattern was not considered; only
the occurrence of GCs was taken into account. In our model, various degrees of pattern
and GC independence were considered. This accounts for the possibility that a certain
percentage of the GCs are inherent in the overall pattern. In the case where pattern and
GC occurrences are completely independent, one can separate the probability of a
fingerprint’s occurrence into two factors:
Pfp
PpP
GC
(1).
In the above equation, Pfp is the probability a particular fingerprint will occur, Pp is the
probability a particular pattern will occur, some approximate figures for which are given
in Table 1, and PGC is the probability of a particular combination of GCs.
Class of
Print
Right Loop
Left Loop
Whorl
Arch
Total
Probability
0.325
0.325
0.3
0.05
1
TABLE 1: A list of approximate occurrence probabilities of the four most common thumbprints from Osterburg, et. al.
The loop category is determined therein to have a 65% occurrence probability, which here is divided into the two
chiralities, which are easily distinguishable and occur at nearly the same rate overall.
Our model treats non-measured GCs and cells in which there are no GCs as
equivalent empty cells. Thus, in the case where GCs are dependent on which pattern a
fingerprint has, we can still use this independence model, by noting that since a particular
Team 250
Page 6
percentage of the GCs are determined by the pattern, we can treat those as empty space in
which no defining characteristic occurs.
Suppose then, that we wish to find the probability that a particular distribution of
measured GCs occurs. To do this, we note that of the N total cells in the fingerprint, only
n of these cells have any significance in terms of GC measurement. The number of ways
this can be distributed is easy to compute. Placing all measured cells on the same level,
we begin placing GC’s and empty cells on the surface of the thumbprint. At first there are
n GCs to place within the total area of the print, and N total cells to place them in. If the
first cell is empty space, we are left with N-1 cells in which to place characteristics, and n
characteristics. If the first cell contains a characteristic, we have N-1 empty cells in which
to place characteristics, and n-1 GCs. Iterating this choice process over all N cells, we
find that the number of ways we can place the GCs is
N
n
N!
n!( N n)!
(2).
This leaves us to calculate the probability that each GC cell contains a particular
GC. Osterburg, et al, contains relative frequencies of occurrence for each characteristic
averaged over 39 fingers. Table 2 gives these figures. In our model, since we disregard
empty spaces, we considered only the relative frequency of the eleven most common
elements. Double occurrences, or the event that two GCs occur in the same space, while
certainly possible, were ignored in this model calculation, due to their small frequency.
The number in the table is misleading, as it accounts for all double occurrences, not
double occurrences of particular types.
Parameter
0
1
2
3
4
5
6
7
8
9
10
11
12
Cell configuration
Empty
Island
Bridge
Spur
Dot
Ending ridge
Fork
Lake
Trifurcation
Double bifurcation
Delta
Broken ridge
Multiple occurances
Total
Frequency
6,584
152
105
64
130
715
328
55
5
12
17
119
305
8,591
Probability of Parameter
0.766
0.018
0.012
0.007
0.015
0.083
0.038
0.006
0.001
0.001
0.002
0.014
0.036
1.000
TABLE 2. Experimentally determined Galton Characteristic probability numbers. From Osterburg, et al. Our model
disregards multiple occurrences, hence for our purposes, the characteristics numbered 0 and 12 are empty cells. Only
the characteristics numbered 1-11 are relevant.
The relative probability is a necessary factor for determining which characteristic
is most likely to occur in the n GC cells. The probability of the ith occurrence is given by:
Team 250
Page 7
P (i )
P (i)
ri
(3),
i
where the elements P(i) are determined from Table 1. The i in this case ranges from 1 to
11, as our model considers only single GC occurrences, and treats the low probability and
multiple occurrence GCs as empty space. It should be noted that their inclusion would
decrease the relative probability of the ith term as defined above; hence, it would decrease
the upper bound which our calculation aims to set. Clearly, the sum of these relative
probability quantities is 1, hence they are validly defined as probabilities.
For n GCs, the probability of each arrangement is given by the relative probability
of each GC to the power of the number of times the GC is selected divided by the number
of ways to divide those n elements into groups categorized by the eleven GCs considered.
Though the idea is complex, the notation is rather mathematically simple, and
corresponds to the product of the selection probabilities divided by the multinomial
coefficient corresponding to n choosing n1 of GC number 1, n2 of GC number 2, etc. If
we divide this quantity by the number of ways each of the n GCs considered, we obtain
the probability of each arrangement of n GC’s, shown in equation (4a).
11
i1
PGC
11
ri
i
n
n1
n11
i1
N
n
n!
n1! n11!
11
ri
(ni !)ri
i
i
N!
n!( N n)!
N!
( N n)!
i
(4a)
One should note that in the above,
i
i
n
(4b),
hence there are only as many stages considered in the determination of GCs as there are
GCs that are measured and available to compare to.
To reiterate, our algorithm for calculating Doppelganger thumb probabilities
considers separately the probabilities of both the general pattern and GC occurrence. The
probability of GC occurrence is determined by the number of places in which GCs are
observed, the relative probability of a GC occurring there, and the number of ways these
GC’s can then be ordered. The quantification of this is then given by equation (4a).
Now, given equations (1) and (4a), we can calculate the probability of any
particular fingerprint matching on both the pattern and any n GCs by using the
information in Tables 1 and 2. Since we wish, then, to put a limit on the number of
people in the world who can match fingerprints, given these characteristics, we calculated
Pmax, the probability of any thumbprint matching a template with only the most likely
characteristics in each of the GC places. This simplifies equation (4a), by restricting
choice to only the GC with maximum probability. Thus we have
Team 250
Page 8
11
i1
PGC
ri
i
n
n1
n11
N
n
n
rmax
n
n
0
N
0 n
n
rmax
N
n
Pmax
(5).
Some plots of this are given in Appendix A. These plots use the value of rmax obtained by
computing the relative probability of ending ridges, and consider only the right and left
loop patterns (occurring in equal supply) to constitute the maximum pattern probability.
To calculate the quantities determined in equation (5), it becomes necessary to
calculate factorials of very large numbers to determine values of N choose n. This can be
approximately done by using Sterling’s approximation, whose formula is given by
log(m!)
m log(m ) m
1
log(2 m)
2
(6).
n)!
(7),
This, in turn, leads us to the approximation
log
N
n
log(n!) log (N
which can be utilized to approximate
log(n!)
N
.
n
If we suppose that a percentage of GCs are dependent on the overlying pattern,
then our model changes very little. Assuming that l of the n total GCs are dependent on a
particular pattern, we can essentially disregard all pattern-dependent GCs as empty cells,
as they would be exactly what is expected in the print at that point in the pattern. Hence,
with a slight modification from n to n – l, where l denotes the number of GCs dependent
on the pattern, equations (4a) and (4b) can still be utilized. In the event that the GCs are
wholly determined by the overlying pattern, we can disregard the influence of the pattern
in our calculation of Pfp, as we have more precise information about GC form and
occurrence than we do about pattern and sub-pattern form and occurrence. Also, our
estimates for the likelihood of a GC occurring at a given point in the N-square array give
a more limiting maximum for the probability than do our figures on general pattern
characteristics. The omission of the pattern influence on the fingerprint probability is
completely valid, since total GC dependence on pattern is equivalent to total pattern
dependence on GC; they simply become two different types of taxonomy.
IV. Data
Returning to problem now, we are specifically asked to determine what the
probability is that a person can be misidentified by fingerprint evidence; that is, we are to
determine the probability that two people share the same fingerprint characteristics. For a
template with n GCs, we are to calculate the probability that two distinct people match
the template. This is limited by the square of Pmax for a given n, which as graphed in
Team 250
Page 9
Figure 3 below, is seen to be very low for all n ≥ 10. For the value of n = 12, taken in
Osterburg, et al to be a median value for what is required for verification by various
international law enforcement agencies, we can see that the probability of fingerprint
multiplicity is 4.64 x 10-15. These calculations were simply performed using a Microsoft
Excel spreadsheet and the formulas in Section III.
Maximum Probabilities at Various Pattern Dependencies
1.00E+04
1.00E-01
1.00E-06
P_max
1.00E-11
1.00E-16
1.00E-21
1.00E-26
1.00E-31
0
5
10
15
20
25
30
35
Number of GCs
No Dependence
25% Dependence
50% Dependence
75% Dependence
100% Dependence
FIGURE 3: Plot of maximum probability as a function of the number n of GCs used in the verification process. Here n
is allowed to range from 1 to 30.
Another, directly applicable, and highly interesting problem is the following:
What is the maximum number of GCs that a particular country’s law enforcement
agencies must use in order to get the highest probability of a match using the lowest
number of GCs per identification? Using population figures in Table 3, we can determine
this. To do so, we multiply the population of a country by Pmax to find the number of
people in a country that are probable to match a given n GC template. The results are
plotted in Appendix A.
The plots in Appendix A all point to near certain identification for n ≥ 12. This is
true regardless of the country in which the identification is being made. In fact, using the
world population figure, it is near certain that on a thumb with 1200 cells, a match is all
but certain, and indeed, only one person is likely to have ever existed with such a print.
Country
US
World
China
Number of people
2.925E+08
6.347E+09
1.295E+09
Team 250
Page 10
Lichtenstein
# People Ever
3.284E+04
1.269E+10
Table 3: Population figures for the world and some representative countries. The number of people ever
was a figure computed on the assumption that roughly twice as many people have existed in the history of
humanity than exist at this particular point in time.
As was noted before, however, it might be the case that a thumb with 1200 cells is
overly large, or that only partial prints can be obtained for identification purposes. In this
case, we restrict the number N to a number less than 1200. For the plots in Appendix B,
we changed the number 1200 in our calculation to values of N = 600 and N = 300.
Though this increases the probability of finding multiple matches, due to restriction in the
number of sites to place n GCs. However, if as few as 12 GCs are matched, the
fingerprint’s unique identity is all but assured.
V. Error Analysis
A previous investigation by Pankanti, et. al. included the orientation of each
minutia in the model for fingerprint individuality. We neglect to include the factor of
orientation of the characteristic for many reasons. Firstly, removing the factor of GC
orientation can only decrease our estimate of the maximum possible thumb Doppelganger
probability. Since we are attempting only to find a maximum bound for this probability,
removal of a factor which can only decrease the probability of a particular print, while in
the same breath unnecessarily complicates our solution, does no damage to our model.
Pankanti, whom accounts for orientation in his model, arrived at a lower figure for
fingerprint individuality than we did. In accounting for this orientation, however,
Pankanti completely disregards the differences in minutiae, only concentrating on
location and orientation of defining features in the fingerprint ridges. Some figures done
on various model calculations that are included in Pankanti’s paper are listed in Table 4,
in Appendix C.
A second reason our model disregards orientation is that our model relies on the
assumption that minutiae occur either independently or semi-independently. In
accounting for orientation, we would have to take into account restrictions placed on the
orientation of the GC by the overall pattern. This is simple to see: persons with loop
patterns have a higher probability upward and downward pointing GCs than do persons
with arches. Accounting for orientation would make the pattern and minutiae
probabilities inseparable, and again harm the simplicity of our model while offering little
improvement to our limiting maximum.
Another unavoidable problem with our model is the roughness of pattern and GC
frequencies. Unfortunately, there are no good assessments published on the percentages
of the population who patterns that fall into the arch, loop, and whorl categories. The
frequency of occurrence of GCs faces a similar problem. In fact, the only figures we
could find were rough estimations based on a small sample of people. Osterburg, whose
figures we used in this model, arrived at his probability parameters of GCs by sampling
from 39 fingerprints. He did break them into a total of 8,591 cells, but as we do not know
whether or not a single person is more likely to have a certain type of GC, these
probabilities cannot be taken at face value [1]. Surely more recent figures on these
Team 250
Page 11
parameters exist, but they again do not harm our model, only the figures which it
calculates.
As mentioned before, there is a possibility that there exists dependence between
GCs and the overall pattern of a fingerprint. In our model, we attempted account for this
by decreasing the identifying traits of a particular minutia by 25%, 50%, and 100%. For
the 100%, we simply calculated the probability of a particular GC occurrence and
disregarded the pattern, as either can be seen to be the determining factor of the other.
This is not an exact model simply because this assumes semi-independence where
complete dependence may occur. Without proper relations that give the dependence of
minutiae on the overall pattern, however, we are unable to properly account for this.
Inasmuch as we were able to adjust for these parameters, our model still predicts that
identifying 12 or more minutiae on a print, which is well within current technology, all
but assures a positive match.
One who pays astute attention to our graph in Figure 3 notes that the graphs of
100% and 0% dependence are actually the closest in predicted probability. This is
because removal of the pattern parameter in the calculation of Pmax only increases the
overall maximum probability by an approximate factor of 10. The other figures suffer
from inexactness in relating the dependence between occurrence of pattern and minutiae.
In the figures for our model, we have more precise knowledge of GC occurrence than of
pattern occurrence. Hence, the plots in which we require a percent dependence on pattern
suffer unnecessarily from inexact data.
As we are creating a somewhat idealistic model of fingerprints, scars were not
taken into consideration. As can bee seen in Figure 4, scars do have an effect on the
appearance of fingerprints. This may create inaccuracies; however, there is no good way
to model the formation of scars, as this is completely due to personal experiences.
FIGURE 4: The effect of scars on fingerprint analysis. From Cowger, p. 4.
Our model also differs on one account from most other models of fingerprints.
Previous articles [3] published on fingerprint analysis define fingerprints only as the
portion in the general vicinity of the central pattern. Our model actually takes the print
on the entire area above the upper joint of the thumb, which would be the type of
fingerprint on file. Accordingly, our probabilities are significantly lower than those
calculated by others. However, our model can, as mentioned before, be made to
approximate these in the limit where the number of cells N is at a value around 300 and n
is around 12. The values we calculated in this method match up to other models
accordingly, as seen in Table 4.
The major problem which our model suffers from is its inability to account for
human error in determining thumbprint probability. Epstein [7] notes that the major
problem with latent fingerprint evidence is the inability of the humans whom examine the
prints to discern exact characteristics. We now have the ability to use optical scans to
determine fingerprints of an individual exactly, as opposed to putting ink on file. If the
thumbprints matches were able to be tested by a computer, it would be highly unlikely,
given our model, that anyone would ever be misidentified.
Team 250
Page 12
Comparing the output of our model with the probabilities of error in DNA
analysis, we find that fingerprints are a much more accurate method of
identification. Though everyone except identical twins and clones has a unique
sequence of DNA, for criminology, the exact sequence is not actually used as
evidence. Instead, DNA is cut up with an enzyme into Restriction fragment
length polymorphisms (RFLPs). These pieces of DNA are then run out on a gel,
which separates it out by the size of the segment [8]. Accordingly, if two or more
people simply have restriction sites in approximately the same area, or even have
the same amounts of DNA between restriction sites, they can be mistaken for one
another. This is a much higher probability than if the exact sequence were taken
into account. Accordingly, though misidentification is rare, the probability of
misidentification in DNA analysis is on the order of one in ten billion, while
according to our data that of fingerprint analysis is much lower [5].
VI. Conclusion
Initially, this problem aroused in us many concerns. What if one of us really had
a thumb Doppelganger? We could be convicted for crimes we had never committed!
This situation would be most unfortunate. However, after running our model under a case
of maximum probability, we discovered that there is a better chance of misidentification
through DNA profiling if the fingerprint analysis is conducted with minimal human error.
This is plainly evident in the fact that the odds of misidentification of DNA evidence,
regarded in legal and public opinion as nearly infallible, has a probability of
misidentification on the order of 10-10, while the odds of fingerprint misidentification is
four orders of magnitude less, according to our model. Needless to say, it seems
unreasonable to deny fingerprint profiling as evidence in a criminal trial.
Appendix A: Shared Characteristics of a Population
The following plots were used to determine the optimum figure for identification
of criminals based on fingerprint evidence that is given in section IV.
Team 250
Page 13
Number of like thumbprints, 0% dependence, N=1200
1.E+10
Number of people with thumbprint
1.E+05
1.E+00
1.E-05
1.E-10
1.E-15
1.E-20
1.E-25
0
5
10
15
20
25
30
35
Number of GCs
US Most
World Most
China Most
Lichtenstein Most
Ever Most
Figure 5: Plot of the number of probable like thumbprints in a given country using the model of zero
percent pattern dependence. This shows that if only 10 minutiae are required to match, then it is likely that
no one in the history of the world has had an exactly matching whole thumbprint.
Number of like thumbprints, 25% dependence, N=1200
1.00E+11
Number of people with thumbprint
1.00E+06
1.00E+01
1.00E-04
1.00E-09
1.00E-14
1.00E-19
1.00E-24
0
5
10
15
20
25
30
35
Number of GCs
US Most
World Most
China Most
Lichtenstein Most
Ever Most
Figure 6: Same as above, for 25% dependence model. Here, only 10 minutiae are required for positive
identification as well.
Team 250
Page 14
Number of like thumbprints, 50% dependence, N=1200
1.00E+09
Number of people with thumbprint
1.00E+04
1.00E-01
1.00E-06
1.00E-11
1.00E-16
1.00E-21
0
5
10
15
20
25
30
35
Number of GCs
US Most
World Most
China Most
Lichtenstein Most
Ever Most
Figure 7: Same as above, for the 50% pattern dependence model. Here, around 12 characteristics are
required for a highly probable identification. The difference here is likely caused by error in our knowledge
of pattern frequencies.
Number of like thumbprints, 100% dependence N=1200
1.00E+09
Number of people with thumbprint
1.00E+04
1.00E-01
1.00E-06
1.00E-11
1.00E-16
1.00E-21
1.00E-26
0
5
10
15
20
25
30
35
Number of GCs
US Most
World Most
China Most
Lichtenstein Most
Ever Most
Figure 8: Same as above, for the complete dependence model. Again, only about 10 characteristics are
required for a positive identification.
Team 250
Page 15
Appendix B: Shared Partial Print Characteristics of a Population
The following plots were used to determine the optimum number of GCs to match
up within a given population if only partial prints are available for comparison.
Number of like thumbprints, 0% dependence, N=600
1.00E+13
Number of people with thumbprint
1.00E+08
1.00E+03
1.00E-02
1.00E-07
1.00E-12
1.00E-17
1.00E-22
0
5
10
15
20
25
30
35
Number of GCs
US Most
World Most
China Most
Lichtenstein Most
Ever Most
Figure 9: A plot of the number of possible like half-thumbprints, given zero dependence on fingerprint
pattern.
Number of like thumbprints, 100% dependence, N=600
1.00E+13
Number of people with thumbprint
1.00E+08
1.00E+03
1.00E-02
1.00E-07
1.00E-12
1.00E-17
1.00E-22
0
5
10
15
20
25
30
35
Number of GC's
US Most
World Most
China Most
Lichtenstein
Ever Most
Figure 10: A plot of the number of possible like half-thumbprints, given one hundred percent dependence
on fingerprint pattern.
Team 250
Page 16
Number of like thumbprints, 0% dependence, N=300
1.00E+12
Number of people with thumbprint
1.00E+07
1.00E+02
1.00E-03
1.00E-08
1.00E-13
1.00E-18
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Number of GCs
US Most
World Most
China Most
Lichtenstein Most
Ever Most
Figure 11: A plot of the number of possible like quarter-thumbprints, given zero dependence on fingerprint
pattern.
Number of like thumbprints, 100% dependency, N=300
1.00E+12
Number of people with thumbprint
1.00E+07
1.00E+02
1.00E-03
1.00E-08
1.00E-13
1.00E-18
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Number of GCs
US Most
World Most
China Most
Lichtenstein Most
Ever Most
Figure 12: A plot of the number of possible like quarter-thumbprints, given one hundred
percent dependence on fingerprint pattern.Appendix C: Table of Calculated
Probabilities
These probabilities were calculated using various past models by Pankanti [6]. As
noted earlier, our model, which predicts a value less than 4 x 10-15 for the probability of
each individual fingerprint, is in good agreement with these calculations.
Team 250
Page 17
Author
Pfp
Galton (1892)
1
16
1
256
1
2
Pearson(1930)
1
16
1
256
1
36
R
R
n=36, R=24,
M=72
1.45x10-11
N=12, R=8,
M=72
9.54x10-7
1.09x10-41
8.65x10-17
Henry(1900)
1
4
N 2
1.32x10-23
3.72x10-9
Balthazard(1911)
1
4
N
2.12x10-22
5.96x10-8
Bose(1917)
1
4
N
2.12x10-22
5.96x10-8
Wentworh & Wilder
(1918)
1
50
6.87x10-62
4.10x10-21
2.22x10-63
1.32x10-22
1.00x10-38
1.00x10-14
3.75x10-47
3.35x10-18
2.47x10-26
2.91x10-9
1.33x10-27
3.05x10-15
1.2x10-80
3.5x10-26
N
Cummins & Midlo (1943)
1
31
Gupta (1968)
1 1
10 10
Roxburgh (1933)
1
1000
Trauring (1963)
(0.1944)N
Osterburg et al. (1980)
Stoney (1985)
1
50
N
1
10
N
1.5
10 2.412
N
(0.76)MN234
N
0.6 (0 .5 10 3 ) N
5
1
TABLE 4: Calculated probabilities for various models. Obtained from Pankanti, et. al. [6]. Here, R is the
number of regions of a fingerprint considered as defined by Galton, M is the number of regions as defined by
Osterburg.
References
[1] J.Osterburg, et al., “Development of a Mathematical Formula for the Calculation of
Fingerprint Probabilities Based on Individual Characteristics”, Journal of the
American Statistical Association, Vol. 72, No. 360, pg 772-778, 1977
[2] S. L. Sclove, “The Occurrence of Fingerprint Characteristics as a Two Dimensional
Process”, Journal of American Statistical Association, Vol. 74, No. 367, pp. 588-595,
1979
Team 250
Page 18
[3] James F. Cowger, Friction Ridge Skin: Comparison and Identification of Fingerprints,
Elsevier Science Publishing Co. Inc., New York, New York, 1983.
[4] The Noble Qur’an: In the English Language, Dr. Muhammad Taqi-un-Din Al-Hilali.
Riyadh, Houston, Lahore: Darussalam Publishers and Distributors, 1998.
[5] “DNA Fingerprinting.” The Columbia Encyclopedia, Sixth Edition. New York:
Columbia University Press, 2003
[6] Sharath Pankanti, et al., “On the Individuality of Fingerprints”
http://biometrics.cse.msu.edu/2cvpr230.pdf
[7] Robert Epstein, “Fingerprints Meet Daubert: The Myth of Fingerprint “Science” is
Revealed”, Southern California Law Review, Vol. 75, pp. 605-658, 2002
[8] Anthony J. F. Griffiths, Modern Genetic Analysis, W. H. Freeman and Company,
New York, Mew York, 2002.
Download