An Evaluation of NIJ’s Evaluation Methodology for Geographic Profiling Software D. Kim Rossmo Research Professor and Director Center for Geospatial Intelligence and Investigation Department of Criminal Justice Texas State University March 9, 2005 Executive Summary This is a response to the National Institute of Justice’s A Methodology for Evaluating Geographic Profiling Software: Final Report (Rich & Shively, 2004). The report contains certain errors, the most critical of which involves suggested performance measurements. Output accuracy is the single most important criterion for evaluating geographic profiling software. The report discusses various performance measures; unfortunately only one of these (hit score percentage/search cost) accurately captures how police investigations actually use geographic profiling. This response addresses the various problems associated with the other measures. Geographic profiling evaluation methodologies must respect the limitations and assumptions underlying geographic profiling, and accurately measure the actual function of a geographic profile. Geographic profiling assumes: (1) the case involves a series of at least five crimes; (2) the offender has a single stable anchor point; (3) the offender is using an appropriate hunting method; and (4) the target backcloth is reasonably uniform. Additionally, for various theoretical and methodological reasons, not all crime locations in a given series can be used in the analysis. The most appropriate measure of geographic profiling performance is “hit score percentage/search cost.” It is the ratio of the area searched (following the geographic profiling prioritization) before the offender’s base is found, to the total hunting area; the smaller this ratio, the better the geoprofile’s focus. There are no intrinsic disadvantages to this measure. The other evaluation measures discussed in the NIJ report are all linked to the problematic “error distance.” “Top profile area” is the ratio of the total area of the top profile region (which is not defined) to the total search area. It is not a measure by itself. “Profile error distance” is the distance from the offender’s base to the nearest point in the top profile region (undefined). “Profile accuracy” indicates whether the offender’s base is within the “top profile area” (undefined); it fundamental misrepresents the prioritization nature of geographic profiling. “Error distance” is the distance from the offender’s actual to predicted base of operations. While it is easily applied to centrographic measures, the complex probability surfaces produced by geographic profiling software must be reduced to a single (usually the highest) point. Several researchers have unfortunately adopted this technique because of its simplicity. There are three major problems with error distance. First, it is linear, while the actual error is nonlinear. Area, rather than distance, is the relevant and required measure. Population (and therefore suspects) increases with area size, which is a function of the square of the radius (error distance). The second problem with error distance is that it is not a standardized measure because of its sensitivity to scale. The third and most serious analytic problem with error distance is that it fails to capture how geographic profiling software actually works. Criminal hunting algorithms produce probability surfaces that outline an optimal search strategy. As an offender’s search is rarely uniformly concentric, simplifying a geoprofile to a single point from which to base an error distance is invalid. The use of error distance ignores most of the output from geographic profiling software and undermines the very mechanics of how the process functions. A more comprehensive approach to evaluating geographic profiling as an investigative methodology needs to consider applicability and utility, as well as performance. Applicability refers to how often geographic profiling is an appropriate investigative methodology. Utility refers to how useful or helpful geographic profiling is in a police investigation. To evaluate geographic profiling properly requires analysing only those cases and crimes appropriate for the technique, and measuring performance by mathematically sound methods. Hit score percentage/search cost is the only measure that meets NIJ’s standard of a “fair and rigorous methodology for evaluating geographic profiling software.” 1 Introduction In January 2005, the National Institute of Justice (NIJ) released A Methodology for Evaluating Geographic Profiling Software: Final Report (Rich & Shively, 2004). While the intent of this document is laudable, it is necessary to respond to certain significant errors that are contained in the report. Some of these may be the result of the advisory expert panel not including professional geographic profilers (defined as police personnel whose full-time function involves geographic profiling), “customers” of geographic profiling (police investigators), or developers of geographic profiling software. A crime analyst (trained in geographic profiling analysis for property crime) was the sole law enforcement practitioner on the advisory panel. The most critical error in the NIJ report involves suggested performance measurements. The expert panel correctly concluded that output accuracy – “the extent to which each software application accurately predicts the offender’s ‘base of operations’” (p. 14) – is the single most important criterion for evaluating geographic profiling software (p. 15). The report discusses various performance measures, providing short definitions, advantages, and disadvantages (p. 16). Only one of these measures (hit score percentage/search cost), however, accurately captures how police investigations actually use geographic profiling. This response addresses the various problems associated with the other measures. Background Geographic profiling is a criminal investigative methodology that analyzes the locations of a connected crime series to determine the most probable area of offender residence. It is primarily used as a suspect prioritization and information management tool (Rossmo, 1992a, 2000). Geographic profiling was developed at Simon Fraser University’s School of Criminology, and first implemented in a law enforcement agency, the Vancouver Police Department, in 1995.1 Geographic profiling embraces a theory-based framework rooted in environmental criminology. Crime pattern (Brantingham & Brantingham, 1981, 1984, 1993), routine activity (Cohen & Felson, 1979; Felson, 2002), and rational choice (Clarke & Felson, 1993; Cornish & Clarke, 1986) theories provide the major foundations. While there are several techniques used by geographic profilers, the main tool is the Rigel software program, built in 1996 around the Criminal Geographic Targeting (CGT) algorithm developed at SFU in 1991. After discussions in the mid-1990s with senior police executives and managers of the Vancouver Police Department (VPD) and the Royal Canadian Mounted Police (RCMP), it was concluded that several components would be necessary for the successful implementation of geographic profiling within the policing profession. These included: creating personnel selection, training. and testing standards; following mentoring and monitoring practices; developing usable and functional software; establishing case policies and procedures; identifying supporting investigative strategies; 1 Examples of the use of spatial statistics for investigative support can be identified in the literature as far back as 1977. During the Hillside Stranglers investigation, the Los Angeles Police Department analyzed where the victims were abducted, their bodies dumped, and the distances between these locations, correctly identifying the area of offender residence (Gates & Shah, 1992). None of these isolated applications, however, led to sustained police use or organizational program implementation. 2 building awareness and knowledge in the customer (police investigator) community; and committing to evaluation, research, and improvement. Over the course of the next few years these components were developed, first for major crime investigation, and then for property crime investigation. Personnel from various international police agencies were trained in geographic profiling. Their agencies signed memoranda of understanding agreeing to follow the established protocols, and to assist other police agencies needing investigative support. Training standards for geographic profilers were eventually adopted by the International Criminal Investigative Analysis Fellowship (ICIAF), an independent professional organization first started by the Federal Bureau of Investigation (FBI) in the 1980s. For a geoprofile to be more than just a map, it must be integrated with specific strategies investigators can use. Examples of strategies identified for geographic profiling include: (1) suspect and tip prioritization; (2) database searches (e.g., police information systems, sex offender registries, motor vehicle registrations, etc.); (3) patrol saturation and surveillance; (4) neighborhood canvasses; (5) information mail outs; and (6) DNA dragnets. The level of resources required by these strategies is directly related to the size of the geographic area in which they are conducted. While not used by professional geographic profilers, there are two derivative geographic profiling tools also mentioned in the NIJ report: NIJ’s own CrimeStat JTC (journey-to-crime) module; and The University of Liverpool’s Dragnet. Both of these systems were developed in 1998. Neither is a commercial product, and training in their use, beyond software instruction manuals, is currently unavailable. Little is known about a fourth geographic profiling software program, Predator, first mentioned in 1998 on Maurice Godwin’s investigative psychology website. Geographic Profiling Evaluation There are three methods for testing the efficacy of geographic profiling software. The first uses Monte Carlo simulation techniques. These test the expected performance of the software on various point patterns representative of serial crime sites. The major advantage of this approach is the ability to generate large numbers of data cases (e.g., 10,000). The major disadvantage is the likelihood the site generation algorithm’s underlying assumptions do not accurately reflect the geographic patterns of all serial crime cases. In addition, the additional information associated with an actual case that can help refine a geoprofile is not present. The second and most common method of evaluating geographic profiling software performance involves examining solved cases. This technique has been used by Rossmo (1995a, 2000), Canter, Coffey, Huntley, and Missen (2000), Levine (2002), Snook, Taylor, and Bennell (2004), and Paulsen (2004). The major advantage of research using historical (cold) cases is that with sufficient effort a reasonably sized sample of cases can be collected. Disadvantages include sampling bias problems and the need for extensive data review. The third method tracks geographic profiling performance in unsolved criminal investigations. This approach is the best of the three as it measures actual – not simulated – performance under field conditions (Rossmo, 2001). It also serves as a blind test as the “answer” is not known at the time of the analysis. Monitoring actual case performance is slow, however, as it is necessary for a case to be solved before it can be included in the data sample. Every trained geographic profiler is required to keep a case file that records the details of their work. The log includes fields for case number, sequential number, date, crime type, city, 3 region, law enforcement agency, investigator, number of crimes, number of locations, type of analysis, report file name, case status, and result (when solved). This file has both administrative and research purposes. It was encouraging to see the NIJ report recommend the use of logs and journals by individuals involved in geographic profiling. However, considering how much there is to learn with any new police technology (especially in regards to investigator utility versus software performance), it seems more prudent for all users, and not just a sample, to keep detailed records. Geographic Profiling Evaluation Methodology Evaluation Premises NIJ’s purpose was “to develop a fair and rigorous methodology for evaluating geographic profiling software” (p. 4), and their report identifies law enforcement officials as the key audience for the evaluation. With this in mind, the following premises are used as the basis for the discussion in this response. Any geographic profiling evaluation methodology should: follow the limitations and assumptions underlying geographic profiling; analyze exactly what the geographic profiling software produces, and not a simplification or generalization of its output; measure, as accurately as reasonably possible, the actual function of a geographic profile; use the highest level measurements possible (i.e., ratio/interval/ordinal/nominal); and be based on validity and reliability concerns (and not on tangential factors such as “it is easier,” “it has been done that way before,” or “the software has limitations”). It is tempting, in the effort to increase a study’s sample size, to collect cases from large databases derived from records management systems (RMS). However, if the details of the crimes are overlooked, inappropriate series will be included: GIGO – garbage in, garbage out. Wilpen Gorr, Michael Maltz, and John Markovic have warned us of the importance of data integrity and specificity issues. “You really need to know the capacities and limitations of this less then perfect [crime] data before you dump it into a model” (John Markovic, International Association of Chiefs of Police, NIJ CrimeMap listserve, January 31, 2005). To prepare a geographic profile properly involves first making sure the case does not violate any underlying assumptions. Furthermore, only those crime locations in the series that meet certain criteria can be used in the analysis. This is one of the reasons why a geoprofile requires anywhere from half a day for a property crime case to up to two weeks for a serial murder case. A significant portion of the geographic profiling training program is spent learning to understand these issues so the methodology is not improperly applied. These complexities are why testing, monitoring and mentoring, and review exist. Geographic Profiling Assumptions Any algorithm or mathematical function is only a model of the real world. The appropriateness and applicability of weather forecasting techniques, multiple linear regression, the spatial mean, or horserace odds are all premised on various assumptions. If those assumptions are violated, or if the processes of interest are not accurately replicated, the model has little value. Using atheoretical algorithms for police problems is tantamount to fast food crime analysis. There are four major theoretical and methodological assumptions required for geographic profiling (Rossmo, 2000): 4 1. The case involves a series of at least five crimes, committed by the same offender. The series should be relatively complete, and any missing crimes should not be spatially biased (such as might occur with a non-reporting police jurisdiction). 2. The offender has a single stable anchor point2 over the time period of the crimes. 3. The offender is using an appropriate hunting method. 4. The target backcloth is reasonably uniform. Geographic profiling is fundamentally a probabilistic form of point pattern analysis. Every additional point (i.e., offense location) in a crime series adds information, and results in greater precision. A minimum of five crime locations is necessary for stable pattern detection and an acceptable level of investigative focus; the mean in operational cases has been 14 (Rossmo, 2000, 2001). Monte Carlo testing shows with only three crimes the expected hit score percentage (defined below) is approximately 25%. By comparison, the expected search area drops to 5% with 10 crimes.3 The resolution of any method will be poor if tested on series of only a few crimes. The NIJ evaluation methodology recommends analyzing cases with as low as three crimes in the series. While there may be some research interest in studying performance for small-number crime series, the report is supposed to lay out guidelines for evaluation methodologies. The document seems at times to be confused as to its role. Research and evaluation are separate processes. At a minimum, any research results should be reported separately, and it should be made clear they do not represent operational geographic profiling performance. For evaluation purposes, it is inappropriate to include cases that fall outside the recommended operational parameters. As small-number crime series are easier to obtain than large-number crime series, there is a risk they will inappropriately drive the findings. For example, the distribution for the number of crimes in Paulsen’s (2004) analysis was heavily skewed to small-number series. Of 150 cases, only 37 (25%) meet the minimum specified requirement in geographic profiling of 5 crime locations – and 22 of those were on the borderline (6-7 crimes). Only 15 cases (10%) involved more than 7 crimes. If the offender is nomadic or transient then there may not be a residence to locate. If the offender is constantly moving residence, then multiple anchor points could be involved in a single crime series, confusing the analysis, and possibly resulting in a violation of the first assumption. It should also be remembered that what constitutes a residence for a street criminal might vary from middle-class expectations. Two geoprofiled burglary cases illustrate this point. In the first, the “home” for a group of transient gypsies was a motel where they temporarily stayed while they committed their crimes. In the second, the homeless offender’s base was a bush in a vacant lot where he slept at night. It is important in geographic profiling to consider the details of the case, the timing of the crimes, and the nature of the area where the peak geoprofile is located. Like all investigative tools, it should be used intelligently. See the discussion below regarding applicability, performance, and utility, in the section General Methodological Comments. 2 The offender’s anchor point appeared to be their residence in the majority (about 85%) of the cases analyzed by professional geographic profilers. Other examples of anchor points include the offender’s workplace or (for students) school, past residence (if the move was recent), and surrogate residences (homes of family members and friends where the offender actually lives). 3 The benchmark is 50%, what you would expect from a uniform (i.e., non-prioritized) search. 5 Offender hunting method is defined as the search for, and attack on, a victim or target (Rossmo, 1997, 2000). Geographic profiling is inappropriate for certain search and attack methods. The residence of a poacher (an offender commuting into an area to commit crimes) by definition will not be within the hunting area of the crimes (though he may be using some other anchor point, such as his workplace or a “fishing hole”). The NIJ methodology suggests that “commuters” be included in any evaluations. How that is to be done, however, is not made clear, as the report acknowledges CrimeStat and Dragnet are unable to handle this type of offender (nor can Rigel, as this is an assumption violation). Gorr (2004) presents some interesting and useful ideas for expanding geographic profiling systems to such cases and these should be explored (perhaps with the addition of a directionality component). As discussed above, it is important to distinguish research from evaluation, and report the results of each separately. While most burglars identify targets during their routine activities in areas of familiarity, others watch for news of estate sales or use accomplices who read luggage nametags at airports. Stalkers (offenders who do not attack victims upon encounter) are also problematic. In one case example, the offenders in a series of armed robberies of elderly victims in Los Angeles went to hospitals and shopping malls, selected suitable victims, and then followed them home where the robbery occurred. In this situation, the victims were choosing the crime sites – not the offenders. A geographic profile based on the robbery locations would therefore be wrong. Instead, the victim encounter sites (the shopping malls and hospitals) should be used because these are the locations the robbers had control over. All criminal profiling involves the inference of offender characteristics from offense characteristics. But this assumes freedom of offender choice; the more constrained the behavior, the less valid the inferences. A uniform (isotropic) target backcloth is thus a necessary assumption for geographic profiling. Certain offenders hunt victims whose spatial opportunity structure is patchy. A predator attacking prostitutes is one such example; he has little freedom of spatial choice as he must hunt in red light districts where potential victims are located. A final caution is necessary regarding what a geographic profile actually produces. It shows the most likely area for an offender’s anchor point (“base of operations”). While this is most often their residence, in other cases it may be their former residence, workplace, a freeway access point, or other significant activity site. This underlines why it is inappropriate to analyze cases without consideration of the underlying theory and environmental background. For example, the geoprofile in a series of bank robberies that occurred from 12:30 pm to 1:00 pm fell onto a commercially zoned section of the city. This led to the correct inference the offender was committing the crimes during his lunch break; while the geoprofile said nothing about where he lived, it accurately pointed to where he worked. For testing purposes, the use of numerous anchor points as prediction sites can quickly produce tautological results. Absent significant a priori information (as in the above bank robbery case), it represents multiple “kicks at the can,” and performance results need to be statistically adjusted accordingly. 6 Crime Site Selection The selection of which crimes to use in a geoprofile is an important process called scenario creation. For various theoretical and methodological reasons, not all crime locations4 should be used in the analysis. Independence of site selection is one issue. An arsonist walks down the street, sets fire to a dumpster, then walks around the corner and sets fire to another dumpster. We have two crimes, two addresses, two times, and potentially two victims. However, for geographic profiling purposes we would eliminate the second crime location, as it is not independent of the first location. A second issue is spatial displacement. If the offender moves his or her area of operations because of police saturation patrolling, for instance, it would likely be inappropriate to combine post-displacement with pre-displacement crime locations. NIJ Performance Measures The NIJ report lists several geographic profiling performance measures in a table (p, 16), explaining “Individual panelists also presented what they felt was their ‘favorite’ output measure” (p. 9). Reviewing the transcript in the report’s appendix shows much debate amongst the members of the expert panel on this particular topic. Employing multiple performance definitions, unless they all reflect actual use conditions, is confusing and disadvantageous. Geographic profiling is an information management strategy, primarily used for suspect and area prioritization. The proper way to test its efficacy is to employ a method that simulates what happens in the real world. The hit score percentage (or search cost) is a measure that satisfactorily accomplishes this goal. Their “popularity” notwithstanding, the other measures discussed in the report do not. The problems with these measures are discussed below. Hit Score/Search Cost The NIJ report incorrectly states “there are no existing standard for measuring accuracy” (p. 16). The hit score percentage (also referred to as search cost) is such a standard, one that accurately and correctly captures geographic profiling performance. It was established in the early 1990s (Rossmo, 1993b), and is used by all professional geographic profilers to quantify geoprofile accuracy. The report lists two “disadvantages” associated with the hit score percentage, both of which are carefully scrutinized below. However, let us first closely examine what hit score percentage measures. The hit score percentage is a measure of geographic profiling search efficiency. It is defined as the ratio of the area searched (following the geographic profiling prioritization) before the offender’s base is found, to the total hunting area; the smaller this ratio, the better the geoprofile’s focus. It is calculated by first adding the number of pixels with a hit score (likelihood value) higher than that of the pixel containing the offender’s residence to half the number of “ties” (pixels with the same hit score), and then dividing by the total number of pixels in the hunting area (40,000 for Rigel). The hunting area is defined as the rectangular zone oriented along the street grid containing all crime locations. These locations may be victim encounter points, murder scenes, body 4 There are other issues for multiple location crimes but these more typically occur in violent sexual offenses. A rape, for example, may involve separate encounter, attack, assault, and victim release sites (Rossmo, Davies, & Patrick, 2004). 7 dump sites, or some combination thereof. The term hunting area is therefore used broadly in the sense of the geographic region within which the offender chose – after some form of search or hunting process – a series of places for criminal action. Locations unknown to authorities, including those where the offender searched for victims or dump sites but was unsuccessful or chose not to act, are obviously not included…. A priori, we do not know where this hunting area is – we only know the locations of the reported, and connected, crimes. Technically, a geoprofile stretches to infinity; the hunting area is only a standardized method of displaying results so that important information is shown, and unimportant information is not. (Rossmo, 2000) In the nomenclature of point pattern analysis, the crimes are events and the hunting area is the study region (Gatrell, Bailey, Diggle, & Rowlingson, 1996). Two factors influence the hit score. First, the resolution of the pixel grid slightly influences precision as more pixels result in more decimal places (e.g., 4.87% vs. 4.9%). Second, the size of the hunting determines the denominator of this ratio. For example, a larger hunting area produces a larger denominator, therefore reducing the hit score percentage or search cost. Rigel uses a minimal bounding rectangle for its hunting area, with a small addition of a guard area for edge effects.5 Dragnet also employs a rectangular search area, defined by the offenses but magnified by 20%, displayed in a grid of 3,300 pixels (Canter et al., 2000). While the NIJ report is correct in stating this measure is dependent upon how the search area is defined, this is not a disadvantage for the purposes of comparative evaluation. As long as the same method is used consistently for all analyses in the comparison, relative performance will be the same. When you divide one ratio by another with a common denominator, the denominator is cancelled out. Therefore, the hunting area can be defined as a rectangle, circle, convex hull polygon, or any other logical shape. As long as the same procedure is consistently used, numerator comparisons will be independent of its size. The second disadvantage listed in the report for the hit score percentage is a claim that it is “Subject to severe changes in output display based on method of thematically mapping the output” (p.16). NIJ Mapping and Public Safety (MAPS) personnel were queried to clarify the meaning of this. It turns out that this is not an inherent disadvantage of the hit score percentage/search cost measure, but rather is a program-specific problem for NIJ’s CrimeStat software. Apparently, its output, when loaded into a geographic information system (GIS), can be dramatically altered depending on the thematic classification used. Nevertheless, as CrimeStat produces a “hit score map” (p. 5) this issue should be solvable (otherwise, the software could not prioritize a list of suspects or areas – the main function of geographic profiling). The equivalent of hit score percentage or search cost can also be calculated for first-order centrographic statistics (e.g., spatial mean, spatial median, “eyeball” estimates, etc.). A search process from a single point estimate involves an outward spiral. The square of the error distance divided by the total area covered by the crimes provides the appropriate search performance measure. If this is being calculated on a grid, half the “ties” (the number of pixels located exactly the error distance away) must be added to the numerator 5 Rigel calculates the boundaries of the hunting area by adding half the mean x and y interpoint distances to the most eastern and western, and northern and southern crime sites, respectively (Boots & Getis, 1988; Gatrell et al., 1996). The geoprofile is generated within the hunting area. 8 Previous Evaluations of Geographic Profiling Performance Three reported studies have examined geographic profiling performance using hit score percentage or search cost. In research on 13 solved serial murder cases from the United States, Canada, and Great Britain, involving 15 serial murderers, 178 victims, and 347 crime locations, Rossmo (1995a) observed the CGT algorithm located the offender’s residence in the top 6% of the hunting area. Canter et al. (2000) found similar results in their study of 70 U.S. serial killers using Dragnet. The mean search cost was 11%; 87% of the offenders’ home bases were found in the top 25% of the search area, 51% in the top 5%, and 15% in the top 1%. In 2001, a review was conducted of all now-solved (but initially unsolved) cases analyzed by full-time police geographic profilers (from the Vancouver Police Department, Royal Canadian Mounted Police, Ontario Provincial Police, and British National Crime Faculty). The study combined the 39 hot cases with 31 cold cases, for a total of 70 serial crime cases, representing 1,426 offenses and 1,726 crime locations (Rossmo, 2001). These involved, in order of frequency, murder, rape, arson, robbery, and sexual assault. The hot cases came from police agencies in North America, Europe, and Africa, and occurred from 1991 to 2001. There was an average of 20 crimes (25 locations), and a median of 14 crimes (17 locations) per case. The mean hit score percentage was 4.7% (SD = 4.4%), and the median was 3.0%. The performance on hot cases was about 1% better than for the cold cases, likely the result of the extra information and analysis time involved. Top Profile Area The top profile area is defined as the ratio of the total area of the top profile region to the total search area. It used with profile error distance. This is actually not a distinct performance measure (at least as explained in the report). It forms an integral part of “profile error distance” and “profile accuracy,” but does not stand alone as a performance measure. Apparently top profile area is similar to search cost, but the mathematical relationship is unclear. The “top profile region” is not defined in the NIJ report beyond the vague “the predicted most likely region containing the base of operations” (p. 15). Is this the top 10% of the area? The top five square miles? The top profile area is apparently a subjective and crude measure. Profile Error Distance This is defined as the distance from the offender’s base to the nearest point in the top profile region. The report claims it takes advantage of the whole profile, but this is incorrect. It only takes into account the “top profile area,” which is not defined. The profile error distance cannot distinguish between offender locations within the “top profile area” – they are either in or out of this region. Moreover, for offender locations outside the “top profile area,” we are back to the use of a linear measure, with all the problems associated with error distance. Profile Accuracy This is defined as a dichotomous measure of whether the offender’s base is within the “top profile area,” and provides a simple indication of whether the geoprofile was “correct.” The very use of such terms as “correct” or “accurate” shows a fundamental misunderstanding of geographic profiling. This is like saying a golf drive from the tee was “wrong” because it only 9 landed the ball in the green, and not in the hole.6 Like grenades and slow dancing, close does matter here. “Top profile area” is also undefined. Error Distance Error distance is defined in the NIJ report as the distance from the offender’s actual to predicted base of operations. While it is easily applied to centrographic measures, the complex probability surfaces produced by geographic profiling software must be reduced to a single (usually the highest) point. Several researchers have unfortunately adopted this technique because of its simplicity (Paulsen, 2004; Levine, 2002; Snook, 2000, 2004; Snook, Canter, & Bennell, 2002; Snook et al., 2004). However, common usage is not synonymous with correct usage. There are three major problems with error distance in this context (Rossmo, in press). While error distance is linear, the actual error is nonlinear. Area, rather than distance, is the relevant and required measure. Population (and therefore suspects) increases with area size, which is a function of the square of the radius (error distance) – when distance doubles, all else being equal, area and population quadruple. Another way of looking at this is to imagine a database containing 1,000 suspects in a sexual homicide case with crime scene DNA evidence. How many of these suspects need to be tested before the offender is identified? This is the appropriate test of a geographic profile’s accuracy. Table 1 compares error distance (d) and search area (πd2) results for two hypothetical geographic profiling systems (A and B). Even though system B shows better performance in terms of error distance (5% less per case on average), it is significantly worse in terms of search area (20% more per case on average). Generally, the greater the relative standard deviation, the more problematic the use of error distance. This means the error distance underestimates the impact of poor performance, favoring geographic profiling systems with higher variability, even when they do not do as well as systems that are more consistent. 6 If so desired, hit score percentage can be converted into binned results of desired area sizes or categories of performance (though this reduces a ratio-level measure to a nominal one). 10 GP System A B Case # Error Distance Search Area Error Distance Search Area 1 2 12.57 2 12.57 2 2 12.57 2 12.57 3 6 113.10 7 153.94 4 4 50.27 2 12.57 5 4 50.27 3 28.27 6 7 153.94 8 201.06 7 3 28.27 1 3.14 8 5 78.54 3 28.27 9 5 78.54 1 3.14 10 2 12.57 9 254.47 Mean 4.00 59.06 3.80 71.00 SD 1.76 47.46 3.01 94.61 Total 40.00 590.62 38.00 710.00 0.95 1.20 System B/A Ratio Table 1. Error Distance versus Search Area. The second problem with error distance is that it is not a standardized measure. While the algorithms used in Rigel and Dragnet are calibrated from point pattern statistics, and are therefore blind to the actual scale of the map, error distance is sensitive to scale. It has been reported in units ranging from millimeters to kilometers, precluding inter-study comparisons.7 For example, in a burglary series, covering 1.5 square miles in Carmel, California, an error distance of 2.1 miles is not very impressive; but it is in a robbery series covering 1,200 square miles in greater Los Angeles. Figures 1 and 2 illustrate this problem. Figure 1 displays 12 crime sites (red dots), and two offender residence scenarios (blue squares), all on a featureless background. The small black arrow in the upper left of the figure represents a short error distance of about 0.25 miles (good performance), while the green arrow represents a much less accurate error distance of about 2 miles (poor performance). Now compare these results to those from Figure 2. Again, there are 12 crime sites, but this time over a much larger scale. The error distance represented by the black arrow is 5 miles, while the error distance represented by the 7 For example, both Paulsen (2004) and Snook et al. (2004) used the negative exponential function in CrimeStat for their research. Paulsen reports a mean error distance of 6.176 kilometers, while Snook et al. report 55.9 millimeters. 11 green arrow is 25 miles. Given the provided information (the point pattern), the geoprofile’s match to the offender’s residence is still very good, yet the error distance does not capture this. Furthermore, the results from the two cases are not comparable.8 Finally, averaging results in this manner is particularly problematic because error distances from crime series occurring over large scales overwhelm error distances from crime series occurring on smaller scales. Figure 1. Crime Sites, GeoProfile, and Error Distance (1 Mile Scale). 8 Expressing the error distance as a proportion of the standard distance or the mean of the sum of all k-order nearest neighbor distances would be one way around this problem. 12 Figure 2. Crime Sites, GeoProfile, and Error Distance (10 Mile Scale). The third and most serious analytic problem with error distance is that it fails to capture how geographic profiling software actually works. Criminal hunting algorithms produce probability surfaces that look like colored topographic maps (see Figures 1, 2, 3, or Exhibit 1.1 in the NIJ report, p. 2). These geoprofiles outline the optimal search strategy. As an offender’s search is rarely uniformly concentric, simplifying a geoprofile to a single point from which to base an error distance is invalid. The use of error distance ignores most of the output from geographic profiling software and in effect undermines the very mechanics of how the process functions. Error distance simply does not accurately measure how such systems operate in actual practice. Figure 3 shows an example of this problem. It shows a geoprofile for a series of 11 sexual assaults that occurred in Mississauga, Ontario. Nine locations were used in the analysis. It was prepared by the Vancouver Police Department’s Geographic Profiling Section at the request of the Peel Regional Police prior to the case being solved. The top 4% of the geoprofile is displayed. The crime sites are indicated by red circles, and the offender’s residence by a blue square. The geoprofile was used, amongst other criteria (e.g., uncorroborated alibi, similarity to a composite sketch, etc.) to help investigators prioritize an initial list of 312 suspects for DNA testing. The individual ultimately convicted of the crime series was ranked sixth out of 312 based on the geoprofile alone (he actually scored higher because of the other criteria). 13 Figure 3. Sexual Assault GeoProfile. The offender resided in the top 2.2% of the hunting area of the original geoprofile, and was in the top 1.9% of the list of suspects.9 But using error distance (0.50 miles, shown as an arrow in Figure 1) places the offender in the top 53.8% of the hunting area – a proportion almost 25 9 It is worth mentioning that the number of suspects generated in a criminal investigation is not strictly a function of the geographic area involved. Equally important factors include the type of crime, number of offenses, length of investigation, volume of police resources, number of jurisdictions, and levels of public fear and media interest. A rape series covering five square miles is more likely to have more suspects than a burglary series covering 10 square miles. 14 times as large. Error distance does not even closely measure the performance of this particular geoprofile. This same problem with error distance can be seen with the geographic profile displayed in Exhibit 1.1 (p. 2) of the NIJ report. A careful inspection of the jeopardy surface and the geoprofile in the inset shows two separate peak areas. The highest point of the geoprofile is in the upper peak area, but the offender’s residence was in the lower peak (his probation office was in the upper peak). The error distance in this case does not represent the geoprofile’s search efficiency. Generally, the extent to which the peak area of any geoprofile deviates in shape from a perfect circle is related to the degree that the spatial distribution of crime sites is distorted from the concentric. This is typically the result of an asymmetrical offender activity space and a less than uniform (anisotropic) target backcloth. General Methodological Comments The report contains discussion of both evaluation methodology and research initiatives for geographic profiling. There needs to be a clear distinction between the two – though having said that, there are several excellent research ideas in the report. For example, varying the values of different input parameters and sensitivity analysis (p. 8) – providing good datasets are used – might fine tune applications to environmental conditions. Deleting non-inhabitable areas from the analysis would produce more accurate search cost estimates. Randomly dropping crimes (p. 17) from series could help determine robustness, though crimes should be eliminated both randomly, to simulate non-reporting or linkage failures, and in reverse chronological order, to test performance over series progression (Brantingham & Brantingham, 1981). A caveat is required here. There are three desirable qualities in model building: (1) realism; (2) generality; and (3) precision (Levins, 1966; see also Winterhalder & Smith, 1992, p. 13). Tradeoffs are usually required as improvements in one model quality generally result in the deterioration of one or both of the other qualities. Greater precision, for example, might well require a loss of robustness. Striking the optimal balance will be an issue for any efforts to improve geographic profiling models. CrimeStat allows the user to select from multiple (five, plus any specifically calibrated functions). Only one of its functions should be chosen a priori for evaluation purposes. Communication with some of CrimeStat’s 1,500 law enforcement users (p. 11) should help NIJ narrow down what works best. Testing on more than one function is equivalent to allowing for repeated tries, and performance results would have to be statistically adjusted downwards. Any testing of CrimeStat’s jurisdiction and crime-specific calibrated function obviously requires separate learning and testing datasets.10 Evaluation of the costs associated with a geographic profiling system (e.g., CrimeStat) should include all federal funding grants and other public taxpayers’ dollars. Applicability, Performance, and Utility The evaluation of geographic profiling as an investigative methodology is not addressed in the NIJ report, which focused strictly on associated software performance. Only a part of the 10 See Canter and Gregory (1994), for an egregious example of this type of error. After calibrating the parameters for a multivariable geographic profiling system for stranger rape, the model was tested on its learning data. Not surprisingly, it achieved a 100% “success” rate. 15 process of geographic profiling (and not the most important) is therefore discussed. A more comprehensive approach needs to consider applicability, performance, and utility (APU). Applicability refers to how often geographic profiling is an appropriate investigative methodology. It would be beneficial to know the frequency of serial crime cases that meet the criteria and assumptions for geographic profiling, and the ability of law enforcement agencies to recognize such patterns. Utility refers to how useful or helpful geographic profiling is in a police investigation. Geographic profiling only plays a supporting role as it cannot directly solve a crime – only a witness, confession, or physical evidence can do that (Klockars & Mastrofski, 1991). How it plays that role, how often, how well, and how it compares to other available offender location techniques are important research questions. Let us consider an APU evaluation in the context of the anchor point/base/residence discussion. We will use the example of a homeless sex offender who has molested several women. This case does not fit the criteria for geographic profiling because there is no anchor point to geoprofile. But what if the offender lived in his car, which was always parked in the same location? Now we have applicability – a stable offender anchor point. Next, let us assume the geoprofile locates the offender’s car in the top 4% of the five square mile hunting area, so we have reasonably good performance. However, this does not mean we have utility. There is no relevant database for investigators to search in this situation. On the other hand, if the victims reported the molester was unkempt in appearance and accurately described his vehicle and contents, an astute detective may conclude the offender is living out of his car and direct patrol search efforts accordingly. Now we have utility. Alternatively, a geoprofile could accurately predict the residence of a rapist, but lose utility because the police jurisdiction did not keep its sex offender registry updated. An argument could be made in this situation for potential utility, as the problem area is the record keeping system, and other law enforcement agencies may have better maintenance. Finally, for an APU assessment to have meaning, it must be considered within a cost/benefit framework and compared to similar analyses for other investigative techniques (e.g., fingerprinting, canvassing, DNA, tip lines, behavioral profiling, etc.). Conclusion We must be absolutely certain that the way in which we think we are empirically evaluating our research question is the way in which we actually are evaluating the question. That is, the statistical model must maximally correspond to the theoretical model, and the degree to which these diverge correspondingly threatens the validity of the empirically based inferences we can draw … (Curran & Willoughby, 2003, pp. 581-582) Replication is an integral and critical part of the scientific process. However, it must be done properly. In the case of geographic profiling, this involves analysing those cases and crimes, and only those cases and crimes, appropriate for the technique, and measuring performance by mathematically sound methods. The hit score percentage/search cost measure accurately captures how these systems work in the real world, while possessing no intrinsic disadvantages. Error distance, top profile area, and their other associated measures do not reflect actual performance, and do not meet NIJ’s standard of a “fair and rigorous methodology for evaluating geographic profiling software.” 16 References Boots, B. N., & Getis, A. (1988). Point pattern analysis. Sage university paper series on scientific geography, 8. Beverly Hills: Sage. Brantingham, P. J., & Brantingham, P. L. (Eds.). (1981). Environmental criminology. Beverly Hills: Sage. Brantingham, P. J., & Brantingham, P. L. (1984). Patterns in crime. New York: Macmillan. Brantingham, P. L., & Brantingham, P. J. (1981). Notes on the geometry on crime. In P. J. Brantingham & P. L. Brantingham (Eds.), Environmental criminology (pp. 27-54). Beverly Hills: Sage. Brantingham, P. L., & Brantingham, P. J. (1993). Environment, routine and situation: Toward a pattern theory of crime. In R. V. Clarke & M. Felson (Eds.), Routine activity and rational choice (pp. 259-294). New Brunswick, NJ: Transaction. Canter, D. V., Coffey, T., Huntley, M., & Missen, C. (2000). Predicting serial killers’ home base using a decision support system. Journal of Quantitative Criminology, 16, 457-478. Canter, D. V., & Gregory, A. (1994). Identifying the residential location of rapists. Journal of the Forensic Science Society, 34, 169-175. Canter, D. V., & Larkin, P. (1993). The environmental range of serial rapists. Journal of Environmental Psychology, 13, 63-69. Canter, D. V., & Snook, B. (1999, December). Modelling the home location of serial offenders. Paper presented at the meeting of the Crime Mapping Research Center, Orlando, FL. Clarke, R. V., & Felson, M. (Eds.). (1993). Routine activity and rational choice. New Brunswick, NJ: Transaction. Cohen, L., & Felson, M. (1979). Social change and crime rate trends: A routine activity approach. American Sociological Review, 44, 588-608. Cornish, D. B., & Clarke, R. V. (Eds.). (1986). The reasoning criminal: Rational choice perspectives on offending. New York: Springer-Verlag. Curran, P. J., & Willoughby, M. T. (2003). Implications of latent trajectory models for the study of developmental psychopathology. Development and Psychopathology, 15, 581-612. Felson, M. (2002). Crime and everyday life (3rd ed.). Thousand Oaks, CA: Sage. Gates, D. F., & Shah, D. K. (1992). Chief. New York: Bantam Books. Gatrell, A. C., Bailey, T. C., Diggle, P. J., & Rowlingson, B. S. (1996). Spatial point pattern analysis and its application in geographical epidemiology. Transactions of the Institute of British Geographers, 21, 256-274. Gorr, W. L. (2004). Framework for validating geographic profiling using samples of solves serial crimes. Unpublished manuscript, Carnegie Mellon University, H. J. Heinz III School of Public Policy and Management, Pittsburgh, PA. Harries, K. (1999). Mapping crime: Principle and practice (NCJ 178919). Washington, DC: National Institute of Justice. Klockars, C. B., & Mastrofski, S. D. (Eds.). (1991). Thinking about police: Contemporary readings (2nd ed.). New York: McGraw-Hill. Levine, N. (2002). CrimeStat: A spatial statistics program for the analysis of crime incident locations (v. 2.0). Washington, DC: National Institute of Justice. Levins, R. (1966). The strategy of model-building in population biology. American Scientist, 54, 421-431. MacKay, R. E. (1999, December). Geographic profiling: A new tool for law enforcement. The Police Chief, pp. 51-59. 17 Paulsen, D. J. (2004, March). Geographic profiling: Hype or hope? – Preliminary results into the accuracy of geographic profiling software. Paper presented at the UK Crime Mapping Conference, London, UK. Paulsen, D. J., & Robinson, M. B. (2004). Spatial aspects of crime: Theory and practice. Allyn and Bacon: Boston. Rich, T., & Shively, M. (2004). A methodology for evaluating geographic profiling software. Cambridge, MA: Abt Associates. Rossmo, D. K. (1992a, April). Targeting victims: Serial killers and the urban environment. Paper presented at the First International Conference on Serial and Mass Murder: Theory, Research and Policy, Windsor, ON. Rossmo, D. K. (1992b, November). Target patterns of serial murderers: A methodological model. Paper presented at the meeting of the American Society of Criminology, New Orleans, LA. Rossmo, D. K. (1993a). Geographic profiling: Locating serial killers. In D. Zahm & P. F. Cromwell (Eds.), Proceedings of the International Seminar on Environmental Criminology and Crime Analysis (pp. 14-29). Coral Gables, FL: Florida Criminal Justice Executive Institute. Rossmo, D. K. (1993b). Multivariate spatial profiles as a tool in crime investigation. In C. R. Block & M. Dabdoub (Eds.), Workshop on crime analysis through computer mapping; Proceedings: 1993 (pp. 89-126). Chicago: Illinois Criminal Justice Information Authority. Rossmo, D. K. (1993c). Target patterns of serial murderers: A methodological model. American Journal of Criminal Justice, 17(2), 1-21. Rossmo, D. K. (1995a). Geographic profiling: Target patterns of serial murderers. Unpublished doctoral dissertation, Simon Fraser University, Burnaby, BC. Rossmo, D. K. (1995b). Place, space, and police investigations: Hunting serial violent criminals. In J. E. Eck & D. A. Weisburd (Eds.), Crime and place: Crime prevention studies, Vol. 4 (pp. 217-235). Monsey, NY: Criminal Justice Press. Rossmo, D. K. (1997). Geographic profiling. In J. L. Jackson & D. A. Bekerian (Eds.), Offender profiling: Theory, research and practice (pp. 159-175). Chichester: John Wiley & Sons. Rossmo, D. K. (2000). Geographic profiling. Boca Raton, FL: CRC Press. Rossmo, D. K. (2001, November). Evaluation of geographic profiling search strategies. Paper presented at the meeting of the American Society of Criminology, Atlanta, GA. Rossmo, D. K. (in press). Geographic heuristics or shortcuts to failure?: Response to Snook et al. Applied Cognitive Psychology. Rossmo, D. K., Davies, A., & Patrick, M. (2004). Exploring the geo-demographic and distance relationships between stranger rapists and their offences (Special Interest Series: Paper 16). London: Research, Development and Statistics Directorate, Home Office. Smith, E. A., & Winterhalder, B. (Eds.). (1992). Evolutionary ecology and human behavior. New York: Aldine de Gruyter. Snook, B. (2000, December). Utility or futility? A provisional examination of the utility of a geographical decision support system. Paper presented at the meeting of the Crime Mapping Research Center, San Diego, CA. Snook, B. (2004). Individual differences in distance travelled by serial burglars. Journal of Investigative Psychology and Offender Profiling, 1, 53-66. 18 Snook, B., Canter, D. V., & Bennell, C. (2002). Predicting the home location of serial offenders: A preliminary comparison of the accuracy of human judges with a geographic profiling system. Behavioral Sciences and the Law, 20, 109-118. Snook, B., Taylor, P. J., & Bennell, C. (2004). Geographic profiling: The fast, frugal, and accurate way. Applied Cognitive Psychology, 18, 105-121. Winterhalder, B., & Smith, E. A. (1992). Evolutionary ecology and the social sciences. In E. A. Smith & B. Winterhalder (Eds.), Evolutionary ecology and human behavior (pp. 3-23). New York: Aldine de Gruyter. 19