A GIS Based Artificial Intelligence Clustering Algorithm to Detect AutoTheft Recovery Patterns Using Artificial Intelligence to assist Law Enforcement Kenneth Reynolds1, Chief Ernest Scott2, Ron Eaglin3, Paul Pan4, Olcay Kursun3 1Department of Criminal Justice, U. of Central Florida, 2Orange County Sheriff’s Office, 3School of Engineering and Computer Science, U. of Central Florida, 4Project Dragon, Community Park, Cardiff UCF 2 Research in Criminal Justice Dept at the University of Central Florida, Orlando, FL Collaboration with Orange County Sheriff’s Office and neighboring counties Data Collection, Format Standardization (separate parser for each agency) Distributed Data Querying and Mining Data Sharing among jurisdictions Eliminate duplicate efforts Create opportunities to suppress criminal activity occurring in multiple jurisdictions UCF 3 Impact on the Media Channel 9 – News report on Auto-theft work Orlando Sentinel, “Sheriffs Discuss Statewide Database”, March 14, 2005 Orlando Sentinel, “Database Connects Cops”, February 6, 2005 Police Executive Research Forum, “A Grassroots Approach To Data Sharing”, February 2005 Orlando Sentinel, “Feds Can Learn From Florida”, January 20, 2005 Business Pipeline, “Florida Crime-fighters Teach Lesson in Business Intelligence”, January 27, 2005 Information Week, “Florida Police Network Supports Post-9/11 Promises”, January 20, 2005 Palm Beach Post, “Police, Sheriff To Test Data Net”, December 7, 2005 Governing, “Scared To Share”, Fall 2004 State Tech, “Improving The Odds”, Summer 2004 Converge On-Line, “Data-Sharing System Fights Terrorism”, July 2004 Daytona Beach News-Journal, “Data Sharing Gets Easier For Area Police Departments”, May 22, 2004 Sarasota Herald Tribune, “Sheriffs’ Computer Links May Help Nail Multi-County Criminals”, April 15, 2004 Capitol News Service, “Police Lobby Lawmakers for Better Communication”, April 15, 2004 Government Technology, “Florida Data-Sharing System Helps Police Nab Suspects, Shorten Investigations”, April 6, 2004 Associated Press, “Florida Legislature: House Proposes Additional $75 Million for Domestic Security”, March 23, 2004 WESH-Channel 2, WFTV-Channel 9, and WKMG Channel 6 – stories in Spring 2004 and March 2005 UCF 4 The Need for the Technology UCF Every 20 seconds a motor vehicle is stolen in the United States (FBI statistics) Insurance companies spend billions of dollars each year compensating owners of stolen vehicles (Insurance Information Institute statistics) Auto-theft investigators and Auto-trap unit have identified the need in day to day operations for assistance in predicting trends and patterns of motor vehicle thefts Small percentage of criminals are responsible for a big percentage of the thefts Light punishment vs low chance of getting caught “Society-feel-good thing” 5 Key Statistics UCF 2003 Theft Statistics: Every 25 seconds, a motor vehicle is stolen in the United States. The odds of a vehicle being stolen were 1 in 190 in 2003 (latest data available). The odds are highest in urban areas. U.S. motor vehicle thefts rose 1.1 percent in 2003 from 2002, according to the FBI's Uniform Crime Reports. In 2003, 1,260,471 motor vehicles were reported stolen. Nationwide, the 2003 motor vehicle theft rate per 100,000 people was 433.4, up 0.1 percent from 432.9 in 2002. Only 13.1 percent of thefts were cleared by arrests in 2003. The average comprehensive insurance premium in the U.S. rose 9.9 percent from 1998 to 2002, the most recent data available. 6 Key Statistics – Motor Vehicle Theft, Top Ten U.S. Metropolitan Areas, 2003 Rank UCF City #Thefts 1 Modesto, CA 6,016 2 Phoenix-Mesa, AZ 40,769 3 Stockton-Lodi, CA 6,730 4 Las Vegas, NV-AZ 18,103 5 Sacramento, CA 17,054 6 Fresno, CA 7 Oakland, CA 23,199 8 Miami, FL 21,088 9 San Diego, CA 26,091 10 Detroit, MI 40,197 9,102 7 Goals of AI research in Auto-Theft Generate information needed to assist in prevention and suppression of criminal activity Detection of hot-spots of the thefts Detection of preferred drop locations Finding sets of related events Make full use of GIS by automation of the laborious tasks of the law enforcement officers UCF 8 Tasks Visualize the incidents Cluster the events for summarization Predict drop locations for stolen vehicles Link analysis for suspect vehicles Make available for online use UCF 9 Why Automation? Why Not Just Map? UCF DAY 1 DAY 3 DAY 2 DAYS 1, 2, 3 10 Simply Mapping Does NOT Help High number of spots, gets even worse over long periods of time Repeat-crime and one-time-crime confusion What is it that the user should look for on the map with so many dots? UCF 11 Overview of our Expert System UCF Similar to how experts do it Identify the commonalities among events Measure dissimilarity (distance) of events Much faster and more robust Some parameters must be determined 12 A simple demonstrative example: Data and Cluster analysis at a click UCF 13 Zoom in for viewing individual clusters: UCF 14 Zoom closer, measure distances, run queries, etc… UCF 15 Cluster Analysis An effective method for determining areas with high concentrations of crime Suspiciously similar criminal activity For auto-theft, the number of potential targets is large Alternative: Capture the criminals at the place of the drop We need to define the concept of a cluster rather than how many clusters to find UCF 16 How to Make Use of Clusters? UCF Size and recency of events as a measure of cluster significance Assign police officers to patrol the most preferred drop locations Identify what are the most common features of auto-thefts for community warnings Further analyze these groups of related events by using additional non-numerical clues available in the narrative. 17 How to Find Clusters? Choose & convert data to numerical format For example, addresses must be geocoded to produce planar or spherical coordinates Put close events (in feature space) in the same cluster Put distant events in different clusters Distance (dissimilarity) of two events is a weighted sum of differences of each data field UCF 18 Finding Clusters: An Illustration N+ H* N– H* N+ For this sample map of recovery locations, suppose: N = Nissan H = Honda Three clusters denoted by the *, +, – symbols UCF 19 Clustering Technique and the Parameters Distance measure Weights Dist X ,Y 2 ⎛ ⎞ − X Y = ∑ ⎜ wi ⋅ i 2 i ⎟ ⎜ ⎟ σi i ⎝ ⎠ Average Distance in the feature space How much deviation from the average Sensitivity Upper-bound for data fields Embed UCF extra knowledge into the analysis 20 Clustering Algorithm Threshold = AverageDistance ⋅ Sensitivity • For all clusters, find Di: the distance of N to the ith cluster • Set D equal to Dm: the minimum of Di • If D is not greater than Threshold then N belongs to the cluster m • Otherwise, a new cluster C is created and N is placed into the cluster C UCF 21 Simulation Dataset Feature Weight Upper-bound Make of the vehicle 20.0 single make in a cluster Year of the vehicle 2.5 10 years X, Y coordinate of the theft location 15.0 Not used X, Y coordinate of the recovery location 15.0 Not used Date of the theft 10.0 60 days Date of the recovery 10.0 60 days UCF In our dataset, we have approximately 1000 auto-thefts in Orange County from 2002 to 2004. 22 Event ID Recovery X Recovery Y YEAR MAKE SDATE 1 504071 1540191 1994 HONDA 10/10/02 2 506029 1543959 1996 HONDA 10/11/02 3 501894 1544772 1997 HONDA 10/12/02 4 508456 1540932 1996 HONDA 10/15/02 5 510511 1541151 1994 HONDA 10/18/02 6 511384 1541098 1995 HONDA 10/19/02 7 510519 1541301 1997 HONDA 10/20/02 8 506034 1544827 1996 HONDA 10/23/02 9 511622 1542343 1994 HONDA 10/24/02 10 513009 1540875 1994 HONDA 10/25/02 11 510841 1543096 1997 HONDA 11/01/02 12 507674 1539989 1994 HONDA 11/04/02 13 503693 1540456 1994 HONDA 11/05/02 14 511984 1542603 1994 HONDA 11/08/02 15 514597 1541363 1994 HONDA 11/08/02 16 506053 1535721 1994 HONDA 11/08/02 17 511551 1540004 1995 HONDA 11/08/02 18 504396 1543033 1996 HONDA 11/08/02 19 514597 1541363 1994 HONDA 11/09/02 20 513492 1535572 1995 HONDA 11/15/02 21 513543 1540593 1997 HONDA 12/04/02 UCF 23 RECOVERED AT YEAR MAKE MODEL SDATE STOLEN FROM 6825 AMBASSADER DR 1994 HONDA ACCORD 10/10/02 408 ORLANDO AV 2A 2812 N POWERS DR 1996 HONDA ACCORD 10/11/02 913 CROWSNEST CI ATT AUTO THEFT ONLY 7367 BORDWINE AV 1997 HONDA CIVIC 10/12/02 7367 BORDWINE AV 5818 HOLMES DR 1996 HONDA ACCORD 10/15/02 6330 LK HORSE SHOE VERANDA CI/INDIATLANTIC DR 1994 HONDA ACCORD 10/18/02 3718 RUNDO DR INDIATLANTIC@QUEENSWAY RD 1995 HONDA ACCORD 10/19/02 6417 GAMBLE DR OAKBRIDGE WY / ATRIUM CICLE 1997 HONDA ACCORD 10/20/02 1331 N PINE HILLS RD 3024 N POWERS DR 1996 HONDA ACCORD 10/23/02 ATT AUTO THEFT ONLY 2424 QUEENSWAY DR 1994 HONDA ACCORD 10/24/02 2424 QUEENSWAY DR 4850 INDIATLANTIC DR 1994 HONDA ACCORD 10/25/02 2904 SPRING HILL CT 5320 W SILVER STAR RD 1997 HONDA ACCORD 11/01/02 5429 POINTE VISTA CI 2098 LEISURE DR 1994 HONDA ACCORD 11/04/02 1511 PINELAKE RD OWASSO CT / DERRICK DR 1994 HONDA ACCORD 11/05/02 2225 PIPESTONE CT 5000 HOMESTEAD DR 1994 HONDA ACCORD 11/08/02 4824 PAT ANN TERR 4515 CHARLEEN TERRACE 1994 HONDA ACCORD 11/08/02 5416 PINTO WAY POWERS DR / MOORE ST 1994 HONDA ACCORD 11/08/02 6005 POWDER POST DR 5211 HERNANDES DR 1995 HONDA ACCORD 11/08/02 N/A 6808 SILVER STAR RD 1996 HONDA ACCORD 11/08/02 N/A 4515 CHARLEEN TERRACE 1994 HONDA ACCORD 11/09/02 4325 CAROUSEL RD 4802 BURGANDY LANE 1995 HONDA ACCORD 11/15/02 3980 VERSAILLES DR 4824 JUDY ANN CT 1997 HONDA ACCORD 12/04/02 4557 FRISCO UCF N/A 24 Data Mapping and Visualization UCF Web Server (Microsoft IIS 4.0), ArcIMS and Oracle form a three-tier architecture that eases data visualization and publishing We use Oracle 8i to store and manage the data Each record is associated with a spatial column computed from X/Y coordinates Each X/Y coordinate pair is geocoded from the address field of each record with ArcIMS ArcIMS is an ESRI GIS product, which can be used for delivering dynamic maps and GIS data and services via the Web Another ESRI product ArcSDE, a Spatial Data Engine, is used as a gateway to access data from ArcIMS 25 Data Issues UCF Duplicate records Bad addresses (not geocodable: “Dean Rd.”) Auto-trap unit Unpopulated fields (location types) Preprocessing needs (for example in the make field: HON = HOND = HONDA) Natural Language Processing needs for the narrative section Tracking multi-jurisdictional incidents (thieves admit that they steal property in one and pawn in another) 26 System Use UCF Currently in use for evaluation and potential enhancements by the Criminal Investigation Division’s Auto Theft Unit of the Orange County Sheriffs Office in Orlando, Florida. 27 Conclusions Developing GIS (geographic information systems) applications is slow and costly Worth having the geographical perspective of incidents within the community Eases the visualization of data, especially with animated maps Helps law enforcement officers discover the patterns of incidents and take necessary measures to prevent them UCF 28 Future Directions Currently, the analysis can be done only offline (some manual preprocessing is needed) Need to develop the application in Java or .NET in order to make it usable online In the online versions, automatic warnings about the hot-spots can be given UCF 29 Contact Us UCF Partnership Building 12354 Research Parkway Orlando, FL K. Michael Reynolds kreynold@mail.ucf.edu 407-823-2943 Ron Eaglin reaglin@mail.ucf.edu 407-823-5937 30