A GIS Based Artificial Intelligence Clustering Algorithm to Detect Auto- Theft Recovery Patterns

advertisement
A GIS Based Artificial
Intelligence Clustering
Algorithm to Detect AutoTheft Recovery Patterns
Using Artificial Intelligence to
assist Law Enforcement
Kenneth Reynolds1,
Chief Ernest Scott2,
Ron Eaglin3,
Paul Pan4,
Olcay Kursun3
1Department
of Criminal Justice, U. of Central Florida,
2Orange County Sheriff’s Office,
3School of Engineering and Computer Science, U. of Central Florida,
4Project Dragon, Community Park, Cardiff
UCF
2
Research in Criminal Justice Dept at the
University of Central Florida, Orlando, FL
„
„
„
„
Collaboration with Orange County Sheriff’s
Office and neighboring counties
Data Collection, Format Standardization
(separate parser for each agency)
Distributed Data Querying and Mining
Data Sharing among jurisdictions
… Eliminate
duplicate efforts
… Create opportunities to suppress criminal activity
occurring in multiple jurisdictions
UCF
3
Impact on the Media
Channel 9 – News report on Auto-theft work
Orlando Sentinel, “Sheriffs Discuss Statewide Database”, March 14, 2005
Orlando Sentinel, “Database Connects Cops”, February 6, 2005
Police Executive Research Forum, “A Grassroots Approach To Data Sharing”, February 2005
Orlando Sentinel, “Feds Can Learn From Florida”, January 20, 2005
Business Pipeline, “Florida Crime-fighters Teach Lesson in Business Intelligence”, January 27, 2005
Information Week, “Florida Police Network Supports Post-9/11 Promises”, January 20, 2005
Palm Beach Post, “Police, Sheriff To Test Data Net”, December 7, 2005
Governing, “Scared To Share”, Fall 2004
State Tech, “Improving The Odds”, Summer 2004
Converge On-Line, “Data-Sharing System Fights Terrorism”, July 2004
Daytona Beach News-Journal, “Data Sharing Gets Easier For Area Police Departments”, May 22, 2004
Sarasota Herald Tribune, “Sheriffs’ Computer Links May Help Nail Multi-County Criminals”, April 15,
2004
Capitol News Service, “Police Lobby Lawmakers for Better Communication”, April 15, 2004
Government Technology, “Florida Data-Sharing System Helps Police Nab Suspects, Shorten
Investigations”, April 6, 2004
Associated Press, “Florida Legislature: House Proposes Additional $75 Million for Domestic Security”,
March 23, 2004
WESH-Channel 2, WFTV-Channel 9, and WKMG Channel 6 – stories in Spring 2004 and March 2005
UCF
4
The Need for the Technology
„
„
„
„
„
„
UCF
Every 20 seconds a motor vehicle is stolen in the United
States (FBI statistics)
Insurance companies spend billions of dollars each year
compensating owners of stolen vehicles (Insurance
Information Institute statistics)
Auto-theft investigators and Auto-trap unit have identified
the need in day to day operations for assistance in
predicting trends and patterns of motor vehicle thefts
Small percentage of criminals are responsible for a big
percentage of the thefts
Light punishment vs low chance of getting caught
“Society-feel-good thing”
5
Key Statistics
„
„
„
„
„
UCF
2003 Theft Statistics: Every 25 seconds, a motor vehicle
is stolen in the United States. The odds of a vehicle
being stolen were 1 in 190 in 2003 (latest data
available). The odds are highest in urban areas.
U.S. motor vehicle thefts rose 1.1 percent in 2003 from
2002, according to the FBI's Uniform Crime Reports. In
2003, 1,260,471 motor vehicles were reported stolen.
Nationwide, the 2003 motor vehicle theft rate per
100,000 people was 433.4, up 0.1 percent from 432.9 in
2002.
Only 13.1 percent of thefts were cleared by arrests in
2003.
The average comprehensive insurance premium in the
U.S. rose 9.9 percent from 1998 to 2002, the most
recent data available.
6
Key Statistics – Motor Vehicle Theft, Top
Ten U.S. Metropolitan Areas, 2003
Rank
UCF
City
#Thefts
1
Modesto, CA
6,016
2
Phoenix-Mesa, AZ
40,769
3
Stockton-Lodi, CA
6,730
4
Las Vegas, NV-AZ
18,103
5
Sacramento, CA
17,054
6
Fresno, CA
7
Oakland, CA
23,199
8
Miami, FL
21,088
9
San Diego, CA
26,091
10
Detroit, MI
40,197
9,102
7
Goals of AI research in Auto-Theft
Generate information needed to assist in
prevention and suppression of criminal
activity
„ Detection of hot-spots of the thefts
„ Detection of preferred drop locations
„ Finding sets of related events
„ Make full use of GIS by automation of the
laborious tasks of the law enforcement
officers
„
UCF
8
Tasks
Visualize the incidents
„ Cluster the events for summarization
„ Predict drop locations for stolen vehicles
„ Link analysis for suspect vehicles
„ Make available for online use
„
UCF
9
Why Automation? Why Not Just Map?
UCF
DAY 1
DAY 3
DAY 2
DAYS 1, 2, 3
10
Simply Mapping Does NOT Help
High number of spots, gets even worse
over long periods of time
„ Repeat-crime and one-time-crime confusion
„ What is it that the user should look for on
the map with so many dots?
„
UCF
11
Overview of our Expert System
„
„
„
„
„
UCF
Similar to how experts do it
Identify the commonalities among events
Measure dissimilarity (distance) of events
Much faster and more robust
Some parameters must be determined
12
A simple demonstrative example:
Data and Cluster analysis at a click
UCF
13
Zoom in for viewing individual clusters:
UCF
14
Zoom closer, measure distances, run
queries, etc…
UCF
15
Cluster Analysis
„
An effective method for determining areas
with high concentrations of crime
… Suspiciously
similar criminal activity
For auto-theft, the number of potential
targets is large
„ Alternative: Capture the criminals at the
place of the drop
„ We need to define the concept of a cluster
rather than how many clusters to find
„
UCF
16
How to Make Use of Clusters?
„
„
„
„
UCF
Size and recency of events as a measure of
cluster significance
Assign police officers to patrol the most
preferred drop locations
Identify what are the most common features of
auto-thefts for community warnings
Further analyze these groups of related events
by using additional non-numerical clues
available in the narrative.
17
How to Find Clusters?
Choose & convert data to numerical format
„ For example, addresses must be geocoded
to produce planar or spherical coordinates
„ Put close events (in feature space) in the
same cluster
„ Put distant events in different clusters
„ Distance (dissimilarity) of two events is a
weighted sum of differences of each data
field
„
UCF
18
Finding Clusters: An Illustration
N+
H*
N–
H*
N+
For this sample map of recovery locations, suppose:
N = Nissan
H = Honda
Three clusters denoted by the *, +, – symbols
UCF
19
Clustering Technique and the Parameters
„
Distance measure
… Weights
Dist X ,Y
2
⎛
⎞
−
X
Y
= ∑ ⎜ wi ⋅ i 2 i ⎟
⎜
⎟
σi
i ⎝
⎠
Average Distance in the feature space
„ How much deviation from the average
„
… Sensitivity
„
Upper-bound for data fields
… Embed
UCF
extra knowledge into the analysis
20
Clustering Algorithm
Threshold = AverageDistance ⋅ Sensitivity
• For all clusters, find Di: the distance of N to the
ith cluster
• Set D equal to Dm: the minimum of Di
• If D is not greater than Threshold then N
belongs to the cluster m
• Otherwise, a new cluster C is created and N is
placed into the cluster C
UCF
21
Simulation Dataset
Feature
Weight
Upper-bound
Make of the vehicle
20.0
single make in a cluster
Year of the vehicle
2.5
10 years
X, Y coordinate of the theft location
15.0
Not used
X, Y coordinate of the recovery location
15.0
Not used
Date of the theft
10.0
60 days
Date of the recovery
10.0
60 days
„
UCF
In our dataset, we have approximately 1000
auto-thefts in Orange County from 2002 to
2004.
22
Event ID
Recovery X
Recovery Y
YEAR
MAKE
SDATE
1
504071
1540191
1994
HONDA
10/10/02
2
506029
1543959
1996
HONDA
10/11/02
3
501894
1544772
1997
HONDA
10/12/02
4
508456
1540932
1996
HONDA
10/15/02
5
510511
1541151
1994
HONDA
10/18/02
6
511384
1541098
1995
HONDA
10/19/02
7
510519
1541301
1997
HONDA
10/20/02
8
506034
1544827
1996
HONDA
10/23/02
9
511622
1542343
1994
HONDA
10/24/02
10
513009
1540875
1994
HONDA
10/25/02
11
510841
1543096
1997
HONDA
11/01/02
12
507674
1539989
1994
HONDA
11/04/02
13
503693
1540456
1994
HONDA
11/05/02
14
511984
1542603
1994
HONDA
11/08/02
15
514597
1541363
1994
HONDA
11/08/02
16
506053
1535721
1994
HONDA
11/08/02
17
511551
1540004
1995
HONDA
11/08/02
18
504396
1543033
1996
HONDA
11/08/02
19
514597
1541363
1994
HONDA
11/09/02
20
513492
1535572
1995
HONDA
11/15/02
21
513543
1540593
1997
HONDA
12/04/02
UCF
23
RECOVERED AT
YEAR
MAKE
MODEL
SDATE
STOLEN FROM
6825 AMBASSADER DR
1994
HONDA
ACCORD
10/10/02
408 ORLANDO AV 2A
2812 N POWERS DR
1996
HONDA
ACCORD
10/11/02
913 CROWSNEST CI
ATT AUTO THEFT ONLY 7367 BORDWINE AV
1997
HONDA
CIVIC
10/12/02
7367 BORDWINE AV
5818 HOLMES DR
1996
HONDA
ACCORD
10/15/02
6330 LK HORSE SHOE
VERANDA CI/INDIATLANTIC DR
1994
HONDA
ACCORD
10/18/02
3718 RUNDO DR
INDIATLANTIC@QUEENSWAY RD
1995
HONDA
ACCORD
10/19/02
6417 GAMBLE DR
OAKBRIDGE WY / ATRIUM CICLE
1997
HONDA
ACCORD
10/20/02
1331 N PINE HILLS RD
3024 N POWERS DR
1996
HONDA
ACCORD
10/23/02
ATT AUTO THEFT ONLY 2424 QUEENSWAY DR
1994
HONDA
ACCORD
10/24/02
2424 QUEENSWAY DR
4850 INDIATLANTIC DR
1994
HONDA
ACCORD
10/25/02
2904 SPRING HILL CT
5320 W SILVER STAR RD
1997
HONDA
ACCORD
11/01/02
5429 POINTE VISTA CI
2098 LEISURE DR
1994
HONDA
ACCORD
11/04/02
1511 PINELAKE RD
OWASSO CT / DERRICK DR
1994
HONDA
ACCORD
11/05/02
2225 PIPESTONE CT
5000 HOMESTEAD DR
1994
HONDA
ACCORD
11/08/02
4824 PAT ANN TERR
4515 CHARLEEN TERRACE
1994
HONDA
ACCORD
11/08/02
5416 PINTO WAY
POWERS DR / MOORE ST
1994
HONDA
ACCORD
11/08/02
6005 POWDER POST DR
5211 HERNANDES DR
1995
HONDA
ACCORD
11/08/02
N/A
6808 SILVER STAR RD
1996
HONDA
ACCORD
11/08/02
N/A
4515 CHARLEEN TERRACE
1994
HONDA
ACCORD
11/09/02
4325 CAROUSEL RD
4802 BURGANDY LANE
1995
HONDA
ACCORD
11/15/02
3980 VERSAILLES DR
4824 JUDY ANN CT
1997
HONDA
ACCORD
12/04/02
4557 FRISCO
UCF
N/A
24
Data Mapping and Visualization
„
„
„
„
„
„
UCF
Web Server (Microsoft IIS 4.0), ArcIMS and Oracle
form a three-tier architecture that eases data
visualization and publishing
We use Oracle 8i to store and manage the data
Each record is associated with a spatial column
computed from X/Y coordinates
Each X/Y coordinate pair is geocoded from the
address field of each record with ArcIMS
ArcIMS is an ESRI GIS product, which can be
used for delivering dynamic maps and GIS data
and services via the Web
Another ESRI product ArcSDE, a Spatial Data
Engine, is used as a gateway to access data from
ArcIMS
25
Data Issues
„
„
„
„
„
„
„
UCF
Duplicate records
Bad addresses (not geocodable: “Dean Rd.”)
Auto-trap unit
Unpopulated fields (location types)
Preprocessing needs (for example in the make field:
HON = HOND = HONDA)
Natural Language Processing needs for the narrative
section
Tracking multi-jurisdictional incidents (thieves admit
that they steal property in one and pawn in another)
26
System Use
„
UCF
Currently in use for evaluation and
potential enhancements by the Criminal
Investigation Division’s Auto Theft Unit of
the Orange County Sheriffs Office in
Orlando, Florida.
27
Conclusions
Developing GIS (geographic information
systems) applications is slow and costly
„ Worth having the geographical perspective
of incidents within the community
„ Eases the visualization of data, especially
with animated maps
„ Helps law enforcement officers discover
the patterns of incidents and take
necessary measures to prevent them
„
UCF
28
Future Directions
Currently, the analysis can be done only
offline (some manual preprocessing is
needed)
„ Need to develop the application in Java or
.NET in order to make it usable online
„ In the online versions, automatic warnings
about the hot-spots can be given
„
UCF
29
Contact Us
„
„
„
UCF
Partnership Building
12354 Research Parkway
Orlando, FL
K. Michael Reynolds
kreynold@mail.ucf.edu
407-823-2943
Ron Eaglin
reaglin@mail.ucf.edu
407-823-5937
30
Download