Document

advertisement
MID-TERM PROJECT
MATH 592: DATA MINING
Akram
Ameya
Liming
Priya
Youtube Video clip:
http://www.youtube.com/watch?v=rAjdM74d2Oc
Vedbar
INSTRUCTOR: Dr. Xijin Ge
03/03/2011








Background
Motivation
Objectives
Crime Classification in United States
--All States in United States
--Major cities in United States
Data Collection
Results and Analysis
Conclusions
References
2

Crime

•
Deviant behavior that violates prevailing norms or cultural
values.
Influential factors:
 Social
 Political
 Psychological
 Economical

Impact and remedies
3

Crime has a long history.


Famous historian Henry Thomas Buckle :


“Society prepares the crime, the criminal commits it.”
In 19th century onwards, development of sociological thought
prompted some fresh views on crime and criminality.


Some religious communities regard sin as a crime.
Criminology, a new disciplinary, was invented to study crime in society.
In 1959, Daniel Glaser et. al. , found the strong relationship of crime
and economic conditions associated with adult unemployment [1].
4

In 1976, Bruce H. Mayhew et. al., found that there will be wide
variety of social interaction in higher population area and therefore
increases the crime of violence [2].

In 2004, John R. Hipp et. al. found that both the violent and property
crime rates are driven by pleasant weather [3].

In 2004, Lance Lochner et. al., found that schooling significantly
reduces the probability of incarceration and arrest [4].

In 2007, Sesha Kethineni et. al., found that unemployment
influences crime both directly and indirectly through social
pathologies such as drug and alcohol use [5].
5

Need to find the influential factors of crime to help
reduce the crime in a society.
Objectives
• Collect crime data and other influential factors of crime for
United States through available sources.
• Visualize and statistically analyze the crime data in major cities,
states & the entire country of USA.
• Determining the relation of different kind of crimes in major
cities and states in USA.
• Suggest some of the influential factors to help reduce the crime
in the United States.
6
•
FBI has been tabulating uniform crime reports annually since
1930.
• Crime Index includes
– Violent Crime
– Property Crime
• Violent Crime includes
–
–
–
–
Murder
Forcible rape
Aggravated assault
Robbery
• Property Crime includes
– Burglary
– Larceny theft
– Motor-vehicle theft
7

Population

Geography

Size of state or city

Illiteracy

Unemployment

Per capita

Inflation rate
8



Year covered from 1979 to 2007.
Crime data obtained from FBI source.
Other variables used are Inflation rate, Unemployment rate, Population
and Per Capita of USA. (Sources * )
9




Crime data was of year 2003.
All 50 states are covered.
Other variables used were Population, Illiteracy, Unemployment and Regions
based on weather.
States were divided into South, West, Midwest and Northeast regions.
10
•
•
•
•
Crime data for cities in USA (2003)
Data of five major cities for each state in USA
Some of the variables in the dataset include Population,Murder,Arson etc.
Cities were divided into three categories: Large, Medium and Small.
11


In 1980 - the Cold War ended
between USA and USSR
In 1984,
US presidential election
 (Ronald Reagan was re-elected)
 Summer Olympics, Los Angeles
California, USA
In 1991 - Gulf War in the Middle
East.
In 2001 - Terrorist attack in the
World Trade Center and Pentagon on
September 11.
In 2002 - Department of Homeland
Security was formed on November
25.
In 2003 - Drop down of crime
index.
In 2007 - The beginning of recession
in the United States.
1980
1991






1984
2001
2002
12

Crime index is positively correlated with inflation rate.
13
• From the above plot, unemployment rate was at its peak in
1980.
• The pattern decreased with a range from 1990 to 2007.
14
Property crime vs. Year
Violent crime vs. Year
Property crime has decreased linearly over the years as compared to
Violent crime which has a peak at 1991.
15
There is no significant correlation between vehicle theft and
unemployment rate (r=0.1123)
16

Divided into 4
Regions:





West (Blue)
South (Green)
Northeast (Red)
Midwest (Yellow)
Divided into 50
states.
17


South region had higher unemployment and illiteracy.
Midwest region was better among all regions.
18
California
New York
Florida
Illinois


Texas
Size of bubble indicates
population of a state.
Crime and illiteracy rate were high in populated states.
Higher illiteracy rate yielded higher crime in all regions.
19
Illiteracy
Crime index
a
Texas
Illinois
New York
California
Florida



Crime and unemployment rate were high in populated states.
Higher unemployment rate yielded higher crime in all regions.
Population were higher in states with pleasant weather.
Pleasant environment leads to high population and so high crime rate
21
Unemployment
rate
Crime Index
p://www.txdps.state.tx.us/crimereports/09/citCh2
Texas


Population and crime index are strongly correlated.
Population raises crime index in the states.
24
Population
Crime index


Both illiteracy and unemployment are correlated to crime.
Illiteracy is more connected to crime than unemployment.
26

Mostly states having high population have high crime index, high
illiteracy and high unemployment.
27
Converting city variable into categorical:
large, middle, and small cities by people
large: population>150000
middle: 150000>=population>=45000
small: population<45000
Converting state variable into two variables:
state.ew and state.sn
state.ew includes four categories:
eastern, middle, western, and other
state.sn contains three categories:
southern, mid, northern
Adding two variables
ratio.total : total number of crime/population per city
ratio.car : number of car theft/population per city
28
6
Frequency
5
4
3
0
0
1
5
2
Frequency
10 15 20 25 30
Histogram of ratio.car
7
Histogram of ratio.total
0.02
0.06
0.10
0.14
0.000
0.010
0.020
ratio.car
Histogram of population
Histogram of total
60
20
40
Frequency
100
60
0
0 20
Frequency
80
140
ratio.total
0.030
0e+00
2e+06
4e+06
population
6e+06
8e+06
0
50000
150000
total
29
a. Ratio of total crime across city variable
● ANOVA: Analysis of variance
● H0: There is no difference in ratio of total
crime across large, middle, and small cities.
● H0 is rejected based on the output of R below.
There is significant difference.
> aov(formula=ratio.total~city)

Df
Sum Sq
Mean Sq
F value Pr(>F)
 city
2
0.036206 0.0181030 35.364 3.869e-14
 Residuals 233
0.119276 0.0005119
30
0.08
0.06
0.04
0.02
ratio.total
0.10
0.12
0.14
Boxplot.ratio.total.city
Large
Middle
City
Small
31
b. Ratio of car theft crime across city variable
Df
Pr(>F)
3.869e-14
0.005
0.010
0.015
0.020
0.025
0.030
Boxplot.ratio.cartheft.city
0.000

ratio.car.theft

Sum Sq Mean Sq
F value
city
2
0.036206 0.0181030 35.364
Residuals 233 0.119276 0.0005119
Large
Middle
City
Small
32
• Proportion of violent
crime increases with
total crime.
• States with high
population seem to
have a higher crime
rate.
Size: population
Color : City
33


Property crime has a
perfect correlation
with total crime.
R value shows that it
is highly correlated
as compared to
violent crime.
Size: population
Color : City
34
1. When Population increases total crime increases.
2. Correlation of larger city is more than small and middle cities
but small cities are more correlated than middle cities.
35

Robbery and
Aggravated
Assault seems
to be highly
correlated.

Car Theft and
Robbery have
low correlation.
36


Murder and
Robbery are
highly
correlated.
Burglary
and Arson
have low
correlation.
37


Murder and
Robbery are
highly
correlated.
Burglary and
Arson have
low
correlation.
38
•
•
•
•
•
•
Major events or crises in a country can influence the crime index of a country.
Inflation rate and crime index are positively correlated.
Population was positively correlated with crime rate in both state and city level.
 Larger cities have high crime rate.
 Car Theft is high in large cities.
 Murder and Robbery seem to have high correlation in small and medium
cities.
Regions or states having pleasant weather have bigger population and crime
index.
Illiteracy rate is more highly correlated to crime index than unemployment rate.
Government should focus more on lowering illiteracy rate first than solving
unemployment to reduce crime index.
39
1.
2.
3.
4.
5.
Daniel Glaser and Kent Rice,” Crime, Age, and Employment”, American Sociological
Review, Vol. 24, No. 5, pp. 679-686, Oct, 1959
Bruce H. Mayhew and Roger L. Levinger, “Size and the Density of Interaction in
Human Aggragates.” ,The American Journal of Sociology, Vol. 82. No. 1, pp. 86-110, Jul.
1976
John R. Hipp, Daniel J. Bauer, Patrick J. Curran and Kenneth A. Bollen, “Crimes of
Opportunity or Crimes of Emotion? Testing Two Explanations of Seasonal Change in
Crime”, Social Forces, Vol. 82, No. 4, pp. 1333-1372, Jun. 2004.
Lance Lochner and Enrico Moretti, “The Effect of Education on Crime: Evidence from
Prison Inmates, Arrests, and Self-Reports” ,The American Economic Review, Vol. 94, No.
1, pp. 155-189, Mar., 2004.
Sesha Kethineni and David N. Falcone, “Employment and ex-offenders in the United
States: Effects of legal and extra legal factors.”, The Journal of Community and Criminal
Justice, Vol. 54(1), pp. 36-51, 2007
* Source:




Crime in the United States, 2007, FBI, Uniform Crime Reports.
http://www.disastercenter.com/crime/wicrime.htm
http://www.infoplease.com/ipa/A0004902.html#ixzz1ESnn3Tf9
http://www.miseryindex.us
40
Download