Alfred Blumstein, Kiminori Nakamura
Heinz College - Carnegie Mellon Univ.
March 27, 2009
1
Old Fogy: “We shouldn’t computerize criminalhistory records because computers don’t understand the Judeo-Christian concept of redemption”
Rejoinder: “Paper records certainly don’t understands that concept, but computers can certainly be taught”
This paper is developing information on what to teach the computers
2
Technology has made background checking easy - and so very ubiquitous
Most large companies now do background checks (~80%)
Statutes require background checks for many jobs
Criminal records are also ubiquitous
Lifetime probability of arrest > 0.5
14 million arrests a year
71 million criminal records in state repositories
90% of the records are computerized
Criminal records have long memories
Many people are handicapped because of an arrest or conviction that happened long ago, and so is “stale”
3
We know from much research that recidivism probability declines with time “clean”
At some point in time, a person with a criminal record who remained crime-free is of sufficiently low risk that the “stale” record no longer contains useful information
Need a basis for establishing when redemption from the prior mark of crime occurs
We still have no measures of redemption time
Also, we want to know how it varies with age and crime type at the prior arrest
4
Lack of empirical evidence leaves employers to set arbitrary cut-off points
5 or 10 years (nice round numbers)
7 years (Biblical origins?)
15 years (conservative)
Forever (usually unreasonable)
Employers vary in level of concern
In dealing with vulnerable populations (elderly, children)
Bank teller
National security
Construction worker
5
Recidivism studies (e.g., BJS, 1997, 2002)
Usually involve short observation period -
Most recidivism occurs in 3-5 years
Birth Cohort studies (e.g., Kurlychek, Brame, &
Bushway, 2006, 2007)
Limited sample size and short follow-up
Rap sheets:
Criminal records from state-level repositories
Samples ~100,000
Permits rich disaggregation, long-term follow-up
But no information about the never-arrested
6
Arrest-history records from NY state repository
Population of individuals who were arrested for the first time as adults (≥ 16) in 1980 (≈ 88,000)
Follow-up time > 25 years
We will report on redemption estimates for:
Age at first arrest: A
1
= 16, 18, 20
Crime type of first arrest: C
1
= Robbery, Burglary, Aggravated Assault
7
Survival probability – S(t)
Survive without a subsequent arrest
Eventually saturates – only a few have more arrests after a sufficiently long time
Provides an estimate of fraction still clean at any t
8
1
1,00
,90
,80
,70
,60
Survival
Probability
,50
,40
,30
,20
,10
,00
0 2 4 6 8 10 12 14 16 18 20 22 24 26
Years since First Arrest
9
1
1,00
,90
,80
,70
,60
Survival
Probability
,50
,40
,30
,20
,10
,00
0 2 4 6 8 10 12 14 16 18 20 22 24 26
Years since First Arrest
Robbery
Burglary
Aggravated
Assault
10
Conditional probability of a new arrest
Conditional on surviving to t
Pr{arrest at t|survive to t} = Hazard - h(t)
New arrest (C
2
) here could be for any crime
Will later consider concern about specific subsequent crime types (C
2 s)
11
Hazard h(t) = Cond. Prob. of a New Arrest
(C
1
= Burglary, 3 A
1 s)
,25
,20
,15
,10
,05
,00
2 4 6 8 10 12 14
Years Since First Arrest
16 18 20
16
18
20
12
(A
Hazard h(t)
1
=18, 3 C
1 s)
,25
,20
,15
,10
,05
,00
2 4 6 8 10 12
Years Since First Arrest
14 16 18
Robbery
Burglary
Aggravated Assault
13
General Population
The employer has a single preferred applicant
Turn to some general measure of how common arrest is for people of the same age
Redemption occurs when hazard crosses age-crime curve
We denote the time to redemption as T*
The Never-Arrested
The employer has a pool of job applicants
Comparison would be between the risk for those with a prior vs. those without
We don’t expect these two hazards to cross
Redemption occurs when hazard is “close enough” to those without
We denote the time to redemption as T**
14
Very commonly used in criminology
Probability of arrest as a function of age
For our population, arrested for the first time in NY in
1980, we created a “ progressive ” age-crime curve for each value of A
1
For A
1
=18, arrests of 19s in 1981, 20s in 1982, etc
,12
,10
,08
,06
,04
,02
,00
General population
(Age 18 in NY in 1980)
2 4 6 8 10 12 14 16 18
Years Since First Arrest
15
T*: Comparison to General Pop’n of the
Same Age by the Age-Crime Curve
Benchmark: The age-crime curve = risk of arrest for any crime in the general population of the same age
T* is at the intersection of h(t) and A-C curve
,25
,20
,15
T* = 7.7 years h(T*) = .096
Age 18 Robbery
,10
General population
(Age 18 in NY in 1980)
,05
,00
2 4 6 8 10 12 14 16 18
Years Since First Arrest
16
1
1
First Offense (C1)
Robbery
16
Age at First Arrest (A1)
8.5 (.103)
18 20
7.7 (.096) 4.4 (.086)
Aggravated Assault
Burglary
4.9 (.105)
4.9 (.105)
4.3 (.098) 3.3 (.086)
3.8 (.097) 3.2 (.086)
Age effect: Younger starters need to remain crimefree longer to achieve redemption
Crime type effect: Robbery > AA ~ Burg
17
First Offense (C1)
Robbery
16
8.5 /
Age at First Arrest (A1)
.092
18 20
7.7 / .202
4.4 / .513
Aggravated Assault
Burglary
4.9 / .291
4.9 / .257
4.3 / .429
3.3 / .556
3.8 /.
414 3.2/ .550
Age effect: The fraction increases with age
Crime type effect: Lowest for young robbers
18
Benchmark: The risk of arrest for those who have never been arrested
The risk of arrest for those with a prior is likely to stay higher than that of those without
Estimate T** when h(t) and h na
(t) are “ close enough ”
Data to directly estimate h na
(t) for the never-arrested is not available from repositories, so must be modeled
19
Population of the never-arrested at age A (N na
(A)):
N na
(A) = Population of New York of age A in 1980
– Σ(First-time arrestees in 1980 for all A
1
< A)
Hazard of the never-arrested at age A (h na calculated as:
(A)) is h na
(A) =
# of first-time arrestees for A
1
= A
N na
(A)
20
na
,018
,016
,014
,012
,010
,008
,006
,004 h na
(t)
,002
,000
20 22 24 26 28 30 32 34 36 38 40 42 44
Age
21
na
,16
,14
,12
,10
,08
,06
,04
,02
,00
2 4 6 8 10 12 14 16 18 20 22
Years Since First Arrest
Age 18 Violent
Age 18 Property hna(t)
22
Estimate T** as the time when h(t) becomes “close enough” to h na
(t)
Simple Intersection method used for T* won’t work if h(t) > h na
(t) for all t
Introduce risk tolerance, δ
,18
,16
,14
,12
,10
,08
,06
,04
,02
,00 hna(t)
Age 18 Violent
δ = .02
2 4 6 8 10 12 14 16 18 20 22
Years Since First Arrest
23
,12
,10
,08
,06
,04
,02
,00
2
Use confidence interval (CI)
We use bootstrap for the CI instead of ±z
α/2 p·q/n
We use upper CI to be conservative : T** is the time when the upper CI of h(t) intersects (h na
(t)+δ)
,18
,16
,14
T** = 18.3 years h(T**) = .025
4 6 8 10 12 14 16 18 20 22
Years Since First Arrest hna(t)
Age 18 Violent
Lower 95% Bootstrap CI
Upper 95% Bootstrap CI
δ = .02
24
20
18
16
14
12
10
8
6
4
2
0
,02
Tradeoff of Risk Tolerance (δ ) and T**
,03 ,04 ,05 ,06
δ: Risk-tolerance difference
,07 ,08
Age 18 Violent
Age 18 Property
Age 20 Property
25
Robustness test across states
Replicate with similar data from other states’ repositories
Robustness across sampling years
Add 1985, 1990
Concern over C
2
– the next crime
Convictions vs Arrests
Anticipate fewer in number
Anticipate higher hazards
Weeded out the innocent
Test for arrests outside New York
Need national data from FBI – in process
26
Users of Criminal Records:
Employers
Inform employers of the low relevance of records older than T* or T**
Enact statutes to protect employers from “due-diligence liability” claims if last arrest is older than T* or T**
Pardon Boards
Length of law-abiding period is an important factor in pardons
Information about T* and T** provides guidance on how long a law-abiding period is long enough
27
Distributors of Criminal Records:
Repositories
State repositories could choose not to disseminate records older than T* or T**
Could seal (or expunge) records older than T* or T**
Commercial Vendors
If states seal or expunge records older than T* or T** years, commercial vendors should do similarly
28
First use of official state repository records to produce redemption times
Strong estimates of redemption times, T* and T**
Provides a basis for responsiveness to user criteria in assessing redemption
⇒ T* or T**can be generated based on the specifications
(A
1
, C
1
, δ, C
2
, etc.) set by the users
29
30