Thomas Hargrove

advertisement
Mind the Gap
How holes in your data can
lead to stories
Thomas Hargrove, Scripps News Washington Bureau
Jennifer LaFleur, Center for Investigative Reporting
NICAR Baltimore: 2 p.m. March 1, 2014 Salon DEF
•
•
•
•
•
•
Never assume data are whole – check !!!
Simple techniques like sorting
Many of these we use to integrity check
Graphing over time
Matching to other data sets
Statistical tools
•
•
•
•
Look for research already done on the topic
Find experts
Talk to reporters who have done similar stories
If possible, talk to records personnel who
assembled the data
• Follow data to their source – usually people
• Finding stories in the holes
– Agencies failure to report
– Varying reporting rules across geography or agency
– Government computer system failures
– Find patterns among missing records
– Find the reasons behind missing records
How This Project Started
Dr. David Icove
Researcher, University of Tennessee
Retired member of FBI Behavioral Science Unit
For many
years, NFIRS
reported
only 5%
of building
fires are
intentionally
set in U.S.
The Impossible Variance of America’s Rate of Arson: 2006 to 2011
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Department
State
Indianapolis
San Diego
New York City
Gwinnett County
Houston
Arlington
Chicago
Los Angeles City
Phoenix
Memphis
Tulsa
Gary
Cleveland
Toledo
Saginaw
Dayton
Buffalo
Youngstown
Highland Park
North Las Vegas
IN
CA
NY
GA
TX
TX
IL
CA
AZ
TN
OK
IN
OH
OH
MI
OH
NY
OH
MI
NV
Fires
1,207
1,022
18,988
1,678
7,740
1,511
5,075
7,975
5,359
5,331
3,076
424
5,742
2,544
1,377
1,930
1,606
2,125
748
435
Arson Rate
0%
0
1
2
2
3
4
10
12
16
22
28
28
28
32
33
33
36
45
49
How Rare is Arson?
But They Should Have Reported:
“Arson is grossly
under reported. The
true rate, I believe,
is 40% to 50% -- in
that range.”
--Bill Degnan,
President National
Association of State
Fire Marshals.
“There isn’t a day
that goes by that I
don’t think: ‘Man,
I was a monster.’
I’m just thankful
no one was hurt,”
--Kenneth Allen
Muncie, Indiana
The Allen Conspiracy:
46 people set 73 home and vehicle fires
to collect $3.8 million from insurance
Lessons Learned from 1 million fires:
• 54,860 fires at ‘unlucky’ buildings that, like
Allen’s home, experienced multiple fires but
none of which were reported as arson.
• 42,434 fires at buildings that experienced
foreclosure, according to the national
mortgage monitoring firm RealtyTrac.
• 3,561 fires that had multiple points of ignition,
suggesting someone set several fires at once.
• 77,596 fires in unoccupied or vacant buildings.
What’s Next?
• Collecting data on 4.8 million fires
• Calculate geographic rates by merging
aggregated fire counts to Census Bureau
tract data
• Correlate rates of suspicious fires to tracts
with unusually high occurrences of fire
• Contact local fire/police authorities to
determine if serial arson is suspected or
should be investigated
Local gap-mining stories
Here’s FBI data you were never supposed to see
Truck accidents by year and agency
Sometimes you find piles
Sometimes you find piles
Statistical tools
• Time series correlation – are your ups and
downs real?
• Project/predict data and compare to actual
results. What causes differences?
• Population counts are pretty accurate. Use
them to determine reporting rates
• Regression with dummy variables
Make sure the holes are real
EE000132 might actually be
the same as EE-000-132
A word of caution
• Do spot checks to make sure what you found
is real
• Run your findings by experts
• If possible, engage government sources of
data early. They may not be the enemy.
• Challenge your assumptions. Data are only a
clue, never an end results
Questions?
Jennifer LaFleur jlafleur@cironline.org @j_la28
Thomas Hargrove hargrovet@scripps.com
202-408-2703
Arson Project syntax files:
https://www.dropbox.com/l/LPB7l3kpz7wxvGsHSdTOy9
A copy of this presentation will be at www.jenster.com/2014
Download