1
Sarah Franklin
October 30 th , 2013
2
Overview of presentation
In 2013, the Canadian Centre for Justice
Statistics (CCJS) placed two administrative crime surveys in the Research Data Centres
(RDCs)
Methodologists and subject matter experts developed a scoring approach for tables of frequency counts to identify ‘safe’ tables
Each variable in a dataset is assigned a sensitivity score. A table’s overall score is the sum of the variable scores. If the score is below a given threshold, the table is safe.
3
Uniform Crime Reporting Incidentbased Survey (UCR) and the
Homicide Survey
Administrative datasets
Mandatory reporting by all police services
Criminal incidents substantiated by the police
UCR is a sample of crime data
• not all crime comes to the attention of the police
• over 2 million incidents of crime annually
Homicide Survey data more sensitive than UCR
• All homicides; 543 homicides in 2012
Information on incident, victim(s), accused(s)
UCR , Homicide variables available to researchers
most serious violation for the incident of crime
(e.g., homicide, robbery, mischief)
geography (region, province, CMA)
location (e.g., residential home, convenience store)
weapon causing injury (e.g., handgun, knife)
relationship between victim and accused
age and sex of victim and/or accused
4
clearance status (accused charged vs cleared otherwise)
5
Publicly available STC policereported crime data
UCR and homicide data available to all Canadians:
CANSIM tables (very aggregate)
Tables and graphs appearing in Juristat articles
Custom tabulations upon request
Edmonton
Toronto
St. John’s
Montréal
Ottawa
Kingston
Saguenay
TroisRivières
Sherbrooke
Moncton
Québec
Brantford
Canada
6
Homicides by CMA
2
598
0
3
1
1
54
11
0
1
50
86
4 victims
2011 rate per
100,000
0.7
0.5
0
0.4
1.4
1.7
4.2
1.5
2.1
1.4
1.2
0
0.7
0
543
0
6
2
1
47
7
0
4
33
80
0 victims
2012 rate per
100,000
1.3
0.5
0
0.8
0
1.6
2.7
1.4
0
1.2
0.7
0
2.7
2009 RDC Pilot - Homicide Survey
7
Homicide Survey was available through RDCs
Results positive, 4 proposals submitted and 3 research reports completed
Researchers commented on the ease of use of data file, documentation and wealth of data/information
Researchers noted that vetting of data tables too long
RDC analysts noted that data disclosure rules difficult to implement and required additional work
8
Disclosure Issues : What are we concerned about?
Statistics Act, paragraph 17(1)(b):
No person […] shall disclose […] any information obtained under this Act in such a manner that it is possible from the disclosure to relate the particulars obtained from any individual return to any identifiable individual person , business or organization.
Main disclosure issues:
• Identity disclosure: can identify an individual
• Attribute disclosure: learn something new
Group attribute disclosure: learn something about a group
• Residual disclosure: disclosure by combining results
RDC disclosure control rules for tabular administrative justice data
9
Scoring approach developed by the Institut de la
Statistique du Québec and is used by all STC administrative datasets in the RDCs
• assign a sensitivity score to each variable
• table’s score = sum of variables’ scores
• if table score greater than a threshold value, cannot release table
Go back and use more aggregated variables with lower scores
Or
• perform controlled rounding
Reviewed all variables to appear on the RDC files
10
Identified variables to be excluded due to: unique identifiers
• name of victim/accused, date of birth of victim/ accused, fingerprint of accused, incident file identifier data quality issues
• aboriginal variable, firearm information (registered, licensed) too sensitive
• homicide victim was pregnant, blood alcohol level of homicide victim, person accused of homicide has suspected mental or developmental disorder
Aggregated sensitive codes of variables
Incident clearance status (UCR, Homicide Survey)
• suicide of accused → cleared otherwise
11
Most serious violation aggregations
Homicide Survey
• 1 st degree murder, 2 nd degree murder → murder
• manslaughter, infanticide → other homicides
UCR
• sexual violations against children → other sexual assaults
Scored all UCR variables to appear on the RDC files
0 = not sensitive
• region=national; sex of victim/accused; vehicle type; target vehicle; motor vehicle recovered; fraud type; property stolen; location of incident; attempted vs completed violation; most serious weapon status
1-7 = sensitive but can be used in a table
8 = sensitive, cannot appear in a table
• police service id, exact date of incident
12
Table threshold: ≤7 pass; ≥ 8 fail
13
Sensitive variables on the UCR,
Homicide surveys
Variables deemed sensitive (score 1-7)
geography (region, province, CMA)
age of victim/accused (aggregated, detailed)
most serious violation (aggregated)
most serious weapon (aggregated, detailed)
clearance status (aggregated, detailed)
level of injury (aggregate, detailed)
relationship of victim and accused (detailed, aggregated)
Detailed relationship between victim and accused (score=4)
Homicide victim was killed by:
Husband/wife
Common-law husband/wife
Divorced husband/wife
Same-sex spouse
Father/mother
Separated husband/wife
Separated common-law h/w
Extra-marital lover ex same-sex spouse
Step-father/mother
Son/daughter
Sister/brother
Close friend
Authority figure
Criminal relationship stranger
Step-Son/step-daughter
Other family
Other intimate relationship
Neighbour
Business relationship
Casual acquaintance
Other
15
Aggregated relationship between victim and accused (score=3)
Homicide victim was killed by:
Family – spouse
Family – parent
Family – other
Other intimate relationship
Casual acquaintance
Criminal relationship
Stranger
Other
Unknown, n/a
16
Factors considered when scoring a variable
Scores, thresholds consistent across surveys
Maximum number of dimensions for RDC tables
• 8 dimensions for UCR; 3 for Homicide
Homicide data: single year vs 10 year data
Wanted scores to work for all CCJS tables:
• UCR scores: passed all CANSIM and Juristat tables
• Homicide scores: passed all CANSIM tables but not all Juristat tables
17
Factors considered when scoring a variable
Principle behind scoring approach:
• table is safe as long as sensitive characteristics cannot be attributed to a person or a group
Scrutinized tables with scores < 8 for sensitive characteristics revealed through:
Identity disclosure
• Examined cells with counts of 1 or 2
Attribute disclosure
• Examined full cells, zero cells
extract of UCR table with score=7
Sexual violation incidents, victim=female age 25-34, accused=male, Canada, 2011 relationship friend
Business
Criminal
Casual
Stranger
Step-parent
Step-child
0
0
Other intimate 1
Neighbour
Total
18
1
28
1
0
7
0
Unknown physical force
Weapon causing injury
Firearm knife other n/a Total
1 31 0 0 1 51 84
10
4
90
52
0
0
0
0
0
0
0
0
0
0
4
2
74
0
190
218
86
4
291
273
3
0
3
2
417
0
0
0
0
0
0
0
0
0
4
0
0
0
0
18
10
3
8
13
3
12
19 22
839 1,306
19
Status of UCR, Homicide RDC pilots
UCR:
Crime data for 2007-2011 available in RDCs
7 research proposals submitted and accepted
Disclosure control vetting committee for the pilot
• ensure disclosure control rules applied correctly
• evaluate/fine tune disclosure control approaches
Homicide:
Homicide data for 1961-2011 available in RDCs
Pros and cons of scoring
20
Pros
• easy for RDC researchers and CCJS to apply rules
• rules are consistently applied
• no distortion of data
Cons
• determining scores and thresholds is time-consuming
• difficult to determine scores if lots of variables or variables have lots of categories
• for Homicide, the pass/fail scoring approach for RDCs is very restrictive
• not immune to residual disclosure
Conclusion
21
The scoring approach for frequency counts works well:
for crime-reported data and effectively mimics subject matter experts’ judgement when vetting
for census administrative data with an extensive history of published tables that set the standard for releasing tables
when there are a manageable number of variables and categories within variables
Once developed, the scoring approach is easy to apply
22
For more information, please contact /
Pour plus de renseignements, veuillez contacter:
Sarah Franklin
Sarah.Franklin@statcan.gc.ca