ECDF and Social Science Research Paul Norris Research Fellow (School of Law)

advertisement
ECDF and Social Science
Research
Paul Norris
Research Fellow (School of Law)
pnorris@staffmail.ed.ac.uk
Data Analysis in the
Social Sciences
„
Data analysis in the social sciences can appear small
compared to other subjects
Datasets typically have only a few thousand cases
Models often include only a handful of variables
„
Analytical methods are increasingly computer intensive
„
Social scientists typically use GUI based software
Potential Benefits of Using
ECDF in the Social Sciences
„
There are three major benefits to using ECDF :1) Ability to run models which won’t run on a
standard PC
2) Increased Speed
3) Reduced Cost
Multiple Ways to Use
Multiple Cores
„
„
„
Some statistical routines within R are created to run
across multiple cores
Propensity Score Matching using genmatch
SPRINT project at ECPP
Some statistical routines are “embarrassingly parallel”
Multiple Imputation
Latent Class based models
Some models use only one core but multiple chunks of
memory
Reporting Crime to the
Police
„
Between 1992 and 2002 the proportion of crime
reported to the police in Scotland fell
„
Changes in an aggregate pattern can be attributed to
two types of underlying shift:Model Change Effects – the behaviour of
individuals (with identical characteristics)
changes over time
Distributional Effects – the makeup of the
“population” changes over time
„
Propensity Score Matching is one technique which can
help to address this question
Reporting Crime to the
Police
Reporting Behaviour
Population
Distribution
1992
2002
1992
55.7
55.0
2002
50.1
49.3
Matching on crime type, gender, age, social class, ethnicity, household income,
weapon used, threat used, doctor visited, insurance claimed, value of damage/theft,
Injury, took place at home, tenure and marital status
ƒ Change in population of crimes and victims seems to have lowered
reporting rates
ƒ Reporting behaviour also slipped (but non-significant)
ƒ Change in reporting seems to be most related to distributional changes
Speed Gains From
Multiple Cores
ƒ Generic matching is very computer intensive
ƒ R routine can be used on a computer cluster
2500
Time in Seconds
2000
1500
1000
500
Analysis based on example dataset
from Sekhon (2007) contains 185
treatment cases and matches on
10 variables
0
Desktop Single
Core
2
3
4
5
Number of Processors Used for Calculations
6
7
Reduced Costs From
Using ECDF
„
One alternative to using ECDF might be a separate
desktop PC
„
“Select PC” costs £700
„
ECDF costs for Police Reporting analysis were tiny in
comparison
„
ECDF resources also better suited to short bursts of
intensive analysis
Conclusions
„
ECDF may seem of limited relevance to the social
sciences
„
Likely to have a steep learning curve
„
Benefits can include :Chance to run models “too big” for normal PC
Increased speed of analysis
Reduced costs
Download