Handling social science data: Challenges and responses Paul Lambert, University of Stirling Node,

advertisement
Handling social science data:
Challenges and responses
Paul Lambert, University of Stirling
DAMES research Node, www.dames.org.uk
17/MAR/2010
DIR workshop: Handling Social Science Data
1
What is social science data?
Example:
Accessing
surveys via UK
Data Archive
Shibboleth
authentication
Download and
analyse in
Stata, SPSS, etc
17/MAR/2010
DIR workshop: Handling Social Science Data
2
Principal forms of data…
• ‘Large and complex social surveys’
Longitudinal; cross-national; hierarchical
• Small scale social surveys
• Administrative data (e.g. ADMIN node; ADLS; commercial data)
• Supplementary (digital) data
E.g. ‘GESDE’ services at DAMES
• Qualitative material – auido / video / textual
17/MAR/2010
DIR workshop: Handling Social Science Data
3
Large and complex social surveys
• several thousand variables
• tens of thousands of cases (micro-data)
• additional complex survey data features (e.g. household clustering)
17/MAR/2010
DIR workshop: Handling Social Science Data
4
Complex data example: British Household Panel
Survey dataset [SN 5151]
. xtdes, i(pid) t(year)
pid:
year:
10002251, 10004491, ..., 1.794e+08
1991, 1992, ..., 2007
Delta(year) = 1 unit
Span(year) = 17 periods
(pid*year uniquely identifies each observation)
Distribution of T_i:
Freq.
•
•
•
min
1
Percent
Cum.
4294
2726
2032
1224
964
840
632
631
593
17941
13.47
8.55
6.37
3.84
3.02
2.64
1.98
1.98
1.86
56.28
13.47
22.02
28.40
32.24
35.26
37.90
39.88
41.86
43.72
100.00
31877
100.00
5%
1
25%
2
50%
6
n =
T =
75%
9
31877
17
95%
17
max
17
Pattern
11111111111111111
........111111111
..........1111111
......11111......
1................
..........1......
........1........
................1
11...............
(other patterns)
XXXXXXXXXXXXXXXXX
This example shows BHPS being analysed in Stata.
BHPS re-contacts subjects annually (since 1991)
4294 interviewed as adults every year for 17 years.
Analysis methods, and measurement issues over
time, are challenging.
. tab year
year
Freq.
Percent
Cum.
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
10,264
9,845
9,600
9,481
9,249
9,438
11,193
10,906
15,623
15,603
18,867
16,597
16,238
15,791
15,627
15,392
14,910
4.57
4.38
4.27
4.22
4.12
4.20
4.98
4.86
6.96
6.95
8.40
7.39
7.23
7.03
6.96
6.85
6.64
4.57
8.95
13.23
17.45
21.56
25.77
30.75
35.60
42.56
49.51
57.91
65.29
72.52
79.55
86.51
93.36
100.00
Total
224,624
100.00
Supplementary (digital) data
• E.g. ‘Occupational information resources’ = data files within information on
occupations, which can be usefully linked to micro-data about occupations
e.g. GEODE acts as a
library of OIRs,
www.geode.stir.ac.uk
Such resources are often
not widely known about,
but have the ability to
enhance analysis
17/MAR/2010
DIR workshop: Handling Social Science Data
6
Example: Qualitative data used by ‘Digital
Records for e-Social Science’ (DReSS)
video
• transcribed
talk
• audio / video
• digital
records
• system logs
code
tree
• location
transcript
system
log
DIR workshop: Handling Social
17/MAR/2010
Science Data
7
Three well-known challenges
• We’re data rich, but analysts’ poor
• UK Data Forum (2007); Wiles et al (2009)
• Under-use of suitably complex statistical models
• Coordination and communication on data processing
• Recodes / Standardisation / harmonisation / documentation
• Not rewarded/incentivised to researchers
• Lack of generic/accessible representation of tasks
• Limited disciplinary/project/researcher cross-over when dealing
with data
• Specific software orientations
These are not generally problems of scale, but of
organisation
17/MAR/2010
DIR workshop: Handling Social Science Data
8
‘Managed’ responses?
• Data handling/analysis capacity-building
ESRC programmes (NCRM, RDI, RMP); training
workshops/materials; P/G funds; strategic research
grant investment
• Documentation/replication policies
Dale (2006)
• Software for data access and analysis
NESSTAR – UK Data Archive data/metadata browser
Long (2009) on the Stata software
Remote access to data (e.g. SDS)
17/MAR/2010
DIR workshop: Handling Social Science Data
9
..train and/or constrain the analysts..
Train them ->
17/MAR/2010
DIR workshop: Handling Social Science Data
10
..constrain
the analysis..
17/MAR/2010
DIR workshop: Handling Social Science Data
11
Non-hierarchical responses?
Technological collaborative services might support
effective, unmanaged data access, coordination
and exploitation
(in principle)
UK e-Social Science investment in data oriented
social science research support
NeISS; E-Stat; DAMES; Obesity e-Lab; CQeSS
17/MAR/2010
DIR workshop: Handling Social Science Data
12
..some examples..
E-Stat @
Design a tool to specify complex
statistical models in generic / visual
terms
Multilevel models
Multiple data permutations and
analytical alternatives
Ready access to a suite of complex
modelling tools
17/MAR/2010
 National e-Infrastructure for
Social Simulation
• Expert led simulation
demonstrations
• Combining data resources
• Workflows for the simulation
analysis
 Modify and re-specify existing
simulation templates
DIR workshop: Handling Social Science Data
13
DAMES – online services for data
coordination/organisation
Tools for handing variables in
social science data
Recoding measures; standardisation /
harmonisation; Linking; Curating
17/MAR/2010
DIR workshop: Handling Social Science Data
14
GESDE – Search and browse supplementary data on
occupations; educational qualifications; ethnicity
17/MAR/2010
DIR workshop: Handling Social Science Data
15
• Data
curation
tool (for
collecting
metadata)
17/MAR/2010
DIR workshop: Handling Social Science Data
16
Handling data: analysis-oriented
data management priorities
• {Data collection or creation}
• Data preservation or curation
• Data enhancement/modification
• Data analysis
• Multiple permutations of related analyses
• Documentation and replication
17/MAR/2010
DIR workshop: Handling Social Science Data
17
Ideas on the future of social
science research data
• Enduring challenges of documentation for
replication, and coordination
• More and more comparative analysis
• Harmonisation and standardisation
• Data linkage and data enhancement
• Models for complex multiprocess systems
• Fluency – increasing uptake by more users
17/MAR/2010
DIR workshop: Handling Social Science Data
18
References and Links
•
•
•
•
•
•
ADLS: http://www.adls.ac.uk/
ADMIN Node: http://www.ncrm.ac.uk/about/organisation/Nodes/ADMIN/
DAMES Node: http://www.dames.org.uk/
DReSS: http://web.mac.com/andy.crabtree/NCeSS_Digital_Records_Node/
Secure Data Service: http://securedata.ukda.ac.uk/
UK Data Archive: http://www.data-archive.ac.uk/
•
Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social
Research Methodology, 9(2), 143-158.
Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC
Press.
Wiles, R., Bardsley, N., & Powell, J. L. (2009). Consultation on research needs in
research methods in the UK social sciences. Southampton: University of
Southampton / ESRC National Centre for Research Methods, and
http://eprints.ncrm.ac.uk/810/
•
•
17/MAR/2010
DIR workshop: Handling Social Science Data
19
Download