Presentation slides (PPT 396KB)

advertisement
NILS Research Forum
Belfast
22 October 2010
Lorraine Dearden
Director ADMIN
Institute of Education
Email: l.dearden@ioe.ac.uk
Introduction
 In current economic climate, using and linking
administrative data very important for policy analysis
 Scope for well funded longitudinal surveys going to be put
under pressure
 Also, for countries like NI, sample sizes in survey data not
always satisfactory
 NILS is a very welcome addition for researchers
 Indeed colleagues at ADMIN using it to look at issues do to
with health and migration
 But limited in scope as to what issues you can use it for and
could be significantly enhanced with other administrative
data
Why so important to make better
use of Administrative Data?
 Administrative data has already been collected for
administrative purposes so money spent
 But the potential it gives for those interested in
making sound policy advice immense if used correctly
 Allows one to potentially follow multiple cohorts over
time (longitudinal data) which is something survey
data can rarely do
 Sample size issues disappear in general which is very
important when doing within country analysis
So why hasn’t it happened?
 Fears over data protection...
 But this is always issue when any individual level data
and the instances of researchers inappropriately using
data virtually unheard of


The individual level data is highly disclosive but researchers
never look at nor report anything that is disclosive
But is essential that this information is in their data at the
individual level
 Major issues around disclosure and data protection
have been centred around agencies holding the
administrative data
So how far have we got on this?
 Have various LS with Scotland the most advanced in
terms of linkage (including linkage to schools data)
 Serious discussions in government about whether
Censuses could be replaced by linking administrative
data
 So politicians and policy makers are talking about it
 Certain departments in Whitehall have started linking
administrative data sets for internal use (ONS)
whereas others have linked data for research projects
for them (e.g. DWP) and yet others for general
research purposes (DfE and BIS)
Another important development
 There is increasing linkage of survey data to Administrative
data where consent has been obtained from the individuals
in the survey
 Longitudinal Survey of Young People in England (linked to
NPD data)
 MCS (and ALSPAC) linked to hospital registration data, NPD
data and now have permissions to link to Hospital Episodes
Data, Economic Data held by DWP and HMRC (for both
parents) as well as NPD data for all siblings of CM
 ELSA has linked to health and economic data and NCDS and
MCS are about to do this as well
 Innovation Panel of Understanding Society will do this in a
few years with hope of rolling it out to full sample
Why is this important?
 New linked admin/longitudinal data has potential
to:
 Get a better understanding of the implications of
missing covariates in administrative which is crucial
if we are going to rely more on administrative data
linkage
 Get a better understanding of implications of
attrition and non-response in survey data
 Allow us to understand the implications and extent
or recall bias in surveys……
 Reduce the costs of longitudinal survey data
So what administrative data is
there?
 Some, like data on school children, is country specific
 Others like HESA (Higher Education), DWP and
HMRC data covers all of Great Britain
 Now going to talk a bit about what is out there in terms
of administrative data...
New longitudinal HE admin data
 Linked individual-level administrative data
 School (NPD), FE (ILR/NISVQ) and HE (HESA) records
 Data on participants AND non-participants in HE
 Four cohorts:
 In Year 11 in 2001-02, 2002-03, 2003-04 and 2005-05
 Potential age 18/19 HE entry in 2004-05, 2005-06, 200607, 2007-08 or (age 19/20 entry 2005-06, 2006-07 and
2007-08)
 State and private school students
Data
 Socio-economic background


Free school meals status from PLASC
IMD quintiles based on home postcode (age 16)
 Gender, MOB and school ID available for all
 Ethnicity, EAL, SEN from PLASC
 Missing for private school kids
 Neighbourhood measure of parental education
based on 2001 Census
 Based on home postcode for state school analysis
 Based on school postcode when include private school
kids
Data
 Prior attainment
 State school :
 Average point score at Key Stage 2, 3, 4 and 5 (plus indicators of
reaching expected level at Key Stage 4 and 5)
 Private school :
 Key Stage 4 and 5 results only
Integrated administrative data set
 School data
 Census of school children with individual
characteristics of all pupils e.g. gender, ethnicity
 Prior achievement from age 11 through to 18
 Individual Learner Record
 FE college attended
 Participation and qualifications achieved
 Higher Education data
 Detailed information on degree subject, institution,
degree class awarded for all those participating in HE
Destinations of Leavers from Higher
Education survey (DLHE)
 Early DHLE Survey (surveys graduates 6 months out of
university) – only preliminary snapshot of graduate
success
 In 2006, HESA carried out a follow up to the Early
DHLE Survey → Longitudinal DLHE – 3 years after
graduation
 Contains full details of HE plus wages / occupation 3
years after graduation
Longitudinal DLHE
 Can tell us early value of degrees
 By subject
 By institution
 Possibly by subject and institution (subject to sample
size)
 Data essentially owned by universities so would need
their permission to do this
What data is included within NPD?
Year 7 Progress Test
Results
Keys: PupilID, Academic
Year, Lea/Estab
Key Stage 2 Results
Keys: PupilID,
Academic Year,
Lea/Estab
Key Stage 1 Results
Keys: PupilID,
Academic Year,
Lea/Estab
Foundation Stage
Profile
Keys: PupilID,
Academic Year,
Lea/Estab
Key Stage 3 Results
Keys: PupilID,
Academic Year,
Lea/Estab
Core Pupil
Keys: PupilID,
Academic Year,
Lea/Estab,
Pupil postcode
Schools census
(formally PLASC)
Keys: PupilID, Academic
Year, Lea/Estab, Pupil
postcode
Key Stage 4
Candidate
Keys: PupilID,
Academic Year,
Lea/Estab
Key Stage 5
Candidate
Keys: PupilID,
Academic Year,
Lea/Estab
Information Learner
Record - Aims
Keys: PupilID, Academic
Year, Lea/Estab
Key Stage 4
Indicators
Key Stage 4
Results
Key Stage 5
Results
Key Stage 5
Indicators
Main fixed pupil characteristics
from School Census
 Main indicators:
 Sex of child
 Age (month of birth is standard release)
 Ethnic group
 English as an additional language
 Are they time-invariant?
 We might collect several measures of each, e.g. one from each of
KS4, KS2, KS1 sweeps and also up to nine years of Pupil Census
reports from schools
 We think of these characteristics as fairly time-invariant, yet they
vary for a tiny minority of children
 You can place greatest weight on most recent reports, or
alternatively place greatest weight on the modal report of their
characteristic
Time-variant pupil characteristics
 FSM eligible
 SEN
 Postcode, LLSOA, IDACI rank
 Connexions, gifted and talented (variable school
recording of this)
 Mode of travel (new)
 Part-time, border
Obtaining geo-classifications for
home
addresses
 Standard release:
 DCSF will release a lower level super output area to indicate where
the child lives
 LLSOA – geographical area with a minimum population of 1,000,
nested within census ward boundaries
 Secure release:
 DCSF will release child’s home postcode to researchers who make a
case for it and can show data will be held securely
 Home postcode – geographical area with an average of 11
households, giving a relatively precise (within 100m) geo-location
 WILL NOT release if you just want to attach geo-data to the
postcode (they will do this for you)
 WILL NOT release if you just want to calculate home-school
distances, find the nearest school etc (they will do this for you)
Access to NPD data
 Most researchers can access this data
 Have to outline their research question, the data they
need, make a case for any special additional variables
that are thought to be disclosive (e.g. date ofbirth,
postcode) and provide evidence that data will be held
securely (never on laptop or desktop etc)
 Data is transfered via a encrypted electronic transfer
 If want to use data for new research project, need to
approach DfE again before using data
NI Schools Data
 Have similar data though not so detailed results data.
Basic outcomes at KS2, KS4 and KS5
 Census data comparable and in some cases more rich
 But have potential to link this to HESA data and
graduate destinations survey as well
Access to linked HESA/NPD data
 This access occurs through BIS who have done the
linkage
 Again need to outline research question and make case
for data
 Again transfer is via electronic encrypted transfer (FTP
site) and host organisation has to demonstrate has
secure facilities where data will be kept
DWP and HMRC data: WPLS
 The DWP has linked all DWP benefit and program
participants to HMRC employment and earnings data
(from P14 returns) since 1998
 This is called the WPLS (Work and Pensions Longitudinal
Study)
 Permission to link this to FRS, NCDS, MCS and ELSA
surveys as well (consent obtained from individuals in these
surveys)
 A summary of its uses can be found here
http://statistics.dwp.gov.uk/asd/longitudinal_study/WPLS
_Uses.pdf
WPLS
 Researchers have had access to this data when carrying out




work/evaluations for DWP
What data does not include is HMRC records for
individuals who have not been on DWP program or
benefits so not as good as it could be...
But surveys who have sought permission to link to DWP
and HMRC data can link to this additional HMRC data
(e.g. FRS, ELSA, NCDS and MCS)
Collecting data on benefit receipt typically difficult to do
in surveys so this linkage extremely valuable and saves
survey time costs
This data covers whole of Great Britain – not just England
HMRC NIC data
 HMRC has records on individual NI contributions since NI




was introduced in 1948
Originally only 1% of sample was held electronically but
now all of these records are electronically held by HMRC
The English Longitudinal Survey of Aging (ELSA) has
linked all individuals in its survey who gave consent for
linkage to this NIC data which means they have earnings
and employment history for their sample from 1948
Up until recent changes in NI for those above UEL, do not
know earnings above UEL but this reasonably small
proportion for most time periods and no longer an issue
This data going to be linked to NCDS and MCS (where
consent rates were in excess of 80%)
Other data
 GP registration data (NILS at forefront here)
 Hospital Episodes Data
 Home Office data on crimes (have individual level
information)
 Birth, marriages and death registration data (NILS
again at forefront here)
How has this linked ADMIN data
been used by researchers?
 Going to shamelessly focus on some of the work I have
done with this data
 Not always successful as I will demonstrate – and this
linked administrative data not always up to research
task
 But has great potential to answer lots of policy relevant
questions
Widening participation in HE
 Joint work with Chowdry, Crawford, Goodman and
Vignoles
 Shows that prior school attainment is main reason
for large gap between rich and poor in:
 HE participation
 Participation in a ‘high status’ university
 Suggests HE funding reforms are not best tool for addressing
social mobility/‘access’ issues.
 Focus instead must be on improving school
attainment amongst poor children
 Uses linked school, FE and HE administrative data to assess
schooling roots of large SEP gap
Widening participation in HE
Month of birth effects
 Joint work with Crawford and Meghir
 Children born in September start school aged 5
whereas those born in August are almost a year
younger
 Does this impact on longer term educational
outcomes?
 Used samed linked data to look at this question
 Found being born in August has prolonged impact on
educational outcomes and even reduces probability of
entering HE
Raw differences (proportion getting expected level)
.5
.6
Proportion
.6
.5
.4
Proportion
.7
Key Stage 3 (age 14)
.7
Key Stage 2 (age 11)
May86
Aug86
Nov86
Feb87
May87
Aug87
Feb86
May86
Aug86
Nov86
Day of Birth
Key Stage 4 (age 16)
Key Stage 5 (age 18)
Feb87
May87
Aug87
Feb87
May87
Aug87
.3
.4
Proportion
.6
.5
.4
Proportion
Nov85
Day of Birth
.5
Feb86
.7
Nov85
Nov85
Feb86
May86
Aug86
Nov86
Feb87
May87
Aug87
Nov85
Feb86
May86
Day of Birth
Aug86
Nov86
Day of Birth
.3
.2
Proportion
.4
HE participation (age 19/20)
Nov85
Feb86
May86
Aug86
Nov86
Day of Birth
Feb87
May87
Aug87
Males
Females
Summary of findings
 August-born children experience significantly
poorer education outcomes than September-born
children
 Almost entirely due to differences in the age at
which they sit the tests
 Starting school earlier/having more terms of school
is marginally better for August born children at
younger ages
Ethnic Parity in JCP services in UK?
 Joint work with Crawford, Mesnard, Shaw and
Sianesi at IFS
 Ethnic parity:
 No difference on average between Ethnic Minority
and “otherwise identical” White entering the same
JCP office and accessing same program/benefit
 Our aim:
 Get as close as possible to “otherwise identical”
White and see what difference remains
 Calculate results for a range of JCP benefits and
programs
Programs and Benefits
 Incapacity benefit (IB): paid to individuals who are assessed as being incapable of
work and who meet certain National Insurance contributions conditions.
 Income support (IS): a benefit for individuals on low income; usually claimants are
lone parents, sick or disabled, or carers.
 Jobseeker’s allowance (JSA): a benefit paid to individuals of working age who are
unemployed, or who work fewer than 16 hours per week and are looking for full-time work.
 New Deal for Lone Parents (NDLP): a voluntary programme whose aim is to
encourage lone parents to improve their work prospects and help them into work.
 New Deal for individuals aged 25 plus (ND25plus): a programme to
help unemployed individuals aged 25 and over to find and keep a job. Participation is
compulsory for individuals who have been claiming JSA for at least 18 of the previous 21
months.
 New Deal for Young People (NDYP): similar to ND25plus except that it is
targeted on individuals aged 18-24. Participation is compulsory for those who have been
claiming JSA for at least six months.
Controlling for selection
 Control for differences in observed characteristics
between ethnic groups that may affect outcomes
 Data:
 Detailed labour market histories
 Individual background characteristics
 Methods:
 Primarily propensity score matching (PSM)
 Also regression-based methods and conditional
difference in differences (DID)
 Previous LM history may have been affected by
discrimination but nothing we can do about this
Sampling frame
 Sample selected on inflow into programme
 Addresses differential selection off programme
Sampling frame
 Sample selected on inflow into programme
 Addresses differential selection off programme
 Inflow window is 2003, allowing:
 3-year pre-inflow labour market history
 1-year follow-up
Jan 2000
Dec 2004
2003
Previous labour market history
Inflow window
Outcomes
Outcomes of interest
 Two dimensions of labour market status
 In employment (15+ days in the month)
 On benefit (15+ days in the month)
 Benefit definition includes:
 IS, IB, JSA, New Deal options, Basic Skills and WorkBased Learning for Adults
 Measured monthly
Data
 Primarily Work and Pensions Longitudinal Study
(WPLS)
 Benefit and employment spells for anyone on a DWP
benefit since mid-1999
 Also contains limited demographics including sex, DOB,
ethnicity and postcode
 Also used National Benefit Database (NBD) and
census information
X variables




Employment and benefit history
Past participation in voluntary programmes
Past participation in Basic Skills
Individual characteristics
 Gender, age, month of inflow
 Proxies for education and wealth (from census)
 Local area characteristics (region, travel-to-work-area
unemployment)
 Other programme-related information
What did we find?
 For most programs and benefits (with exception of IS and
IB), Minorities and Whites are simply too different for
satisfactory estimates to be calculated and results are
sensitive to the methodology used.
 MASSIVE COMMON SUPPORT PROBLEMS
 This calls into question previous results based on simple
regression techniques, which may hide the fact that
observationally different ethnic groups are being compared
by parametric extrapolation.
 In some cases, depending on method used, eg NDLP we
could find significant ethnic penalites in employment (raw
and DID), no ethnic penalty (regression methods) and
significant ethnic premium (PSM)
IB: raw labour market status
Employment
On benefit
0.50
1.00
0.90
0.40
0.80
0.30
0.70
0.20
0.60
0.10
0.50
0.00
0.40
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12
EthnicMinorities
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12
Whites
IB: overall employment result
Raw Differences
Ethnic Parity
0.05
0.05
0.01
0.01
-0.03
-0.03
-0.07
-0.07
-0.11
-0.11
-0.15
-0.15
-0.060** -0.069** -0.070** -0.070**
-0.19
-0.003
-0.004
0.004
0.006
-0.19
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12
Reliability of matching: CS(0), UC(28) (i.e. reliable according to our criteria)
IB: overall benefit result
0.16
Raw Differences
0.16
0.13
0.13
0.10
0.10
0.07
0.07
0.04
0.04
0.01
0.01
-0.02
-0.02
0.022** 0.055** 0.066** 0.064**
-0.05
0.004
-0.05
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12
Ethnic Parity
0.012* 0.014
0.007
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12
Reliability of matching: CS(0), UC(28) (i.e. reliable according to our criteria)
Need other methods to do this
properly
 Using administrative data to analyse this question very
problematic
 Problem due to the fact that the Ethnic Minority and
White clients accessing the same JCP office are very
different in the UK with exception of IS and IB
recipients
 Might not be problem in other countries but could
be.......
 Not problem with ADMIN data – just can’t be used for
this question
Download