Report into the feasibility of matching teenage conception data with... Vera Ruddock and Peter Davies, Office for National Statistics

advertisement
Report into the feasibility of matching teenage conception data with school census data
Vera Ruddock and Peter Davies, Office for National Statistics
1. Background
There is considerable research and policy interest in teenage pregnancy. The Office for National
Statistics (ONS) publishes statistics on the numbers of women conceiving under the age of 181. These
figures are derived from birth and abortion records; they hold no information about the social
background and educational achievement of those who conceive.
The Department of Education (DfE) maintains the National Pupil Database and the School Census2,3.
These two sources contain a wealth of information on the characteristics of pupils in England.
The Teenage Pregnancy Unit commissioned ONS to carry out a study linking the conceptions and
education data sources. The ultimate aim was to produce a dataset that could be analysed to
identify factors associated with teenage pregnancy.
Consent to match the ONS conceptions data to the DfE education data was obtained from the data
custodians for the NPD and School Census, the ONS Caldicott Guardian (for the conceptions leading
to maternity data) and the Chief Medical Officer for England (for the conceptions leading to abortion
data). Access to data was restricted to ONS staff directly involved in the project.
This report describes the process and results of the linkage study.
2. Data sources
The study attempted to link data for 3 cohorts of girls (Table 1).
Table 1 Cohorts of girls included in study
Cohort
Dates of birth
included
Conceptions School
data
census data
included
included
A
1 September 1989
– 30 August 1990
1 September 1990
– 30 August 1991
1 September 1991
– 30 August 1992
2003-2008
2003-2008
Age at end of
2008 (when
follow up
ended)
18
2004-2008
2004-2008
17
2005-2008
2005-2008
16
B
C
This design was chosen to provide a cohort of girls which were followed up until the age of 18
(cohort A). The inclusion of cohorts B and C increase the sample of girls who conceived under the
age of 16. Without this boost the number of conceptions to girls under 16 would be very small and
any analysis on the linked data would lack statistical power.
1
2.1
Preparing the data for matching – National Pupil Database records
Table 2 shows the number of NPD records in each of the cohorts by survey year
Table 2 NPD data for England
Year of survey
Cohort A
2003
296,842
2004
2005
2006
2007
2008
296,663
296,144
292,520
107,430
86,757
Cohort B
Cohort C
All cohorts
296,842
300,437
300,146
299,609
296,357
108,726
297,439
297,358
296,955
294,051
597,100
893,729
889,487
700,742
489,534
Each girl will have many NPD records, one for each year where she was in school, and also a unique
NPD identifier. A dataset was constructed with one record for each girl for every combination of
date of birth and postcode in the NPD data. A girl who remained at the same postcode throughout
the surveys and had consistent dates of birth would be included once. Those who had changes in
postcode or recorded date of birth were included a number of times. This was to allow us to match
conceptions occurring throughout the study period.
The year of the survey used for matching was also recorded on the dataset. Where there was more
than one year with the same combination of pupil id number, date of birth and postcode the earliest
survey record was used.
2.2
Preparing the data for matching – ONS conception records
There were 91,476 conceptions to girls under 18 in the three cohorts. These conception records
were matched using postcode of residence and the women’s dates of birth to create pseudo
conception histories.
A dataset was created containing the first conception record for each unique date of birth/postcode
combination (86,560 records). Eighty eight second or subsequent conceptions were included in the
denominator in error. Since this number is very small relative to the total the impact on match rates
is low.
2.3
Matching variables
The matching was carried out in two stages using purely statistical data and then using names to
improve matching of conceptions leading to a maternity. ONS does not have access to the names of
women who have an abortion.
Only two variables were available on both the conception and National Pupil Database – postcode of
residence and date of birth of the woman. Names are not included on the statistical conception
dataset.
2
2.4
The matching process
Matching was carried out in two stages as shown in Figures 1 and 2. The conception records were
separated into those where there was a single conception for a given postcode/date of birth
combination (Figure 1) and those where there was more than one conception (Figure 2).
Match rates were analysed using a dataset containing only the first conception for each date of birth
combination.
The first stage matched records using only date of birth and postcode. Subsequently conceptions
leading to a maternity were matched using names recorded at birth registration.
2.5
Matching conceptions leading to a maternity using names
In addition to the statistical conceptions dataset ONS also holds information on the names of
mothers for births registered in England and Wales. This information may be used by ONS for
statistical purposes, such as this project. These names were matched to the conceptions leading to a
maternity. ONS does not have access to information on the names of women who have had an
abortion.
The Department of Education supplied ONS with the names of girls on the NPD which were then
linked to the NPD data. These two named datasets were matched together.
The ONS conceptions leading to maternity records were split into 3 groups:
•
•
•
Group 1 – Conception records already matched using the key fields of date of birth of mother
and postcode (N = 18,806)
Group 2 – Conception records that matched using the key fields above, but the match was
not unique, resulting in two or more NPD pupils being identified (N = 191)
Group 3 - Conception records that could not be matched to the NPD data using these key
fields (N = 23,356), or where two or more ONS records matched to two or more NPD records
(N= 57)
Matching using names - Group 1
These data had previously been matched on a one to one basis. If the first names matched (ONS &
NPD) the match was confirmed. If the first names did not match, records were manually checked and
were marked as matched if the forenames were the same allowing for differences in spelling.
Matching using names – Group 2
These conceptions were checked automatically using forenames. Records which did not match
automatically were checked manually. The record was marked as matched if the ONS forenames
matched the NPD forenames, again allowing for differences in spelling.
Matching using names – Group 3
It was assumed that the most likely reason for these records not matching originally was that the girl
had moved house before having the baby. The enhanced matching used the date of birth of the
mother as a first match, then used the names to confirm the match or otherwise. The surname was
3
included in the matching, with the maiden name being used as well as the married surname, if
available. The records were matched in 3 stages:
•
a match was attempted using the forenames and the surname(s), and confirmed if successful
(Quality=1).
•
if no match was obtained then a match was attempted using the forenames and the first four
characters of the surname (Quality=2).
•
if no match was obtained then a match was obtained just using the forenames. These records
were then also manually checked for confirmation of the match (Quality =3).
Table 3 shows the impact of using names to match conceptions in the three groups. The weakest
matches (Quality=3) will probably include some false positives. However many of the matches share
similar postcodes; in 560 records the postcode recorded at birth shared the same first two characters
as the postcode on the school records. These matches are highly likely to be true matches. For this
reason it was decided to include these quality 3 matched records in the analysis dataset.
2.6 Matching rates
2.6.1
Match rates using only date of birth and postcode
Table 4 shows the number of matches and match rate by outcome (maternity/abortion) and girl’s
age. There were 47,808 unique matches (where one conception record linked to a single NPD
record). For conceptions to girls under 18 years the match rate was 55 per cent. The match rate
was much higher for conceptions leading to abortion (67 per cent) than for conceptions leading to a
maternity (44 per cent). There is a very good reason for this. The postcode recorded on the
conceptions leading to a maternity is the postcode where the girl was living at the time of the birth,
not the conception. Girls who moved out of the family home before the birth would have a different
postcode than when they were at school. Unless they returned to school after the birth, the
conception record would not match with the school records.
For conceptions leading to abortion the postcode recorded is the usual residence at the time of the
abortion. This is far less likely to be different from the postcode recorded in school records.
Match rates decrease with age at conception. In the case of conceptions leading to abortion this
decline is small (from 72 per cent for girls aged under 15 at the time of the conception to 61 per
cent for 17 year olds).
For conceptions leading to maternity the decline is much larger. While 68 per cent of conceptions to
girls under 15 years match to NPD records only 34 per cent of conceptions to 17 year olds match.
This is likely to be because older girls are more likely to move to a new address separate from the
family home before giving birth.
2.6.2
Match rates using date of birth, postcode and names
Including names in the matching algorithm for conceptions leading to a maternity had a big impact
on match rates.
The overall match rate for conceptions leading to a maternity increased from 44 per cent to 71 per
cent. The biggest improvement in match rate was for the older girls; the rate for 17 year olds
increased from 34% to 68%.
4
2.7
Investigating poor match rates for conceptions leading to a maternity
When a birth is registered the Registrar records the type of birth registration. There are four
possibilities:
•
•
•
•
Births inside marriage
Births outside marriage registered by two parents living at the same address
Births outside marriage registered by two parents living at different addresses
Births registered only by the mother (sole registrations).
Table 5 shows the match rates by type of registration and age. Match rates are similar for sole
registrations and births registered by two parents living at different addresses (73 per cent and 79
per cent respectively). Rates for the small number of maternities where the birth occurred inside
marriage are much lower (31 per cent). Including names in the matching algorithm had a big impact
on match rates for births outside marriage living at the same address. The match rate for this group
increased from 20 per cent to 63 per cent. This is not surprising since many of these mothers will
have moved away from the parental home.
2.8
Match rates by region
Table 6 shows match rates by outcome and region. The variation in match rates is larger for
conceptions leading to a maternity than conceptions leading to abortion. The lowest match rates for
both maternities and abortions occur in London. One reason for this may be poorer coverage of the
NPD in London. In 2011 ten per cent of girls aged 14 living in London attended an independent
school compared with four per cent of girls in the North East4. Information on these girls would not
be included in the NPD data.
2.9
Match rates by the Income Deprivation Affecting Children Index (IDACI) 5
The IDACI was constructed by the Social Disadvantage Research Centre at the University of Oxford as
part of the English Indices of Deprivation 2010 for the Department of Communities and Local
Government. This index represents the proportion of children under 15 living in income deprived
households.
The IDACI index of deprivation was assigned to each conception record using the postcode. This
enabled an analysis of match rates by IDACII quintile (Tables 7a and 7b).
Match rates for conceptions to girls under 18 years were lower for those living in more deprived
areas (quintile 1). This was true for both conceptions leading to a maternity and leading to an
abortion. For abortions the match rates varied from 63 per cent in the most deprived areas to 70 per
cent in the least deprived. Match rates for conceptions leading to a maternity varied from 67 per
cent to 77 per cent.
5
2.10
Possible reasons for poor match rates
There are a number of reasons why conception records may not match with NPD records. It is
important to examine these and consider whether they are likely to cause bias in the matched
dataset.
The main problems are matching records where the girl’s postcode changed between the ages of 13
and 18 and she was not studying in an educational setting covered by the NPD until the age of 18. A
high proportion of girls in England aged 16 and 17 do not attend either maintained schools or
academies. In 2008 only 35 per cent of girls aged 16 attended a maintained school, city technology
college or academy 6. An additional 34 per cent attended further education colleges and 12 per cent
sixth form colleges. Neither of these two groups are covered by the NPD. Sixteen year olds
attending further education colleges in England are nearly twice as likely to be eligible for free school
meals than those attending a state funded school (15.2 per cent vs 7.8 per cent)7.
3
Creation of linked pupil level dataset
The second output of the matching study is the creation of a pupil level dataset with one record per
girl on the NPD dataset. All matched conceptions would be included on the same record.
There were six stages to this process.
1. The original matched dataset (not using names) was used to identify matched conception
records.
2. This was supplemented by maternity records matched using names.
3. Second and subsequent conceptions were linked in to the data.
4. Conception records with the same NPD pupil ID were linked together. This created a
conception history for girls who moved house between conceptions.
5. The resulting file was joined to NPD data for girls who did not have a conception
6. Explanatory variables from the NPD were merged into the final data file.
The final dataset contained one record for each girl on the NPD with linked in conception data. A
single matching variable was added describing the match status of the first conception as matched,
possible match or non-match. Possible matches were where two or more NPD records had linked to
a single conception history (but see point 2 below).
During the process of creating the file two small errors occurred:
1. In the initial matching (without names) there were 2,697 abortions followed by one or more
further conceptions that linked to a single NPD record. These were categorised as unique or
‘true’ matches. Most of these records were misclassified in the final dataset as possible
matches. The algorithm created to carry out the matching checked whether the names of
NPD and conception records matched before assigning match status. Since abortion records
were not matched using names they were reclassified as possible matches.
2. In the original matching there were 533 conception histories where a girl had a maternity
followed by one or more further conceptions which did not match at the first stage. When
names were added to the algorithm 251 of these records matched. Unfortunately only the
6
first conception (the maternity) was added to the analysis dataset. Of these 251, 134 were
for a maternity followed by an abortion and 117 for a maternity followed by a maternity.
The impact of this error will be on the analyses of repeat maternities in cohort A. Out of the
117 repeat maternities, 59 were to girls in cohort A.
3.1
Research access
Identifiable information (such as postcodes, names, dates of birth) was removed from the matched
data file. The resulting file was loaded into the ONS secure Virtual Microdata Laboratory (VML)8.
The Approved Researchers accessed the data in the VML at ONS to carry out their analyses. Outputs
were disclosure controlled by ONS before being released to the researchers.
4
Conclusions
This report shows that it is possible to link NPD pupil data to ONS conception data. However using
only date of birth and postcode match rates were low, particularly for conceptions leading to a
maternity. For conceptions to girls under 18 years the overall match rate was 55 per cent, with rates
of 67 per cent for abortions and 44 per cent for maternities.
Including names in the matching algorithm for conceptions leading to maternities increased the
match rate for these conceptions to 71 per cent.
There is clear evidence of differences between the matched sample and conceptions that did not
match.
•
•
•
Match rates were lower for maternities where the birth either occurred within marriage or
was registered by 2 people at the same address.
Match rates were lower for conceptions to girls living in deprived areas.
Match rates were lower in London and East of England.
A number of possible reasons for non-matches were identified:
•
•
•
Girls who moved house and changed their postcode of residence during their secondary
career.
The NPD includes nearly all girls under the age of 16, but it does not include records for
those at sixth form colleges and other forms of higher education. If a girl moved after she
left maintained school and then conceived the conception would not match.
Girls living in care are more likely to change address than other girls. The matched sample
probably does not include as many girls in care as it should.
Acknowledgement
ONS would like to acknowledge the assistance of Sarah Butt, formerly of the Department of
Education, in supplying data and background information.
7
References
1. Office for National Statistics. Conception statistics, England and Wales, available at:
http://www.ons.gov.uk/ons/rel/vsob1/conception-statistics--england-andwales/2010/index.html2010
2. Information on the National Pupil Database is available at :
http://www.education.gov.uk/vocabularies/educationtermsandtags/5385
3. Information on the School Census is available at:
http://www.education.gov.uk/rsgateway/schoolcensus.shtml
4. Figures derived from :
Department for Education: Schools, Pupils and their characteristics 2011 Tables 9b,9c and 9d
available at:
http://www.education.gov.uk/researchandstatistics/statistics/statistics-bytopic/schoolpupilcharacteristics/a00196810/schools-pupils-and-their-characteristics-january-2
5. Department for Communities and Local Government 2010. Available at :
http://www.communities.gov.uk/publications/corporate/statistics/indices2010
6. Department for Education: Participation in Education, Training and Employment by 16-18 Year
Olds in England, Table 6 available at:
http://www.education.gov.uk/rsgateway/DB/SFR/s000938/index.shtml
7. Department for Education, Pupil Census 2007/8. Personal communication from Tim Thair.
8. Virtual Microdata Laboratory. Office for National Statistics. Details of this service are available
at:
http://www.ons.gov.uk/ons/about-ons/who-we-are/services/virtual-microdatalaboratory/index.html
8
Table 3
Group
Number and percentage of conceptions to girls under 18 leading to a maternity matched to education records using different matching strategies Status of match using date of birth and postcode only
1
Unique match
2
Matched to more than 1 NPD record
Number
Additional variables used
Quality (Group 3 only)
18,806
Forenames
Forenames
Automated
Manual
Total
191
Forenames
(one ONS record)
3
Matched to more than 1 NPD record
57
Forenames
(more than 1 ONS record)
3
Did not match at all 23,909
42,963
Number of matches
Percentage matched
Total
17,895
791
18,686
17,903
903
18,806
99.3
86.8
99.4
Manual 189
191
99.0
Total
189
191
99.0
Automated
24
57
42.1
Total
24
57
42.1
Date of birth,forename,surname, maiden name
1
Automated
9,723
23,909
40.7
Date of birth,forename, first four characters of surname, maiden name
2
Automated
130
14,186
0.9
Date of birth, forenames
3
Automated + manual check
Total
1,648
14,056
11.7
Did not match Total
Method
Conceptions in England in 3 cohorts, 2003‐2008
11,501
12,408
30,400
70.8
Numbers and per cent of conceptions matched to education records Table 4
Conceptions Matching using only leading to an postcode and date of abortion
birth
Age in years
Total number
Under 15
15
16
17
3,817
9,303
16,140
14,337
Number matched Percentage to 1 NPD
matched1
Conceptions in England in 3 cohorts, 2003‐2008
Conceptions leading to a maternity
Matching using only postcode and date of birth
Total number
Number matched to 1 NPD
Percentage matched1
Number matched using names
Percentage matched1
Matching using names
2,736
6,769
10,722
8,775
71.7
72.8
66.4
61.2
2,116
6,564
16,117
18,166
1,431
3,977
7,161
6,237
67.6
60.6
44.4
34.3
1,650
4,999
11,337
12,414
78.0
76.2
70.3
68.3
29,002
66.5
42,963
18,806
43.8
30,400
70.8
Under 18
43,597
1 Unique matches only included
Table 5
Conceptions in England in 3 cohorts, 2003‐2008
Numbers of cases and percentage of maternities matched to education records by type of registration
Conceptions leading to a Using only postcode and maternity
date of birth
Number Percentage matched
Total number matched
Age1
Adding names
Number matched Percentage matched
Births inside marriage
Under 15
15
16
17
Under 18
25
71
350
621
2
9
77
110
8.0
12.7
22.0
17.7
4
10
121
198
16.0
14.1
34.6
31.9
1,067
198
18.6
333
31.2
Births outside marriage to parents living at the same address
Under 15
15
16
17
Under 18
219
1,401
5,305
7,264
102
470
1,102
1,152
46.6
33.5
20.8
15.9
139
893
3,255
4,601
63.5
63.7
61.4
63.3
14,189
2,826
19.9
8,888
62.6
Births outisde marriage to parents living at different addresses
Under 15
15
16
17
Under 18
807
2,733
6,184
6,396
613
1,968
3,733
3,245
76.0
72.0
60.4
50.7
684
2,276
4,904
4,883
84.8
83.3
79.3
76.3
16,120
9,559
59.3
12,747
79.1
1,065
2,359
4,278
3,885
714
1,530
2,249
1,730
67.0
64.9
52.6
44.5
823
1,820
3,057
2,732
77.3
77.2
71.5
70.3
11,587
6,223
53.7
8,432
72.8
Sole registrations
Under 15
15
16
17
Under 18
1
Age at conception
Table 6
Numbers of cases and percentage of maternities matched to education records by region
Conceptions leading to a maternity
Using only postcode and date of birth
Total number
Number Percentage matched
matched
Conceptions in England in 3 cohorts, 2003‐2008
Conceptions leading to an abortion
Matching using names
Using only postcode and date of birth
Number matched
Percentage matched
Total number
Number matched
Percentage matched
Region
North East
North West
Yorkshire and The
Humber
East Midlands
West Midlands
East of England
London
South East
South West
England
3,077
7,138
1,548
3,308
50.3
46.3
2,243
5,199
72.9
72.8
2,372
6,731
1,682
4,539
70.9
67.4
5,716
4,060
5,489
3,720
4,651
5,406
3,706
2,530
1,838
2,510
1,456
1,717
2,347
1,552
44.3
45.3
45.7
39.1
36.9
43.4
41.9
4,159
3,018
4,012
2,562
2,444
3,983
2,780
72.8
74.3
73.1
68.9
52.5
73.7
75.0
4,794
3,373
5,292
3,915
7,345
5,937
3,838
3,317
2,280
3,591
2,551
4,449
4,055
2,538
69.2
67.6
67.9
65.2
60.6
68.3
66.1
42,963
18,806
43.8
30,400
70.8
43,597
29,002
66.5
Table 7a
Conceptions in England in 3 cohorts, 2003‐2008
Match rates by IDACI quintile ‐ conceptions to under 18s
Conceptions leading to an abortion
Conceptions leading to a maternity
Using only postcode and date of birth
Matching using names
Using postcode and date of birth
Number
Matched
Percentage matched
Matched
Percentage matched
Number
Percentage matched
Matched
1
2
3
4
5
18,467
11,506
6,816
3,866
2,308
7,852
5,051
2,962
1,789
1,152
42.5
43.9
43.5
46.3
49.9
12,448
8,242
5,033
2,905
1,772
67.4
71.6
73.8
75.1
76.8
12,970
10,112
7,841
6,686
5,988
8,145
6,717
5,278
4,663
4,199
62.8
66.4
67.3
69.7
70.1
England
42,963
18,806
43.8
30,400
70.8
43,597
29,002
66.5
IDACI Quintile
Table 7b
Conceptions in England in 3 cohorts, 2003‐2008
Match rates by IDACI quintile ‐ conceptions to under 16s
Conceptions leading to an abortion
Conceptions leading to a maternity
Using only postcode and date of birth
Number
Matched Match rate
Matching using names
Using postcode and date of birth
Matched
Percentage matched
Number
Matched
Percentage matched
IDACI Quintile
1
2
3
4
5
3,898
2,313
1,292
712
465
2,361
1,486
814
440
307
60.6
64.2
63.0
61.8
66.0
2,887
1,809
1,008
555
390
74.1
78.2
78.0
77.9
83.9
4,062
3,194
2,322
1,921
1,621
2,784
2,343
1,699
1,452
1,227
68.5
73.4
73.2
75.6
75.7
England
8,680
5,408
62.3
6,649
76.6
13,120
9,505
72.4
Single conception
for date of birth/
postcode
Conceptions to girls in
cohorts A B and C under 18
N=91,476
Data in matching rate
tables
Matched records in
tables
Match on date of
birth/postcode
Single
conception
N=81,742
NPD records
1 record per
postcode
Match on date of
birth/ postcode
Possible
match
>1 NPD
matches
N=554
Unique
match
N=44,207
Maternity
N=17,902
Abortion
N=26,305
Match using
names
Match
N=17,791
No
match
111
Maternity
N= 191
Abortion
N=363
Match using
names
Match
N= 189
No match
N=36,981
Maternity
N=23,356
Abortion
N=13,625
Match using
names
No match
N= 2
Match
N=11,250
No match
N=12,106
More than one
conception for
date of birth/
postcode
Conceptions to girls in
cohorts A B and C under 18
N= 91,476
Match on date of
birth/postcode
Data in matching rate tables
More than 1
conception
N= 9,733
Matched records in
tables
NPD records
1 record per
postcode
AA = two abortions
AM = abortion followed by maternity
Match on date of birth/
postcode
1st
conception
N= 4,730
Subsequent
conceptions
N=5,003
No
match
N=1,049
Abortion
N=496
Unique
match
N=3,601
Maternity
N=553
Abortion
N=2,697
Match
using
names
Matched
N=251
Not
matched
N=302
Possible
match
N=80
Maternity
N=904
AA=1,640
AM=1,057
Matched
N=895
Abortion
N=56
No
match
N=1,095
Maternity
N=24
Match
using
names
Not
matched
N=9
Unique
match
N=3,820
Match using
names
Matched
N=24
Not
matched
N=0
Possible
match
N=88
Download