Comparing linked maternity data sets to check data quality in SPSS

Comparing linked maternity
data sets to check data quality
in SPSS
Preeti Datta-Nemdharry, Nirupa Dattani and Alison
Macfarlane
Background (1)
Birth registration
• By law, live births must be registered within 42 days
of birth
• Information recorded from parents is mainly sociodemographic, such as names, address of residence,
occupation of parents, marital status and country of
birth
Background (2)
NHS Numbers for Babies (NN4B)
• Central Issuing System introduced in 2002 for issuing
NHS numbers at birth for babies born in England,
Wales and the Isle of Man
• A small set of data is collected, including gestational
age for live births, ethnicity of baby and date and
time of birth
Background (3)
Maternity Hospital Episode Statistics (HES)
• Data should be collected for all births occurring in
England
• Core admitted patient care record for mother plus
‘maternity tail’ with details of delivery and the baby.
• Core birth record for baby plus ‘baby tail(s)’
Background (4)
National Community Child Health database
(NCCHD) and Patient Episode Database for
Wales (PEDW)
• Data collected for all births occurring in Wales
• Information collected on maternity similar to HES
Method
• Link data for 2005 and 2006 for England and
Wales
• Phase 1 involving linkage of birth registration
data to NN4B data
• Phase 2 involving linkage of registration/NN4B
data to Maternity HES for England and Child
Health/PEDW databases for Wales
Method cont…
Phase 2
• Linkage to maternity HES carried out by
Northgate Solutions using algorithm devised
by City University
• Key data items for linkage, e.g. NHS no, DOB
and unique ID compiled by ONS sent to
Northgate solutions for linkage
• Linkage to Child Health and PEDW databases
carried out by NHS Wales Informatics Service
using the same algorithm
After the linkage was done…
• HES records, linked to registration/NN4B data,
had multiple records for the same mother for
each episode.
• So needed to omit the duplicates by keeping
records with most information.
• Ensure one-to-one linkage to
registration/NN4B
Identifying duplicates, triplicates..
• GET
• FILE='C:\Users\trial\Desktop\exampleHES.sav'.
• Dataset name DataSet1 Window=Front.
• * Identify Duplicate Cases after sorting by id and within id by
epikeys.
• Dataset activate Dataset1.
• Sort cases by id(D) epikeys(D). /* sorts the cases first by id(D)
and then by epikeys(D)*/.
• compute flag=1. /*computes a variable called flag with
default value of 1 */.
• if id=lag(id) flag=0. /*replaces any initial ‘1’ value to 0 if id =
the same id in the row before*/.
• exe.
id and epikey
sorted –
descending
1.00 allocated to the
highest epikey per id
Creating a file with only one id per
row…
• *Create wodups - without duplicates dataset.
• Dataset Activate dataset1. /*exampleHES dataset is
the active dataset */.
• Dataset copy wodups.
• Select if (flag=1). /*selecting the record with the
most information ie the highest epikey*/.
• Exe.
Merge with exampleNN4BREG data
•
•
•
•
•
*merge exampleHES with exampleNN4BREG.
*first sort the key variable e.g. id.
*main dataset.
Dataset activate wodups.
Sort cases by id(A). /*make sure the cases are sorted in both
the datasets */.
• *dataset to be merged.
• Dataset Activate NN4BREG.
• Sort cases by id(A).
• *merging.
• Match files file=wodups.
• /file=NN4BREG
• /by id.
• Exe.
Data quality checks
• Quality of maternity HES based on
completeness and consistency of the HES data
in relation to birth registration data where
ever possible
• NN4B data used to validate maternity HES
where information not available from
registration.
Missing data
•
•
•
•
•
*Missing data - for string variables eg NHS No.
Dataset activate wodups.
missing values NHSnoHES (" ").
freq var = NHSnoHES/format=notable.
/*gives only the total numbers */.
•
•
•
•
•
*OR.
compute var1 = (length(rtrim(NHSnoHES)) = 0).
execute.
desc var = var1
/statistics = sum.
• *Missing data - for dates, after checking formats.
• freq var=dobHES/format=notable.
• *Missing data for numeric variables e.g. birthweight.
• Freq var=birthweightHES/format=notable
• *OR.
• Compute noBWT=missing(birthweightHES). /*codes 1 as
missing */.
• Exe.
•
•
•
•
Cross checking dates…
*Cross checking baby's dob
*1) Formatting dates.
*if one date is string - reformat to date.
Compute datevar2=Number(dobReg,ADATE10). /*converting date
in string eg 01/01/2005 into date format*/.
• Formats datevar2 (ADATE10).
• Execute.
• *if both are in date format but need to reformat into eg
yyyy/mm/dd.
• formats dobHES (sDate10). /*other way around ie mm/dd/yyyy (aDate10) */.
• execute.
• *2) cross checking dates.
• compute equal=dobHES=dobReg. /*gives value of 1 =same dates
and 0 = dates differ*/.
• Execute.
• freq var=equal/format=notable. /* shows how many are equal*/.
Birthweight
• *cross checking birthweight between two datasets.
• *one way- create another variable which will give value of 0 if not equal
and 1 if equal.
• DATASET ACTIVATE wodups.
• Compute birthweight3=birthweightHES=
• birthweightReg.
• Execute.
• Freq var birthweight3.
•
•
•
•
•
•
•
•
•
*OR group birthweight into categories and see how many cases fall into each
category.
*recoding birthweight data for HES.
Recode birthweightHES (0=0) (9998=0) (MISSING=0) (1 thru 499=1) (500 thru
999=2) (1000 thru 1499=3) (1500 thru 1999=4) (2000 thru 2499=5) (2500 thru
2999=6) (3000 thru 3499=7) (3500 thru 3999=8) (4000 thru 4499=9) (4500
thru 4999=10) (5000 thru 5499=11) (5500 thru
Highest=12) INTO BWTgroupHES.
Var labels BWTgroupHES 'BWTgroupHES'.
Exe.
•
•
•
*recoding birthweight data for registration.
Recode birthweightReg (0=0) (9998=0) (MISSING=0) (1 thru 499=1) (500 thru
999=2) (1000 thru 1499=3) (1500 thru 1999=4) (2000 thru 2499=5) (2500 thru
2999=6) (3000 thru 3499=7) (3500 thru 3999=8) (4000 thru 4499=9) (4500
thru 4999=10) (5000 thru 5499=11) (5500 thru
Highest=12) INTO BWTgroupReg.
Var labels BWTgroupReg 'BWTgroupReg'.
Exe.
•
•
•
•
•
Crosstabs
/tables=birthweightHES BY birthweightReg
/format=avalue tables
/cells=count /*row column-If want row percentage or column percentage */.
/count round cell.
•
Gestational age
• *recoding gestational age data.
• Recode gestNN4B (0=0) (missing=0) (1 thru 21=1) (44 thru
Highest=2) (Else=Copy) into GestGroupNN4B.
• Var Labels GestGroupNN4B 'GestGroupNN4B'.
• Execute.
• Recode gestHES (0=0) (missing=0) (1 thru 21=1) (44 thru
Highest=2) (else=Copy) into GestGroupHES.
• Var labels GestGroupHES 'GestGroupHES'.
• Execute.
• Crosstabs
• /tables=GestGroupHES BY GestGroupNN4B
• /format=avalue tables
• /cells=count row column total
• /count round cell.
Ethnicity
• *Recoding ethnicity.
• Recode ethnicNN4B ('A'=1) ('B'=1) ('C'=1) ('D'=9) ('E'=9) ('F'=9) ('G'=9)
('H'=2) ('J'=3) ('K'=4) ('L'=9) ('M'=6) ('N'=5) ('P'=7) ('R'=
•
8) ('S'=9) ('Z'=10) (missing=10) into ethnicgroupNN4B.
• Var labels ethnicgroupNN4B 'ethnicgroupNN4B'.
• Execute.
• Recode ethnicHES ('A'=1) ('B'=1) ('C'=1) ('D'=9) ('E'=9) ('F'=9) ('G'=9) ('H'=2)
('J'=3) ('K'=4) ('L'=9) ('M'=6) ('N'=5) ('P'=7) ('R'=
•
8) ('S'=9) ('Z'=10) (missing=10) into ethnicgroupHES.
• Var labels ethnicgroupHES 'ethnicgroupHES'.
• Execute.
• *also rename the variable values into the relevant ethnic group.
Results
91% of maternity HES delivery records could
be linked to the birth registration/NN4B
records
Linked records for singleton births with missing
data items in common data fields, 2005
NN4B
Number
Registration
Maternity HES
Percent
Number
Percent
Number
Percent
Mother
NHS No
164,458
30
NA
NA
16,685
3
Mother’s
DOB
960
0.2
0
0
0
0
Ethnicity
59,865
11
NA
NA
77,771
14
Gestation
3,829
1
NA
NA
264,877
48
Birthweight
2,721
1
874
0.2
135,144
25
Birth
status
615
0.1
0
0
176,455
32
1,098
0.2
0
0
144,115
26
Sex baby
Comparison of sex for singletons in the linked
records, 2005
Maternity
HES*
Male
Female
Total
Birth registration
Male
Female
Total
Percentage
204,613
791
205,404
51
2,814
196,524
199,338
49
207,427
197,315
404,742
100
Concordance in data items between NN4B and
maternity HES, 2005
Stated
Missing
Concordance
where stated
Percentage
Birthweight*
75
25
99
Gestational age
52
48
89
Ethnicity
81
19
87
* using birth registration rather than NN4B
Conclusion
• Good linkage rate was obtained
• To gain maximum benefit, data quality and
completeness needs to improve in maternity
HES
• SPSS is useful in data quality checks.