Comparing linked maternity data sets to check data quality in SPSS Preeti Datta-Nemdharry, Nirupa Dattani and Alison Macfarlane Background (1) Birth registration • By law, live births must be registered within 42 days of birth • Information recorded from parents is mainly sociodemographic, such as names, address of residence, occupation of parents, marital status and country of birth Background (2) NHS Numbers for Babies (NN4B) • Central Issuing System introduced in 2002 for issuing NHS numbers at birth for babies born in England, Wales and the Isle of Man • A small set of data is collected, including gestational age for live births, ethnicity of baby and date and time of birth Background (3) Maternity Hospital Episode Statistics (HES) • Data should be collected for all births occurring in England • Core admitted patient care record for mother plus ‘maternity tail’ with details of delivery and the baby. • Core birth record for baby plus ‘baby tail(s)’ Background (4) National Community Child Health database (NCCHD) and Patient Episode Database for Wales (PEDW) • Data collected for all births occurring in Wales • Information collected on maternity similar to HES Method • Link data for 2005 and 2006 for England and Wales • Phase 1 involving linkage of birth registration data to NN4B data • Phase 2 involving linkage of registration/NN4B data to Maternity HES for England and Child Health/PEDW databases for Wales Method cont… Phase 2 • Linkage to maternity HES carried out by Northgate Solutions using algorithm devised by City University • Key data items for linkage, e.g. NHS no, DOB and unique ID compiled by ONS sent to Northgate solutions for linkage • Linkage to Child Health and PEDW databases carried out by NHS Wales Informatics Service using the same algorithm After the linkage was done… • HES records, linked to registration/NN4B data, had multiple records for the same mother for each episode. • So needed to omit the duplicates by keeping records with most information. • Ensure one-to-one linkage to registration/NN4B Identifying duplicates, triplicates.. • GET • FILE='C:\Users\trial\Desktop\exampleHES.sav'. • Dataset name DataSet1 Window=Front. • * Identify Duplicate Cases after sorting by id and within id by epikeys. • Dataset activate Dataset1. • Sort cases by id(D) epikeys(D). /* sorts the cases first by id(D) and then by epikeys(D)*/. • compute flag=1. /*computes a variable called flag with default value of 1 */. • if id=lag(id) flag=0. /*replaces any initial ‘1’ value to 0 if id = the same id in the row before*/. • exe. id and epikey sorted – descending 1.00 allocated to the highest epikey per id Creating a file with only one id per row… • *Create wodups - without duplicates dataset. • Dataset Activate dataset1. /*exampleHES dataset is the active dataset */. • Dataset copy wodups. • Select if (flag=1). /*selecting the record with the most information ie the highest epikey*/. • Exe. Merge with exampleNN4BREG data • • • • • *merge exampleHES with exampleNN4BREG. *first sort the key variable e.g. id. *main dataset. Dataset activate wodups. Sort cases by id(A). /*make sure the cases are sorted in both the datasets */. • *dataset to be merged. • Dataset Activate NN4BREG. • Sort cases by id(A). • *merging. • Match files file=wodups. • /file=NN4BREG • /by id. • Exe. Data quality checks • Quality of maternity HES based on completeness and consistency of the HES data in relation to birth registration data where ever possible • NN4B data used to validate maternity HES where information not available from registration. Missing data • • • • • *Missing data - for string variables eg NHS No. Dataset activate wodups. missing values NHSnoHES (" "). freq var = NHSnoHES/format=notable. /*gives only the total numbers */. • • • • • *OR. compute var1 = (length(rtrim(NHSnoHES)) = 0). execute. desc var = var1 /statistics = sum. • *Missing data - for dates, after checking formats. • freq var=dobHES/format=notable. • *Missing data for numeric variables e.g. birthweight. • Freq var=birthweightHES/format=notable • *OR. • Compute noBWT=missing(birthweightHES). /*codes 1 as missing */. • Exe. • • • • Cross checking dates… *Cross checking baby's dob *1) Formatting dates. *if one date is string - reformat to date. Compute datevar2=Number(dobReg,ADATE10). /*converting date in string eg 01/01/2005 into date format*/. • Formats datevar2 (ADATE10). • Execute. • *if both are in date format but need to reformat into eg yyyy/mm/dd. • formats dobHES (sDate10). /*other way around ie mm/dd/yyyy (aDate10) */. • execute. • *2) cross checking dates. • compute equal=dobHES=dobReg. /*gives value of 1 =same dates and 0 = dates differ*/. • Execute. • freq var=equal/format=notable. /* shows how many are equal*/. Birthweight • *cross checking birthweight between two datasets. • *one way- create another variable which will give value of 0 if not equal and 1 if equal. • DATASET ACTIVATE wodups. • Compute birthweight3=birthweightHES= • birthweightReg. • Execute. • Freq var birthweight3. • • • • • • • • • *OR group birthweight into categories and see how many cases fall into each category. *recoding birthweight data for HES. Recode birthweightHES (0=0) (9998=0) (MISSING=0) (1 thru 499=1) (500 thru 999=2) (1000 thru 1499=3) (1500 thru 1999=4) (2000 thru 2499=5) (2500 thru 2999=6) (3000 thru 3499=7) (3500 thru 3999=8) (4000 thru 4499=9) (4500 thru 4999=10) (5000 thru 5499=11) (5500 thru Highest=12) INTO BWTgroupHES. Var labels BWTgroupHES 'BWTgroupHES'. Exe. • • • *recoding birthweight data for registration. Recode birthweightReg (0=0) (9998=0) (MISSING=0) (1 thru 499=1) (500 thru 999=2) (1000 thru 1499=3) (1500 thru 1999=4) (2000 thru 2499=5) (2500 thru 2999=6) (3000 thru 3499=7) (3500 thru 3999=8) (4000 thru 4499=9) (4500 thru 4999=10) (5000 thru 5499=11) (5500 thru Highest=12) INTO BWTgroupReg. Var labels BWTgroupReg 'BWTgroupReg'. Exe. • • • • • Crosstabs /tables=birthweightHES BY birthweightReg /format=avalue tables /cells=count /*row column-If want row percentage or column percentage */. /count round cell. • Gestational age • *recoding gestational age data. • Recode gestNN4B (0=0) (missing=0) (1 thru 21=1) (44 thru Highest=2) (Else=Copy) into GestGroupNN4B. • Var Labels GestGroupNN4B 'GestGroupNN4B'. • Execute. • Recode gestHES (0=0) (missing=0) (1 thru 21=1) (44 thru Highest=2) (else=Copy) into GestGroupHES. • Var labels GestGroupHES 'GestGroupHES'. • Execute. • Crosstabs • /tables=GestGroupHES BY GestGroupNN4B • /format=avalue tables • /cells=count row column total • /count round cell. Ethnicity • *Recoding ethnicity. • Recode ethnicNN4B ('A'=1) ('B'=1) ('C'=1) ('D'=9) ('E'=9) ('F'=9) ('G'=9) ('H'=2) ('J'=3) ('K'=4) ('L'=9) ('M'=6) ('N'=5) ('P'=7) ('R'= • 8) ('S'=9) ('Z'=10) (missing=10) into ethnicgroupNN4B. • Var labels ethnicgroupNN4B 'ethnicgroupNN4B'. • Execute. • Recode ethnicHES ('A'=1) ('B'=1) ('C'=1) ('D'=9) ('E'=9) ('F'=9) ('G'=9) ('H'=2) ('J'=3) ('K'=4) ('L'=9) ('M'=6) ('N'=5) ('P'=7) ('R'= • 8) ('S'=9) ('Z'=10) (missing=10) into ethnicgroupHES. • Var labels ethnicgroupHES 'ethnicgroupHES'. • Execute. • *also rename the variable values into the relevant ethnic group. Results 91% of maternity HES delivery records could be linked to the birth registration/NN4B records Linked records for singleton births with missing data items in common data fields, 2005 NN4B Number Registration Maternity HES Percent Number Percent Number Percent Mother NHS No 164,458 30 NA NA 16,685 3 Mother’s DOB 960 0.2 0 0 0 0 Ethnicity 59,865 11 NA NA 77,771 14 Gestation 3,829 1 NA NA 264,877 48 Birthweight 2,721 1 874 0.2 135,144 25 Birth status 615 0.1 0 0 176,455 32 1,098 0.2 0 0 144,115 26 Sex baby Comparison of sex for singletons in the linked records, 2005 Maternity HES* Male Female Total Birth registration Male Female Total Percentage 204,613 791 205,404 51 2,814 196,524 199,338 49 207,427 197,315 404,742 100 Concordance in data items between NN4B and maternity HES, 2005 Stated Missing Concordance where stated Percentage Birthweight* 75 25 99 Gestational age 52 48 89 Ethnicity 81 19 87 * using birth registration rather than NN4B Conclusion • Good linkage rate was obtained • To gain maximum benefit, data quality and completeness needs to improve in maternity HES • SPSS is useful in data quality checks.