Detailed Report SANT Data Link Demonstration Project

advertisement
Child Development and Education
Research Partnership Project
Child Development and Education Research Partnership Project
Page 1
Contents
Page
1. Background ……………………………………………………………………..
3
2. Aims and objectives ……………………………………………………………
4
3. Project rationale ………………………………………………………………...
7
4. Data-linkage methodology ……………………………………………………..
8
5. Study sample and data…………………………………………………………
9
6. Project outcomes ……………………………………………………………….
11
7. Anticipated timeline …………………………………………………………….
13
Appendix 1. Data items selected for linkage analysis ………………………….
15
A1. Department of Education Data ……………………………………………….
15
A2. Department of Health Data ……………………………………………………
17
A3. DEEWR Data (AEDI) …………………………………………………………..
23
Child Development and Education Research Partnership Project
Page 2
1. Background
The NT Government’ has contributed $400,000 over the past four years as a founding
partner of the SA NT Datalink consortium. This has enabled the development of the
technical capacity to support research being conducted through the de-identified linkage of
NT population datasets to conduct policy-relevant analyses not previously possible.
From 2009 - 2013, Menzies CCDE has partnered with NT Health Gains and Planning to
conduct the SA NT Datalink Early Childhood Development Demonstration Study (Silburn,
Lynch, Guthridge & McKenzie, 2009). The design of the NT study was developed in
collaboration with Professor John Lynch, Professor of Public Health, University of Adelaide,
so that a parallel study could be conducted in South Australia with comparable SA data
using key variables relevant to children’s early health and development matched to those
used in the NT study.
The objectives of both the NT and SA demonstration studies have been to:
a) develop robust methods to link and de-identify population-wide perinatal health, early
child development, school education and other relevant datasets for the purpose of
describing the population dynamics of childhood growth and development;
b) document the precision, consistency and completeness of the data-linkage across
different database, and establish the number of unique individuals within each
dataset and across datasets,
c) demonstrate the feasibility of using de-identified linked data to quantify the
prevalence, associations and consequences of factors in early life of key relevance to
children’s longer-term outcomes in health, behaviour and learning.
The current status of the NT data-linkage demonstration study is that most of the
lead up work required in the de-identification and linkage of selected data items from
each of the first four datasets being used in the study is now complete. This has
required securing HREC and agency approvals for the creation of the anonymous
linkage-keys and the extraction of the relevant data items attached to their linkage
keys from the respective health, education and AEDI datasets for all children born in
the NT from 1993 to 2009.
The reliability of the linkage keys used for de-identified linkage of perinatal,
immunization, school enrolment, attendance, NAPLAN and the 2009/10 AEDI data
has been examined to document the consistency and quality of key identifying
variables and content variables of particular interest data within each dataset (e.g.
date of birth, gender, indigenous status).
New statistical methods of multiple imputation have been tested and are being used
to correct discrepancies between identifying variables across datasets to minimise
sample loss when performing analyses involving data on individuals drawn from
multiple datasets.
The success of the SA NT Datalink Demonstration Study in establishing the
Child Development and Education Research Partnership Project
Page 3
feasibility of data-linkage analysis to make better use of existing NT administrative
data has provided the foundation for the proposed ‘Project’ component of the NTGMenzies CCDE Research Partnership.
2. Aims and objectives
2.1 Aims
The overall aim of the NTG-Menzies CCDE Research Partnership project is to build
upon the experience of the SA NT-Datalink Early Childhood Development Study in
creating a NT specific study population based on de-identified linked data spanning
the first antenatal health care visit through to school year 9 covering the 1993-2006
birth cohorts.
This will involve a three year program of research aimed at identifying specific early
life conditions and experiences that adversely or beneficially influence child
outcomes with a view to informing and supporting policies and programs that have
the most likelihood of success in improving child outcomes in the NT.
2.2 Objectives
Objective 1. Investigate and report the social, individual, health and family
factors that influence achievement in AEDI and NAPLAN literacy and numeracy
tests among Northern Territory children.
The key research question to address this objective is “given the unique sociodemographic characteristics of the NT what are the most salient and potentially
modifiable early life determinants which should be the focus of policy and practice to
improve the longer-term human capability of the NT population?” Establishing the
associations between AEDI and NAPLAN outcomes with early life sociodemographic and health circumstances will require controlling for potential
confounding and mediating effects, examining effects in sub populations and
identifying multi-level effects. The covariates to be considered in these analyses will
depend on the specific questions but will include such family-level variables as
maternal education, age, occupation etc and community/neighbourhood-level
characteristics assessed through community-level indices of environmental health
(e.g. housing overcrowding), social functioning (e.g. per capita rates of child
protection and domestic violence notifications, police call-outs etc) and family support
of school education (e.g. average annual school attendance rates).
Research questions related to social factors could include:
a) Identifying the relative contribution of family socio-economic factors (including
parental education and occupation and area of residence) that influence school
readiness and achievement in literacy and numeracy tests,
b) Identifying the relative contribution of socio-demographic factors (including family
Child Development and Education Research Partnership Project
Page 4
size and structure, ethnicity, language background, maternal and paternal age and
family mobility) on school readiness, school attendance and achievement in literacy
and numeracy tests.
Research questions relating to health factors that influence school readiness
(AEDI), attendance and achievement in literacy and numeracy tests (NAPLAN)
among NT children are:
a) Determining the relative contribution of clinical factors (including birth weight,
Apgar score, birth length, head circumference, infant growth and nutritional status
(GAA data) that influence school readiness and achievement on literacy and
numeracy tests,
2) Identifying the relative contribution of maternal and child health factors (including
mother’s gestational health and medical conditions/health issues identified from birth)
(e.g. early childhood anaemia) that influence school readiness and achievement in
literacy and numeracy tests.
The final research question under this objective concerns the combined effect of all
of these factors and the appropriate covariates; what are the interactions between
them and which factors can enhance or mute the beneficial or detrimental effect of
others, and are there discrete developmental pathways which can be identified for
different groups of children.
OBJECTIVE 2: Investigate the significant differences between the number of
children in birth, school enrolment and school attendance cohorts.
There is significant mobility of children between the Northern Territory and other
states and territories which is poorly described in the currently available data. There
are also significant numbers of children that remain in the Northern Territory but do
not engage with the school system (not enrolling or not attending regularly). Can the
information available from the linked datasets inform our understanding of these
cohorts of children?
The specific research questions that require investigation include:
a) what are the numbers and demographic profiles of children entering or
leaving the NT in their early years?
b) How many children of school age are not enrolled in school or attend
irregularly?
c) What are their demographic characteristics? How useful is linked birth and
education data in studying these questions?
Objective 3. Investigate the child health, parental, family and community
factors that relate to children’s vulnerability to child abuse and neglect and the
longer-term developmental consequences of such vulnerability.
Child Development and Education Research Partnership Project
Page 5
Child protection notifications and substantiations are a significant indication of the
disadvantage and vulnerability of Aboriginal children and families (AIHW, 2012). In
2010-2011, Aboriginal and Torres Strait Islander children were almost 8 times as
likely to be the subject of substantiated child abuse and neglect as non-Indigenous
children (rates of 34 .6 and 4.5 per 1,000 children, respectively). In June 2011, the
rate of Aboriginal and Torres Strait Islander children on care and protection orders
was over 9 times the rate of non-Indigenous children (rates of 51.4and 5.4 per 1,000
children, respectively). Similarly, the rate of Aboriginal and Torres Strait Islander
children in out -of-home care was 10 times the rate of non-Indigenous children (rate
of 51.7 and 5.1per 1,000 children, respectively).
Concerns about the level of Indigenous over-representation in out of home
placements constituting another ‘Stolen Generation’ have been challenged in recent
years by the view that child removal remains a necessary response to the high
prevalence of neglect in some communities where high rates of social adversity,
family breakdown, chronic stress and ill health, low levels of parental education and
employment are reproduced in a ‘vicious cycle’ of disadvantage (Delfabbro et al,
2010).
Given the unique socio-demographic characteristics of the NT there are several
research questions where data-linkage analysis could provide a more nuanced
understanding of the key drivers and consequences of childhood vulnerability in the
NT. Pending discussion with the Office of Children and Families, these questions
could include:
a) what are the most salient and potentially modifiable early life determinants which
should be the focus of policy and practice to reduce children’s vulnerability to
abuse and neglect?
b) what are the combined effects of all of the early life determinants and their
appropriate covariates; what are the interactions between them and which factors
can enhance or mute the beneficial or detrimental effect of others, and are there
discrete developmental pathways of vulnerability which can be identified for
different groups of children?
c) Can the above analysis be used to establish an index of vulnerability which could
be used with the available population data to answer the question of whether
current child protection practice in the NT represents an under- or over-response
of services and agencies, and whether there have been identifiable trends over
time in terms the levels of service response.
d) What are the longer-term pathways of development to age 18 of children who
have been in out-of-home care? Do these outcomes differ with regard to the age
of the child at the time of placement, whether these are kinship placements, the
number and frequency of placements, and the total time spent in alternative
care? Key outcomes which should be examined include: developmental
functioning at age 5 years (AEDI), school attendance and retention, academic
outcomes (NAPLAN years 3, 5, 7 & 9), contact with the juvenile justice and
mental health systems etc.
Child Development and Education Research Partnership Project
Page 6
Investigating these child protection related questions will involve securing the
approvals needed for the linkage of OCF service data with the already linked
datasets. Once the de-identified data are available, the analysis will involve
establishing the prevalence and relative contribution of early life circumstances that
predict the likelihood of a child’s involvement with the NT child protection system.
This will include consideration of the relative contribution of child clinical factors
(e.g. inter-uterine alcohol and nicotine exposure, birth-weight and peri-natal health
status, infant growth and nutritional status); parental and family factors (e.g.
maternal age and education, parents’ health and mental status, family composition,
functioning and mobility), and; community factors (e.g. housing overcrowding,
indicators of community safety and community social functioning).
Objective 4. Investigating the extent to which NT early childhood development
data and its markers match and diverge from those in South Australia
Given that the SANT-Datalink Early Childhood Development Study was set up to be
done in parallel with a comparable South Australian study there is an opportunity to
Investigating the extent to which NT data and its markers match and diverge from
those in South Australia. While this will need to occur after Research Objectives 1 to
3 have been completed in the NT, Objective 45 is technically feasible but will the
preparation of requests for variation to the existing HREC approvals in the NT and
SA for the merging of two independently confidentialised linked datasets. Each of the
data custodians of the various datasets in each jurisdiction will also need to consent
to their data being combined with the data from the other jurisdictions to address
some broader questions relating to both jurisdictions. On the basis of current
experience and assuming no unforseen complications, this could take anywhere from
12 – 18 months to obtain all the administrative consents and to complete the analysis
and reporting of findings.
3. Project rationale
There is widespread scientific agreement that the early years of a child’s life is of
critical importance in shaping longer-term outcomes in health, development, learning
and wellbeing across the lifespan. The Commonwealth, state and territory
governments through the Council of Australian Governments (COAG) have
established a comprehensive agenda for investing in early childhood development
and wellbeing to “ensure that by 2020 all children have the best start in life to create
a better future for themselves and for the nation.” Key goals of the National Strategy
are to reduce the impact of risk factors on children’s development, reduce
inequalities in outcomes between groups and to improve outcomes for all children.
Building better information and robust evidence was highlighted as one of six
priorities to progress the goals of the National Strategy
(http://www.coag.gov.au/coag_meeting_outcomes/2009-0702/docs/national_ECD_strategy.pdf).
Child Development and Education Research Partnership Project
Page 7
In the Lancet special series on child development in developing countries, Engle et al
(2007b) concluded that the most effective early child development programs are
those which are: targeted towards disadvantaged children; provide services to
younger children (less than age 3); have continued duration throughout early
childhood; are of high-quality, defined by structure (e.g. child-staff ratio, staff training,
processes which allow responsive interactions and a variety of activities), provide
services directly to children and parents and are integrated into existing health
programs. It is acknowledged that children’s development is shaped by a complex
interplay between individual biological factors and a range of social, economic and
environmental factors.
For government policy to be better informed by evidence we need to improve our
understanding of how various factors impact at the population level and for significant
sub populations. Understanding how these factors influence children’s developmental
trajectories and their capacity to participate in life and learning is essential to the
effective targeting and delivery of services and to investigating the extent to which
our policies are working in achieving their stated aims.
While there is a growing body of international work exploring the relationship
between specific risk factors and outcomes in early childhood, much less is known
about how these risk and protective factors cumulatively impact in whole populations.
Given the continuing poor child health and educational outcomes in the NT and the
new policy emphasis on the development and delivery of more effective early
childhood and family support services, it is vital that the design, implementation and
evaluation of these services is based on reliable evidence and a systematic
understanding of the complex interplay between individual, environmental and social
forces shaping the lives of children in the NT population context.
4. Data-linkage methodology
The mechanics of the data linkage process are as follows:
-
The identification/linkage data only (e.g. date of birth and name but not birth weight)
from each dataset is supplied to the SA NT data linkage unit.
-
The SANT-datalink linkage service generate and attach linkage keys (unique to each
individual) to each record supplied.
-
This is returned to the custodians in each agency who attach the linkage keys but
remove the identifying data before supplying the “information” datasets to the
researchers.
-
The researchers now have de-identified data sets but can use the linkage keys to
match records from different sources.
SA NT-Datalink’s systems and protocols are based on the highest ethical and privacy
standards and strong security measures have been implemented to prevent inappropriate
Child Development and Education Research Partnership Project
Page 8
use or disclosure of personal information. Only the data custodians have access to personal
identifying information and only de-identified linkage keys will be provided to the research
team. The linkage process is carefully designed to ensure that no identified information
(other than that used for the actual linkage) is supplied by the data custodians and that they
receive no identifiable data from other sources.
The de-identified datasets with their anonymous linkage keys are stored separately on a
secure computer server at CDU. The nominated Menzies CCDE researchers working on the
project (Messrs Silburn and McKenzie ) have secure access to these de-identified
linkeddatasets..
The data cleaning stage of the project has included cross-validation analysis to examine the
internal consistency/accuracy of the merged datasets and an audit of data completeness
and analysis of possible determinants of missing data and how this might inform the
treatment of missing data through standard multiple imputation methods.
With regard to the public reporting of findings from the analysis our approach is consistent
with the data cell size guidelines for use of AIHW data
http://www.aihw.gov.au/committees/simc/guidelines_statistical_purposes.doc. While there is
no national standard for public reporting of small cell sizes, for the purposes of this project,
we will suppress any positive cell size less than 10 as well as adjacent cells so that back
calculation is technically not possible.
5. Study design and analysis
A range of statistical methods will be used in addressing the four research objectives and
their associated research questions.
Multinomial logistic regression analysis will be used to investigate whether a cumulative risk
index can be developed and to examine its predictive validity using multivariable risk
prediction methods, including ROC curves, discrimination, and net reclassification indices.
This index of cumulative risk will be used to examine its levels, social patterning and
predictive value in different SES, ethnic and geographic groups.
Structural equation modelling (SEM) using LISREL will be used to explore models of
children’s developmental status (AEDI at age 5 years) and successful learning (NAPLAN
year 3 Reading and Numeracy). SEM involves specifying models of predictor variables
available within the various linked data sets covering the period from the first antenatal
health care visit to age 8 years. These variables together represent ‘latent’ constructs such
as ‘prenatal health’, ‘health at birth’, ‘maternal deprivation’, ‘community stress’, ‘family
functioning‘ and ‘developmental readiness for school learning’ etc. SEM then identifies the
strength of association between these ‘latent’ constructs and the key outcomes of interest
(i.e. AEDI and NAPLAN). The final step of SEM is to assess how well each of the
conceptual models match with the observed data (i.e. their ‘goodness of fit’).
Child Development and Education Research Partnership Project
Page 9
A number of different SEM models will be explored for Indigenous and non-Indigenous
children and for special populations such as Indigenous children residing in ‘Town Camps’.
Our analytic team has access to bio-statistical experts who are proficient in the use of these
methods now available with the STATA and SPSS statistical software packages. Multi Level
Modelling methods will also be used to tease out community/school effects from individual
factors using the MLwiN statistical software package.
6. Study sample and data
The following NT population-based administrative datasets have currently been linked by the
SA NT Datalink service: NT Dept of Health: Perinatal health data, client master index (for
data linkage purposes), childhood immunisation data (for identifying study children who may
have re-located from the NT), NT Dept of Education: school enrolment, attendance and
NAPLAN (2008-2011); and DEEWR: AEDI data (2009).
The relevance of the selected data items linked from these datasets to support analysis of
the population-level dynamics of early child development from the first antenatal health care
visit through to school completion is summarised in shaded boxes shown in figure 1 below.
Note: The un-shaded boxes in this figure indicate the additional NTG administrative datasets
which the project will aim to link during the course of the study. The figure also shows the
community-level data relevant to four different developmental epochs which the project aims
to assemble and link for the longitudinal analysis being conducted over the next several
years.
Figure 1. Datasets informing the NTG Menzies CCDE research partnership analysis
Child Development and Education Research Partnership Project
Page 10
Community-level
demographics &
SEP environmental
factors
Community-level
demographics &
SEP environmental
factors
Community-level
demographics &
SEP environmental
factors
Community-level
demographics &
SEP environmental
factors
(perinatal period)
(early childhood years)
(primary school years)
(middle school years)
-9mths
Age 5-6
Birth
Perinatal
dataset
(All NT born
Age 8-9
Age 10-11
Age 12-13
Age 14-15
Immunisation
Dataset
AEDI
dataset
(All enrolments, school, attendance & NAPLAN data)
Education dataset
children
1993-2005)
(NT & non-NT
born children
since 1999)
(Censoring
variable)
(NAPLAN assessments from 2008 onwards)
(School years 3, 5, 7 & 9)
N=
(Children
enrolled in NT
schools at 5
age 5yrs in
2009 & 2012)
Child protection dataset
(Still to be linked)
Child health & hospital dataset
(Still to be linked)
7. Project outcomes
Year 1 project outcomes (June 2013- June 2014)
The main outcomes in the first year of the project will be:
a) Appointment of a CCDE Senior Research Officer to support Data-linkage utilisation
from July 2013 to June 2016.
b) A draft data linkage protocol and communication strategy is prepared
c) A draft technical report documenting the precision and completeness of the data
linkage of the AEDI data with the birth, perinatal and school attendance and
achievement and community-level data is prepared
d) A NHMRC Partnership Grant to leverage additional Australian Government funding
for a two year program of data-linkage research based on a study population
comprising the 1993-2008 NT birth cohorts is developed and approved for
submission
Year 2 project outcomes (June 2014- June 2015)
The main outcomes in the second year of the project will be:
a) Departmental amendments to data linkage protocol and communication strategy are
incorporated into a final version ready for publication on NTG and CCDE websites
b) The finalised AEDI Data Quality and Linkage Report is approved
c) A report detailing the findings of the data-linkage analysis of the perinatal, child
health, AEDI, family and community-level determinants of school attendance and
NAPLAN in the NT is delivered to the Steering Committee one month prior to its
November meeting for approval for publication as a joint NTG/Menzies publication;
and as separate journal articles in peer-reviewed scientific journals.
d) Ethics Approval is obtained for the incorporation of the child protection records and a
data dictionary and data quality and linkage report are drafted.
Child Development and Education Research Partnership Project
Page 11
e) A technical report documenting the precision and completeness of the data linkage of
the child protection records with the birth, perinatal and school attendance and
achievement and community-level data is drafted.
f) A report detailing the findings of a feasibility study of the integration of comparable
data fields in NT and SA datasets linked through the SANT Datalink facility will be
drafted.
g) Ethics Approval is obtained for the incorporation of the juvenile justice records and a
data dictionary and data quality and linkage report are drafted.
h) A technical report documenting the precision and completeness of the data linkage of
the juvenile justice records with the birth, perinatal and school attendance and
achievement and community-level data is drafted.
Year 3 project outcomes (June 2015- June 2016)
The main outcomes in the third year of the project will be:
a) A report is prepared detailing the findings of the data-linkage analysis of the early life
determinants of children’s involvement with the NT child protection system; and
approved as separate journal articles in peer-reviewed scientific journals.
b) A report is prepared detailing the findings of the data-linkage analysis of the early life
determinants of children’s involvement with the NT juvenile justice system. Approval
for publication as a joint NTG/Menzies publication; and as separate journal articles in
peer-reviewed scientific journals.
c) The final report is delivered on the overall project outcomes, implications for policy
and service planning, and recommendations for further development of data-linkage
capacity in the NT.
Child Development and Education Research Partnership Project
Page 12
8. Project timeline
The following timeline shows the expected project outputs given no unexpected
administrative or technical difficulties in the linkage of new datasets.
Year 1 (2013-2014)
Oct
A CCDE Senior
x
Research Officer is
allocated.
Draft data linkage
protocol and
communication
Technical report on the
AEDI data linkage and
quality
NHMRC Partnership
Grant is developed and
approved.
Finalise data linkage
protocol &
communication
strategy
Finalise AEDI Data
quality and linkage
report
A report detailing the
health, AEDI, family
and community-level
determinants of school
outcomes
Ethics Approval, and
draft data dictionary
and data quality and
linkage report for child
protection records
A feasibility report on
the integration of NT
and SA drafted
Report on the early life
determinants of
children’s involvement
with the NT child
protection system
Nov
Feb
May
Year 2 (2014-2015)
Oct
Nov
Feb
May
Year 3 (2015-2016)
Oct
Nov
Feb
May
x
x
x
Child Development and Education Research Partnership Project
x
x
x
x
x
x
Page 13
Ethics Approval, and
draft data dictionary
and data quality and
linkage report for child
protection records
Ethics Approval, and
draft data dictionary
and data quality and
linkage report for
juvenile justice records
Report on the early life
determinants of
children’s involvement
with the NT juvenile
justice system
The final report is
delivered on the overall
project outcomes,
implications for policy
and service planning,
and recommendations
for further development
of data-linkage
capacity in the NT.
x
x
x
x
(X indicates the delivery of a report to the NTG-Menzies CCDE Steering Committee)
Child Development and Education Research Partnership Project
Page 14
Download