Child Development and Education Research Partnership Project Child Development and Education Research Partnership Project Page 1 Contents Page 1. Background …………………………………………………………………….. 3 2. Aims and objectives …………………………………………………………… 4 3. Project rationale ………………………………………………………………... 7 4. Data-linkage methodology …………………………………………………….. 8 5. Study sample and data………………………………………………………… 9 6. Project outcomes ………………………………………………………………. 11 7. Anticipated timeline ……………………………………………………………. 13 Appendix 1. Data items selected for linkage analysis …………………………. 15 A1. Department of Education Data ………………………………………………. 15 A2. Department of Health Data …………………………………………………… 17 A3. DEEWR Data (AEDI) ………………………………………………………….. 23 Child Development and Education Research Partnership Project Page 2 1. Background The NT Government’ has contributed $400,000 over the past four years as a founding partner of the SA NT Datalink consortium. This has enabled the development of the technical capacity to support research being conducted through the de-identified linkage of NT population datasets to conduct policy-relevant analyses not previously possible. From 2009 - 2013, Menzies CCDE has partnered with NT Health Gains and Planning to conduct the SA NT Datalink Early Childhood Development Demonstration Study (Silburn, Lynch, Guthridge & McKenzie, 2009). The design of the NT study was developed in collaboration with Professor John Lynch, Professor of Public Health, University of Adelaide, so that a parallel study could be conducted in South Australia with comparable SA data using key variables relevant to children’s early health and development matched to those used in the NT study. The objectives of both the NT and SA demonstration studies have been to: a) develop robust methods to link and de-identify population-wide perinatal health, early child development, school education and other relevant datasets for the purpose of describing the population dynamics of childhood growth and development; b) document the precision, consistency and completeness of the data-linkage across different database, and establish the number of unique individuals within each dataset and across datasets, c) demonstrate the feasibility of using de-identified linked data to quantify the prevalence, associations and consequences of factors in early life of key relevance to children’s longer-term outcomes in health, behaviour and learning. The current status of the NT data-linkage demonstration study is that most of the lead up work required in the de-identification and linkage of selected data items from each of the first four datasets being used in the study is now complete. This has required securing HREC and agency approvals for the creation of the anonymous linkage-keys and the extraction of the relevant data items attached to their linkage keys from the respective health, education and AEDI datasets for all children born in the NT from 1993 to 2009. The reliability of the linkage keys used for de-identified linkage of perinatal, immunization, school enrolment, attendance, NAPLAN and the 2009/10 AEDI data has been examined to document the consistency and quality of key identifying variables and content variables of particular interest data within each dataset (e.g. date of birth, gender, indigenous status). New statistical methods of multiple imputation have been tested and are being used to correct discrepancies between identifying variables across datasets to minimise sample loss when performing analyses involving data on individuals drawn from multiple datasets. The success of the SA NT Datalink Demonstration Study in establishing the Child Development and Education Research Partnership Project Page 3 feasibility of data-linkage analysis to make better use of existing NT administrative data has provided the foundation for the proposed ‘Project’ component of the NTGMenzies CCDE Research Partnership. 2. Aims and objectives 2.1 Aims The overall aim of the NTG-Menzies CCDE Research Partnership project is to build upon the experience of the SA NT-Datalink Early Childhood Development Study in creating a NT specific study population based on de-identified linked data spanning the first antenatal health care visit through to school year 9 covering the 1993-2006 birth cohorts. This will involve a three year program of research aimed at identifying specific early life conditions and experiences that adversely or beneficially influence child outcomes with a view to informing and supporting policies and programs that have the most likelihood of success in improving child outcomes in the NT. 2.2 Objectives Objective 1. Investigate and report the social, individual, health and family factors that influence achievement in AEDI and NAPLAN literacy and numeracy tests among Northern Territory children. The key research question to address this objective is “given the unique sociodemographic characteristics of the NT what are the most salient and potentially modifiable early life determinants which should be the focus of policy and practice to improve the longer-term human capability of the NT population?” Establishing the associations between AEDI and NAPLAN outcomes with early life sociodemographic and health circumstances will require controlling for potential confounding and mediating effects, examining effects in sub populations and identifying multi-level effects. The covariates to be considered in these analyses will depend on the specific questions but will include such family-level variables as maternal education, age, occupation etc and community/neighbourhood-level characteristics assessed through community-level indices of environmental health (e.g. housing overcrowding), social functioning (e.g. per capita rates of child protection and domestic violence notifications, police call-outs etc) and family support of school education (e.g. average annual school attendance rates). Research questions related to social factors could include: a) Identifying the relative contribution of family socio-economic factors (including parental education and occupation and area of residence) that influence school readiness and achievement in literacy and numeracy tests, b) Identifying the relative contribution of socio-demographic factors (including family Child Development and Education Research Partnership Project Page 4 size and structure, ethnicity, language background, maternal and paternal age and family mobility) on school readiness, school attendance and achievement in literacy and numeracy tests. Research questions relating to health factors that influence school readiness (AEDI), attendance and achievement in literacy and numeracy tests (NAPLAN) among NT children are: a) Determining the relative contribution of clinical factors (including birth weight, Apgar score, birth length, head circumference, infant growth and nutritional status (GAA data) that influence school readiness and achievement on literacy and numeracy tests, 2) Identifying the relative contribution of maternal and child health factors (including mother’s gestational health and medical conditions/health issues identified from birth) (e.g. early childhood anaemia) that influence school readiness and achievement in literacy and numeracy tests. The final research question under this objective concerns the combined effect of all of these factors and the appropriate covariates; what are the interactions between them and which factors can enhance or mute the beneficial or detrimental effect of others, and are there discrete developmental pathways which can be identified for different groups of children. OBJECTIVE 2: Investigate the significant differences between the number of children in birth, school enrolment and school attendance cohorts. There is significant mobility of children between the Northern Territory and other states and territories which is poorly described in the currently available data. There are also significant numbers of children that remain in the Northern Territory but do not engage with the school system (not enrolling or not attending regularly). Can the information available from the linked datasets inform our understanding of these cohorts of children? The specific research questions that require investigation include: a) what are the numbers and demographic profiles of children entering or leaving the NT in their early years? b) How many children of school age are not enrolled in school or attend irregularly? c) What are their demographic characteristics? How useful is linked birth and education data in studying these questions? Objective 3. Investigate the child health, parental, family and community factors that relate to children’s vulnerability to child abuse and neglect and the longer-term developmental consequences of such vulnerability. Child Development and Education Research Partnership Project Page 5 Child protection notifications and substantiations are a significant indication of the disadvantage and vulnerability of Aboriginal children and families (AIHW, 2012). In 2010-2011, Aboriginal and Torres Strait Islander children were almost 8 times as likely to be the subject of substantiated child abuse and neglect as non-Indigenous children (rates of 34 .6 and 4.5 per 1,000 children, respectively). In June 2011, the rate of Aboriginal and Torres Strait Islander children on care and protection orders was over 9 times the rate of non-Indigenous children (rates of 51.4and 5.4 per 1,000 children, respectively). Similarly, the rate of Aboriginal and Torres Strait Islander children in out -of-home care was 10 times the rate of non-Indigenous children (rate of 51.7 and 5.1per 1,000 children, respectively). Concerns about the level of Indigenous over-representation in out of home placements constituting another ‘Stolen Generation’ have been challenged in recent years by the view that child removal remains a necessary response to the high prevalence of neglect in some communities where high rates of social adversity, family breakdown, chronic stress and ill health, low levels of parental education and employment are reproduced in a ‘vicious cycle’ of disadvantage (Delfabbro et al, 2010). Given the unique socio-demographic characteristics of the NT there are several research questions where data-linkage analysis could provide a more nuanced understanding of the key drivers and consequences of childhood vulnerability in the NT. Pending discussion with the Office of Children and Families, these questions could include: a) what are the most salient and potentially modifiable early life determinants which should be the focus of policy and practice to reduce children’s vulnerability to abuse and neglect? b) what are the combined effects of all of the early life determinants and their appropriate covariates; what are the interactions between them and which factors can enhance or mute the beneficial or detrimental effect of others, and are there discrete developmental pathways of vulnerability which can be identified for different groups of children? c) Can the above analysis be used to establish an index of vulnerability which could be used with the available population data to answer the question of whether current child protection practice in the NT represents an under- or over-response of services and agencies, and whether there have been identifiable trends over time in terms the levels of service response. d) What are the longer-term pathways of development to age 18 of children who have been in out-of-home care? Do these outcomes differ with regard to the age of the child at the time of placement, whether these are kinship placements, the number and frequency of placements, and the total time spent in alternative care? Key outcomes which should be examined include: developmental functioning at age 5 years (AEDI), school attendance and retention, academic outcomes (NAPLAN years 3, 5, 7 & 9), contact with the juvenile justice and mental health systems etc. Child Development and Education Research Partnership Project Page 6 Investigating these child protection related questions will involve securing the approvals needed for the linkage of OCF service data with the already linked datasets. Once the de-identified data are available, the analysis will involve establishing the prevalence and relative contribution of early life circumstances that predict the likelihood of a child’s involvement with the NT child protection system. This will include consideration of the relative contribution of child clinical factors (e.g. inter-uterine alcohol and nicotine exposure, birth-weight and peri-natal health status, infant growth and nutritional status); parental and family factors (e.g. maternal age and education, parents’ health and mental status, family composition, functioning and mobility), and; community factors (e.g. housing overcrowding, indicators of community safety and community social functioning). Objective 4. Investigating the extent to which NT early childhood development data and its markers match and diverge from those in South Australia Given that the SANT-Datalink Early Childhood Development Study was set up to be done in parallel with a comparable South Australian study there is an opportunity to Investigating the extent to which NT data and its markers match and diverge from those in South Australia. While this will need to occur after Research Objectives 1 to 3 have been completed in the NT, Objective 45 is technically feasible but will the preparation of requests for variation to the existing HREC approvals in the NT and SA for the merging of two independently confidentialised linked datasets. Each of the data custodians of the various datasets in each jurisdiction will also need to consent to their data being combined with the data from the other jurisdictions to address some broader questions relating to both jurisdictions. On the basis of current experience and assuming no unforseen complications, this could take anywhere from 12 – 18 months to obtain all the administrative consents and to complete the analysis and reporting of findings. 3. Project rationale There is widespread scientific agreement that the early years of a child’s life is of critical importance in shaping longer-term outcomes in health, development, learning and wellbeing across the lifespan. The Commonwealth, state and territory governments through the Council of Australian Governments (COAG) have established a comprehensive agenda for investing in early childhood development and wellbeing to “ensure that by 2020 all children have the best start in life to create a better future for themselves and for the nation.” Key goals of the National Strategy are to reduce the impact of risk factors on children’s development, reduce inequalities in outcomes between groups and to improve outcomes for all children. Building better information and robust evidence was highlighted as one of six priorities to progress the goals of the National Strategy (http://www.coag.gov.au/coag_meeting_outcomes/2009-0702/docs/national_ECD_strategy.pdf). Child Development and Education Research Partnership Project Page 7 In the Lancet special series on child development in developing countries, Engle et al (2007b) concluded that the most effective early child development programs are those which are: targeted towards disadvantaged children; provide services to younger children (less than age 3); have continued duration throughout early childhood; are of high-quality, defined by structure (e.g. child-staff ratio, staff training, processes which allow responsive interactions and a variety of activities), provide services directly to children and parents and are integrated into existing health programs. It is acknowledged that children’s development is shaped by a complex interplay between individual biological factors and a range of social, economic and environmental factors. For government policy to be better informed by evidence we need to improve our understanding of how various factors impact at the population level and for significant sub populations. Understanding how these factors influence children’s developmental trajectories and their capacity to participate in life and learning is essential to the effective targeting and delivery of services and to investigating the extent to which our policies are working in achieving their stated aims. While there is a growing body of international work exploring the relationship between specific risk factors and outcomes in early childhood, much less is known about how these risk and protective factors cumulatively impact in whole populations. Given the continuing poor child health and educational outcomes in the NT and the new policy emphasis on the development and delivery of more effective early childhood and family support services, it is vital that the design, implementation and evaluation of these services is based on reliable evidence and a systematic understanding of the complex interplay between individual, environmental and social forces shaping the lives of children in the NT population context. 4. Data-linkage methodology The mechanics of the data linkage process are as follows: - The identification/linkage data only (e.g. date of birth and name but not birth weight) from each dataset is supplied to the SA NT data linkage unit. - The SANT-datalink linkage service generate and attach linkage keys (unique to each individual) to each record supplied. - This is returned to the custodians in each agency who attach the linkage keys but remove the identifying data before supplying the “information” datasets to the researchers. - The researchers now have de-identified data sets but can use the linkage keys to match records from different sources. SA NT-Datalink’s systems and protocols are based on the highest ethical and privacy standards and strong security measures have been implemented to prevent inappropriate Child Development and Education Research Partnership Project Page 8 use or disclosure of personal information. Only the data custodians have access to personal identifying information and only de-identified linkage keys will be provided to the research team. The linkage process is carefully designed to ensure that no identified information (other than that used for the actual linkage) is supplied by the data custodians and that they receive no identifiable data from other sources. The de-identified datasets with their anonymous linkage keys are stored separately on a secure computer server at CDU. The nominated Menzies CCDE researchers working on the project (Messrs Silburn and McKenzie ) have secure access to these de-identified linkeddatasets.. The data cleaning stage of the project has included cross-validation analysis to examine the internal consistency/accuracy of the merged datasets and an audit of data completeness and analysis of possible determinants of missing data and how this might inform the treatment of missing data through standard multiple imputation methods. With regard to the public reporting of findings from the analysis our approach is consistent with the data cell size guidelines for use of AIHW data http://www.aihw.gov.au/committees/simc/guidelines_statistical_purposes.doc. While there is no national standard for public reporting of small cell sizes, for the purposes of this project, we will suppress any positive cell size less than 10 as well as adjacent cells so that back calculation is technically not possible. 5. Study design and analysis A range of statistical methods will be used in addressing the four research objectives and their associated research questions. Multinomial logistic regression analysis will be used to investigate whether a cumulative risk index can be developed and to examine its predictive validity using multivariable risk prediction methods, including ROC curves, discrimination, and net reclassification indices. This index of cumulative risk will be used to examine its levels, social patterning and predictive value in different SES, ethnic and geographic groups. Structural equation modelling (SEM) using LISREL will be used to explore models of children’s developmental status (AEDI at age 5 years) and successful learning (NAPLAN year 3 Reading and Numeracy). SEM involves specifying models of predictor variables available within the various linked data sets covering the period from the first antenatal health care visit to age 8 years. These variables together represent ‘latent’ constructs such as ‘prenatal health’, ‘health at birth’, ‘maternal deprivation’, ‘community stress’, ‘family functioning‘ and ‘developmental readiness for school learning’ etc. SEM then identifies the strength of association between these ‘latent’ constructs and the key outcomes of interest (i.e. AEDI and NAPLAN). The final step of SEM is to assess how well each of the conceptual models match with the observed data (i.e. their ‘goodness of fit’). Child Development and Education Research Partnership Project Page 9 A number of different SEM models will be explored for Indigenous and non-Indigenous children and for special populations such as Indigenous children residing in ‘Town Camps’. Our analytic team has access to bio-statistical experts who are proficient in the use of these methods now available with the STATA and SPSS statistical software packages. Multi Level Modelling methods will also be used to tease out community/school effects from individual factors using the MLwiN statistical software package. 6. Study sample and data The following NT population-based administrative datasets have currently been linked by the SA NT Datalink service: NT Dept of Health: Perinatal health data, client master index (for data linkage purposes), childhood immunisation data (for identifying study children who may have re-located from the NT), NT Dept of Education: school enrolment, attendance and NAPLAN (2008-2011); and DEEWR: AEDI data (2009). The relevance of the selected data items linked from these datasets to support analysis of the population-level dynamics of early child development from the first antenatal health care visit through to school completion is summarised in shaded boxes shown in figure 1 below. Note: The un-shaded boxes in this figure indicate the additional NTG administrative datasets which the project will aim to link during the course of the study. The figure also shows the community-level data relevant to four different developmental epochs which the project aims to assemble and link for the longitudinal analysis being conducted over the next several years. Figure 1. Datasets informing the NTG Menzies CCDE research partnership analysis Child Development and Education Research Partnership Project Page 10 Community-level demographics & SEP environmental factors Community-level demographics & SEP environmental factors Community-level demographics & SEP environmental factors Community-level demographics & SEP environmental factors (perinatal period) (early childhood years) (primary school years) (middle school years) -9mths Age 5-6 Birth Perinatal dataset (All NT born Age 8-9 Age 10-11 Age 12-13 Age 14-15 Immunisation Dataset AEDI dataset (All enrolments, school, attendance & NAPLAN data) Education dataset children 1993-2005) (NT & non-NT born children since 1999) (Censoring variable) (NAPLAN assessments from 2008 onwards) (School years 3, 5, 7 & 9) N= (Children enrolled in NT schools at 5 age 5yrs in 2009 & 2012) Child protection dataset (Still to be linked) Child health & hospital dataset (Still to be linked) 7. Project outcomes Year 1 project outcomes (June 2013- June 2014) The main outcomes in the first year of the project will be: a) Appointment of a CCDE Senior Research Officer to support Data-linkage utilisation from July 2013 to June 2016. b) A draft data linkage protocol and communication strategy is prepared c) A draft technical report documenting the precision and completeness of the data linkage of the AEDI data with the birth, perinatal and school attendance and achievement and community-level data is prepared d) A NHMRC Partnership Grant to leverage additional Australian Government funding for a two year program of data-linkage research based on a study population comprising the 1993-2008 NT birth cohorts is developed and approved for submission Year 2 project outcomes (June 2014- June 2015) The main outcomes in the second year of the project will be: a) Departmental amendments to data linkage protocol and communication strategy are incorporated into a final version ready for publication on NTG and CCDE websites b) The finalised AEDI Data Quality and Linkage Report is approved c) A report detailing the findings of the data-linkage analysis of the perinatal, child health, AEDI, family and community-level determinants of school attendance and NAPLAN in the NT is delivered to the Steering Committee one month prior to its November meeting for approval for publication as a joint NTG/Menzies publication; and as separate journal articles in peer-reviewed scientific journals. d) Ethics Approval is obtained for the incorporation of the child protection records and a data dictionary and data quality and linkage report are drafted. Child Development and Education Research Partnership Project Page 11 e) A technical report documenting the precision and completeness of the data linkage of the child protection records with the birth, perinatal and school attendance and achievement and community-level data is drafted. f) A report detailing the findings of a feasibility study of the integration of comparable data fields in NT and SA datasets linked through the SANT Datalink facility will be drafted. g) Ethics Approval is obtained for the incorporation of the juvenile justice records and a data dictionary and data quality and linkage report are drafted. h) A technical report documenting the precision and completeness of the data linkage of the juvenile justice records with the birth, perinatal and school attendance and achievement and community-level data is drafted. Year 3 project outcomes (June 2015- June 2016) The main outcomes in the third year of the project will be: a) A report is prepared detailing the findings of the data-linkage analysis of the early life determinants of children’s involvement with the NT child protection system; and approved as separate journal articles in peer-reviewed scientific journals. b) A report is prepared detailing the findings of the data-linkage analysis of the early life determinants of children’s involvement with the NT juvenile justice system. Approval for publication as a joint NTG/Menzies publication; and as separate journal articles in peer-reviewed scientific journals. c) The final report is delivered on the overall project outcomes, implications for policy and service planning, and recommendations for further development of data-linkage capacity in the NT. Child Development and Education Research Partnership Project Page 12 8. Project timeline The following timeline shows the expected project outputs given no unexpected administrative or technical difficulties in the linkage of new datasets. Year 1 (2013-2014) Oct A CCDE Senior x Research Officer is allocated. Draft data linkage protocol and communication Technical report on the AEDI data linkage and quality NHMRC Partnership Grant is developed and approved. Finalise data linkage protocol & communication strategy Finalise AEDI Data quality and linkage report A report detailing the health, AEDI, family and community-level determinants of school outcomes Ethics Approval, and draft data dictionary and data quality and linkage report for child protection records A feasibility report on the integration of NT and SA drafted Report on the early life determinants of children’s involvement with the NT child protection system Nov Feb May Year 2 (2014-2015) Oct Nov Feb May Year 3 (2015-2016) Oct Nov Feb May x x x Child Development and Education Research Partnership Project x x x x x x Page 13 Ethics Approval, and draft data dictionary and data quality and linkage report for child protection records Ethics Approval, and draft data dictionary and data quality and linkage report for juvenile justice records Report on the early life determinants of children’s involvement with the NT juvenile justice system The final report is delivered on the overall project outcomes, implications for policy and service planning, and recommendations for further development of data-linkage capacity in the NT. x x x x (X indicates the delivery of a report to the NTG-Menzies CCDE Steering Committee) Child Development and Education Research Partnership Project Page 14