Longitudinal Workforce Analysis using Routinely Collected Data: Challenges and Possibilities Shereen Hussein, BSc MSc PhD King’s College London Longitudinal Analysis General advantages General challenges • Can control for individual heterogeneity • Subject serve as own control • Between-subject variation excluded from error • Can better assess causality than crosssectional data • Conventional statistical methods require independence between observations • Longitudinal data are likely to violate this assumption • Missing data due to attrition • Data availability 29/5/2012 2 Workforce Data Example: NMDS-SC • • • • • • • Structure Design Coverage Time span Type of information collected Data collection and archiving size 29/5/2012 3 NMDS-SC data structure Social care providers in England Complete NMDSSC returns Aggregate information on the workforce Providers’ Database 29/5/2012 Detailed information on all or some individual workers Linkable workers’ Database 4 NMDS-SC longitudinal analysis: potential • Data coverage • Wide range of providers and individual workers’ information • Sector specific- uniqueness • Hierarchical structure • Workforce development and business sustainability • Timely – Demographics, austerity, unemployment • Economics – Care costs, including turnover costs – Pay • Linkable to local data characteristics 29/5/2012 5 Challenges in NMDS-SC longitudinal analysis • No sampling framework • No regular intervals for data collection • Irregularities in data completion by different providers • Additions/alterations of variables and fields • Cumulative nature and consequences on data size and structure • Archiving 29/5/2012 6 Challenges in NMDS-SC longitudinal analysis- continued Computational • Data size – Innovation in system design and architecture • Accumulative property – Scalability of the system • Changes in data fields • Variable additions and omissions • Data over-ride and archiving – Software and hardware issues 29/5/2012 Methodological • Unusual patterns of followup – Censoring • Variability in the database over time • Unbalanced cohort design • Missing data – Update frequency – Attrition – True exit • Other methodological issues 7 Providers’ level longitudinal mapping • From December 2007 to March 2011 • Linked 18 separate databases on the providers’ level • Each has records from 13,095 to 25,266 421,671 valid records included in the construction • Number of updates ranged from 0 to 18 per provider • Continuous process, more records added every 3 months 29/5/2012 8 8000 6000 0 2000 4000 N Provider 10000 12000 Meta-data analysis: providers with different number of events 2 29/5/2012 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 N Events 9 Specific example 1: Providers with 18 updates 29/5/2012 10 Specific example 2: Providers with 2 updates 29/5/2012 11 Density distribution plot of providers with at least 2 updates during the period December 2007 to March 2011 Density 0.0015 0.0010 0.0005 0.0000 29/5/2012 2008-May-01 2009-Sep-13 Update date 2011-Jan-26 12 density distributions of number of days elapsed between two updated providers’ events (1,180] (180,360] (360,540] (540,720] (720,900] (900,1.08e+03] 0.025 0.020 Density 0.015 0.010 0.005 0.000 29/5/2012 0 500 1000 N days between 2 update points per worker 13 Simple example using providers’ database: workforce stability over time • Longitudinal changes in care workers’ turnover and vacancy rates over time – From January 2008 to January 2010 • Changes in reasons for leaving the sector, identified by employers – Differentiating between those with improved (reduced) turnover rates and those with worse (increased) turnover rates 29/5/2012 14 Pre analysis • Selecting and constructing providers’ panel – Including those with at least two updates within +/- 3 months of T1 and T2 – 2953 providers with mean coverage duration of 602d • Investigate sample representation • Data quality checks • Data manipulation/imputation 29/5/2012 15 Some findings: changes in turnover rates 29/5/2012 16 Reason for leaving and turnover rate changes 29/5/2012 17 Analysis expansion: next steps • • • • Consider changes over a longer period of time Examine other providers’ characteristics Different take on panel inclusion criteria Link to individual workers’ longitudinal databases to examine relations with detailed workforce structure – Pay, qualifications, profile etc. • Build economic elements within analyses models, e.g. specific-turnover costs, within the longitudinal model 29/5/2012 18 Workers’ level longitudinal analysis • A much larger database – Same period of time- over 11M records • Providers not required to complete information for ‘all’ workers – Structural/design missing data – True missing data • Linkage issues – more data fields required for identification and linkage • Considerably large number of variables and fields – Careful planning; analysis-tailored data retrieval • Changes in database – Amendments, new variables etc. – Programming intensive and demanding models (may not be replicable for different databases) 19 29/5/2012 600000 150000 Records available 70000 90000 Records cannot be used Records with missing worker ID 60000 90000 N 350000 Valid records 100000 115000 Records with no update date 5 29/5/2012 10 Data set index 15 20 Issues to consider • Suitability of models – Longitudinal structure – Competing risks • Measurement window – Late entry into risk sets • Use proxies, other variables in the dataset • Adopt suitable approach/model – Censoring (LHS and RHS) • Assumptions – Guided by: • Sector-specific knowledge • Intelligence from other variables in the data 29/5/2012 21 Current longitudinal research Watch this space!! • Workforce mobility within the sector • Occupation durations • Characteristic-specific probabilities of exiting or remaining in the sector • Characteristic-specific probabilities of moving employer within the sector or having multiple jobs • Career pathways within the sector 29/5/2012 22 Acknowledgments • Thanks to the Department of Health for funding this work • Thanks to Skills for Care for providing the data on regular basis • Thanks to Analytical Research Ltd for their technical and quantitative support 29/5/2012 23 Further information • Shereen.hussein@kcl.ac.uk • 02078481669 • See: • http://www.kcl.ac.uk/sspp/departments/ssh m/scwru/res/knowledge/nmdslong.aspx 29/5/2012 24