Statistical challenges in hospital acquired infection data trying to get it right? INFERENCE FOR EPIDEMIC-RELATED RISK InFER2011 CONFERENCE Emma McBryde Royal Melbourne Hospital & University of Melbourne & Burnet Institute Australia March 2011 RMH Intensive Care Methicillin-resistant Staphylococcus aureus Hospital acquired infections HAIs • • • • • High morbidity High mortality Greater duration of stay* Greater cost …. Large burden Engemann, J. J., Y. Carmeli, et al. (2003). "Adverse clinical and economic outcomes attributable to methicillin resistance among patients with Staphylococcus aureus surgical site infection." Clin Infect Dis 36(5): 592-598 Challenges in statistical inference • Serial dependence of transmission data • Data have a complex relational structure – bidirectional causality – confounding • Experimental options are limited – randomised Control Trial infeasible, unethical – may not answer any valuable question • Interrupted Time Series – has some advantages – numerous ways they can lead to incorrect inference • Partial observation Challenges in statistical inference • Serial dependence of transmission data • Data have a complex relational structure – bidirectional causality – confounding • Experimental options are limited – randomised Control Trial infeasible, unethical – may not answer any valuable question • Interrupted Time Series – has some advantages – numerous ways they can lead to incorrect inference • Partial observation Basic science informs transmission models • Hand to hand (contact) transmission is the commonest way the Staphylococcus aureus spreads • In the ICU, most of patient to patient transmission is from colonised to uncolonised patients via the hands of HCW • Environmental contamination certainly plays a role in some hospital pathogens • Must be considered if it is an influential transmission dynamic driver and particularly if the environment remains contaminated after the colonised patient is gone Ross-MacDonald Model HCW Serial Dependence in data • RCT: contaminated by effect of treatment in neighbouring patients • Cluster RCT; ok but – feasibility how many similar ICUs are there? – some effects can’t be ethically compared in RCT • Hand hygiene, for example – inference can limited (variance of events >> mean) • Interrupted time series is a more convenient alternative but has potential to lead to false inference ITS common mistakes • Wait until there is an epidemic • Institute numerous measures at once • Disregard – important confounding effects – that observations are partial – dependency in the data 30 Number of colonised patients 25 20 15 10 5 0 0 100 200 300 400 500 600 Time (days) 700 800 Simulation SI model 30 Number of colonised patients 25 20 15 10 5 0 0 200 400 600 800 1000 Time (days) 1200 1400 1600 1800 Challenges in statistical inference • Serial dependence of transmission data • Data have a complex relational structure – bidirectional causality – confounding • Experimental options are limited – randomised Control Trial infeasible, unethical – may not answer any valuable question • Interrupted Time Series – has some advantages – numerous ways they can lead to incorrect inference • Partial observation Length of stay-> Infection • Estimate effect of hospital infection on length of stay • Confounds effect of other covariates on infection Many solutions • Different approaches taken – Competing risk models – Instrumental variables • Survival analysis with discharge day as the “failure event” – Infection and other known factors as covariates – If day of infection is known, can model this as a timedependent covariate • Assumes the hazard ratio due to infection on discharge odds per day is constant over time • Data imputation (risk model for day of acquisition) if time of infection is unknown Some common mistakes • Take LOS as a “time invariant” covariate or binary covariate “risk factors for the development of HA-MRSA on multivariate analysis multivariate analysis included length of stay >7 days” • Use a statistical model that allows LOS to confound other potential risk factors for HAI, such as antibiotics A 0.1 A Infec Uncol 0.1 10 days Infec 50 days Results of simple univariate regression PArameter True value Estimated value Effect of antibiotics on infection risk Odds Ratio=1 OR=1.05 (^7) Hazard is not constant Marshall, C., D. Spelman, et al. (2009). "Daily hazard of acquisition of methicillinresistant Staphylococcus aureus infection in the intensive care unit." Infect Control Hosp Epidemiol 30(2): 125-129. Challenges in statistical inference • Serial dependence of transmission data • Data have a complex relational structure – bidirectional causality – confounding • Experimental options are limited – randomised Control Trial infeasible, unethical – may not answer any valuable question • Interrupted Time Series – has some advantages – numerous ways they can lead to incorrect inference • Partial observation Partial observation in hospital data • Colonisation unseen – date of colonisation – Presence of colonisation • Missing data – Infections not correctly diagnosed, for example • When and from whom the transmission occurred • With perfect data we could learn a lot about transmission • Solutions? – Assume perfect data • Underestimate true effects, overestimate false effect Missing data imputation • Impute missing data using the partial likelihood given the state of the model and the partial likelihood values as the sampling distribution • Calculating likelihood is difficult on observed data – observed infection times • Fully observed dataset is readily soluble using a model – actual infection times RMH Does detection and isolation work? Study • Planned ITS at Melbourne Hospital Intensive Care Unit • Pre-intervention 15 months – standard care – add swabs form MRSA on admission, discharge and Mondays, Thursdays – no reporting of results back to treating team – no routine isolation for MRSA colonisation (unethical?) • Post-intervention 15 months – Swabs and rapid PCR, returned within hours, but only on swab days – Report results of colonised patients, within the day – Patient isolation; add in contact precautions, put sign on patient room, aprons and gloves, single room (or cohorting) Complexities • Censored data; 3-4 days between swabs • Covariates – Patient factors: age, treating unit, risk of death – Ward factors: infection control compliance – Colonised patients: new versus old, antibiotic exposure Statistical model: hazard of colonisation • Patient factors: antibiotic use, age, sex, admitting unit • Colonised patients, – Total number or all or nothing – New colonisation or known to be colonised – Antibiotic exposure • Ward factors: – staff ratio, – adherence to infection control precautions; • Treatment (phase of trial) • Nuisance? – Baseline hazard changes due to time since admission Results: segmental regression model 0.015 Incident colonisations 0.01 0.005 0 0 5 10 15 Time (months) 20 25 30 Segmented regression • • • • Negative binomial regression 4 parameters Using vague priors Estimated expected rate at end of intervention compared with change-point • Estimated expected rate at end compared with extrapolated rate Segmented regression model: posteriors for change in MRSA rates 100 250 90 80 200 70 60 150 50 40 100 30 20 50 10 0 -0.02 -0.015 -0.01 -0.005 Inferred diff expectaion 0 0.005 0.01 Change in MRSA incidence rate During phase 2 0 -10 -8 -6 -4 -2 Actual difference in expectaion 0 2 -3 Change in MRSA incidence rate During phase 2 x 10 Back to the richer dataset Making most out of the data • first attempts at incorporating full patient histories • Just concentrating on colonisation pressure (the numberof other patients on the ward who were colonised) – Reed-Frost – Greenwood • Phase • Interaction between phase and colonisation Mathematical model S I S I Q Method of likelihood estimation • Piecewise hazard – Time interval of one day • Hazard calculated based on patient factors, ward factors, phase of intervention and colonised patient factors • Hazard of a transmission for patient p on day t given data, augmented data (exact transmission times of patients) and parameters Day of acquisition was imputed using partial likelihood: Components of the likelihood that depend on the imputed value Time Method of data augmentation: MCMC • Calculate likelihood and update augmented dataset, using a Gibbs step • Recalculate likelihood update parameters, using a Metropolis, Metropolis-Hastings Assumptions • Fully sensitive test • New colonisation in first 48 hours must be pre-colonised • Individual Infectiousness unchanged with time, except in presence of time dependent covariate = antibiotic use • All of these assumptions could be relaxed Methods Effect of phase 2 Factors included • Colonised patients • phase of study • Phase x colonised • Alone • On colonised S I background Q Univariate Multivariate I(Colonisation>0 ) Greenwood 1.56 4.1 (1.04-15.8) phase 0.40 (0.25-0.64) 0.63 (0.32-1.2) Phase*colonisation 1.02(0.78-1.34) 0.41 (0.20-0.95) Some more results • Patient factors – Use of anaerobic activity antibiotics • Colonised patient factors – Antibiotic use is not estimated to be important Conclusions • The rapid detection and isolation intervention led to a reduced risk of MRSA colonisation, particularly in the presence of colonised patient but also when no known colonised patient was in the ward • Other risk factors include anaerobic antibiotics • Further developments – – – – Allow for fully undetected acquisitions Look at the PFGE typing data Look at physical space; bed numbers Individual Infectiousness unchanged with time Acknowledgements • • • • Dr Caroline Marshall RN Leanne Redl NHMRC Grant #454495 Fairfield photo collection 0.015 Incidence 0.01 0.005 0 0 5 10 15 Time (months) 20 25 30 The ORION statement: guidelines for transparent reporting of outbreak reports and intervention studies of nosocomial infection Unless outcomes are independent, statistical approaches able to account for dependencies in the outcome data should be used, adjusting, where necessary, for potential confounders. Model • • • • • • v1=unifrnd(10, 20, 1000, 1); los=infcontin1+randn(size(infcontin1)); v2=infcontin1+randn(size(los))+los; Given v2, v1 has no effect on los V1 is functionally related to v2 V1 v2 The effect of v1 on los will show the direct effect of v2 on los los • • • • • • • • • • regress ( infcontin2, infcontin1) 2.0032 regress (los, infcontin2) 0.5008 regress (los, [infcontin1, infcontin2]) -0.0370 0.5192 regress (los, infcontin1) 1.0032=answer regress (infcontin1 ,[los, infcontin1]) • 1.0108 • 0.9872 0.9 0.8 0.7 incidence per 100 beddays full compliance with isolation 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 Month of phase 2 of study 10 12 14 Problem with this approach • Type 2 errors – Lose information from the data – Loss of power to detect a true difference – Loss of important cofactors in the analysis • Type 1 errors – Overstate an effect due to clustered data and high variance/mean • log(∆(t|X))=log(∆ 0)+beta1 X(t) +beta2Y(t)+beta3(Wt) • Likelihood infection day t using a piecewize hazard function • S(t)=λt exp{-Σ λi } (i=1, t-1) Model u 0 u0 Uncolonised patients not receiving antibiotics Pu0 u0 u1 u1 Uncolonised patients receiving antibiotics Pu1 u1 p1 p1 (1 ) H c / N h c0 c 0 c0 Colonised patients not receiving antibiotics Pc0 c1 Colonised patients receiving antibiotics Pc1 c1 c1 e1 Pc1 / N p e 0 Pc 0 / N p p 0 h 0 Pc 0 / N p p1 h1 ( Pc1 / N p E ) Uncontaminated HCW Hu Hc Environmental contamination E eh H c / N h Contaminated HCW Hc Change in transmission parameter Challenges in statistical inference • Serial dependence of transmission data • Data have a complex relational structure – bidirectional causality – confounding • Experimental options are limited – randomised Control Trial infeasible, unethical – may not answer any valuable question • Interrupted Time Series – has some advantages – numerous ways they can lead to incorrect inference • Partial observation Univariate I(Colonisation>0 ) Greenwood Multivariate 4.1 (1.04-15.8) Colonisation Reed Frost 1.10 (0.72- 1.69) phase 0.63 (0.32-1.2) 0.42 (0.23-0.75) Phase*colonisation 0.41 (0.20-0.95) 1.0 (0.64 -1.63) 0.015 Incidence 0.01 0.005 0 0 5 10 15 Time (months) 20 25 30 Hidden Markov Model . Hidden states: number colonised Observations: number detected Two components to the HMM horizontal component: transition model vertical component: observation model . MacDonald and Zucchini, Hidden Markov Models for Discrete Valued Time Series Ananda-Rajah, M. R., E. S. McBryde, et al. (2010). "The role of general quality improvement measures in decreasing the burden of endemic MRSA in a medical-surgical intensive care unit." Intensive Care Med 36(11): 1890-1898.