Developing an adverse event prediction system : A neural network and Bayesian pilot study Associate Professor Liza Heslop & Mahdi Bazargani Acknowledgements: Dean Athan and Gitesh Raikundalia May 14th 2013 vu.edu.au CRICOS Provider No: Three year study with three stages • Stage One (Pilot study): develop a structured neural network based on first day admission case mix indicators to discover the most sensitive indicators that impact on AEs and to refine the neural network threshold values– Neural Networks and Bayesian approach • Stage Two: Daily aggregate adverse events based on daily hospital workload indicators (DHWI) – a Bayesian approach • Stage Three: Discovering the relationship of common comorbidity indices with patients different main CHADx adverse event categories - A Bayesian approach Surgeons blame pressure from management for poor safety at Lincolnshire trust BMJ 2013;346:f1094 Fourteen hospital trusts are to be investigated for higher than expected mortality rates BMJ 2013;346:f960 “It has been estimated that across the 14 hospitals around 6000 more patients died than expected, with mortality rates 20% higher” BMJ 2013;346:f960 How has current research developed understandings of hospital-based workload intensity? Nurse workforce (measured as nurse overtime working hours) and nurse-sensitive patient outcome indicators are positively correlated (Liu et al. 2012) Nurse staffing (fewer RNs), increased workload, and unstable nursing unit environments were linked to negative patient outcomes including falls and medication errors on medical/surgical units in a mixed method study combining longitudinal data (5 years) and primary data collection (Duffield et al. 2012) Workload levels and sources of stressors can vary across different professional groups (Mazur et al. 2012) Current measures/variables of workload intensity Measure Source Patient transfers Composition based on the Clinical Classification System, complications identified by patient safety indicators, and in-hospital mortality Workload of inpatient doctors measured as the “difficulty of the tasks they perform while admitting patients” Nurse staffing levels, hours of nursing care per patient day (HPPD) Volume measures: Total census (the midnight census); number of surgeries (the total number of scheduled and unscheduled surgeries per- formed that day); Addons (the number of unscheduled surgeries); percentage add-ons (the number of add-ons as a percentage of the total surgeries performed); and behavioural health admissions Job satisfaction of doctors measured in survey as ‘workload’ Subjective measure of doctors’ workflow interruptions Nurses perceived last shift patient workload Generic data but need for additional data: time that nurses are off the unit (for code blue response, patient transfers and accompanying patients for tests, internal transfers/bed moves to accommodate patient-specific issues and particularly to address infection control issues; and deaths Blay et al. 2012 Studnicki et al. 2011 Lamba et al. 2012 Numerous Pedroja 2008 Khuwaja et al. 2004 Weigl et al. 2011 Kalisch et al. 2011 Fram et al. 2012 Workforce intensity measures– no common standard A range of internal and external research instruments such as audit, subjective responses to surveys and administrative and clinical data records Very few sourced coded episode-based hospital administrative data (HAD) It is necessary to accurately measure workload A factor that impacts upon the safety and quality of health care Useful measure to validate in the prediction model Objectives of the first stage pilot study • • • • • Develop a structured neural network based on first day admission case mix indicators Discover the sensitivity of each input and controlling indicator toward occurrences of an AEs Know the most sensitive indicators that impact on AEs Establish neural network threshold values Compare two main machine learning algorithms - Neural Networks (NN) and Bayesian Networks Classifier (NBC) Methodological objective: Develop a complex computational relational model Machine learning methodologies - Neural Network (NN) and Naive Bayes Classifiers (NBC). Both contribute in different ways. While NN has a complex structure, it is suitable for establishing the relational model. The relational model is based on dependent and inter-correlated indicators. NBC was employed with a pre-optimization algorithm that was trained with independent indicators. The accuracy of these two methodologies (NN and NBC) are compared with each other based on ‘confusion matrices’ and the rate of true positive and true negative AEs. Sensitivity analyses are reported based on the NN model which is finally established based on all incorporated indicators. Generalized Feed Forward Multilayer Perceptron Neural Network (input, hidden and output layers) The hidden layer consists of four processing elements (PEs) with using TanhAxon function as the transfer function. Weights are updated using back propagation by using Momentum rule (Momentum=0.7, Step Size=0.1) and batch learning. Batch learning improves the speed of training/learning Neural Network is employed for developing a prediction model based on dependent and intercorrelated input indicators There are many structures for a Neural Network and many methods for training them. For neural networks method, this study employed a Generalized Feed Forward Multilayer Perceptron Neural Network with three layers - input, hidden and output. This simple structure is suitable for the current coded episode static dataset in absence of any time series objective for prediction of AEs. The input layers are composed of independent variables based on first day of admission information : Table 1 DHVI; Table 2 Patient demographic information and patients’ diagnosis and episode characteristics (used as controlling input indicators); and a numeric score derived from comorbidity indexes. Conceptual design for building the relational model Daily Hospital Volume Indicators (DHVIs) Patient demographic information; Patient’s diagnosis and episode characteristics; Comorbidity indices Likelihood of Patient Adverse Events (CHADx) Identify daily hospital volume indicators Daily hospital volume indicators (DHVI)measures of work intensity DHVI Capability for extraction from a coded Australian episode data set Table 1. Daily Hospital Volume Indicators (DHVI) employed as independent or input variables No. Volume Indicator Name Description 1 Number of admissions Daily number of admissions on the patient’s admitted day 2 Number of discharges Daily number of admissions on the patient’s admitted day 3 Number of emergency admissions Daily number of admissions where the admission type is Casualty (A & E) 4 The percentage of all daily emergency admission calculated from all admissions 5 Percentage of emergency admissions Number of surgeries 6 Number of mid-point surgeries Surgical type calculated from the mid-point of admission and discharge date 7 Number of patients each day The number of patients in the hospital each day 8 Number of deaths Extracted from discharge information within the episode dataset 9 Number of Adverse Events Extracted using CHADx business rules on all episodes of care and assigns the time of possible AEs at mid-point date of hospitalization. Calculated from DRG code type ‘surgical’ assigned A defined set of controlling indicators controlling indicators Patient demographic information and patients’ diagnosis and episode characteristics These indicators are itemised within Table 2 Table 2. Patient demographic information and patients’ diagnosis and episode characteristics ( used as controlling input indicators) No. Indicator Name Category Description 1 Patient demographic information Age of patient extracted from the episode dataset 3 Age Sex Admission type 4 Primary procedure 5 Secondary procedure Primary diagnosis Patient’s diagnosis and episode characteristics Corresponding LOS scores for each category were obtained from the coded episode dataset Corresponding patient primary procedure LOS score extracted from National Hospital Cost Data Collection (NHCDC, 2001) Corresponding patient secondary procedure LOS score extracted from National Hospital Cost Data Collection (NHCDC, 2001) 2 6 7 Secondary diagnosis Sex of patient extracted from the episode dataset The type of admission defined as: 1-Casualty (A&E) 2-Waiting List 3-Qualified/Unqualified New-born 4-Transfers(Other Acute Hospital, External Care, Rehabilitation), 5-Change from Psychiatric Unit or Psychogeriatric 6-Other - Includes Referrals from Local Medical Officer (LMO) etc. Corresponding patient primary diagnosis LOS score extracted from National Hospital Cost Data Collection (NHCDC, 2001) Corresponding patient secondary diagnosis LOS score extracted from National Hospital Cost Data Collection (NHCDC, 2001) Determine each patient’s corresponding LOS scores related to patient specific characteristics Primary procedure Secondary procedure Primary diagnosis Secondary diagnosis Assignment of LOS scores from the NHCDC (2001) for each primary and secondary diagnosis and procedures for each patient in the coded episode data set. Comorbidity classification indices used for obtaining comorbidity LOS scores Charlson Comorbidity Index (CCI) (Deyo et al. 1992) Elixhauser Index (Elixhauser et al. 1998) Disease count (Stineman et al.1998) Shwartz Index (Shwartz et al. 1996) Dealing with a coded data set without ‘onset flag’ Identifying possible comorbidity and complication diseases with absence of onset flag A step was necessary to include and identify the possible comorbidities from the coded episode dataset. This study used a coded data set that did not have an onset flag. Onset flags were introduced in 2008 where hospital acquired conditions (HAC) were flagged in the codes. Hence a difficulty of this dataset was the nonexistence of an indicator (represented by an onset flag) on each secondary diagnosis to identify its type as a comorbidity or complication. Identify an operation for AEs – output from the Neural Network Each patient episode of care is identified as containing an AE if it satisfies any CHADx major categories’ business rules. According to Utz et al. (2012): “The CHADx offers a comprehensive classification of hospital-acquired conditions available for use with ICD10-AM.The CHADx was developed as a tool for use within hospitals, allowing hospitals to monitor (assuming constant casemix) and reduce hospital-acquired illness and injury. Within Queensland in 2010/2011, 9.0% of all admissions included at least one hospital-acquired condition (as defined by the CHADx)”. Results: building of the relational model (training and validation components) The relational model will ascertain relationships between all intercorrelated (dependent) input and controlling indicators toward the output variable AEs. To some extent these variables are inter-correlated, for example emergency admissions are correlated with the number of admissions. Table 3 Ordered Sensitivity of input indicators (Standard Deviation) with coefficient=1SD and step Size=1000 Indicator Name Ordered Sensitivity Values (SD) 0.09454 0.06732 0.05526 Indicator Type Number of Emergency Admissions Elixhauser Index 0.04406 DHVI 0.04397 Comorbidity Index Sex 0.04148 Patient Demographic Characteristics Charlson Index 0.03726 Comorbidity Index Mid Point Number Of Surgeries 0.03074 DHVI Age 0.02915 Patient Demographic Characteristics Number of Surgeries 0.02790 DHVI Secondary Procedure LOS 0.02713 Patient Diagnoses Types Primary Diagnosis 0.02649 Patient Diagnoses Types Number of Discharges Percentage of Emergency Admissions Shwartz Index Number of Admissions Admission Source Number of Patients Each Day Number of Deaths Disease Count Index 0.02557 0.02382 DHVI DHVI 0.01897 0.01866 0.01784 0.01456 0.01312 0.00706 Comorbidity Index DHVI Patient Demographic Characteristics DHVI DHVI Comorbidity Index Secondary Diagnosis LOS Prime procedure LOS Number of Adverse Event Patient Diagnoses Types Patient Diagnoses Types DHVI Discussion on Table 3 Most of the DHVIs have small sensitivity toward the output The ‘number of adverse events’; and ‘emergency admissions’ on the date of admission have the most sensitivity toward AE occurrences Among patient diagnoses indicators, all show strong sensitivity toward the likelihood of adverse events with Secondary Diagnosis LOS having the most effect among all employed input indicators in this pilot study Among comorbidity indices, Elixhauser and Charlson show rather strong sensitivity values Sex and Age have the highest sensitivities among demographic characteristic indicators Table 4 Accuracy of Patient’s AE classification system using Neural Network and naïve Bayes with different threshold values Classifier Accuracy (%) Sensitivity (%) Specificity (%) NN (Threshold=0.55) 65.83 38.09 93.57 NN (Threshold=0.50) 67.75 42.85 92.66 NN (Threshold=0.45) 71.33 50 92.66 NN (Threshold=0.40) 72.79 54.76 90.82 NN (Threshold=0.35) 74.25 59.52 88.99 NN (Threshold=0.30) 75.71 64.28 87.15 NN (Threshold=0.25) 73.23 66.66 79.81 NN (Threshold=0.20) 74.60 78.57 70.64 NN (Threshold=0.15) 70 85.71 52.29 Enhanced NBC 64.1 33.6 94.6 Discussion on Table 4 The Neural Network with different thresholds achieves higher overall accuracy than an optimized NBC. As the goal of this prediction is to obtain higher accuracy of true positive rates of AE (sensitivity), the thresholds 0.15 (sensitivity 85%) and 0.20 (sensitivity 78%) were selected while the last one achieved overall higher accuracy (74% versus 70%). Selection of these thresholds could be also dependent on the problem specification and application of the prediction model. On the other hand, NBC overall accuracy was lower than those values (64%) and a low rate of sensitivity was obtained (33%). Summary of key findings A trained Neural Network and NBC on the least indicators which achieve the highest accuracy Ordering of the sensitivity values Number of adverse events and Number of Emergency admissions on the date of admission showed most sensitivity within DHVIs Elixhauser and Shwartz indices showed most sensitivity within comorbidity indices Sex and Age showed most sensitivity within patient characteristics information toward occurrences of an AE. Results show the supremacy of the Neural Network with an overall accuracy of 74% (Threshold =0.2) versus 64% for Naive Bayes Classifier Lessons for the three stage study Indicators are very sensitive to the current state of the trained neural network and may be different if the network is trained with a different structure and if new indicators are employed Outcomes A simply-structured relational model and neural network that can generate complex computational calculations based on several weights for each node as well as several input and hidden nodes – a first step to develop a relational model to predict AEs Various training iterations have been conducted to generate the highest accuracy based on the validation dataset. This has resulted in avoidance of the overtraining and over-fitting of the network which the sensitivity analyses are based on Sensitivity values for the independent indicators have been obtained Study limitations Use of a coded episode data set without an onset flag Inclusion of complicated steps to distinguished complications arising after admission Results are not conclusive without further machine computational processing Implications of this pilot study for the next stages of this research The procedures to overcome the lack of an onset flag have been complex. The accuracy of knowing the hospital acquired conditions in the overall relational model will be improved in the main study. The sensitivity results will help with refinements to this pilot study when a larger data set will be used (including onset flag) The DHVIs on the date of admission may be eliminated as they don’t show sufficient strength for AE prediction. Comorbidity diseases and demographic characteristics along with diagnosis types are involved. 1-age 2-sex 3-primary procedure 4-seconary diagnosis 5-Elixhasuer There did not seem to be workload indicators involved in the highest accuracy of this relational model, but this finding will require validation in the refinements to the pilot study. May not support many research findings which suggests that workload indicators are heavily associated with adverse events. Next stage of research (continued) To develop a case mix of input indicators (CMI) between all employed indicators to reach the highest possible accuracy of classification based on employed machine learning algorithm This CMI will hold the least number of indicators which achieve the highest accuracy of classification To firmly establish which indicators to eliminate as their inclusion will not improve the overall accuracy of the model Direction of change as a result of the pilot study Further testing different machines other than Neural Network and Bayesian Network Consider an ensemble of RepTRee which may result further accuracy). There are different machines (e.g. Bayes, Neural Networks, Decision Trees, Logistic Regression) involved with different optimization algorithms(Greedy Search, Genetic Algorithm, Ensembles). The next stage will be to obtain the episode data indicator which will result in the highest possible accuracy for each machine and for each corresponding optimization algorithms Correlation (tipping point or non-linear relationships) may be examined in stage three based on the average rate of DHWIs during all days of the patient hospitalization, rather than the first day of admission. Correlation types based on Neural Networks is very complex and suitable for just classification and prediction results – hence Bayes is recommended for this study instead Development of a composite measure of hospital workload intensity A composite measure of hospital workload intensity may be valuable to policy and health service officials at many levels: The future outcome of a valid and reliable workload intensity composite measure will • Help clinicians define suitable workload standards for hospital organisations • Help hospital organisational officials to monitor their hospitals’ workload intensity and even possibly capacity • Support health services researchers to standardize measures of workload intensity for benchmarking • Help examine relationships between practice environment features (for example, as rated on measures of job satisfaction, turnover intentions and assessments of quality of care) and workload intensity in a systematic and standardized way Development of a composite measure of hospital workload intensity (cont’d) Make better use of coded activity-based data to improve the effectiveness of operational decision-making For example, Pedroja (2008:36), who used composite indexes to measure hospital workload intensity suggested: “Through the identification of a set of indicators that predict stresses on the system, leaders would have the ability to provide additional resources or system fixes that would make the operation less vulnerable to health care error and patient harm” Support national studies that may like to develop a systemic picture of workload intensity. Most current studies on workload intensity use a range of proxy measures in small scale or localised studies to measure the effort needed for inpatient medical and nursing work or workload intensity References Australian Commission on Safety and Quality in Health Care. Classification of Hospital Acquired Diagnosis (CHADx), 2011 Thomas JW, Guire KE, Horvat GG Is patient length of stay related to quality of care? Hospital & Health Services Administration (1997) 42(4):489-507 CONTACT DETAILS NAME Liza Heslop DEPARTMENT Western Centre for Health Research and Education, Sunshine Hospital. PHONE +0407886201 EMAIL liza.heslop@vu.edu.au www.vu.edu.au